Converting FASTQ to FASTA

A little Perl one liner I borrowed from The Edwards Lab that converts FASTQ to FASTA. Please note I had to truncate the line to make it show properly in this blog entry.


$ cat file_to_covert.fq | perl -e \
'$i=0;while(<>){if(/^\@/&&$i==0){s/^\@/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' \
> output.fasta

Thanks Edwards Lab!

I wrote the above post and a reader rightly pointed out that it is incorrect (see post comments below). I solved the problem of converting FASTQ to FASTA with the following script, which seems to work fine:

use Bio::SeqIO; 
my ($file1,$file2)=@ARGV; 
my $seqin = Bio::SeqIO -> new (-format => 'fastq',-file => $file1); 
my $seqout = Bio::SeqIO -> new (-format => 'fasta',-file => ">$file2"); 

while (my $seq_obj = $seqin -> next_seq) 
   $seqout -> write_seq($seq_obj); 

Categories: Genomics, Tutorials

Tagged as: , , ,

4 replies »

    • Also buggy. There is no guarantee that FASTQ files have a simple alternation of lines. Line wrap can occur (and is allowed). The FASTQ format is rather a crummy design. As it says on the Wikipedia FASTQ page:
      “The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of “@” and “+” as markers (these characters can also occur in the quality string).”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s