Command line tricks for manipulating fastq

From HPCwiki
Jump to navigation Jump to search

Extracting a sequence based on read name

<source lang='bash'> gunzip -c reads.fq.gz | sed -n '/readname/,+3 p' </source>

From BAM/SAM to fastq

<source lang='bash'> samtools view bamfile.bam | grep -v '^@' | awk '{print "@"$1"\n"$10"\n+\n"$11}' </source>

From fastq to fasta

<source lang='bash'> gunzip -c fastqfile.fq.gz | sed 's/^@/>/' | awk '{print;getline;print;getline;getline}' </source>

Counting number of bases in a fastq file

<source lang='bash'> gunzip -c fastqfile.fq.gz | awk '{;getline;print;getline;getline}' | wc </source>