Command line tricks for manipulating fastq

From HPCwiki
Jump to navigation Jump to search

Extracting a sequence based on read name

gunzip -c reads.fq.gz | sed -n '/readname/,+3 p'

From BAM/SAM to fastq

samtools view bamfile.bam | grep -v '^@' | awk '{print "@"$1"\n"$10"\n+\n"$11}'

From fastq to fasta

gunzip -c fastqfile.fq.gz | sed 's/^@/>/' | awk '{print;getline;print;getline;getline}'

Counting number of bases in a fastq file

gunzip -c fastqfile.fq.gz | awk '{;getline;print;getline;getline}' | wc