Mapping reads with Mosaik: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
(Created page with "Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly. == Pre...")
 
No edit summary
Line 21: Line 21:


=== Align reads with MOSAIK ===
=== Align reads with MOSAIK ===
{| width="100%"
|- valign="top"
<pre>
<pre>
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15
</pre>
</pre>
 
|}
==== Sort the alignment file ===
==== Sort the alignment file ===
<pre>
<pre>

Revision as of 16:33, 25 November 2011

Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.

Prerequisites

input reads

The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.

Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.

reference assembly files for MOSAIK

The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:

/shared/Sus/Sscrofa_build10_2/

Mapping procedure

creating MOSAIK file for the reads

The first step is creating '.dat' file of the sequence reads.

MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger

Align reads with MOSAIK

MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15

= Sort the alignment file

MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat

= convert to BAM

MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam

Finalizing preparation of BAM files with SAMtools

prepare a new header file

samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt

merge BAM files (if applicable)

samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam

reheader merged BAM file

samtools reheader newheader.txt merged.bam >pigname_rh.bam

Variant calling with Samtools

calling raw variants

samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt

filtering variants

samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt

Additional notes

submitting using grid-engine

Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:

#!/bin/bash
#$ -cwd
#$ -S /bin/sh
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/

Tidying up

Please clean up the intermediate files as soon as they are not needed any more.

rm *.dat
rm reads.bam