Mapping reads with Mosaik: Difference between revisions
No edit summary |
No edit summary |
||
Line 23: | Line 23: | ||
{| width="100%" | {| width="100%" | ||
|- valign="top" | |- valign="top" | ||
| width="25%" | | |||
<pre> | <pre> | ||
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15 | MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15 | ||
</pre> | </pre> | ||
|} | |} | ||
=== Sort the alignment file === | |||
<pre> | <pre> | ||
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat | MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat |
Revision as of 16:34, 25 November 2011
Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.
Prerequisites
input reads
The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.
Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.
reference assembly files for MOSAIK
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:
/shared/Sus/Sscrofa_build10_2/
Mapping procedure
creating MOSAIK file for the reads
The first step is creating '.dat' file of the sequence reads.
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger
Align reads with MOSAIK
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15 |
Sort the alignment file
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat
= convert to BAM
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam
Finalizing preparation of BAM files with SAMtools
prepare a new header file
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt
merge BAM files (if applicable)
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam
reheader merged BAM file
samtools reheader newheader.txt merged.bam >pigname_rh.bam
Variant calling with Samtools
calling raw variants
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt
filtering variants
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt
Additional notes
submitting using grid-engine
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:
#!/bin/bash #$ -cwd #$ -S /bin/sh #$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
Tidying up
Please clean up the intermediate files as soon as they are not needed any more.
rm *.dat rm reads.bam