Mapping reads with Mosaik: Difference between revisions
No edit summary |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
=== reference assembly files for MOSAIK === | === reference assembly files for MOSAIK === | ||
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: | The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: | ||
< | <source lang='bash'> | ||
/shared/Sus/Sscrofa_build10_2/ | LIBPATH='/shared/Sus/Sscrofa_build10_2/' | ||
</ | </source> | ||
== Mapping procedure == | == Mapping procedure == | ||
=== creating MOSAIK file for the reads === | === creating MOSAIK file for the reads === | ||
The first step is creating '.dat' file of the sequence reads. | The first step is creating '.dat' file of the sequence reads. | ||
< | <source lang='bash'> | ||
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger | MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger | ||
</ | </source> | ||
=== Align reads with MOSAIK === | === Align reads with MOSAIK === | ||
{| width="100%" | {| width="100%"| | ||
| | <source lang='bash'> | ||
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \ | |||
-p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15 | |||
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia | </source> | ||
</ | |||
|} | |} | ||
=== Sort the alignment file === | === Sort the alignment file === | ||
< | <source lang='bash'> | ||
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat | MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat | ||
</ | </source> | ||
=== convert to BAM === | |||
< | <source lang='bash'> | ||
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam | MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam | ||
</ | </source> | ||
== Finalizing preparation of BAM files with SAMtools == | == Finalizing preparation of BAM files with SAMtools == | ||
=== prepare a new header file === | === prepare a new header file === | ||
< | <source lang='bash'> | ||
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt | samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt | ||
</ | </source> | ||
< | <source lang='bash'> | ||
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt | samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt | ||
</ | </source> | ||
=== merge BAM files (if applicable) === | === merge BAM files (if applicable) === | ||
< | <source lang='bash'> | ||
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam | samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam | ||
</ | </source> | ||
=== reheader merged BAM file === | === reheader merged BAM file === | ||
< | <source lang='bash'> | ||
samtools reheader newheader.txt merged.bam >pigname_rh.bam | samtools reheader newheader.txt merged.bam >pigname_rh.bam | ||
</ | </source> | ||
== Variant calling with Samtools == | == Variant calling with Samtools == | ||
=== calling raw variants === | === calling raw variants === | ||
< | <source lang='bash'> | ||
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt | samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt | ||
</ | </source> | ||
=== filtering variants === | === filtering variants === | ||
< | <source lang='bash'> | ||
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt | samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt | ||
</ | </source> | ||
== Additional notes == | == Additional notes == | ||
=== submitting using grid-engine === | === submitting using grid-engine === | ||
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: | Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: | ||
< | <source lang='bash'> | ||
#!/bin/bash | #!/bin/bash | ||
#$ -cwd | #$ -cwd | ||
#$ -S /bin/sh | #$ -S /bin/sh | ||
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/ | #$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/ | ||
</ | </source> | ||
=== Tidying up === | === Tidying up === | ||
Please clean up the intermediate files as soon as they are not needed any more. | Please clean up the intermediate files as soon as they are not needed any more. | ||
< | <source lang='bash'> | ||
rm *.dat | rm *.dat | ||
rm reads.bam | rm reads.bam | ||
</ | </source> |
Latest revision as of 20:19, 4 March 2012
Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.
Prerequisites
input reads
The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.
Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.
reference assembly files for MOSAIK
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: <source lang='bash'> LIBPATH='/shared/Sus/Sscrofa_build10_2/' </source>
Mapping procedure
creating MOSAIK file for the reads
The first step is creating '.dat' file of the sequence reads. <source lang='bash'> MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger </source>
Align reads with MOSAIK
<source lang='bash'> MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \ -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15 </source>Sort the alignment file
<source lang='bash'> MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat </source>
convert to BAM
<source lang='bash'> MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam </source>
Finalizing preparation of BAM files with SAMtools
prepare a new header file
<source lang='bash'> samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt </source>
<source lang='bash'> samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt </source>
merge BAM files (if applicable)
<source lang='bash'> samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam </source>
reheader merged BAM file
<source lang='bash'> samtools reheader newheader.txt merged.bam >pigname_rh.bam </source>
Variant calling with Samtools
calling raw variants
<source lang='bash'> samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt </source>
filtering variants
<source lang='bash'> samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt </source>
Additional notes
submitting using grid-engine
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: <source lang='bash'>
- !/bin/bash
- $ -cwd
- $ -S /bin/sh
- $ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
</source>
Tidying up
Please clean up the intermediate files as soon as they are not needed any more. <source lang='bash'> rm *.dat rm reads.bam </source>