Mapping reads with Mosaik: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
(Created page with "Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly. == Pre...")
 
 
(7 intermediate revisions by the same user not shown)
Line 9: Line 9:
=== reference assembly files for MOSAIK ===
=== reference assembly files for MOSAIK ===
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:
<pre>
<source lang='bash'>
/shared/Sus/Sscrofa_build10_2/
LIBPATH='/shared/Sus/Sscrofa_build10_2/'
</pre>
</source>


== Mapping procedure ==
== Mapping procedure ==
=== creating MOSAIK file for the reads ===
=== creating MOSAIK file for the reads ===
The first step is creating '.dat' file of the sequence reads.  
The first step is creating '.dat' file of the sequence reads.  
<pre>
<source lang='bash'>
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger
</pre>
</source>


=== Align reads with MOSAIK ===
=== Align reads with MOSAIK ===
<pre>
{| width="100%"|
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15
<source lang='bash'>
</pre>
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \
    -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15
</source>
|}


==== Sort the alignment file ===
=== Sort the alignment file ===
<pre>
<source lang='bash'>
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat
</pre>
</source>
==== convert to BAM ===
=== convert to BAM ===
<pre>
<source lang='bash'>
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam
</pre>
</source>
== Finalizing preparation of BAM files with SAMtools ==
== Finalizing preparation of BAM files with SAMtools ==
=== prepare a new header file ===
=== prepare a new header file ===
<pre>
<source lang='bash'>
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
</pre>
</source>


<pre>
<source lang='bash'>
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt
</pre>
</source>


=== merge BAM files (if applicable) ===
=== merge BAM files (if applicable) ===
<pre>
<source lang='bash'>
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam
</pre>
</source>


=== reheader merged BAM file ===
=== reheader merged BAM file ===
<pre>
<source lang='bash'>
samtools reheader newheader.txt merged.bam >pigname_rh.bam
samtools reheader newheader.txt merged.bam >pigname_rh.bam
</pre>
</source>


== Variant calling with Samtools ==
== Variant calling with Samtools ==
=== calling raw variants ===
=== calling raw variants ===
<pre>
<source lang='bash'>
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt
</pre>
</source>
=== filtering variants ===
=== filtering variants ===


<pre>
<source lang='bash'>
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt
</pre>
</source>


== Additional notes ==
== Additional notes ==
=== submitting using grid-engine ===
=== submitting using grid-engine ===
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:
<pre>
<source lang='bash'>
#!/bin/bash
#!/bin/bash
#$ -cwd
#$ -cwd
#$ -S /bin/sh
#$ -S /bin/sh
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
</pre>
</source>


=== Tidying up ===
=== Tidying up ===
Please clean up the intermediate files as soon as they are not needed any more.
Please clean up the intermediate files as soon as they are not needed any more.
<pre>
<source lang='bash'>
rm *.dat
rm *.dat
rm reads.bam
rm reads.bam
</pre>
</source>

Latest revision as of 20:19, 4 March 2012

Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.

Prerequisites

input reads

The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.

Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.

reference assembly files for MOSAIK

The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: <source lang='bash'> LIBPATH='/shared/Sus/Sscrofa_build10_2/' </source>

Mapping procedure

creating MOSAIK file for the reads

The first step is creating '.dat' file of the sequence reads. <source lang='bash'> MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger </source>

Align reads with MOSAIK

<source lang='bash'> MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \ -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15 </source>

Sort the alignment file

<source lang='bash'> MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat </source>

convert to BAM

<source lang='bash'> MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam </source>

Finalizing preparation of BAM files with SAMtools

prepare a new header file

<source lang='bash'> samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt </source>

<source lang='bash'> samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt </source>

merge BAM files (if applicable)

<source lang='bash'> samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam </source>

reheader merged BAM file

<source lang='bash'> samtools reheader newheader.txt merged.bam >pigname_rh.bam </source>

Variant calling with Samtools

calling raw variants

<source lang='bash'> samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt </source>

filtering variants

<source lang='bash'> samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt </source>

Additional notes

submitting using grid-engine

Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: <source lang='bash'>

  1. !/bin/bash
  2. $ -cwd
  3. $ -S /bin/sh
  4. $ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/

</source>

Tidying up

Please clean up the intermediate files as soon as they are not needed any more. <source lang='bash'> rm *.dat rm reads.bam </source>