Mapping reads with Mosaik: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
=== reference assembly files for MOSAIK ===
=== reference assembly files for MOSAIK ===
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:
The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in:
<pre>
<source lang='bash'>
/shared/Sus/Sscrofa_build10_2/
LIBPATH='/shared/Sus/Sscrofa_build10_2/'
</pre>
</source>


== Mapping procedure ==
== Mapping procedure ==
=== creating MOSAIK file for the reads ===
=== creating MOSAIK file for the reads ===
The first step is creating '.dat' file of the sequence reads.  
The first step is creating '.dat' file of the sequence reads.  
<pre>
<source lang='bash'>
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger
MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger
</pre>
</source>


=== Align reads with MOSAIK ===
=== Align reads with MOSAIK ===
{| width="100%"
{| width="100%"|
|- valign="top"
<source lang='bash'>
| width="25%" |
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \
<pre>
    -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15
MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 -p 24 -act 20 -j /shared/Sus/Sscrofa_build10_2/Mosaiklibs/Ssc10_2_wUn.jump15
</source>
</pre>
|}
|}
=== Sort the alignment file ===
=== Sort the alignment file ===
<pre>
<source lang='bash'>
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat
MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat
</pre>
</source>
==== convert to BAM ===
=== convert to BAM ===
<pre>
<source lang='bash'>
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam
MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam
</pre>
</source>
== Finalizing preparation of BAM files with SAMtools ==
== Finalizing preparation of BAM files with SAMtools ==
=== prepare a new header file ===
=== prepare a new header file ===
<pre>
<source lang='bash'>
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt
</pre>
</source>


<pre>
<source lang='bash'>
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt
samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt
</pre>
</source>


=== merge BAM files (if applicable) ===
=== merge BAM files (if applicable) ===
<pre>
<source lang='bash'>
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam
samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam
</pre>
</source>


=== reheader merged BAM file ===
=== reheader merged BAM file ===
<pre>
<source lang='bash'>
samtools reheader newheader.txt merged.bam >pigname_rh.bam
samtools reheader newheader.txt merged.bam >pigname_rh.bam
</pre>
</source>


== Variant calling with Samtools ==
== Variant calling with Samtools ==
=== calling raw variants ===
=== calling raw variants ===
<pre>
<source lang='bash'>
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt
samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt
</pre>
</source>
=== filtering variants ===
=== filtering variants ===


<pre>
<source lang='bash'>
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt
samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt
</pre>
</source>


== Additional notes ==
== Additional notes ==
=== submitting using grid-engine ===
=== submitting using grid-engine ===
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:
Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted:
<pre>
<source lang='bash'>
#!/bin/bash
#!/bin/bash
#$ -cwd
#$ -cwd
#$ -S /bin/sh
#$ -S /bin/sh
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
#$ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/
</pre>
</source>


=== Tidying up ===
=== Tidying up ===
Please clean up the intermediate files as soon as they are not needed any more.
Please clean up the intermediate files as soon as they are not needed any more.
<pre>
<source lang='bash'>
rm *.dat
rm *.dat
rm reads.bam
rm reads.bam
</pre>
</source>

Latest revision as of 20:19, 4 March 2012

Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.

Prerequisites

input reads

The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.

Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.

reference assembly files for MOSAIK

The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: <source lang='bash'> LIBPATH='/shared/Sus/Sscrofa_build10_2/' </source>

Mapping procedure

creating MOSAIK file for the reads

The first step is creating '.dat' file of the sequence reads. <source lang='bash'> MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger </source>

Align reads with MOSAIK

<source lang='bash'> MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \ -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15 </source>

Sort the alignment file

<source lang='bash'> MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat </source>

convert to BAM

<source lang='bash'> MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam </source>

Finalizing preparation of BAM files with SAMtools

prepare a new header file

<source lang='bash'> samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt </source>

<source lang='bash'> samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt </source>

merge BAM files (if applicable)

<source lang='bash'> samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam </source>

reheader merged BAM file

<source lang='bash'> samtools reheader newheader.txt merged.bam >pigname_rh.bam </source>

Variant calling with Samtools

calling raw variants

<source lang='bash'> samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt </source>

filtering variants

<source lang='bash'> samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt </source>

Additional notes

submitting using grid-engine

Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: <source lang='bash'>

  1. !/bin/bash
  2. $ -cwd
  3. $ -S /bin/sh
  4. $ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/

</source>

Tidying up

Please clean up the intermediate files as soon as they are not needed any more. <source lang='bash'> rm *.dat rm reads.bam </source>