Mapping reads with Mosaik

From HPCwiki
Jump to navigation Jump to search

Within the Pig ERC project we have several mapping strategies. This article describes the mapping procedure using Mosaik for paired-end mapping to the reference assembly.

Prerequisites

input reads

The mapping procedure requires paired end reads in fastq format. For the current procedure quality-trimmed reads are used. Forward and reverse reads should be in two separate files, and should be in the same order. Input files are assumed to have Sanger-style quality scaling.

Here the input fastq files are named 'reads_1_QT.fq.gz', for forward reads, and 'reads_2_QT.fq.gz' for reverse reads.

reference assembly files for MOSAIK

The procedure described here requires a number of MOSAIK specific input files: a '.dat' file of the reference assembly, and a so called 'jump' database. These can be found in: <source lang='bash'> LIBPATH='/shared/Sus/Sscrofa_build10_2/' </source>

Mapping procedure

creating MOSAIK file for the reads

The first step is creating '.dat' file of the sequence reads. <source lang='bash'> MosaikBuild -q reads_1_QT.fq.gz -q2 reads_2_QT.fq.gz -out reads_QT.dat -st sanger </source>

Align reads with MOSAIK

<source lang='bash'> MosaikAligner -in reads_QT.dat -out aligned_reads.dat -ia $LIBPATH/Mosaiklibs/Ssc10_2_wUn.dat -hs 15 -mmp 0.07 -m all -mhp 10 \ -p 24 -act 20 -j $LIBPATH/Mosaiklibs/Ssc10_2_wUn.jump15 </source>

Sort the alignment file

<source lang='bash'> MosaikSort -in aligned_reads.dat -out aligned_reads-sorted.dat </source>

convert to BAM

<source lang='bash'> MosaikText -in aligned_reads-sorted.dat -bam aligned_reads-sorted.bam </source>

Finalizing preparation of BAM files with SAMtools

prepare a new header file

<source lang='bash'> samtools view -H aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' >newheader.txt </source>

<source lang='bash'> samtools view -H another_aligned_reads-sorted.bam | sed 's/SM:unknown/SM:pigname/' | sed 's/PL:sanger/PL:ILLUMINA/' | grep @RG >>newheader.txt </source>

merge BAM files (if applicable)

<source lang='bash'> samtools merge merged.bam aligned_reads-sorted.bam another_aligned_reads-sorted.bam </source>

reheader merged BAM file

<source lang='bash'> samtools reheader newheader.txt merged.bam >pigname_rh.bam </source>

Variant calling with Samtools

calling raw variants

<source lang='bash'> samtools view -u pigname_rh.bam | samtools pileup -vcf /shared/Sus/Sscrofa_build10_2/FASTA/Ssc10_2_wUn.fa - >vars-raw_pigname.txt </source>

filtering variants

<source lang='bash'> samtools.pl varFilter -D25 vars-raw_pigname.txt | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' >vars-flt_pigname-final.txt </source>

Additional notes

submitting using grid-engine

Submitting jobs using the grid-engine (qsub) requires a few additional options. The following lines can be added to the script to be submitted: <source lang='bash'>

  1. !/bin/bash
  2. $ -cwd
  3. $ -S /bin/sh
  4. $ -v MOSAIK_TMP=/srv/nfs02/yourhomedir/

</source>

Tidying up

Please clean up the intermediate files as soon as they are not needed any more. <source lang='bash'> rm *.dat rm reads.bam </source>