Assemble mitochondrial genomes from short read data: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
A simple procedure for assembling mitochondrial genomes based on whole-genome re-sequencing data. The first step is to extract reads from the sequence library based on a closely related entirely assembled genome (e.g., for pig, the MT genome as present in the genome build, but could also be of a related species). The genome is then assembled using | A simple procedure for assembling mitochondrial genomes based on whole-genome re-sequencing data. The first step is to extract reads from the sequence library based on a closely related entirely assembled genome (e.g., for pig, the MT genome as present in the genome build, but could also be of a related species). The genome is then assembled using SOAPdenovo. | ||
* a reference genome of a closely related population or species. | |||
* a bowtie2 index (make with bowtie2_build) | |||
* a blastable db of the reference mitochondrial genome | |||
* a SOAPdenovo configuration file: | |||
soapdenovo.config | |||
[LIB] | |||
avg_ins=450 | |||
reverse_seq=0 | |||
asm_flags=1 | |||
rank=3 | |||
q1=fq1.fq | |||
q2=fq2.fq | |||
Note that the avg_ins flag may vary between libraries; may have an effect on assembly efficiency. | |||
<source lang='bash'> | <source lang='bash'> | ||
Line 21: | Line 36: | ||
java7 -jar /cm/shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True | java7 -jar /cm/shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True | ||
SOAPdenovo-63mer all -K 63 -p 4 -s | SOAPdenovo-63mer all -K 63 -p 4 -s soapdenovo.config -o $1_mito_assembly.fa | ||
blastn -query $1_mito_assembly.fa.scafSeq -db mt_pig.fa -outfmt 6 | blastn -query $1_mito_assembly.fa.scafSeq -db mt_pig.fa -outfmt 6 |
Revision as of 13:18, 26 March 2014
A simple procedure for assembling mitochondrial genomes based on whole-genome re-sequencing data. The first step is to extract reads from the sequence library based on a closely related entirely assembled genome (e.g., for pig, the MT genome as present in the genome build, but could also be of a related species). The genome is then assembled using SOAPdenovo.
- a reference genome of a closely related population or species.
- a bowtie2 index (make with bowtie2_build)
- a blastable db of the reference mitochondrial genome
- a SOAPdenovo configuration file:
soapdenovo.config
[LIB] avg_ins=450 reverse_seq=0 asm_flags=1 rank=3 q1=fq1.fq q2=fq2.fq
Note that the avg_ins flag may vary between libraries; may have an effect on assembly efficiency.
<source lang='bash'>
- !/bin/bash
- SBATCH --time=1000
- SBATCH --mem=16000
- SBATCH --ntasks=8
- SBATCH --nodes=1
- SBATCH --constraint=normalmem
- SBATCH --output=output_%j.txt
- SBATCH --error=error_output_%j.txt
- SBATCH --job-name=assemble_mito
- SBATCH --partition=ABGC_Research
- SBATCH --mail-type=ALL
- SBATCH --mail-user=hendrik-jan.megens@wur.nl
module load bowtie/2-2.2.1 SOAPdenovo2/r240 BLAST+/2.2.28 MUMmer/3.23
bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | head -2 >$1_mito_align.sam bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | awk '$5>0' | head -10000 >>$1_mito_align.sam
java7 -jar /cm/shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True
SOAPdenovo-63mer all -K 63 -p 4 -s soapdenovo.config -o $1_mito_assembly.fa
blastn -query $1_mito_assembly.fa.scafSeq -db mt_pig.fa -outfmt 6
mummer -mum -b -c mt_pig.fa $1_mito_assembly.fa.scafSeq > mummer.mums mummerplot -postscript -p mummer mummer.mums </source>
<source lang='bash'> sh do_mtalign_bowtie_pig.sh MA01F18 33 /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R1.PF.fastq.gz /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R2.PF.fastq.gz