Assemble mitochondrial genomes from short read data: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 22: Line 22:
#SBATCH --ntasks=8
#SBATCH --ntasks=8
#SBATCH --nodes=1
#SBATCH --nodes=1
#SBATCH --constraint=normalmem
#SBATCH --constraint=4gpercpu
#SBATCH --output=output_%j.txt
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=assemble_mito
#SBATCH --job-name=assemble_mito
#SBATCH --partition=ABGC_Research
#SBATCH --mail-type=ALL
#SBATCH --mail-type=ALL
#SBATCH --mail-user=hendrik-jan.megens@wur.nl
#SBATCH --mail-user=hendrik-jan.megens@wur.nl
Line 34: Line 33:
bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | awk '$5>0' | head -10000 >>$1_mito_align.sam
bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | awk '$5>0' | head -10000 >>$1_mito_align.sam


java7 -jar /cm/shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True
java7 -jar /shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True


SOAPdenovo-63mer all -K 63 -p 4 -s soapdenovo.config -o $1_mito_assembly.fa
SOAPdenovo-63mer all -K 63 -p 4 -s soapdenovo.config -o $1_mito_assembly.fa
Line 44: Line 43:
</source>
</source>


Invoke like this:
<source lang='bash'>
<source lang='bash'>
sh do_mtalign_bowtie_pig.sh MA01F18 33\
sbatch do_mtalign_bowtie_pig.sh MA01F18 33\
  /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R1.PF.fastq.gz\
  /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R1.PF.fastq.gz\
  /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R2.PF.fastq.gz
  /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R2.PF.fastq.gz

Latest revision as of 09:58, 16 June 2023

A simple procedure for assembling mitochondrial genomes based on whole-genome re-sequencing data. The first step is to extract reads from the sequence library based on a closely related entirely assembled genome (e.g., for pig, the MT genome as present in the genome build, but could also be of a related species). The genome is then assembled using SOAPdenovo.

  • a reference genome of a closely related population or species.
  • a bowtie2 index (make with bowtie2_build)
  • a blastable db of the reference mitochondrial genome
  • a SOAPdenovo configuration file:

soapdenovo.config

 [LIB]
 avg_ins=450
 reverse_seq=0
 asm_flags=1
 rank=3
 q1=fq1.fq
 q2=fq2.fq

Note that the avg_ins flag may vary between libraries; may have an effect on assembly efficiency.

<source lang='bash'>

  1. !/bin/bash
  2. SBATCH --time=1000
  3. SBATCH --mem=16000
  4. SBATCH --ntasks=8
  5. SBATCH --nodes=1
  6. SBATCH --constraint=4gpercpu
  7. SBATCH --output=output_%j.txt
  8. SBATCH --error=error_output_%j.txt
  9. SBATCH --job-name=assemble_mito
  10. SBATCH --mail-type=ALL
  11. SBATCH --mail-user=hendrik-jan.megens@wur.nl

module load bowtie/2-2.2.1 SOAPdenovo2/r240 BLAST+/2.2.28 MUMmer/3.23

bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | head -2 >$1_mito_align.sam bowtie2 --phred$2 --local -p 8 -x mt_pig.fa -1 $3 -2 $4 | awk '$5>0' | head -10000 >>$1_mito_align.sam

java7 -jar /shared/apps/SHARED/picard-tools/picard-tools-1.109/SamToFastq.jar I=$1_mito_align.sam F=fq1.fq F2=fq2.fq INCLUDE_NON_PF_READS=True

SOAPdenovo-63mer all -K 63 -p 4 -s soapdenovo.config -o $1_mito_assembly.fa

blastn -query $1_mito_assembly.fa.scafSeq -db mt_pig.fa -outfmt 6

mummer -mum -b -c mt_pig.fa $1_mito_assembly.fa.scafSeq > mummer.mums mummerplot -postscript -p mummer mummer.mums </source>

Invoke like this: <source lang='bash'> sbatch do_mtalign_bowtie_pig.sh MA01F18 33\

/lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R1.PF.fastq.gz\
/lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ABGSA0071/ABGSA0071_MA01F18_R2.PF.fastq.gz