Short read mapping pipeline pig: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
Line 7: Line 7:
=== Data sources ===
=== Data sources ===
* path to [[ABGSA | sequence archives]] /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ (for pig only)
* path to [[ABGSA | sequence archives]] /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ (for pig only)
* access to ABGSA meta-database, currently hosted at scomp1095.wurnet.nl, database='ABGSAschema')
* path to reference genome, including index for BWA /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa
* path to reference genome, including index for BWA /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa


=== Hardcoded paths ===
=== Hardcoded paths ===

Revision as of 21:21, 27 December 2013

The latest short-read mapping pipeline for the pig project is based on a Python3 script that creates a shell script that can subsequently be executed from the command line or submitted to the cluster using SLURM. The latest version of the Python3 script can be found at GitHub.


Prerequisites

Data sources

  • path to sequence archives /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ (for pig only)
  • access to ABGSA meta-database, currently hosted at scomp1095.wurnet.nl, database='ABGSAschema')
  • path to reference genome, including index for BWA /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa

Hardcoded paths

  • bwa 5.9 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.5.9/
  • bwa 7.5 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.7.5a/
  • samtools 1.19 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.19/
  • samtools 1.12 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.12a/
  • picard /cm/shared/apps/WUR/ABGC/picard/picard-tools-1.93/
  • GATK /cm/shared/apps/WUR/ABGC/GATK/GATK2.6/
  • Mosaik /path/to/mosaik/ref.dat
  • Mosaik Jump Library /path/to/mosaikjump/ref.j15
  • dbSNPfile=reffolder+'/dbSNP/dbSNP.vcf'
  • gatk_gvcf /cm/shared/apps/WUR/ABGC/GATK/GATK_gVCFmod/
  • gvcftools /cm/shared/apps/WUR/ABGC/gvcftools/gvcftools-0.16/bin/
  • helper scripts /cm/shared/apps/WUR/ABGC/abgsascripts/
  • path to sqlite db path (for cow) /path/to/sqlite/Bulls1000/
  • Variant Effect Predictor (VEP) /cm/shared/apps/WUR/ABGC/variant_effect_predictor/VEP231213/

Present in the working directory

  • cow_schema.db (SQLite db for 1000 Bulls project - for cow only at the moment).

Basic execution

<source lang='bash'> (virtenv)[megen002@nfs01 rundir]$ python ABGC_mapping_v2.py -i LW22F04 -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4 </source> The code should produce the following shell script, ready for execution with SLURM.

Automated runfile creation

<source lang='bash'> mysql -u ABGSAuser -h scomp1095.wurnet.nl -p ABGSAschema -e 'select ABG_individual_id from ABGSAschema_main where archive_name like "ABGSA0%" group by ABG_individual_id' >list.txt FILES=`cat list.txt` for ID in $FILES; do python ABGC_mapping_v2.py -i $ID -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4; done </source>

Output files

See also

Animal Breeding & Genomics Sequence Archives

External links