Short read mapping pipeline pig: Difference between revisions

Latest revision as of 20:43, 27 December 2013

The latest short-read mapping pipeline for the pig project is based on a Python3 script that creates a shell script that can subsequently be executed from the command line or submitted to the cluster using SLURM. The latest version of the Python3 script can be found at GitHub. The script requires to be executed using python3.

Prerequisites

Data sources

path to sequence archives /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ (for pig only)
access to ABGSA meta-database, currently hosted at scomp1095.wurnet.nl, database='ABGSAschema')
path to reference genome, including index for BWA /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa

Hardcoded paths

All paths to data and software are currently hardcoded. This is done for transparency (hardcoded==explicit). Hardcoded paths do require work however, when migrating to new environment. Currently contemplated to switch to using environment variables.

bwa 5.9 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.5.9/ (required if using bwa 5.9, e.g. for 1000 Bulls project)
bwa 7.5 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.7.5a/ (required if using bwa 7.5, e.g. when using BWA men)
samtools 1.19 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.19/ (required)
samtools 1.12 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.12a/ (required if variant calling)
picard /cm/shared/apps/WUR/ABGC/picard/picard-tools-1.93/ (currently not enabled, not required)
GATK /cm/shared/apps/WUR/ABGC/GATK/GATK2.6/ (required)
Mosaik /path/to/mosaik/ref.dat (required when using Mosaik as mapping tool)
Mosaik Jump Library /path/to/mosaikjump/ref.j15 (required when using Mosaik as mapping tool)
dbSNPfile=reffolder+'/dbSNP/dbSNP.vcf' (required for re-callibration)
gatk_gvcf /cm/shared/apps/WUR/ABGC/GATK/GATK_gVCFmod/ (required when variant calling)
gvcftools /cm/shared/apps/WUR/ABGC/gvcftools/gvcftools-0.16/bin/ (required when variant calling)
helper scripts /cm/shared/apps/WUR/ABGC/abgsascripts/ (required)
Variant Effect Predictor (VEP) /cm/shared/apps/WUR/ABGC/variant_effect_predictor/VEP231213/ (required when variant calling)

Present in PATH

sickle
pigz
bgzip
tabix
perl
python2 (link to python 2.6 or 2.7 with name python2)
java7 (link to Java v.7 with name java7)

Present in the working directory

cow_schema.db (SQLite db for 1000 Bulls project - for cow only at the moment).

Basic execution

<source lang='bash'> (virtenv)[megen002@nfs01 rundir]$ python ABGC_mapping_v2.py -i LW22F04 -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4 </source> The code should produce the following shell script, ready for execution with SLURM.

Automated runfile creation

<source lang='bash'> mysql -u ABGSAuser -h scomp1095.wurnet.nl -p ABGSAschema -e 'select ABG_individual_id from ABGSAschema_main where archive_name like "ABGSA0%" group by ABG_individual_id' >list.txt FILES=`cat list.txt` for ID in $FILES; do python ABGC_mapping_v2.py -i $ID -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4; done </source>

Output files

External links

NGStools page on GitHub

@@ Line 1: / Line 1: @@
-The latest short-read mapping pipeline for the pig project is based on a Python3 script that creates a shell script that can subsequently be executed
+The latest short-read mapping pipeline for the pig project is based on a Python3 script that creates a shell script that can subsequently be executed from the command line or submitted to the cluster using SLURM.
+The latest version of the Python3 script can be found at [https://github.com/hjmegens/NGStools/blob/master/ABGC_mapping_v2.py GitHub]. The script requires to be executed using python3.
 == Prerequisites ==
+=== Data sources ===
+* path to [[ABGSA | sequence archives]] /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ (for pig only)
+* access to ABGSA meta-database, currently hosted at scomp1095.wurnet.nl, database='ABGSAschema')
+* path to reference genome, including index for BWA /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa
+=== Hardcoded paths ===
+All paths to data and software are currently hardcoded. This is done for transparency (hardcoded==explicit). Hardcoded paths do require work however, when migrating to new environment. Currently contemplated to switch to using environment variables.
+* bwa 5.9 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.5.9/ (required if using bwa 5.9, e.g. for 1000 Bulls project)
+* bwa 7.5 /cm/shared/apps/WUR/ABGC/bwa/bwa-0.7.5a/ (required if using bwa 7.5, e.g. when using BWA men)
+* samtools 1.19 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.19/ (required)
+* samtools 1.12 /cm/shared/apps/WUR/ABGC/samtools/samtools-0.1.12a/ (required if variant calling)
+* picard /cm/shared/apps/WUR/ABGC/picard/picard-tools-1.93/ (currently not enabled, not required)
+* GATK /cm/shared/apps/WUR/ABGC/GATK/GATK2.6/ (required)
+* Mosaik /path/to/mosaik/ref.dat (required when using Mosaik as mapping tool)
+* Mosaik Jump Library /path/to/mosaikjump/ref.j15 (required when using Mosaik as mapping tool)
+* dbSNPfile=reffolder+'/dbSNP/dbSNP.vcf' (required for re-callibration)
+* gatk_gvcf /cm/shared/apps/WUR/ABGC/GATK/GATK_gVCFmod/ (required when variant calling)
+* gvcftools /cm/shared/apps/WUR/ABGC/gvcftools/gvcftools-0.16/bin/ (required when variant calling)
+* helper scripts   /cm/shared/apps/WUR/ABGC/abgsascripts/ (required)
+* Variant Effect Predictor (VEP) /cm/shared/apps/WUR/ABGC/variant_effect_predictor/VEP231213/ (required when variant calling)
+=== Present in PATH ===
+* sickle
+* pigz
+* bgzip
+* tabix
+* perl
+* python2 (link to python 2.6 or 2.7 with name python2)
+* java7 (link to Java v.7 with name java7)
+=== Present in the working directory ===
+* [[1000Bulls_mapping_pipeline_at_ABGC | cow_schema.db]] (SQLite db for 1000 Bulls project - for cow only at the moment).
 == Basic execution ==
 <source lang='bash'>
-(testenv)[megen002@nfs01 rundir]$ python ABGC_mapping_v2.py -i LW22F04 -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4
+(virtenv)[megen002@nfs01 rundir]$ python ABGC_mapping_v2.py -i LW22F04 -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4
 </source>
+The code should produce the [https://github.com/hjmegens/NGStools/blob/master/runLW22F08.sh following shell script], ready for execution with SLURM.
 == Automated runfile creation ==
 <source lang='bash'>
-mysql -u ABGSAuser -h scomp1095.wurnet.nl -p ABGSAschema -e 'select ABG_individual_id from ABGSAschema_main where archive_name like "ABGSA03%" group by ABG_individual_id' >list300.txt
+mysql -u ABGSAuser -h scomp1095.wurnet.nl -p ABGSAschema -e 'select ABG_individual_id from ABGSAschema_main where archive_name like "ABGSA0%" group by ABG_individual_id' >list.txt
-cat list100.txt list200.txt list300.txt | sort | uniq >list100-200-300.txt
+FILES=`cat list.txt`
-FILES=`cat list100-200-300.txt`
 for ID in $FILES; do python ABGC_mapping_v2.py -i $ID -a /lustre/nobackup/WUR/ABGC/shared/Pig/ABGSA/ -r /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/Ensembl72/Sus_scrofa.Sscrofa10.2.72.dna.toplevel.fa -t 4; done
 </source>
@@ Line 22: / Line 55: @@
 == See also ==
+* [[ABGSA | Animal Breeding & Genomics Sequence Archives]]
+* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls @ABGC implementation of the pipeline]]
 == External links ==
+[https://github.com/hjmegens/NGStools/blob/master/ABGC_mapping_v2.py NGStools page on GitHub]

Short read mapping pipeline pig: Difference between revisions

Latest revision as of 20:43, 27 December 2013

Contents

Prerequisites

Data sources

Hardcoded paths

Present in PATH

Present in the working directory

Basic execution

Automated runfile creation

Output files

See also

External links

Navigation menu

Short read mapping pipeline pig: Difference between revisions

Latest revision as of 20:43, 27 December 2013

Prerequisites

Data sources

Hardcoded paths

Present in PATH

Present in the working directory

Basic execution

Automated runfile creation

Output files

See also

External links

Navigation menu

Search