Maker protocols Pmajor: Difference between revisions
Jump to navigation
Jump to search
content of
content of
contents of
No edit summary |
No edit summary |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
=== Rationale === | === Rationale === | ||
For this round no P. major-based ESTs were available. Zebrafinch (T. guttata) is the closest relative for which a reasonably complete gene-model set is available. As a first pass, it was decided to let gene predictions be driven by ab-inititio predictions rather than by Zebrafinch EST. | For this round no P. major-based ESTs were available. Zebrafinch (T. guttata) is the closest relative for which a reasonably complete gene-model set is available. As a first pass, it was decided to let gene predictions be driven by ab-inititio predictions rather than by Zebrafinch EST. | ||
=== Invoking maker script === | |||
Do not forget to load the <code>maker</code> module: | |||
<source lang='bash'> | |||
module load maker/2.28 | |||
</source> | |||
script submitted by SLURM (<code>sbatch</code> command): | |||
<source lang='bash'> | |||
#!/bin/bash | |||
#SBATCH --time=48000 | |||
#SBATCH --nodes=1 | |||
#SBATCH --ntasks=16 | |||
#SBATCH --output=output_%j.txt | |||
#SBATCH --error=error_output_%j.txt | |||
#SBATCH --job-name=test_maker | |||
#SBATCH --mail-type=ALL | |||
#SBATCH --mail-user=hendrik-jan.megens@wur.nl | |||
maker | |||
</source> | |||
=== Maker settings === | === Maker settings === | ||
content of <code>maker_opts.ctl</code> | ==== content of <code>maker_opts.ctl</code> ==== | ||
#-----Genome (these are always required) | #-----Genome (these are always required) | ||
genome=Pam.fa #genome sequence (fasta file or fasta embeded in GFF3 file) | genome=Pam.fa #genome sequence (fasta file or fasta embeded in GFF3 file) | ||
Line 40: | Line 60: | ||
#-----Gene Prediction | #-----Gene Prediction | ||
snaphmm= | snaphmm= /shared/apps/WUR/ABGC/snap/snap-2013-11-29/HMM/mam54.hmm #SNAP HMM file | ||
gmhmm= #GeneMark HMM file | gmhmm= #GeneMark HMM file | ||
augustus_species= chicken #Augustus gene prediction species model | augustus_species= chicken #Augustus gene prediction species model | ||
Line 79: | Line 99: | ||
clean_up=1 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no | clean_up=1 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no | ||
TMP= #specify a directory other than the system default temporary directory for temporary files | TMP= #specify a directory other than the system default temporary directory for temporary files | ||
==== content of <code>maker_exe.ctl</code> ==== | |||
#-----Location of Executables Used by MAKER/EVALUATOR | |||
makeblastdb=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/makeblastdb #location of NCBI+ makeblastdb executable | |||
blastn=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/blastn #location of NCBI+ blastn executable | |||
blastx=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/blastx #location of NCBI+ blastx executable | |||
tblastx=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/tblastx #location of NCBI+ tblastx executable | |||
formatdb= #location of NCBI formatdb executable | |||
blastall= #location of NCBI blastall executable | |||
xdformat= #location of WUBLAST xdformat executable | |||
blasta= #location of WUBLAST blasta executable | |||
RepeatMasker=/shared/apps/WUR/ABGC/RepeatMasker/RepeatMasker-4-0-3/RepeatMasker #location of RepeatMasker executable | |||
exonerate=/shared/apps/WUR/ABGC/exonerate/exonerate-2.2.0-x86_64/bin/exonerate #location of exonerate executable | |||
#-----Ab-initio Gene Prediction Algorithms | |||
snap=/shared/apps/WUR/ABGC/snap/snap-2013-11-29/snap #location of snap executable | |||
gmhmme3= #location of eukaryotic genemark executable | |||
gmhmmp= #location of prokaryotic genemark executable | |||
augustus=/shared/apps/WUR/ABGC/augustus/augustus.2.7/src/augustus #location of augustus executable | |||
fgenesh= #location of fgenesh executable | |||
#-----Other Algorithms | |||
probuild= #location of probuild executable (required for genemark) | |||
==== contents of <code>maker_bopts.ctl</code>==== | |||
#-----BLAST and Exonerate Statistics Thresholds | |||
blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' | |||
pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments | |||
pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments | |||
eval_blastn=1e-10 #Blastn eval cutoff | |||
bit_blastn=40 #Blastn bit cutoff | |||
depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) | |||
pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments | |||
pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments | |||
eval_blastx=1e-06 #Blastx eval cutoff | |||
bit_blastx=30 #Blastx bit cutoff | |||
depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) | |||
pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments | |||
pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments | |||
eval_tblastx=1e-10 #tBlastx eval cutoff | |||
bit_tblastx=40 #tBlastx bit cutoff | |||
depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) | |||
pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking | |||
pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking | |||
eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking | |||
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking | |||
ep_score_limit=20 #Exonerate protein percent of maximal score threshold | |||
en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold | |||
== See also == | == See also == | ||
[[Maker_2.2.8 | Maker pipeline as installed on | [[Maker_2.2.8 | Maker pipeline as installed on Anunna]] | ||
== External links == | == External links == | ||
* [http://www.yandell-lab.org/software/maker.html Maker homepage] | * [http://www.yandell-lab.org/software/maker.html Maker homepage] | ||
* [http://gmod.org/wiki/MAKER_Tutorial_2013 Maker tutorial] | * [http://gmod.org/wiki/MAKER_Tutorial_2013 Maker tutorial] |
Latest revision as of 10:00, 16 June 2023
This page describes the various rounds of Maker-based annotations for the Parus major (Great Tit) genome.
Round 1
Rationale
For this round no P. major-based ESTs were available. Zebrafinch (T. guttata) is the closest relative for which a reasonably complete gene-model set is available. As a first pass, it was decided to let gene predictions be driven by ab-inititio predictions rather than by Zebrafinch EST.
Invoking maker script
Do not forget to load the maker
module:
<source lang='bash'>
module load maker/2.28
</source>
script submitted by SLURM (sbatch
command):
<source lang='bash'>
- !/bin/bash
- SBATCH --time=48000
- SBATCH --nodes=1
- SBATCH --ntasks=16
- SBATCH --output=output_%j.txt
- SBATCH --error=error_output_%j.txt
- SBATCH --job-name=test_maker
- SBATCH --mail-type=ALL
- SBATCH --mail-user=hendrik-jan.megens@wur.nl
maker </source>
Maker settings
content of maker_opts.ctl
#-----Genome (these are always required) genome=Pam.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= Taeniopygia_guttata.taeGut3.2.4.74.cdna.all.fa #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= Taeniopygia_guttata.taeGut3.2.4.74.pep.all.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=Metazoa #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= /shared/apps/WUR/ABGC/snap/snap-2013-11-29/HMM/mam54.hmm #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= chicken #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=1 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=1 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files
content of maker_exe.ctl
#-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/blastn #location of NCBI+ blastn executable blastx=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/blastx #location of NCBI+ blastx executable tblastx=/shared/apps/WUR/ABGC/blast/ncbi-blast-2.2.28+/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/shared/apps/WUR/ABGC/RepeatMasker/RepeatMasker-4-0-3/RepeatMasker #location of RepeatMasker executable exonerate=/shared/apps/WUR/ABGC/exonerate/exonerate-2.2.0-x86_64/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/shared/apps/WUR/ABGC/snap/snap-2013-11-29/snap #location of snap executable gmhmme3= #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/shared/apps/WUR/ABGC/augustus/augustus.2.7/src/augustus #location of augustus executable fgenesh= #location of fgenesh executable #-----Other Algorithms probuild= #location of probuild executable (required for genemark)
contents of maker_bopts.ctl
#-----BLAST and Exonerate Statistics Thresholds blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments eval_blastx=1e-06 #Blastx eval cutoff bit_blastx=30 #Blastx bit cutoff depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments eval_tblastx=1e-10 #tBlastx eval cutoff bit_tblastx=40 #tBlastx bit cutoff depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking ep_score_limit=20 #Exonerate protein percent of maximal score threshold en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold
See also
Maker pipeline as installed on Anunna