Provean Sus scrofa: Difference between revisions
No edit summary |
No edit summary |
||
Line 8: | Line 8: | ||
/lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | ||
The following script will add 300 runs every hour. Note that it will kill remaining Provean processes, and, importantly, will clean the <code>/tmp</code> dirs of all nodes of remaining Provean related temporary folders. This to prevent the error message that Provean has problems creating temporary folders. | |||
<source lang='bash'> | <source lang='bash'> | ||
!/bin/bash | !/bin/bash |
Revision as of 12:02, 27 December 2013
From Variant Effect Predictor output, select only protein altering variants and sort by transcript: <source lang='bash'> cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt </source>
Protein models for Sus scrofa:
/lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa
The following script will add 300 runs every hour. Note that it will kill remaining Provean processes, and, importantly, will clean the /tmp
dirs of all nodes of remaining Provean related temporary folders. This to prevent the error message that Provean has problems creating temporary folders.
<source lang='bash'>
!/bin/bash
- SBATCH --time=4800
- SBATCH --ntasks=1
- SBATCH --mem-per-cpu=16000
- SBATCH --nice=1000
- SBATCH --output=output_%j.txt
- SBATCH --error=error_output_%j.txt
- SBATCH --job-name=Provean
- SBATCH --partition=ABGC_Research
- cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt
TELLER=100 echo $TELLER; let TELLER+=1; echo $TELLER; while [ $TELLER -gt 99 ]; do
PROVS=`squeue | grep Provean | sed 's/^ \+//' | sed 's/ \+/\t/' | cut -f1`; for PROV in $PROVS; do scancel $PROV; done; sleep 10; for i in `seq 1 2`; do ssh fat00$i 'rm -rf /tmp/provean*'; done; for i in `seq 10 60`; do ssh node0$i 'rm -rf /tmp/provean*'; done; for i in `seq 1 9`; do ssh node00$i 'rm -rf /tmp/provean*'; done; TRANS=`cat prot_alt.txt | head -15000 | cut -f6 | sort | uniq`; TELLER2=0; for TRAN in $TRANS; do if [ $TELLER2 -lt 300 ]; then echo "transcript: $TRAN"; echo "teller boven: $TELLER2"; PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRAN | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1`; echo "protein: $PROT"; if [ -f $PROT.sss ]; then echo "$PROT $TRAN already done"; else echo "will do sbatch testProvean_sub.sh $TRAN'"; sbatch runProvean_sub.sh $TRAN; let TELLER2+=1; echo "teller onder: $TELLER2"; fi; fi; done; sleep 3600;
done
</source>
The 'runProvean_sub.sh' script referred to in the above script consists of the following code: <source lang='bash'>
- !/bin/bash
- SBATCH --time=4800
- SBATCH --ntasks=1
- SBATCH --mem-per-cpu=16000
- SBATCH --nice=1000
- SBATCH --output=output_%j.txt
- SBATCH --error=error_output_%j.txt
- SBATCH --job-name=Provean
- SBATCH --partition=ABGC_Research
TRANS=$1 PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRANS | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1` cat prot_alt.txt | grep $TRANS | awk '{print $11,$12}' | sed 's/ \+/\t/' | sed 's/\//\t/' | awk '{OFS=","; print $1,$2,$3}' | sed 's/\t//g' | sed 's/ \+//g' >$TRANS.var; cat prot_alt.txt | grep $TRANS | awk -v prot=$PROT '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$8,prot, $11,$12,$13,$14,$15}' >$PROT.var.info; faOneRecord /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa $PROT >$PROT.fa; mv $TRANS.var $PROT.var; provean.sh -q $PROT.fa -v $PROT.var --save_supporting_set $PROT.sss >$PROT.result.txt 2>$PROT.error; </source>
Individual transcripts can also be submitted using the following script: <source lang='bash'>
- !/bin/bash
- SBATCH --time=4800
- SBATCH --ntasks=1
- SBATCH --mem-per-cpu=16000
- SBATCH --nice=1000
- SBATCH --output=output_%j.txt
- SBATCH --error=error_output_%j.txt
- SBATCH --job-name=Provean
- SBATCH --partition=ABGC_Research
- cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt
TRANS=$1 PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRANS | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1` if [ -f $PROT.sss ];
then echo "$PROT $TRANS already done."; else cat prot_alt.txt | grep $TRANS | awk '{print $11,$12}' | sed 's/ \+/\t/' | sed 's/\//\t/' | awk '{OFS=","; print $1,$2,$3}' | sed 's/\t//g' | sed 's/ \+//g' >$TRANS.var; cat prot_alt.txt | grep $TRANS | awk -v prot=$PROT '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$8,prot, $11,$12,$13,$14,$15}' >$PROT.var.info; faOneRecord /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa $PROT >$PROT.fa; mv $TRANS.var $PROT.var; provean.sh -q $PROT.fa -v $PROT.var --save_supporting_set $PROT.sss >$PROT.result.txt 2>$PROT.error;
fi;
</source>