Provean Sus scrofa

From HPCwiki
Revision as of 14:23, 27 December 2013 by Megen002 (talk | contribs) (See also)
Jump to navigation Jump to search

From Variant Effect Predictor output, select only protein altering variants and sort by transcript:

cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt

Protein models for Sus scrofa:

 /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa

The following script will add 300 runs every hour. Note that it will kill remaining Provean processes, and, importantly, will clean the /tmp dirs of all nodes of remaining Provean related temporary folders. This to prevent the error message that Provean has problems creating temporary folders.

!/bin/bash
#SBATCH --time=4800
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16000
#SBATCH --nice=1000
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=Provean
#SBATCH --partition=ABGC_Research
#cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt
TELLER=100
echo $TELLER;
let TELLER+=1;
echo $TELLER;
while [ $TELLER -gt 99 ]; do

  PROVS=`squeue | grep Provean | sed 's/^ \+//' | sed 's/ \+/\t/' | cut -f1`;
  for PROV in $PROVS; do scancel $PROV; done;
  sleep 10;
  for i in `seq 1 2`; do ssh fat00$i 'rm -rf /tmp/provean*'; done;
  for i in `seq 10 60`; do ssh node0$i 'rm -rf /tmp/provean*'; done;
  for i in `seq 1 9`; do ssh node00$i 'rm -rf /tmp/provean*'; done;
  TRANS=`cat prot_alt.txt | head -15000 | cut -f6 | sort | uniq`;
  TELLER2=0;
  for TRAN in $TRANS; do
     if [ $TELLER2 -lt 300 ]; then
       echo "transcript: $TRAN";
       echo "teller boven: $TELLER2";
       PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRAN | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1`;
       echo "protein: $PROT";
       if [ -f $PROT.sss ];
        then
          echo "$PROT $TRAN already done";
        else
          echo "will do sbatch testProvean_sub.sh $TRAN'";
          sbatch runProvean_sub.sh $TRAN;
          let TELLER2+=1;
          echo "teller onder: $TELLER2";
       fi;
    fi;
  done;
  sleep 3600;
done

The 'runProvean_sub.sh' script referred to in the above script consists of the following code:

#!/bin/bash
#SBATCH --time=4800
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16000
#SBATCH --nice=1000
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=Provean
#SBATCH --partition=ABGC_Research
TRANS=$1
PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRANS | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1`
cat prot_alt.txt | grep $TRANS | awk '{print $11,$12}' | sed 's/ \+/\t/' | sed 's/\//\t/' | awk '{OFS=","; print $1,$2,$3}' | sed 's/\t//g' | sed 's/ \+//g' >$TRANS.var;
cat prot_alt.txt | grep $TRANS | awk -v prot=$PROT '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$8,prot, $11,$12,$13,$14,$15}' >$PROT.var.info;
faOneRecord /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa $PROT >$PROT.fa;
mv $TRANS.var $PROT.var;
provean.sh -q $PROT.fa -v $PROT.var --save_supporting_set $PROT.sss >$PROT.result.txt 2>$PROT.error;

Individual transcripts can also be submitted using the following script:

#!/bin/bash
#SBATCH --time=4800
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16000
#SBATCH --nice=1000
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=Provean
#SBATCH --partition=ABGC_Research
#cat outVEP_*.txt | awk '$11~/\//' | sed 's/:/\t/' | sort -k6 >prot_alt.txt
TRANS=$1
PROT=`cat /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa | grep $TRANS | sed 's/ \+/\t/g' | sed 's/^>//' | cut -f1`
if [ -f $PROT.sss ];
  then
  echo "$PROT $TRANS already done.";
  else
  cat prot_alt.txt | grep $TRANS | awk '{print $11,$12}' | sed 's/ \+/\t/' | sed 's/\//\t/' | awk '{OFS=","; print $1,$2,$3}' | sed 's/\t//g' | sed 's/ \+//g' >$TRANS.var;
  cat prot_alt.txt | grep $TRANS | awk -v prot=$PROT '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$8,prot, $11,$12,$13,$14,$15}' >$PROT.var.info;
  faOneRecord /lustre/nobackup/WUR/ABGC/shared/public_data_store/genomes/pig/Ensembl74/pep/Sus_scrofa.Sscrofa10.2.74.pep.all.fa $PROT >$PROT.fa;
  mv $TRANS.var $PROT.var;
  provean.sh -q $PROT.fa -v $PROT.var --save_supporting_set $PROT.sss >$PROT.result.txt 2>$PROT.error;
fi;

See also

Provean on the B4F Cluster