Variant annotation tutorial: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
Line 42: Line 42:
== Polyphen ==
== Polyphen ==


  /lustre/nobackup/WUR/ABGC/shared/public_data_store/polyphen/polyphen-2.2.2/bin/run_pph.pl -s test1033.fa test1033.coord
  #o_acc              o_pos o_aa1 o_aa2 rsid      acc        pos aa1 aa2 nt1 nt2         prediction             based_on     effect     site   region     PHAT dScore Score1 Score2 MSAv   Nobs Nstruct Nfilt PDB_id PDB_pos PDB_ch ident lengthNormASA SecStr MapReg   dVol dProp B-fact H-bonds AveNHet MinDHet AveNInt MinDInt AveNSit MinDSit Transv CodPos CpG MinDJxn     PfamHit   IdPmax   IdPSNP   IdQmin
  ENSSSCG00000001099ENSSSCT00000001195   368     S     A           ENSSSCG00000001099ENSSSCT00000001195   368   S   A                 benign           alignment                                   +0.721 -1.180 -1.901   2     68                                                                                                                                                                             12.350   12.350   42.63
  ENSSSCG00000001099ENSSSCT00000001195   453     R     H           ENSSSCG00000001099ENSSSCT00000001195   453   R   H                 benign           alignment                                   +0.351 -2.130 -2.481   2     67                                                                                                                                                                             33.194   33.194   93.76
  ENSSSCG00000001099ENSSSCT00000001195   431     K     T           ENSSSCG00000001099ENSSSCT00000001195   431   K   T     possibly damaging           alignment                                   +1.705 -1.678 -3.383   2     68                                                                                                                                                                             2.382           68.80
What goes in:
  test1033.fa
  >ENSSSCG00000001099ENSSSCT00000001195
  MSSIEQTTEILLCLSPAEAANLKEGINFVRNKSTGKDYILFKNKSRLKACKNMCKHQGGL
  FIKDIEDLNGRSVKCTKHNWKLDVSSMKYINPPGSFCQDELVVEKDEENGVLLLELNPPN
  PWDSEPRSPEDLAFGEVQITYLTHACMDLKLGDKRMVFDPWLIGPAFARGWWLLHEPPSD
  WLERLSRADLIYISHMHSDHLSYPTLKKLAERRPDVPIYVGNTERPVFWNLNQSGVQLTN
  INVVPFGIWQQVDKNLRFMILMDGVHPEMDTCIIVEYKGHKILNTVDCTRPNGGRLPMKV
  ALMMSDFAGGASGFPMTFSGGKFTEEWKAQFIKTERKKLLNYKARLVKDLQPRIYCPFAG
  YFVESHPSDKYIKETNIKNDPNELNNLIKKNSEVVTWTPRPGATLDLGRMLKDPTDSKGI
  VEPPEGTKIYKDSWDFGPYLNILNAAIGDEIFRHSSWIKEYFTWAGFKDYNLVVRMIETD
  EDFSPLPGGYDYLVDFLDLSFPKERPSREHPYEEIRSRVDVIRHVVKNGLLWDDLYIGFQ
  TRLQRDPDIYHHLFWNHFQIKLPLTPPDWKSFLMCSG
  test1033.coord
  ENSSSCG00000001099ENSSSCT00000001195 368        S A
  ENSSSCG00000001099ENSSSCT00000001195 453        R H
  ENSSSCG00000001099ENSSSCT00000001195 431        K T


== Provean ==
== Provean ==

Revision as of 14:05, 6 March 2015


Slicing and dicing of VCF files

 tabix -h allbt.vcf.gz 18 >BT18.vcf
 tabix -p vcf BT18.vcf.gz


Annotating VCF with rs-numbers

 tabix -h BT18.vcf.gz 18:100000-101000 | vcf-annotate -a BT_incl_cons.18.vcf.gz -c CHROM,FROM,ID


something

Extracting variants using BEDtools

select a region of BT18 with genes:

gunzip -c Bos_taurus.UMD3.1.78.gtf.gz | awk '$3=="gene"' | awk '$1==18&&$4>1000000&&$4<2000000'

bedtools intersect -a BT18.vcf.gz -b mygenes.gtf | more

Variant Effect Predictor

 gunzip -c /home/formacion/COMUNES/IAMZ/data/CIHEAM/MULTISAMPLE_VCF/all.fb.vcf.gz | perl /home/formacion/COMUNES/IAMZ/soft/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl --dir /home/formacion/COMUNES/IAMZ/data/CIHEAM/ReferenceGenome/VEP/ --species bos_taurus -o test2.vep --fork 4 --canonical --sift b --coding_only --no_intergenic --offline --force_overwrite --vcf

snpEff

== Inferring function

from VEP output to Polyphen/Provean input

extracting fasta record.

faOneRecord Bos_taurus.UMD3.1.pep.all.fa `cat Bos_taurus.UMD3.1.pep.all.fa | grep ENSBTAT00000063226 | awk '{print $1}' | sed 's/>//'`


Polyphen

 /lustre/nobackup/WUR/ABGC/shared/public_data_store/polyphen/polyphen-2.2.2/bin/run_pph.pl -s test1033.fa test1033.coord 
 #o_acc               	 o_pos	o_aa1	o_aa2	rsid      	acc       	   pos	aa1	aa2	nt1	nt2	        prediction	            based_on	    effect	    site	  region	    PHAT	dScore	Score1	Score2	MSAv	  Nobs	 Nstruct	 Nfilt	PDB_id	PDB_pos	PDB_ch	 ident	lengthNormASA	SecStr	MapReg	  dVol	 dProp	B-fact	 H-bonds	 AveNHet	 MinDHet	 AveNInt	 MinDInt	 AveNSit	 MinDSit	Transv	CodPos	CpG	 MinDJxn	     PfamHit	  IdPmax	  IdPSNP	  IdQmin
 ENSSSCG00000001099ENSSSCT00000001195	   368	    S	    A	          	ENSSSCG00000001099ENSSSCT00000001195	   368	  S	  A	   	   	            benign	           alignment	          	        	        	        	+0.721	-1.180	-1.901	   2	    68	        	      	             	      	      	      	       	      	      	      	      	      	        	        	        	        	        	        	        	      	      	   	        	            	  12.350	  12.350	   42.63
 ENSSSCG00000001099ENSSSCT00000001195	   453	    R	    H	          	ENSSSCG00000001099ENSSSCT00000001195	   453	  R	  H	   	   	            benign	           alignment	          	        	        	        	+0.351	-2.130	-2.481	   2	    67	        	      	             	      	      	      	       	      	      	      	      	      	        	        	        	        	        	        	        	      	      	   	        	            	  33.194	  33.194	   93.76
 ENSSSCG00000001099ENSSSCT00000001195	   431	    K	    T	          	ENSSSCG00000001099ENSSSCT00000001195	   431	  K	  T	   	   	 possibly damaging	           alignment	          	        	        	        	+1.705	-1.678	-3.383	   2	    68	        	      	             	      	      	      	       	      	      	      	      	      	        	        	        	        	        	        	        	      	      	   	        	            	   2.382	        	   68.80

What goes in:

 test1033.fa
 >ENSSSCG00000001099ENSSSCT00000001195
 MSSIEQTTEILLCLSPAEAANLKEGINFVRNKSTGKDYILFKNKSRLKACKNMCKHQGGL
 FIKDIEDLNGRSVKCTKHNWKLDVSSMKYINPPGSFCQDELVVEKDEENGVLLLELNPPN
 PWDSEPRSPEDLAFGEVQITYLTHACMDLKLGDKRMVFDPWLIGPAFARGWWLLHEPPSD
 WLERLSRADLIYISHMHSDHLSYPTLKKLAERRPDVPIYVGNTERPVFWNLNQSGVQLTN
 INVVPFGIWQQVDKNLRFMILMDGVHPEMDTCIIVEYKGHKILNTVDCTRPNGGRLPMKV
 ALMMSDFAGGASGFPMTFSGGKFTEEWKAQFIKTERKKLLNYKARLVKDLQPRIYCPFAG
 YFVESHPSDKYIKETNIKNDPNELNNLIKKNSEVVTWTPRPGATLDLGRMLKDPTDSKGI
 VEPPEGTKIYKDSWDFGPYLNILNAAIGDEIFRHSSWIKEYFTWAGFKDYNLVVRMIETD
 EDFSPLPGGYDYLVDFLDLSFPKERPSREHPYEEIRSRVDVIRHVVKNGLLWDDLYIGFQ
 TRLQRDPDIYHHLFWNHFQIKLPLTPPDWKSFLMCSG 
 test1033.coord
 ENSSSCG00000001099ENSSSCT00000001195 368        S A
 ENSSSCG00000001099ENSSSCT00000001195 453        R H
 ENSSSCG00000001099ENSSSCT00000001195 431        K T

Provean

>ENSSSCP00000018263 pep:novel chromosome:Sscrofa10.2:12:6621092:6624938:1 gene:ENSSSCG00000017236 transcript:ENSSSCT00000018765 gene_biotype:protein_coding transcript_biotype:protein_coding MTPRVGAVWLPSALLLLRVPGCLSLSGPPTAMGTKGGSLSVQCRYEEEYIDDKKYWDKSP CFLSWKHIVETTESAREVRRGRVSIRDDPANLTFTVTLERLTEEDAGTYCCGITAQFSVD PTHEVEVVVFPALGTSRPPSMPGPPTTLPATTWSFVSERETMANNLGKGPASQDPGQHPR SKHPSIRLLLLVFLEVPLFLGMLGAVLWVHRPLRSSESRSVAMDPVPGNTAPSAGWK

235,G,E 5,V,A 22,C,Y 34,T,I 51,D,N 53,K,N 59,S,Y 61,C,R 64,S,L 67,H,P 68,I,T 75,A,V 108,T,K 115,A,T 124,E,D 130,F,Y 133,L,P 142,P,A 155,F,I 158,E,G 186,I,V 21,G,D

    1. PROVEAN scores ##
  1. VARIATION SCORE

235,G,E 0.076 5,V,A 0.287 22,C,Y -8.028 34,T,I -1.932 51,D,N -1.613 53,K,N 1.565 59,S,Y -2.140 61,C,R -5.826 64,S,L -0.437 67,H,P -0.511 68,I,T -2.664 75,A,V -1.061 108,T,K -4.051 115,A,T 1.557 124,E,D -1.587 130,F,Y -0.983 133,L,P 3.203 142,P,A -2.058 155,F,I 0.502 158,E,G -0.077 186,I,V 0.220 21,G,D -6.014