Variant annotation tutorial: Difference between revisions
Jump to navigation
Jump to search
(→snpEff) |
|||
Line 68: | Line 68: | ||
== Provean == | == Provean == | ||
>ENSSSCP00000018263 pep:novel chromosome:Sscrofa10.2:12:6621092:6624938:1 gene:ENSSSCG00000017236 transcript:ENSSSCT00000018765 gene_biotype:protein_coding transcript_biotype:protein_coding | >ENSSSCP00000018263 pep:novel chromosome:Sscrofa10.2:12:6621092:6624938:1 gene:ENSSSCG00000017236 transcript:ENSSSCT00000018765 gene_biotype:protein_coding t transcript_biotype:protein_coding | ||
MTPRVGAVWLPSALLLLRVPGCLSLSGPPTAMGTKGGSLSVQCRYEEEYIDDKKYWDKSP | MTPRVGAVWLPSALLLLRVPGCLSLSGPPTAMGTKGGSLSVQCRYEEEYIDDKKYWDKSP | ||
CFLSWKHIVETTESAREVRRGRVSIRDDPANLTFTVTLERLTEEDAGTYCCGITAQFSVD | CFLSWKHIVETTESAREVRRGRVSIRDDPANLTFTVTLERLTEEDAGTYCCGITAQFSVD | ||
PTHEVEVVVFPALGTSRPPSMPGPPTTLPATTWSFVSERETMANNLGKGPASQDPGQHPR | PTHEVEVVVFPALGTSRPPSMPGPPTTLPATTWSFVSERETMANNLGKGPASQDPGQHPR | ||
SKHPSIRLLLLVFLEVPLFLGMLGAVLWVHRPLRSSESRSVAMDPVPGNTAPSAGWK | SKHPSIRLLLLVFLEVPLFLGMLGAVLWVHRPLRSSESRSVAMDPVPGNTAPSAGWK | ||
235,G,E | 235,G,E | ||
5,V,A | 5,V,A | ||
22,C,Y | 22,C,Y | ||
34,T,I | 34,T,I | ||
51,D,N | 51,D,N | ||
53,K,N | 53,K,N | ||
59,S,Y | 59,S,Y | ||
61,C,R | 61,C,R | ||
64,S,L | 64,S,L | ||
67,H,P | 67,H,P | ||
68,I,T | 68,I,T | ||
75,A,V | 75,A,V | ||
108,T,K | 108,T,K | ||
115,A,T | 115,A,T | ||
124,E,D | 124,E,D | ||
130,F,Y | 130,F,Y | ||
133,L,P | 133,L,P | ||
142,P,A | 142,P,A | ||
155,F,I | 155,F,I | ||
158,E,G | 158,E,G | ||
186,I,V | 186,I,V | ||
21,G,D | 21,G,D | ||
## PROVEAN scores ## | ## PROVEAN scores ## | ||
# VARIATION SCORE | # VARIATION SCORE | ||
235,G,E 0.076 | 235,G,E 0.076 | ||
5,V,A 0.287 | 5,V,A 0.287 | ||
22,C,Y -8.028 | 22,C,Y -8.028 | ||
34,T,I -1.932 | 34,T,I -1.932 | ||
51,D,N -1.613 | 51,D,N -1.613 | ||
53,K,N 1.565 | 53,K,N 1.565 | ||
59,S,Y -2.140 | 59,S,Y -2.140 | ||
61,C,R -5.826 | 61,C,R -5.826 | ||
64,S,L -0.437 | 64,S,L -0.437 | ||
67,H,P -0.511 | 67,H,P -0.511 | ||
68,I,T -2.664 | 68,I,T -2.664 | ||
75,A,V -1.061 | 75,A,V -1.061 | ||
108,T,K -4.051 | 108,T,K -4.051 | ||
115,A,T 1.557 | 115,A,T 1.557 | ||
124,E,D -1.587 | 124,E,D -1.587 | ||
130,F,Y -0.983 | 130,F,Y -0.983 | ||
133,L,P 3.203 | 133,L,P 3.203 | ||
142,P,A -2.058 | 142,P,A -2.058 | ||
155,F,I 0.502 | 155,F,I 0.502 | ||
158,E,G -0.077 | 158,E,G -0.077 | ||
186,I,V 0.220 | 186,I,V 0.220 | ||
21,G,D -6.014 | 21,G,D -6.014 |
Revision as of 14:08, 6 March 2015
Slicing and dicing of VCF files
tabix -h allbt.vcf.gz 18 >BT18.vcf tabix -p vcf BT18.vcf.gz
Annotating VCF with rs-numbers
tabix -h BT18.vcf.gz 18:100000-101000 | vcf-annotate -a BT_incl_cons.18.vcf.gz -c CHROM,FROM,ID
something
Extracting variants using BEDtools
select a region of BT18 with genes:
gunzip -c Bos_taurus.UMD3.1.78.gtf.gz | awk '$3=="gene"' | awk '$1==18&&$4>1000000&&$4<2000000'
bedtools intersect -a BT18.vcf.gz -b mygenes.gtf | more
Variant Effect Predictor
gunzip -c /home/formacion/COMUNES/IAMZ/data/CIHEAM/MULTISAMPLE_VCF/all.fb.vcf.gz | perl /home/formacion/COMUNES/IAMZ/soft/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl --dir /home/formacion/COMUNES/IAMZ/data/CIHEAM/ReferenceGenome/VEP/ --species bos_taurus -o test2.vep --fork 4 --canonical --sift b --coding_only --no_intergenic --offline --force_overwrite --vcf
snpEff
Inferring function
from VEP output to Polyphen/Provean input
extracting fasta record.
faOneRecord Bos_taurus.UMD3.1.pep.all.fa `cat Bos_taurus.UMD3.1.pep.all.fa | grep ENSBTAT00000063226 | awk '{print $1}' | sed 's/>//'`
Polyphen
/lustre/nobackup/WUR/ABGC/shared/public_data_store/polyphen/polyphen-2.2.2/bin/run_pph.pl -s test1033.fa test1033.coord #o_acc o_pos o_aa1 o_aa2 rsid acc pos aa1 aa2 nt1 nt2 prediction based_on effect site region PHAT dScore Score1 Score2 MSAv Nobs Nstruct Nfilt PDB_id PDB_pos PDB_ch ident lengthNormASA SecStr MapReg dVol dProp B-fact H-bonds AveNHet MinDHet AveNInt MinDInt AveNSit MinDSit Transv CodPos CpG MinDJxn PfamHit IdPmax IdPSNP IdQmin ENSSSCG00000001099ENSSSCT00000001195 368 S A ENSSSCG00000001099ENSSSCT00000001195 368 S A benign alignment +0.721 -1.180 -1.901 2 68 12.350 12.350 42.63 ENSSSCG00000001099ENSSSCT00000001195 453 R H ENSSSCG00000001099ENSSSCT00000001195 453 R H benign alignment +0.351 -2.130 -2.481 2 67 33.194 33.194 93.76 ENSSSCG00000001099ENSSSCT00000001195 431 K T ENSSSCG00000001099ENSSSCT00000001195 431 K T possibly damaging alignment +1.705 -1.678 -3.383 2 68 2.382 68.80
What goes in:
test1033.fa >ENSSSCG00000001099ENSSSCT00000001195 MSSIEQTTEILLCLSPAEAANLKEGINFVRNKSTGKDYILFKNKSRLKACKNMCKHQGGL FIKDIEDLNGRSVKCTKHNWKLDVSSMKYINPPGSFCQDELVVEKDEENGVLLLELNPPN PWDSEPRSPEDLAFGEVQITYLTHACMDLKLGDKRMVFDPWLIGPAFARGWWLLHEPPSD WLERLSRADLIYISHMHSDHLSYPTLKKLAERRPDVPIYVGNTERPVFWNLNQSGVQLTN INVVPFGIWQQVDKNLRFMILMDGVHPEMDTCIIVEYKGHKILNTVDCTRPNGGRLPMKV ALMMSDFAGGASGFPMTFSGGKFTEEWKAQFIKTERKKLLNYKARLVKDLQPRIYCPFAG YFVESHPSDKYIKETNIKNDPNELNNLIKKNSEVVTWTPRPGATLDLGRMLKDPTDSKGI VEPPEGTKIYKDSWDFGPYLNILNAAIGDEIFRHSSWIKEYFTWAGFKDYNLVVRMIETD EDFSPLPGGYDYLVDFLDLSFPKERPSREHPYEEIRSRVDVIRHVVKNGLLWDDLYIGFQ TRLQRDPDIYHHLFWNHFQIKLPLTPPDWKSFLMCSG
test1033.coord ENSSSCG00000001099ENSSSCT00000001195 368 S A ENSSSCG00000001099ENSSSCT00000001195 453 R H ENSSSCG00000001099ENSSSCT00000001195 431 K T
Provean
>ENSSSCP00000018263 pep:novel chromosome:Sscrofa10.2:12:6621092:6624938:1 gene:ENSSSCG00000017236 transcript:ENSSSCT00000018765 gene_biotype:protein_coding t transcript_biotype:protein_coding MTPRVGAVWLPSALLLLRVPGCLSLSGPPTAMGTKGGSLSVQCRYEEEYIDDKKYWDKSP CFLSWKHIVETTESAREVRRGRVSIRDDPANLTFTVTLERLTEEDAGTYCCGITAQFSVD PTHEVEVVVFPALGTSRPPSMPGPPTTLPATTWSFVSERETMANNLGKGPASQDPGQHPR SKHPSIRLLLLVFLEVPLFLGMLGAVLWVHRPLRSSESRSVAMDPVPGNTAPSAGWK
235,G,E 5,V,A 22,C,Y 34,T,I 51,D,N 53,K,N 59,S,Y 61,C,R 64,S,L 67,H,P 68,I,T 75,A,V 108,T,K 115,A,T 124,E,D 130,F,Y 133,L,P 142,P,A 155,F,I 158,E,G 186,I,V 21,G,D
## PROVEAN scores ## # VARIATION SCORE 235,G,E 0.076 5,V,A 0.287 22,C,Y -8.028 34,T,I -1.932 51,D,N -1.613 53,K,N 1.565 59,S,Y -2.140 61,C,R -5.826 64,S,L -0.437 67,H,P -0.511 68,I,T -2.664 75,A,V -1.061 108,T,K -4.051 115,A,T 1.557 124,E,D -1.587 130,F,Y -0.983 133,L,P 3.203 142,P,A -2.058 155,F,I 0.502 158,E,G -0.077 186,I,V 0.220 21,G,D -6.014