<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.anunna.wur.nl/index.php?action=history&amp;feed=atom&amp;title=Extract_noncall_snps_from_soy</id>
	<title>Extract noncall snps from soy - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.anunna.wur.nl/index.php?action=history&amp;feed=atom&amp;title=Extract_noncall_snps_from_soy"/>
	<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Extract_noncall_snps_from_soy&amp;action=history"/>
	<updated>2026-04-18T01:49:33Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Extract_noncall_snps_from_soy&amp;diff=1293&amp;oldid=prev</id>
		<title>Megen002: Created page with &quot;== Extract the SNPs that are not called in Soy ==  From the &lt;code&gt;.ped&lt;/code&gt; file, only the soy samples were extracted, and call rate calculated like this:  &lt;source lang=&#039;bas...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Extract_noncall_snps_from_soy&amp;diff=1293&amp;oldid=prev"/>
		<updated>2014-03-26T12:28:43Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;== Extract the SNPs that are not called in Soy ==  From the &amp;lt;code&amp;gt;.ped&amp;lt;/code&amp;gt; file, only the soy samples were extracted, and call rate calculated like this:  &amp;lt;source lang=&amp;#039;bas...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Extract the SNPs that are not called in Soy ==&lt;br /&gt;
&lt;br /&gt;
From the &amp;lt;code&amp;gt;.ped&amp;lt;/code&amp;gt; file, only the soy samples were extracted, and call rate calculated like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;#039;bash&amp;#039;&amp;gt;&lt;br /&gt;
./plink --file soy  --out soy --missing-genotype N --freq --noweb&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output data looks like this:&lt;br /&gt;
 CHR           SNP   A1   A2          MAF  NCHROBS&lt;br /&gt;
   1   MARC0044150    0    0           NA        0&lt;br /&gt;
   1   ASGA0000014    A    C          0.5       26&lt;br /&gt;
   1   ASGA0000021    G    T        0.375       16&lt;br /&gt;
   1   ALGA0000009    G    A       0.1333       30&lt;br /&gt;
   1   ALGA0000014    T    C          0.5       28&lt;br /&gt;
   1   H3GA0000032    A    C          0.5       38&lt;br /&gt;
   1   ASGA0000005    C    A          0.4       20&lt;br /&gt;
   1   INRA0000015    A    G       0.4643       28&lt;br /&gt;
   1   M1GA0000060    T    C      0.06667       30&lt;br /&gt;
&lt;br /&gt;
Subsequently, only SNPs with less than 4 calls (i.e. 2 individuals) were extracted. This amounts to only a single Soy sample being called for that SNP.&lt;br /&gt;
&amp;lt;source lang=&amp;#039;bash&amp;#039;&amp;gt;&lt;br /&gt;
cat soy.frq | sed &amp;#039;s/ \+/\t/g&amp;#039; | sort -k6 -nr | awk &amp;#039;$6&amp;lt;4&amp;#039; | cut -f3 &amp;gt;noncallsnps2.txt&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that the 3rd column is selected because the first column is shifted due to leading whitespace.&lt;br /&gt;
&lt;br /&gt;
== Make reduced SNP set in Plink ==&lt;br /&gt;
From the original SNP set (bones2.ped), that contains all samples (including the soy samples) for 49802 SNPs that map to autosomes in Ssc10.2, a reduced SNP set is made that only contains a subset of 8552 SNPs for the same individuals (including soy).&lt;br /&gt;
&amp;lt;source lang=&amp;#039;bash&amp;#039;&amp;gt;&lt;br /&gt;
./plink --file bones2  --out bones2_reduced --missing-genotype N --noweb --extract noncallsnps2.txt --recode&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Calculate heterozygosities from &amp;lt;code&amp;gt;.ped&amp;lt;/code&amp;gt; file ==&lt;br /&gt;
&lt;br /&gt;
Python3 script to calculate heterozygosity from &amp;lt;code&amp;gt;.ped&amp;lt;/code&amp;gt; file:&lt;br /&gt;
&amp;lt;source lang=&amp;#039;python&amp;#039;&amp;gt;&lt;br /&gt;
import argparse&lt;br /&gt;
import sys&lt;br /&gt;
import os&lt;br /&gt;
&lt;br /&gt;
parser = argparse.ArgumentParser( description=&amp;#039;calculate heterozygosities for individuals in a ped file&amp;#039;)&lt;br /&gt;
parser.add_argument(&amp;quot;-p&amp;quot;, &amp;quot;--pedfile&amp;quot;, help=&amp;quot;name of pedfile&amp;quot;, nargs=1)&lt;br /&gt;
&lt;br /&gt;
args = parser.parse_args()&lt;br /&gt;
ped=args.pedfile[0]&lt;br /&gt;
&lt;br /&gt;
def pedfile_to_list_of_lists(pedfile):&lt;br /&gt;
  flist=[]&lt;br /&gt;
  with open(pedfile) as pedf:&lt;br /&gt;
    for l in pedf.readlines():&lt;br /&gt;
      l=l.rstrip().split()&lt;br /&gt;
      flist.append(l[:])&lt;br /&gt;
  return flist&lt;br /&gt;
&lt;br /&gt;
def calc_het(geno):&lt;br /&gt;
   het=0&lt;br /&gt;
   hom=0&lt;br /&gt;
   for i in range(0,len(geno),2):&lt;br /&gt;
      pair=geno[i:i+2]&lt;br /&gt;
      if pair[0] == pair[1] and pair[0] != &amp;#039;N&amp;#039;:&lt;br /&gt;
         hom+=1&lt;br /&gt;
      elif pair[0] != pair[1] and pair[0] != &amp;#039;N&amp;#039;:&lt;br /&gt;
         het+=1&lt;br /&gt;
   obshet=het/(het+hom)&lt;br /&gt;
   return [obshet,het,hom]&lt;br /&gt;
&lt;br /&gt;
if __name__==&amp;quot;__main__&amp;quot;:&lt;br /&gt;
  allgenos=pedfile_to_list_of_lists(ped)&lt;br /&gt;
  for genos in allgenos:&lt;br /&gt;
     het=calc_het(genos[6:])&lt;br /&gt;
     print(genos[0],genos[1],het[0],het[1],het[2],sep=&amp;#039;\t&amp;#039;)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The script is invoked like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;#039;bash&amp;#039;&amp;gt;&lt;br /&gt;
python3 het_stats.py -p bones2_reduced.ped&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output will look something like this:&lt;br /&gt;
  BABA	HAPmap3_box3_G9_BABA_29	0.003549362	27	7580&lt;br /&gt;
  BABA	HAPmap3_box3_H9_BABA_31	0.007805386	60	7627&lt;br /&gt;
  BK22	WG0093132-DNAH08_SSBK22M01	0.247806248	2118	6429&lt;br /&gt;
  BK23	WG0093131-DNAA02_SSBK23F09	0.243493552	2077	6453&lt;br /&gt;
  DU22	WG0093131-DNAC01_SSDU22F01	0.287118287	2454	6093&lt;br /&gt;
  DU23	WG0093132-DNAD11_SSDU23M06	0.290866511	2484	6056&lt;br /&gt;
  GL100	HAPmap5_Box7_B03_SSRB417_U01	0.137879911	186	1163&lt;br /&gt;
  GL101	HAPmap5_Box7_C03_SSRB420_U01	0.010806217	89	8147&lt;br /&gt;
  GL102	HAPmap5_Box7_D03_SSRB421_U01	0.014296464	114	7860&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
[[Bioinformatics_protocols_ABG_Chairgroup | Bioinformatics tips, tricks, workflows at ABGC]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
[http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml Plink website and manual]&lt;/div&gt;</summary>
		<author><name>Megen002</name></author>
	</entry>
</feed>