Calculate corrected theta from resequencing data: Difference between revisions

Revision as of 10:42, 1 April 2014

This procedure will estimate theta (nucleotide diversity) based on re-sequencing data. The method is describe in Esteve-Codina et al.

!/bin/bash
SBATCH --time=10000
SBATCH --mem=4000
SBATCH --ntasks=1
SBATCH --nodes=1
SBATCH --constraint=normalmem
SBATCH --output=output_%j.txt
SBATCH --error=error_output_%j.txt
SBATCH --job-name=ngstheta
SBATCH --partition=ABGC_Research

echo "$1 min_depth is $MIN" samtools mpileup -uf /lustre/nobackup/WUR/ABGC/shared/Pig/Sscrofa_build10_2/FASTA/Ssc10_2_chromosomes.fa /lustre/nobackup/WUR/ABGC/shared/Pig/BAM_files_hjm_newbuild10_2/$1_rh.bam | bcftools view -bvcg - > $1.mig.bcf bcftools view $1.mig.bcf | vcfutils.pl varFilter -d$MIN -D$MAX > $1.mig.vcf awk '$6 >= 20' $1.mig.vcf > $1.miguel.vcf samtools mpileup -Bq 20 -d 50000 /lustre/nobackup/WUR/ABGC/shared/Pig/BAM_files_hjm_newbuild10_2/$1_rh.bam | perl covXwin-v3.1.pl -v $1.miguel.vcf -w 50000 -d $MIN -m $MAX -b /lustre/nobackup/WUR/ABGC/shared/Pig/BAM_files_hjm_newbuild10_2/$1_rh.bam | ./ngs_theta -d $MIN -m $MAX > $1.wintheta </source>

The script can be submitted using sbatch using the following code, assuming that the names of the individuals are listed in a file called individuals.txt. <source lang='bash'> INDS=`cat individuals.txt` for IND in $INDS; do sbatch nucdiv_pipeline.sh $IND; done </source>

<source lang = 'rsplus'> files=list.files(pattern="wintheta") a <- data.frame("file" = character(), "theta_het" = numeric()) for (file1 in files){

  x <- read.table(file1,header=T); 
  mn=mean(x$THETA_HET[x$BP>20000 & x$CHR != 'chrUN_nr' & x$CHR != 'Ssc10_2_X']); 
  print(paste(file1,mn,sep="  "));
  a<- rbind(a,data.frame("file"=file1,"theta_het"=mn))

} write.table(x=a,file="theta_het_results.txt") </source>

Calculate corrected theta from resequencing data: Difference between revisions

Revision as of 10:42, 1 April 2014

Navigation menu

Search