DE expression: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
Line 49: Line 49:
for f in $FILES
for f in $FILES
do
do
       /home/NIOO/martijnd/progs_nobackup/samtools-1.3/samtools sort $f -o $f.sorted
       samtools sort $f -o $f.sorted
/home/NIOO/martijnd/progs_nobackup/samtools-1.3/samtools rmdup $f.sorted $f.sorted.rmdup.bam
samtools rmdup $f.sorted $f.sorted.rmdup.bam
  ftrim=${f:0:-18}
  ftrim=${f:0:-18}
       name=`echo $f | cut -d"/" -f2`
       name=`echo $f | cut -d"/" -f2`

Revision as of 13:59, 26 January 2016

Typical commands used for doing a DE expression study using Tophat (including Bowtie2 as aligner), Stringtie and Cuffdiff

  • Samtools, Trimmomatic, Tophat, Bowtie2, Stringtie, Cufflinks need to be in PATH.
  • Use case in wintermoth. 12 samples (1 Egg, 1 Caterpillar, 3x3 Pupa, 1 Moth).

Trim sequences for quality and adapters

<source lang='bash'> DIRS='/data/ngs/AnE/visser_group/Operophtera_brumata/data/rnaseq/raw/*' mkdir -p TRIM for d in $DIRS do

      	R1=$d/*R1*.fastq.gz

R2=$d/*R2*.fastq.gz

      	lname=`echo $R1 | cut -d"/" -f10`
       echo $lname

mkdir -p TRIM/$lname cd TRIM/$lname

       java -jar ~/progs_nobackup/Trimmomatic-0.35/trimmomatic-0.35.jar PE -phred33 -threads 32 $R1 $R2 $lname\_trimmed_R1.fastq.gz $lname\_trimmed_R1_unpaired.fastq.gz $lname\_trimmed_R2.fastq.gz $lname\_trimmed_R2_unpaired.fastq.gz ILLUMINACLIP:/home/NIOO/martijnd/progs_nobackup/Trimmomatic-0.35/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:7 MINLEN:50 >> ../wm_rnaseq_trimmomatic.log

cd .. done </source>

Perform mapping with tophat2

<source lang='bash'> mkdir -p MAPPING mkdir -p assembly/bowtie2-index bowtie2-build assembly/Obru1.fsa assembly/bowtie2-index/Obru1

DIRS='TRIM/*' for d in $DIRS do

   	log='TRIM/wm_rnaseq_trimmomatic.log'

if "$d" == "$log" ; then continue; fi R1=`echo $d/*R1.fastq.gz`

      	R2=`echo $d/*R2.fastq.gz`
      	R1_U=`echo $d/*R1_unpaired.fastq.gz`
      	R2_U=`echo $d/*R2_unpaired.fastq.gz`
      	lname=`echo $R1 | cut -d"/" -f2`

cat $R1_U $R2_U > $d/unpaired.fastq.gz echo $lname

      	tophat2 -p 32 --read-mismatches 8 --read-gap-length 4 --read-edit-dist 8 -o MAPPING/$lname assembly/bowtie2-index/Obru1 $R1 $R2,$d/unpaired.fastq.gz 

done </source>

Remove PCR duplicates and use stringtie to quantify/assemble transcripts

<source lang='bash'> mkdir -p STRINGTIE FILES='MAPPING/*/*hits.bam' for f in $FILES do

      	samtools sort $f -o $f.sorted

samtools rmdup $f.sorted $f.sorted.rmdup.bam

	ftrim=${f:0:-18}
      	name=`echo $f | cut -d"/" -f2`
       echo $name
      	stringtie -p 32 -G assembly/Obru_genes.gff -o STRINGTIE/$name/transcripts.gtf $f.sorted.rmdup.bam

echo STRINGTIE/$name/transcripts.gtf >> cuffmerge_gtf_list.txt done </source>

Merge transcriptomes

<source lang='bash'> cuffmerge -o MERGED -g assembly/Obru_genes.gff -s assembly/Obru1.fsa -p 32 STRINGTIE/cuffmerge_gtf_list.txt </source>

Differential expression with Cuffdiff

<source lang='bash'> mkdir -p CUFFDIFF cuffdiff -o CUFFDIFF -p 32 -L Egg,Caterpillar,PupaStg1,PupaStg2,PupaStg3,Moth MERGED/merged.gtf \ MAPPING/run0183IL_34_Egg/accepted_hits.bam.sorted.rmdup.bam \ MAPPING/run0183IL_rups_502-5/accepted_hits.bam.sorted.rmdup.bam \ MAPPING/run0183IL_pop1/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop2/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop3/accepted_hits.bam.sorted.rmdup.bam \ MAPPING/run0183IL_pop4/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop5/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop6/accepted_hits.bam.sorted.rmdup.bam \ MAPPING/run0183IL_pop7/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop8/accepted_hits.bam.sorted.rmdup.bam,MAPPING/run0183IL_pop9/accepted_hits.bam.sorted.rmdup.bam \ MAPPING/run0183IL_5__wintervlinder/accepted_hits.bam.sorted.rmdup.bam </source>