<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.anunna.wur.nl/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Moiti001</id>
	<title>HPCwiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.anunna.wur.nl/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Moiti001"/>
	<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php/Special:Contributions/Moiti001"/>
	<updated>2026-04-18T17:28:24Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2180</id>
		<title>Whole genome alignment pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2180"/>
		<updated>2022-07-18T13:56:19Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/whole-genome-alignment here]&lt;br /&gt;
&lt;br /&gt;
= Whole genome alignment pipeline =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/whole-genome-alignment&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This pipeline aligns one or more genomes to a specified genome and plots the alignment.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/lh3/minimap2 minimap2]&lt;br /&gt;
* R&lt;br /&gt;
** [https://github.com/tpoorten/dotPlotly/blob/master/pafCoordsDotPlotly.R pafCoordsDotPlotly.R]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:whole-genome-alignment-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-config.yaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;# genome alignment parameters:&lt;br /&gt;
GENOME: /path/to/genome #genome fasta to be compared&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;lt;genome1&amp;gt;: path/to/genome1.fasta&lt;br /&gt;
  &amp;lt;genome2&amp;gt;: path/to/genome2.fasta&lt;br /&gt;
  &amp;lt;genome3&amp;gt;: path/to/genome3.fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&lt;br /&gt;
&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OUTDIR: /path/to/outdir&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* GENOME: path to the genome fasta file (can be compressed). This is the genome that you want to be compared to all the others&lt;br /&gt;
* COMPARISON_GENOME: genome fasta (can be compressed) for whole genome comparison. Add your species name and the path to the fasta file. ex: chicken: /path/to/chicken.fna.gz. You can add several genomes, one on each line.&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there’s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
* PREFIX: name of your species (ex: turkey)&lt;br /&gt;
* OUTDIR: directory where snakemake will run and where the results will be written to&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;installing-r-packages&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
&lt;br /&gt;
First load R: &amp;lt;code&amp;gt;module load R/4.0.2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Enter the R environment by writing &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt; and clicking enter. Install the packages:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;amp;lt;- c(&amp;amp;quot;optparse&amp;amp;quot;, &amp;amp;quot;data.table&amp;amp;quot;, &amp;amp;quot;ggplot2&amp;amp;quot;)&lt;br /&gt;
new.packages &amp;amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;amp;quot;Package&amp;amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
&#039;lib = &amp;amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] and try to install the packages again.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;genome_alignment/{prefix}&#039;&#039;vs&#039;&#039;{species}.paf&#039;&#039;&#039; paf format file w&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2179</id>
		<title>Whole genome alignment pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2179"/>
		<updated>2022-07-18T13:47:58Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/whole-genome-alignment here]&lt;br /&gt;
&lt;br /&gt;
= Single Cell preprocessing pipeline =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/whole-genome-alignment&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This pipeline aligns one or more genomes to a specified genome and plots the alignment.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/lh3/minimap2 minimap2]&lt;br /&gt;
* R&lt;br /&gt;
** [https://github.com/tpoorten/dotPlotly/blob/master/pafCoordsDotPlotly.R pafCoordsDotPlotly.R]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:whole-genome-alignment-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-config.yaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;# genome alignment parameters:&lt;br /&gt;
GENOME: /path/to/genome #genome fasta to be compared&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;lt;genome1&amp;gt;: path/to/genome1.fasta&lt;br /&gt;
  &amp;lt;genome2&amp;gt;: path/to/genome2.fasta&lt;br /&gt;
  &amp;lt;genome3&amp;gt;: path/to/genome3.fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&lt;br /&gt;
&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OUTDIR: /path/to/outdir&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* GENOME: path to the genome fasta file (can be compressed). This is the genome that you want to be compared to all the others&lt;br /&gt;
* COMPARISON_GENOME: genome fasta (can be compressed) for whole genome comparison. Add your species name and the path to the fasta file. ex: chicken: /path/to/chicken.fna.gz. You can add several genomes, one on each line.&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there’s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
* PREFIX: name of your species (ex: turkey)&lt;br /&gt;
* OUTDIR: directory where snakemake will run and where the results will be written to&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;installing-r-packages&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
&lt;br /&gt;
First load R: &amp;lt;code&amp;gt;module load R/4.0.2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Enter the R environment by writing &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt; and clicking enter. Install the packages:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;amp;lt;- c(&amp;amp;quot;optparse&amp;amp;quot;, &amp;amp;quot;data.table&amp;amp;quot;, &amp;amp;quot;ggplot2&amp;amp;quot;)&lt;br /&gt;
new.packages &amp;amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;amp;quot;Package&amp;amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
&#039;lib = &amp;amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] and try to install the packages again.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;genome_alignment/{prefix}&#039;&#039;vs&#039;&#039;{species}.paf&#039;&#039;&#039; paf format file w&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Whole-genome-alignment-workflow.png&amp;diff=2178</id>
		<title>File:Whole-genome-alignment-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Whole-genome-alignment-workflow.png&amp;diff=2178"/>
		<updated>2022-07-18T13:46:53Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2177</id>
		<title>Whole genome alignment pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Whole_genome_alignment_pipeline&amp;diff=2177"/>
		<updated>2022-07-18T13:46:02Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added documentation&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/whole-genome-alignment here]&lt;br /&gt;
&lt;br /&gt;
= Single Cell preprocessing pipeline =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/whole-genome-alignment&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This pipeline aligns one or more genomes to a specified genome and plots the alignment.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/lh3/minimap2 minimap2]&lt;br /&gt;
* R&lt;br /&gt;
** [https://github.com/tpoorten/dotPlotly/blob/master/pafCoordsDotPlotly.R pafCoordsDotPlotly.R]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:whole-genome-alignment-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-config.yaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;# genome alignment parameters:&lt;br /&gt;
GENOME: /path/to/genome #genome fasta to be compared&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;lt;genome1&amp;gt;: path/to/genome1.fasta&lt;br /&gt;
  &amp;lt;genome2&amp;gt;: path/to/genome2.fasta&lt;br /&gt;
  &amp;lt;genome3&amp;gt;: path/to/genome3.fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&lt;br /&gt;
&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OUTDIR: /path/to/outdir&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* GENOME: path to the genome fasta file (can be compressed). This is the genome that you want to be compared to all the others&lt;br /&gt;
* COMPARISON_GENOME: genome for whole genome comparison. Add your species name and the path to the fasta file. ex: chicken: /path/to/chicken.fna.gz. You can add several genomes, one on each line.&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there’s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
* PREFIX: name of your species (ex: turkey)&lt;br /&gt;
* OUTDIR: directory where snakemake will run and where the results will be written to&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;installing-r-packages&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
&lt;br /&gt;
First load R: &amp;lt;code&amp;gt;module load R/4.0.2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Enter the R environment by writing &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt; and clicking enter. Install the packages:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;amp;lt;- c(&amp;amp;quot;optparse&amp;amp;quot;, &amp;amp;quot;data.table&amp;amp;quot;, &amp;amp;quot;ggplot2&amp;amp;quot;)&lt;br /&gt;
new.packages &amp;amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;amp;quot;Package&amp;amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
&#039;lib = &amp;amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] and try to install the packages again.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;genome_alignment/{prefix}&#039;&#039;vs&#039;&#039;{species}.paf&#039;&#039;&#039; paf format file w&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2176</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2176"/>
		<updated>2022-07-18T13:41:09Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added whole genome alignment pipeline page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling| Nanopore assembly and variant calling pipeline]]&lt;br /&gt;
* [[Population variant calling pipeline | Population variant calling pipeline]]&lt;br /&gt;
* [[Single Cell preprocessing pipeline| Single Cell preprocessing pipeline]]&lt;br /&gt;
* [[Whole genome alignment pipeline | Whole genome alignment pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Single-cell-processing-workflow.png&amp;diff=2175</id>
		<title>File:Single-cell-processing-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Single-cell-processing-workflow.png&amp;diff=2175"/>
		<updated>2022-07-18T09:03:43Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Single_Cell_preprocessing_pipeline&amp;diff=2174</id>
		<title>Single Cell preprocessing pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Single_Cell_preprocessing_pipeline&amp;diff=2174"/>
		<updated>2022-07-18T09:02:40Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/single-cell-data-processing here]&lt;br /&gt;
&lt;br /&gt;
= Single Cell preprocessing pipeline =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/single-cell-data-processing&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This pipeline includes the first steps in the analysis of Single-cell data.&amp;lt;br /&amp;gt;&lt;br /&gt;
The first step is getting the reference package for your species. This will be used for read alignment and gene expression quantification. If you&#039;re working with human or mouse, you can download the reference from the Cellranger website, if not, the pipeline can create the reference for you. (details below)&lt;br /&gt;
&lt;br /&gt;
Once you have the reference package, the pipeline starts by running [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count Cellranger count]. Cellranger count performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis.&amp;lt;br /&amp;gt;&lt;br /&gt;
If the fastq files are not named in the format accepted by Cellranger count: &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;, you can specify in the config file that these need to be renamed: option &amp;lt;code&amp;gt;RENAME: y&amp;lt;/code&amp;gt; or you can rename them yourself to follow this naming convention.&lt;br /&gt;
&lt;br /&gt;
The metrics from Cellranger count for all samples are combined into one file &amp;lt;code&amp;gt;cellranger_count_metrics_allsamples.tsv&amp;lt;/code&amp;gt;. This will have information such as &amp;amp;quot;estimated number of cells&amp;amp;quot;, and &amp;amp;quot;mean reads per cell&amp;amp;quot;.&lt;br /&gt;
&lt;br /&gt;
After the Cellranger count step, it&#039;s important to remove the ambient RNA, which is RNA that has been released from degraded or dying cells and is now in the cell suspension. The R package [https://github.com/constantAmateur/SoupX SoupX] is used to correct for ambient RNA. In addition to the output files with the corrected data, one html document is created per sample processed (&amp;lt;code&amp;gt;2_ambient_RNA_correction/Ambient_RNA_correction_&amp;amp;lt;sample&amp;amp;gt;.html&amp;lt;/code&amp;gt;). This html file shows the code used to perform the ambient RNA correction, as well as a few plots that illustrate this process - for the 5 most affected genes and for 5 random genes:&lt;br /&gt;
&lt;br /&gt;
* Plot 1: in which cells the gene is expressed&lt;br /&gt;
* Plot 2: ratio of observed to expected counts&lt;br /&gt;
* Plot 3: change in expression due to correction&lt;br /&gt;
&lt;br /&gt;
Once the data has been corrected for ambient RNA, it&#039;s time for quality control filtering. This is a step that depends on the cell type, library preparation method used, etc, so you should always check if the default parameters make sense, use your own, or even run several times with different ones.&lt;br /&gt;
&lt;br /&gt;
QC is run for every sample separately. First [https://scanpy.readthedocs.io/en/stable/ Scanpy] calculates some general QC metrics for genes and cells. It will also calculate the proportion of counts for mitochondrial genes. Several plots will be created to help assess the quality of the data:&amp;lt;br /&amp;gt;&lt;br /&gt;
Before filtering:&lt;br /&gt;
&lt;br /&gt;
* Violin plots showing:&lt;br /&gt;
** n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
** total_counts: total number of counts for a cell&lt;br /&gt;
** pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
* Scatter plot showing :&lt;br /&gt;
** total_counts vs pct_counts_mt&lt;br /&gt;
** total counts vs n_genes_by_counts&lt;br /&gt;
&lt;br /&gt;
After filtering:&lt;br /&gt;
&lt;br /&gt;
* Percentage of counts per gene for the top 20 genes after filtering&lt;br /&gt;
* Violin plots showing:&lt;br /&gt;
** n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
** total_counts: total number of counts for a cell&lt;br /&gt;
** pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
&lt;br /&gt;
The final preprocessing step is doublet removal with [https://github.com/swolock/scrublet Scrublet]. This step may be run more than once to determine the ideal doublet score threshold. The histogram shown in &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt; should show a biomodal distribution, and the threshold shown in the &amp;amp;quot;simulated doublets&amp;amp;quot; plot should be at the minium between the two modes. The first run should be with parameter &amp;lt;code&amp;gt;SCRUB_TRESHOLD:&amp;lt;/code&amp;gt;. Once the first run has finished, you should look at the histograms of all samples and see if you need to change the treshold. If you do, set it in the config file as explained further below and run the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step again.&lt;br /&gt;
&lt;br /&gt;
There will be an &amp;lt;code&amp;gt;h5ad&amp;lt;/code&amp;gt; object containing the preprocessed data for each sample - ambient RNA removed, QC filtered and doublets removed - in the &amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt; directory.&amp;lt;br /&amp;gt;&lt;br /&gt;
This file can be used for further analysis.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* Cellranger:&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkgtf mkgtf] - filter GTF. [default: off]&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkref mkref] - create reference&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count count] - create feature counts&lt;br /&gt;
* R&lt;br /&gt;
** combine Cellranger count sample metrics&lt;br /&gt;
** [https://github.com/constantAmateur/SoupX SoupX] - remove ambient RNA&lt;br /&gt;
* Python&lt;br /&gt;
** [https://scanpy.readthedocs.io/en/stable/index.html Scanpy] - QC filtering&lt;br /&gt;
** [https://github.com/swolock/scrublet Scrublet] - Doublet removal&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:single-cell-processing-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files-and-set-parameters&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files and set parameters ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;DATA: /path/to/data/dir&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
&lt;br /&gt;
# mkref options&lt;br /&gt;
MKREF: &amp;lt;y/n&amp;gt;&lt;br /&gt;
FASTA: /path/to/fasta&lt;br /&gt;
GTF: /path/to/gtf # if creating own reference&lt;br /&gt;
REF_VERSION: &lt;br /&gt;
  - &amp;quot;--ref-version=&amp;lt;version&amp;gt;&amp;quot;&lt;br /&gt;
CR_MKREF_EXTRA: &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Filter GTF&lt;br /&gt;
FILTER_GTF: y&lt;br /&gt;
# # see here for available biotypes https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkgtf&lt;br /&gt;
ATTRIBUTES:&lt;br /&gt;
  - &amp;quot;--attribute=&amp;lt;biotype&amp;gt;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
PREFIX: &amp;lt;species&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# rename fastq files&lt;br /&gt;
RENAME: &amp;lt;y/n&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Cell ranger count options &lt;br /&gt;
# https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count&lt;br /&gt;
CR_COUNT_extra: &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# QC parameters&lt;br /&gt;
MITO_PERCENTAGE: 10 # keep cells with less than X% mitochondrial read fraction&lt;br /&gt;
NUMBER_GENES_PER_CELL: 500 # keep cells with more than X genes&lt;br /&gt;
NUMBER_UMI_PER_CELL: 1000 # keep cells with more than X UMIs&lt;br /&gt;
ENSEMBLE_BIOMART_SPECIES: &amp;quot;&amp;lt;species&amp;gt;&amp;quot; # ensembl biomart species used to get the mitochondrial genes for that species&lt;br /&gt;
&lt;br /&gt;
# threshold doublet score (should be at the minimum between two modes of the simulated doublet histogram)&lt;br /&gt;
SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;lt;sample1&amp;gt;: &amp;lt;value&amp;gt;&lt;br /&gt;
  &amp;lt;sample2&amp;gt;: &amp;lt;empty&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;DATA&#039;&#039;&#039; - path to directory containing fastq files. Preferrably, the files should be named in the format accepted by Cellranger Count &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;. If they are, set &amp;lt;code&amp;gt;RENAME: n&amp;lt;/code&amp;gt;. If not, they should be in the format &amp;lt;code&amp;gt;&amp;amp;lt;sample&amp;amp;gt;_R1.fastq.gz&amp;lt;/code&amp;gt;. In this case, you should set &amp;lt;code&amp;gt;RENAME: y&amp;lt;/code&amp;gt; so that the pipeline will rename the files according to the necessary format for Cellranger Count. The path can be to a directory that contains a subdirectory per sample. For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;```text&lt;br /&gt;
DATA&lt;br /&gt;
├── SAMPLE_1&lt;br /&gt;
│   ├──  sample_1_R1.fastq.gz&lt;br /&gt;
│   └──  sample_1_R2.fastq.gz&lt;br /&gt;
└── SAMPLE_2&lt;br /&gt;
   ├──  sample_2_R1.fastq.gz&lt;br /&gt;
   └──  sample_2_R2.fastq.gz&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Or to a directory with fastqs for all samples:&lt;br /&gt;
&lt;br /&gt;
```text&lt;br /&gt;
DATA&lt;br /&gt;
├── sample_1_R1.fastq.gz&lt;br /&gt;
├── sample_1_R2.fastq.gz&lt;br /&gt;
├── sample_2_R1.fastq.gz&lt;br /&gt;
└── sample_2_R2.fastq.gz&lt;br /&gt;
```&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;OUTDIR&#039;&#039;&#039; - directory where snakemake will run and where the results will be written to.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you don&#039;t want the results to be written to a new directory, open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;MKREF&#039;&#039;&#039; &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; if Cell Ranger doesn&#039;t provide a reference package for your species (currently, the species available are Human and Mouse). &amp;lt;code&amp;gt;n&amp;lt;/code&amp;gt; if you&#039;re using an existing reference package. In this case, you should create a directory in the pipeline directory named &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt;, and download the reference from [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;You only need to set the following if &amp;lt;code&amp;gt;MKREF: y&amp;lt;/code&amp;gt;:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;FASTA&#039;&#039;&#039; - path to fasta file&lt;br /&gt;
* &#039;&#039;&#039;GTF&#039;&#039;&#039;: path to gft file&lt;br /&gt;
* &#039;&#039;&#039;PREFIX&#039;&#039;&#039;: name of your species. Used to name the directory containing the reference package&lt;br /&gt;
* &#039;&#039;&#039;REF_VERSION&#039;&#039;&#039;: Reference version string to include with reference&lt;br /&gt;
* &#039;&#039;&#039;CR_MKREF_EXTRA&#039;&#039;&#039;: any other options for cellranger mkref. [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#singl see here]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;PREFIX&#039;&#039;&#039; - The name of your organism. The reference package used for cellranger count will be in the &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt; directory&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;RENAME&#039;&#039;&#039; - &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; if your input fastqs are not named in this format &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;. Use &amp;lt;code&amp;gt;n&amp;lt;/code&amp;gt; if they are.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Options for Cellrange count:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;CR_COUNT_extra&#039;&#039;&#039; - any other options for cellranger count. [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count Find other options here]. [Default: &amp;amp;quot;&amp;amp;quot;]&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;QC parameters&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;MITO_PERCENTAGE&#039;&#039;&#039; - Keep cells with less than X% mitochondrial read fraction. [Default: 10]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;NUMBER_GENES_PER_CELL&#039;&#039;&#039; - keep cells with more than X genes. [Default: 500]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;NUMBER_UMI_PER_CELL&#039;&#039;&#039; - keep cells with more than X UMIs. [Default: 1000]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;ENSEMBLE_BIOMART_SPECIES&#039;&#039;&#039; - ensembl biomart species used to get the mitochondrial genes&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Doublet removal score&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;SCRUB_THRESHOLD&#039;&#039;&#039; - threshold doublet score. It should be at the minimum between two modes of the simulated doublet histogram. In the first run it should be run as &amp;lt;code&amp;gt;SCRUB_TRESHOLD:&amp;lt;/code&amp;gt; (with no parameters). After that is done, for each sample you should then look at the &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt; plot and see if the vertical line on the &amp;amp;quot;simulated doublets&amp;amp;quot; plot is at the minimum between the two modes. If it&#039;s not, you should manually set it in the config file as:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;```yaml&lt;br /&gt;
SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;amp;lt;sample 1&amp;amp;gt;: &amp;amp;lt;value&amp;amp;gt;&lt;br /&gt;
  &amp;amp;lt;sample 2&amp;amp;gt;: &amp;amp;lt;empty&amp;amp;gt;&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
In `SCRUB_THRESHOLD` there should be a line for each sample, even if you don&#039;t need to set the threshold for that sample. If you need to change the treshold, set `&amp;amp;lt;sample&amp;amp;gt;: &amp;amp;lt;value&amp;amp;gt;`, if not, set `&amp;amp;lt;sample&amp;amp;gt;:` (without value).&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Additional set up ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;install-cellranger&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Install Cellranger ===&lt;br /&gt;
&lt;br /&gt;
Follow the instructions [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation here]&lt;br /&gt;
&lt;br /&gt;
# First download the package from the [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest downloads page]. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;wget -O cellranger-6.1.2.tar.gz &amp;quot;https://cf.10xgenomics.com/releases/cell-exp/cellranger-6.1.2.tar.gz?Expires=1648081234&amp;amp;Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZi4xMHhnZW5vbWljcy5jb20vcmVsZWFzZXMvY2VsbC1leHAvY2VsbHJhbmdlci02LjEuMi50YXIuZ3oiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NDgwODEyMzR9fX1dfQ__&amp;amp;Signature=HqCwx6eBEj~Lyw7C7UvsMAHzUH9aiPSM5yFcyflZiL2JRIwqzY2VWz1COtDQHNoJ48Ve41LZ5Q3eGv1yaAEf88SGhtxRUb2wJhFvvixBoR550bQ2wK7qfL6buLL9~u7MPw4q0-c1adXaSCm6otd6Xn0x2FIpZimOGJMYI9QEvNStN1Hi6MH4ZUOHGFFRBAvxlRxHmYBk-Vr~6qdc7nFXJW0C8OBWTn2g~XSKZRD50B5G5StMis0lLmgXZbRS0htQu8LPuUp8ZxqxQv20m9-HV9jEDVYEUP1sNJzAHGhAtq1FajN572Lptq0cWES8fheMexht1l-wRbQA-yOKAp7Bzg__&amp;amp;Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&amp;quot;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
Attention: the &amp;lt;code&amp;gt;cellranger-6.1.2.tar.gz&amp;lt;/code&amp;gt; name is an example, when you use this pipeline there might be a more recent version available.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Unpack Cellranger:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;tar -xzvf cellranger-6.1.2.tar.gz&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This will create a new directory, &amp;lt;code&amp;gt;cellranger-6.1.2&amp;lt;/code&amp;gt;, that contains cellranger and its dependencies.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol start=&amp;quot;3&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Place the path to the &amp;lt;code&amp;gt;cellranger-6.1.2&amp;lt;/code&amp;gt; (or the version you installed) in the config.yaml file. It should look like this&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;CELLRANGER_PATH: /path/to/cellranger-6.1.2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;don&#039;t add the backslash (&amp;amp;quot;\&amp;amp;quot;) after the directory name&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;reference-package&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Reference package ===&lt;br /&gt;
&lt;br /&gt;
If you&#039;re working with human or mouse data, download the reference from [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest here] and place it in a folder in the pipeline directory called &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt;.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you&#039;re working with another organism, download the fasta file and gtf file for your organism and place them in a directory called &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt; directory (should be in the pipeline main directory - where the Snakefile is). You can download these from [https://www.ensembl.org/index.html Ensembl].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;how-to-run&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== How to run ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Dry run to check if everything was correctly set up and if the pipeline is ready to run&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake -np&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;If all looks good, run the pipeline with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake --profile &amp;lt;name of hpc profile&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Once you have the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; results for each sample, you should look at the histogram, either in the jupyter notebook &amp;lt;code&amp;gt;4_Doublets/processed_notebook_&amp;amp;lt;sample&amp;amp;gt;.ipynb&amp;lt;/code&amp;gt; or in the saved plot &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt;. The vertical bar (threshold) in the &amp;amp;quot;simulated doublets&amp;amp;quot; plot should be at the lowest point between the two modes. If not, you&#039;ll need to set it in the config file. Only change the threshold for the samples that need it. For some, the automatically set threshold may be good. To do this, edit the config file. Add your sample name (the same as the directory names in &amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt;) and the new threshold or, in case you don&#039;t need to add a threshold, add the sample name and nothing in front of it.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;lt;sample 1&amp;gt;: &amp;lt;value&amp;gt;&lt;br /&gt;
  &amp;lt;sample 2&amp;gt;:&lt;br /&gt;
  &amp;lt;sample 3&amp;gt;: &amp;lt;value&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;In this case, the scrublet step for &amp;lt;code&amp;gt;sample 1&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;sample 3&amp;lt;/code&amp;gt; will run with the user defined threshold and &amp;lt;code&amp;gt;sample 2&amp;lt;/code&amp;gt; will run with the automatically defined treshold.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;After changing the thresholds, you&#039;ll need to run the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step again. Before you do this, you need to copy the previous results (&amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt;) to another directory, or you need to delete those results. Example to move the results to another directory:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;mkdir -p 4_Doublets_first_run&lt;br /&gt;
mv -r 4_Doublets/* 4_Doublets_first_run&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;5&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Once that&#039;s done you can rerun the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake -np --forcerun remove_doublets&lt;br /&gt;
snakemake --profile &amp;lt;profile name&amp;gt; --forcerun remove_doublets&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
You will have results for each step of the pipeline.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;1_renamed: directory with renamed fastq files softlinked&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;2_ambient_RNA_correction: directory containing results from ambient RNA correction&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Ambient_RNA_correction_&amp;lt;sample&amp;gt;.html: shows the code used for the ambient RNA correction, as well as a few plots that illustrate this process - for the 5 most affected genes and for 5 random genes:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Plot 1: in which cells the gene is expressed&amp;lt;br /&amp;gt;&lt;br /&gt;
Plot 2: ratio of observed to expected counts&amp;lt;br /&amp;gt;&lt;br /&gt;
Plot 3: change in expression due to correction&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;2_ambient_RNA_correction_data: inside, the ambient RNA corrected data for each sample is its corresponding directory.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;3_QC: directory containing results from QC:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;processed_notebook_&amp;lt;sample&amp;gt;.ipynp - Jupyter notebooks used to calculate QC for each sample. These are interactive and can be used to do further QC.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;sample&amp;gt;.h5ad - Filtered data.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;sample&amp;gt; - Directory with QC plots&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;description-of-the-qc-plots&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
===== Description of the QC plots =====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Before filtering:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Violin plots showing:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;n_genes_by_counts: number of genes with positive counts in a cell&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total_counts: total number of counts for a cell&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;pct_counts_mt: proportion of mitochondrial counts for a cell&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Scatter plot showing :&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total_counts vs pct_counts_mt&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total counts vs n_genes_by_counts&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
After filtering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  - Percentage of counts per gene for the top 20 genes after filtering&lt;br /&gt;
  - Violin plots showing:&lt;br /&gt;
    - n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
    - total_counts: total number of counts for a cell&lt;br /&gt;
    - pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
  These jupyter notebooks are interactive and can be used to do further QC control.&amp;lt;/pre&amp;gt;&lt;br /&gt;
* 4_Doublets: directory containing results from doublet removal&lt;br /&gt;
** processed_notebook_&amp;lt;sample&amp;gt;.ipynb - Jupyter notebooks used to do the doublet removal step for each sample. These are interactive and can be used to test different scrublet tresholds&lt;br /&gt;
** &amp;lt;sample&amp;gt; - directory with saved plots&lt;br /&gt;
** &amp;lt;sample&amp;gt;_doublets.h5ad - filtered data&lt;br /&gt;
* &amp;amp;lt;sample&amp;amp;gt; - directory containing output from Cellranger count. See [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/overview here] for more information&lt;br /&gt;
** outs&lt;br /&gt;
*** web_summary.html - summary metrics and automated secondary analysis results. If an issue was detected during the pipeline run, an alert appears on this page.&lt;br /&gt;
*** metrics_summary.csv - metrics like &amp;amp;quot;estimated number of cells&amp;amp;quot;&lt;br /&gt;
*** possorted_genome_bam.bam - BAM file containing position-sorted reads aligned to the genome and transcriptome, as well as unaligned reads. Each read in this BAM file has Chromium cellular and molecular barcode information attached.&lt;br /&gt;
*** raw_feature_bc_matrix.h5 - Contains every barcode from the fixed list of known-good barcode sequences that has at least one read. This includes background and cell associated barcodes&lt;br /&gt;
*** filtered_feature_bc_matrix.h5 - Contains only detected cell-associated barcodes. For Targeted Gene Expression samples, non-targeted genes are removed from the filtered matrix.&lt;br /&gt;
*** analysis - directory containing secondary analysis results: clustering, differential expression analysis, PCA, t-SNE, UMAP&lt;br /&gt;
*** molecule_info.h5 - contains per-molecule information for all molecules that contain a valid barcode, a valid UMI, and were assigned with high confidence to a gene or Feature Barcode.&lt;br /&gt;
*** cloupe.cloup - file to be used with [https://support.10xgenomics.com/single-cell-gene-expression/software/visualization/latest/what-is-loupe-cell-browser Loupe Browser]&lt;br /&gt;
&amp;lt;span id=&amp;quot;common-issues&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Common issues ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;only-see-create_file_log-and-combine_cellranger_counter_metrics-when-running-snakemake--np&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Only see &amp;lt;code&amp;gt;create_file_log&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;combine_cellranger_counter_metrics&amp;lt;/code&amp;gt; when running &amp;lt;code&amp;gt;snakemake -np&amp;lt;/code&amp;gt; ===&lt;br /&gt;
&lt;br /&gt;
This can be because your fastq files are not named correctly and the &amp;lt;code&amp;gt;rename&amp;lt;/code&amp;gt; option is not set to &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Single_Cell_preprocessing_pipeline&amp;diff=2173</id>
		<title>Single Cell preprocessing pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Single_Cell_preprocessing_pipeline&amp;diff=2173"/>
		<updated>2022-07-18T08:54:06Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added documentation&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/single-cell-data-processing here]&lt;br /&gt;
&lt;br /&gt;
= Single Cell preprocessing pipeline =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/single-cell-data-processing&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This pipeline includes the first steps in the analysis of Single-cell data.&amp;lt;br /&amp;gt;&lt;br /&gt;
The first step is getting the reference package for your species. This will be used for read alignment and gene expression quantification. If you&#039;re working with human or mouse, you can download the reference from the Cellranger website, if not, the pipeline can create the reference for you. (details below)&lt;br /&gt;
&lt;br /&gt;
Once you have the reference package, the pipeline starts by running [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count Cellranger count]. Cellranger count performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis.&amp;lt;br /&amp;gt;&lt;br /&gt;
If the fastq files are not named in the format accepted by Cellranger count: &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;, you can specify in the config file that these need to be renamed: option &amp;lt;code&amp;gt;RENAME: y&amp;lt;/code&amp;gt; or you can rename them yourself to follow this naming convention.&lt;br /&gt;
&lt;br /&gt;
The metrics from Cellranger count for all samples are combined into one file &amp;lt;code&amp;gt;cellranger_count_metrics_allsamples.tsv&amp;lt;/code&amp;gt;. This will have information such as &amp;amp;quot;estimated number of cells&amp;amp;quot;, and &amp;amp;quot;mean reads per cell&amp;amp;quot;.&lt;br /&gt;
&lt;br /&gt;
After the Cellranger count step, it&#039;s important to remove the ambient RNA, which is RNA that has been released from degraded or dying cells and is now in the cell suspension. The R package [https://github.com/constantAmateur/SoupX SoupX] is used to correct for ambient RNA. In addition to the output files with the corrected data, one html document is created per sample processed (&amp;lt;code&amp;gt;2_ambient_RNA_correction/Ambient_RNA_correction_&amp;amp;lt;sample&amp;amp;gt;.html&amp;lt;/code&amp;gt;). This html file shows the code used to perform the ambient RNA correction, as well as a few plots that illustrate this process - for the 5 most affected genes and for 5 random genes:&lt;br /&gt;
&lt;br /&gt;
* Plot 1: in which cells the gene is expressed&lt;br /&gt;
* Plot 2: ratio of observed to expected counts&lt;br /&gt;
* Plot 3: change in expression due to correction&lt;br /&gt;
&lt;br /&gt;
Once the data has been corrected for ambient RNA, it&#039;s time for quality control filtering. This is a step that depends on the cell type, library preparation method used, etc, so you should always check if the default parameters make sense, use your own, or even run several times with different ones.&lt;br /&gt;
&lt;br /&gt;
QC is run for every sample separately. First [https://scanpy.readthedocs.io/en/stable/ Scanpy] calculates some general QC metrics for genes and cells. It will also calculate the proportion of counts for mitochondrial genes. Several plots will be created to help assess the quality of the data:&amp;lt;br /&amp;gt;&lt;br /&gt;
Before filtering:&lt;br /&gt;
&lt;br /&gt;
* Violin plots showing:&lt;br /&gt;
** n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
** total_counts: total number of counts for a cell&lt;br /&gt;
** pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
* Scatter plot showing :&lt;br /&gt;
** total_counts vs pct_counts_mt&lt;br /&gt;
** total counts vs n_genes_by_counts&lt;br /&gt;
&lt;br /&gt;
After filtering:&lt;br /&gt;
&lt;br /&gt;
* Percentage of counts per gene for the top 20 genes after filtering&lt;br /&gt;
* Violin plots showing:&lt;br /&gt;
** n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
** total_counts: total number of counts for a cell&lt;br /&gt;
** pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
&lt;br /&gt;
The final preprocessing step is doublet removal with [https://github.com/swolock/scrublet Scrublet]. This step may be run more than once to determine the ideal doublet score threshold. The histogram shown in &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt; should show a biomodal distribution, and the threshold shown in the &amp;amp;quot;simulated doublets&amp;amp;quot; plot should be at the minium between the two modes. The first run should be with parameter &amp;lt;code&amp;gt;SCRUB_TRESHOLD:&amp;lt;/code&amp;gt;. Once the first run has finished, you should look at the histograms of all samples and see if you need to change the treshold. If you do, set it in the config file as explained further below and run the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step again.&lt;br /&gt;
&lt;br /&gt;
There will be an &amp;lt;code&amp;gt;h5ad&amp;lt;/code&amp;gt; object containing the preprocessed data for each sample - ambient RNA removed, QC filtered and doublets removed - in the &amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt; directory.&amp;lt;br /&amp;gt;&lt;br /&gt;
This file can be used for further analysis.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* Cellranger:&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkgtf mkgtf] - filter GTF. [default: off]&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkref mkref] - create reference&lt;br /&gt;
** [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count count] - create feature counts&lt;br /&gt;
* R&lt;br /&gt;
** combine Cellranger count sample metrics&lt;br /&gt;
** [https://github.com/constantAmateur/SoupX SoupX] - remove ambient RNA&lt;br /&gt;
* Python&lt;br /&gt;
** [https://scanpy.readthedocs.io/en/stable/index.html Scanpy] - QC filtering&lt;br /&gt;
** [https://github.com/swolock/scrublet Scrublet] - Doublet removal&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:https://github.com/CarolinaPB/single-cell-data-processing/blob/master/workflow.png|DAG]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files-and-set-parameters&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files and set parameters ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;DATA: /path/to/data/dir&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
&lt;br /&gt;
# mkref options&lt;br /&gt;
MKREF: &amp;lt;y/n&amp;gt;&lt;br /&gt;
FASTA: /path/to/fasta&lt;br /&gt;
GTF: /path/to/gtf # if creating own reference&lt;br /&gt;
REF_VERSION: &lt;br /&gt;
  - &amp;quot;--ref-version=&amp;lt;version&amp;gt;&amp;quot;&lt;br /&gt;
CR_MKREF_EXTRA: &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Filter GTF&lt;br /&gt;
FILTER_GTF: y&lt;br /&gt;
# # see here for available biotypes https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkgtf&lt;br /&gt;
ATTRIBUTES:&lt;br /&gt;
  - &amp;quot;--attribute=&amp;lt;biotype&amp;gt;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
PREFIX: &amp;lt;species&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# rename fastq files&lt;br /&gt;
RENAME: &amp;lt;y/n&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Cell ranger count options &lt;br /&gt;
# https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count&lt;br /&gt;
CR_COUNT_extra: &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# QC parameters&lt;br /&gt;
MITO_PERCENTAGE: 10 # keep cells with less than X% mitochondrial read fraction&lt;br /&gt;
NUMBER_GENES_PER_CELL: 500 # keep cells with more than X genes&lt;br /&gt;
NUMBER_UMI_PER_CELL: 1000 # keep cells with more than X UMIs&lt;br /&gt;
ENSEMBLE_BIOMART_SPECIES: &amp;quot;&amp;lt;species&amp;gt;&amp;quot; # ensembl biomart species used to get the mitochondrial genes for that species&lt;br /&gt;
&lt;br /&gt;
# threshold doublet score (should be at the minimum between two modes of the simulated doublet histogram)&lt;br /&gt;
SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;lt;sample1&amp;gt;: &amp;lt;value&amp;gt;&lt;br /&gt;
  &amp;lt;sample2&amp;gt;: &amp;lt;empty&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;DATA&#039;&#039;&#039; - path to directory containing fastq files. Preferrably, the files should be named in the format accepted by Cellranger Count &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;. If they are, set &amp;lt;code&amp;gt;RENAME: n&amp;lt;/code&amp;gt;. If not, they should be in the format &amp;lt;code&amp;gt;&amp;amp;lt;sample&amp;amp;gt;_R1.fastq.gz&amp;lt;/code&amp;gt;. In this case, you should set &amp;lt;code&amp;gt;RENAME: y&amp;lt;/code&amp;gt; so that the pipeline will rename the files according to the necessary format for Cellranger Count. The path can be to a directory that contains a subdirectory per sample. For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;```text&lt;br /&gt;
DATA&lt;br /&gt;
├── SAMPLE_1&lt;br /&gt;
│   ├──  sample_1_R1.fastq.gz&lt;br /&gt;
│   └──  sample_1_R2.fastq.gz&lt;br /&gt;
└── SAMPLE_2&lt;br /&gt;
   ├──  sample_2_R1.fastq.gz&lt;br /&gt;
   └──  sample_2_R2.fastq.gz&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Or to a directory with fastqs for all samples:&lt;br /&gt;
&lt;br /&gt;
```text&lt;br /&gt;
DATA&lt;br /&gt;
├── sample_1_R1.fastq.gz&lt;br /&gt;
├── sample_1_R2.fastq.gz&lt;br /&gt;
├── sample_2_R1.fastq.gz&lt;br /&gt;
└── sample_2_R2.fastq.gz&lt;br /&gt;
```&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;OUTDIR&#039;&#039;&#039; - directory where snakemake will run and where the results will be written to.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you don&#039;t want the results to be written to a new directory, open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;MKREF&#039;&#039;&#039; &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; if Cell Ranger doesn&#039;t provide a reference package for your species (currently, the species available are Human and Mouse). &amp;lt;code&amp;gt;n&amp;lt;/code&amp;gt; if you&#039;re using an existing reference package. In this case, you should create a directory in the pipeline directory named &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt;, and download the reference from [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;You only need to set the following if &amp;lt;code&amp;gt;MKREF: y&amp;lt;/code&amp;gt;:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;FASTA&#039;&#039;&#039; - path to fasta file&lt;br /&gt;
* &#039;&#039;&#039;GTF&#039;&#039;&#039;: path to gft file&lt;br /&gt;
* &#039;&#039;&#039;PREFIX&#039;&#039;&#039;: name of your species. Used to name the directory containing the reference package&lt;br /&gt;
* &#039;&#039;&#039;REF_VERSION&#039;&#039;&#039;: Reference version string to include with reference&lt;br /&gt;
* &#039;&#039;&#039;CR_MKREF_EXTRA&#039;&#039;&#039;: any other options for cellranger mkref. [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#singl see here]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;PREFIX&#039;&#039;&#039; - The name of your organism. The reference package used for cellranger count will be in the &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt; directory&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;RENAME&#039;&#039;&#039; - &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; if your input fastqs are not named in this format &amp;lt;code&amp;gt;[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz&amp;lt;/code&amp;gt;. Use &amp;lt;code&amp;gt;n&amp;lt;/code&amp;gt; if they are.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Options for Cellrange count:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;CR_COUNT_extra&#039;&#039;&#039; - any other options for cellranger count. [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count Find other options here]. [Default: &amp;amp;quot;&amp;amp;quot;]&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;QC parameters&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;MITO_PERCENTAGE&#039;&#039;&#039; - Keep cells with less than X% mitochondrial read fraction. [Default: 10]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;NUMBER_GENES_PER_CELL&#039;&#039;&#039; - keep cells with more than X genes. [Default: 500]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;NUMBER_UMI_PER_CELL&#039;&#039;&#039; - keep cells with more than X UMIs. [Default: 1000]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&#039;&#039;&#039;ENSEMBLE_BIOMART_SPECIES&#039;&#039;&#039; - ensembl biomart species used to get the mitochondrial genes&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Doublet removal score&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;SCRUB_THRESHOLD&#039;&#039;&#039; - threshold doublet score. It should be at the minimum between two modes of the simulated doublet histogram. In the first run it should be run as &amp;lt;code&amp;gt;SCRUB_TRESHOLD:&amp;lt;/code&amp;gt; (with no parameters). After that is done, for each sample you should then look at the &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt; plot and see if the vertical line on the &amp;amp;quot;simulated doublets&amp;amp;quot; plot is at the minimum between the two modes. If it&#039;s not, you should manually set it in the config file as:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;```yaml&lt;br /&gt;
SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;amp;lt;sample 1&amp;amp;gt;: &amp;amp;lt;value&amp;amp;gt;&lt;br /&gt;
  &amp;amp;lt;sample 2&amp;amp;gt;: &amp;amp;lt;empty&amp;amp;gt;&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
In `SCRUB_THRESHOLD` there should be a line for each sample, even if you don&#039;t need to set the threshold for that sample. If you need to change the treshold, set `&amp;amp;lt;sample&amp;amp;gt;: &amp;amp;lt;value&amp;amp;gt;`, if not, set `&amp;amp;lt;sample&amp;amp;gt;:` (without value).&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Additional set up ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;install-cellranger&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Install Cellranger ===&lt;br /&gt;
&lt;br /&gt;
Follow the instructions [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation here]&lt;br /&gt;
&lt;br /&gt;
# First download the package from the [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest downloads page]. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;wget -O cellranger-6.1.2.tar.gz &amp;quot;https://cf.10xgenomics.com/releases/cell-exp/cellranger-6.1.2.tar.gz?Expires=1648081234&amp;amp;Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZi4xMHhnZW5vbWljcy5jb20vcmVsZWFzZXMvY2VsbC1leHAvY2VsbHJhbmdlci02LjEuMi50YXIuZ3oiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NDgwODEyMzR9fX1dfQ__&amp;amp;Signature=HqCwx6eBEj~Lyw7C7UvsMAHzUH9aiPSM5yFcyflZiL2JRIwqzY2VWz1COtDQHNoJ48Ve41LZ5Q3eGv1yaAEf88SGhtxRUb2wJhFvvixBoR550bQ2wK7qfL6buLL9~u7MPw4q0-c1adXaSCm6otd6Xn0x2FIpZimOGJMYI9QEvNStN1Hi6MH4ZUOHGFFRBAvxlRxHmYBk-Vr~6qdc7nFXJW0C8OBWTn2g~XSKZRD50B5G5StMis0lLmgXZbRS0htQu8LPuUp8ZxqxQv20m9-HV9jEDVYEUP1sNJzAHGhAtq1FajN572Lptq0cWES8fheMexht1l-wRbQA-yOKAp7Bzg__&amp;amp;Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&amp;quot;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
Attention: the &amp;lt;code&amp;gt;cellranger-6.1.2.tar.gz&amp;lt;/code&amp;gt; name is an example, when you use this pipeline there might be a more recent version available.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Unpack Cellranger:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;tar -xzvf cellranger-6.1.2.tar.gz&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This will create a new directory, &amp;lt;code&amp;gt;cellranger-6.1.2&amp;lt;/code&amp;gt;, that contains cellranger and its dependencies.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol start=&amp;quot;3&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Place the path to the &amp;lt;code&amp;gt;cellranger-6.1.2&amp;lt;/code&amp;gt; (or the version you installed) in the config.yaml file. It should look like this&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;CELLRANGER_PATH: /path/to/cellranger-6.1.2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;don&#039;t add the backslash (&amp;amp;quot;\&amp;amp;quot;) after the directory name&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;reference-package&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Reference package ===&lt;br /&gt;
&lt;br /&gt;
If you&#039;re working with human or mouse data, download the reference from [https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest here] and place it in a folder in the pipeline directory called &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt;.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you&#039;re working with another organism, download the fasta file and gtf file for your organism and place them in a directory called &amp;lt;code&amp;gt;&amp;amp;lt;prefix&amp;amp;gt;_genome&amp;lt;/code&amp;gt; directory (should be in the pipeline main directory - where the Snakefile is). You can download these from [https://www.ensembl.org/index.html Ensembl].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;how-to-run&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== How to run ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Dry run to check if everything was correctly set up and if the pipeline is ready to run&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake -np&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;If all looks good, run the pipeline with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake --profile &amp;lt;name of hpc profile&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Once you have the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; results for each sample, you should look at the histogram, either in the jupyter notebook &amp;lt;code&amp;gt;4_Doublets/processed_notebook_&amp;amp;lt;sample&amp;amp;gt;.ipynb&amp;lt;/code&amp;gt; or in the saved plot &amp;lt;code&amp;gt;4_Doublets/&amp;amp;lt;sample&amp;amp;gt;/histogram_&amp;amp;lt;sample&amp;amp;gt;_doublets.pdf&amp;lt;/code&amp;gt;. The vertical bar (threshold) in the &amp;amp;quot;simulated doublets&amp;amp;quot; plot should be at the lowest point between the two modes. If not, you&#039;ll need to set it in the config file. Only change the threshold for the samples that need it. For some, the automatically set threshold may be good. To do this, edit the config file. Add your sample name (the same as the directory names in &amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt;) and the new threshold or, in case you don&#039;t need to add a threshold, add the sample name and nothing in front of it.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;SCRUB_THRESHOLD: &lt;br /&gt;
  &amp;lt;sample 1&amp;gt;: &amp;lt;value&amp;gt;&lt;br /&gt;
  &amp;lt;sample 2&amp;gt;:&lt;br /&gt;
  &amp;lt;sample 3&amp;gt;: &amp;lt;value&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;In this case, the scrublet step for &amp;lt;code&amp;gt;sample 1&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;sample 3&amp;lt;/code&amp;gt; will run with the user defined threshold and &amp;lt;code&amp;gt;sample 2&amp;lt;/code&amp;gt; will run with the automatically defined treshold.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;After changing the thresholds, you&#039;ll need to run the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step again. Before you do this, you need to copy the previous results (&amp;lt;code&amp;gt;4_Doublets&amp;lt;/code&amp;gt;) to another directory, or you need to delete those results. Example to move the results to another directory:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;mkdir -p 4_Doublets_first_run&lt;br /&gt;
mv -r 4_Doublets/* 4_Doublets_first_run&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;5&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Once that&#039;s done you can rerun the &amp;lt;code&amp;gt;remove_doublets&amp;lt;/code&amp;gt; step with&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;snakemake -np --forcerun remove_doublets&lt;br /&gt;
snakemake --profile &amp;lt;profile name&amp;gt; --forcerun remove_doublets&amp;lt;/syntaxhighlight&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
You will have results for each step of the pipeline.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;1_renamed: directory with renamed fastq files softlinked&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;2_ambient_RNA_correction: directory containing results from ambient RNA correction&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;Ambient_RNA_correction_&amp;lt;sample&amp;gt;.html: shows the code used for the ambient RNA correction, as well as a few plots that illustrate this process - for the 5 most affected genes and for 5 random genes:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Plot 1: in which cells the gene is expressed&amp;lt;br /&amp;gt;&lt;br /&gt;
Plot 2: ratio of observed to expected counts&amp;lt;br /&amp;gt;&lt;br /&gt;
Plot 3: change in expression due to correction&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;2_ambient_RNA_correction_data: inside, the ambient RNA corrected data for each sample is its corresponding directory.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;3_QC: directory containing results from QC:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;processed_notebook_&amp;lt;sample&amp;gt;.ipynp - Jupyter notebooks used to calculate QC for each sample. These are interactive and can be used to do further QC.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;sample&amp;gt;.h5ad - Filtered data.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;sample&amp;gt; - Directory with QC plots&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;span id=&amp;quot;description-of-the-qc-plots&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
===== Description of the QC plots =====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Before filtering:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Violin plots showing:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;n_genes_by_counts: number of genes with positive counts in a cell&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total_counts: total number of counts for a cell&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;pct_counts_mt: proportion of mitochondrial counts for a cell&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Scatter plot showing :&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total_counts vs pct_counts_mt&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;total counts vs n_genes_by_counts&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
After filtering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  - Percentage of counts per gene for the top 20 genes after filtering&lt;br /&gt;
  - Violin plots showing:&lt;br /&gt;
    - n_genes_by_counts: number of genes with positive counts in a cell&lt;br /&gt;
    - total_counts: total number of counts for a cell&lt;br /&gt;
    - pct_counts_mt: proportion of mitochondrial counts for a cell&lt;br /&gt;
  These jupyter notebooks are interactive and can be used to do further QC control.&amp;lt;/pre&amp;gt;&lt;br /&gt;
* 4_Doublets: directory containing results from doublet removal&lt;br /&gt;
** processed_notebook_&amp;lt;sample&amp;gt;.ipynb - Jupyter notebooks used to do the doublet removal step for each sample. These are interactive and can be used to test different scrublet tresholds&lt;br /&gt;
** &amp;lt;sample&amp;gt; - directory with saved plots&lt;br /&gt;
** &amp;lt;sample&amp;gt;_doublets.h5ad - filtered data&lt;br /&gt;
* &amp;amp;lt;sample&amp;amp;gt; - directory containing output from Cellranger count. See [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/overview here] for more information&lt;br /&gt;
** outs&lt;br /&gt;
*** web_summary.html - summary metrics and automated secondary analysis results. If an issue was detected during the pipeline run, an alert appears on this page.&lt;br /&gt;
*** metrics_summary.csv - metrics like &amp;amp;quot;estimated number of cells&amp;amp;quot;&lt;br /&gt;
*** possorted_genome_bam.bam - BAM file containing position-sorted reads aligned to the genome and transcriptome, as well as unaligned reads. Each read in this BAM file has Chromium cellular and molecular barcode information attached.&lt;br /&gt;
*** raw_feature_bc_matrix.h5 - Contains every barcode from the fixed list of known-good barcode sequences that has at least one read. This includes background and cell associated barcodes&lt;br /&gt;
*** filtered_feature_bc_matrix.h5 - Contains only detected cell-associated barcodes. For Targeted Gene Expression samples, non-targeted genes are removed from the filtered matrix.&lt;br /&gt;
*** analysis - directory containing secondary analysis results: clustering, differential expression analysis, PCA, t-SNE, UMAP&lt;br /&gt;
*** molecule_info.h5 - contains per-molecule information for all molecules that contain a valid barcode, a valid UMI, and were assigned with high confidence to a gene or Feature Barcode.&lt;br /&gt;
*** cloupe.cloup - file to be used with [https://support.10xgenomics.com/single-cell-gene-expression/software/visualization/latest/what-is-loupe-cell-browser Loupe Browser]&lt;br /&gt;
&amp;lt;span id=&amp;quot;common-issues&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Common issues ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;only-see-create_file_log-and-combine_cellranger_counter_metrics-when-running-snakemake--np&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Only see &amp;lt;code&amp;gt;create_file_log&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;combine_cellranger_counter_metrics&amp;lt;/code&amp;gt; when running &amp;lt;code&amp;gt;snakemake -np&amp;lt;/code&amp;gt; ===&lt;br /&gt;
&lt;br /&gt;
This can be because your fastq files are not named correctly and the &amp;lt;code&amp;gt;rename&amp;lt;/code&amp;gt; option is not set to &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2171</id>
		<title>Nanopore assembly and variant calling</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2171"/>
		<updated>2022-06-13T14:36:22Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added link to github repo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/nanopore-assembly here]&lt;br /&gt;
= Assemble nanopore reads and do variant calling with short and long reads =&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/nanopore-assembly&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create a nanopore assembly. It also does variant calling with long and short reads.&amp;lt;br /&amp;gt;&lt;br /&gt;
The pipeline starts by using &amp;lt;code&amp;gt;porechop&amp;lt;/code&amp;gt; to trim the adaptors, then it uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create the assembly. After that, &amp;lt;code&amp;gt;ntLink-arks&amp;lt;/code&amp;gt; from &amp;lt;code&amp;gt;Lonstitch&amp;lt;/code&amp;gt; is used to scaffold the assembly using the nanopore reads. The scaffolded assembly is polished with &amp;lt;code&amp;gt;polca&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;Bwa-mem2&amp;lt;/code&amp;gt; is used to map the short reads to the assembly and &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; to do variant calling using these reads. &amp;lt;code&amp;gt;Minimap2&amp;lt;/code&amp;gt; is used to map the long reads to the assembly, and &amp;lt;code&amp;gt;longshot&amp;lt;/code&amp;gt; for variant calling using these. In the end, in addition to your assembly and variant calling results, you&#039;ll also get assembly statistics and busco scores before and after the polishing.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/rrwick/Porechop Porechop] - trim adaptors&lt;br /&gt;
* [https://github.com/fenderglass/Flye Flye] - assembly&lt;br /&gt;
* [https://github.com/lh3/seqtk Seqtk] - convert fasta to one line fasta&lt;br /&gt;
* [https://github.com/bcgsc/longstitch LongStitch (ntLink-arks)] - scaffolding with nanopore reads&lt;br /&gt;
* [https://busco.ezlab.org/ BUSCO] - assess assembly completeness&lt;br /&gt;
* [https://github.com/alekseyzimin/masurca MaSuRCA (polca)] - polish assembly&lt;br /&gt;
* Python - get assembly stats&lt;br /&gt;
* [https://github.com/lh3/minimap2 Minimap2] - map long reads to reference. Genome alignment&lt;br /&gt;
* [http://www.htslib.org/ Samtools] - sort and index mapped reads and vcf files&lt;br /&gt;
* [https://github.com/pjedge/longshot Longshot] - variant calling with nanopore reads&lt;br /&gt;
* [https://github.com/bwa-mem2/bwa-mem2 Bwa-mem2] - map short reads to reference&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* R - [https://github.com/tpoorten/dotPlotly pafCoordsDotPlotly] - plot genome alignment&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:nanopore-assembly-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;LONGREADS: &amp;amp;lt;nanopore_reads.fq.gz&amp;amp;gt;&lt;br /&gt;
SHORTREADS:&lt;br /&gt;
  - /path/to/short/reads_1.fq.gz&lt;br /&gt;
  - /path/to/short/reads_2.fq.gz&lt;br /&gt;
GENOME_SIZE: &amp;amp;lt;approximate genome size&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;prefix&amp;amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
BUSCO_LINEAGE:&lt;br /&gt;
  - &amp;amp;lt;lineage&amp;amp;gt;&lt;br /&gt;
&lt;br /&gt;
# genome alignment parameters:&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&amp;lt;/pre&amp;gt;&lt;br /&gt;
* LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)&lt;br /&gt;
* SHORTREADS - paths to short reads fq.gz&lt;br /&gt;
* GENOME_SIZE - approximate genome size &amp;lt;code&amp;gt;haploid genome size (bp)(e.g. &#039;3e9&#039; for human genome)&amp;lt;/code&amp;gt; from [https://github.com/bcgsc/longstitch#full-help-page longstitch]&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* BUSCO_LINEAGE - lineage used for busco. Can be one or more (one per line). To see available lineages run &amp;lt;code&amp;gt;busco --list-datasets&amp;lt;/code&amp;gt;&lt;br /&gt;
* COMPARISON_GENOME - genome for whole genome comparison. Add your species name and the path to the fasta file. ex: &amp;lt;code&amp;gt;chicken: /path/to/chicken.fna.gz&amp;lt;/code&amp;gt;. You can add several genomes, one on each line.&lt;br /&gt;
** If you don&#039;t want to run the genome alignment step, comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&amp;lt;/pre&amp;gt;&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there&#039;s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
If you have your long reads in several fastq files and need to create one file compressed file with all the reads:&lt;br /&gt;
&lt;br /&gt;
# In your pipeline directory create one file with all the reads&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cat /path/to/fastq/directory/*.fastq &amp;amp;gt; &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compress the file you just created:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gzip &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== After installing the conda environment (first step of this guide) you&#039;ll need to edit the polca.sh file. ===&lt;br /&gt;
&lt;br /&gt;
First go to the directory where miniconda3 is installed (usually your home directory). Go to &amp;lt;code&amp;gt;/&amp;amp;lt;home&amp;amp;gt;/miniconda/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin&amp;lt;/code&amp;gt; and open the file &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt;. In my case the path looks like this: &amp;lt;code&amp;gt;/home/WUR/&amp;amp;lt;username&amp;amp;gt;/miniconda3/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin/&amp;lt;/code&amp;gt;. In your editor open &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt; and replace this line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) $BASM.alignSorted 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
With this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) -o $BASM.alignSorted.bam 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files are and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa&#039;&#039;&#039; final assembly&lt;br /&gt;
** assembly_stats_&amp;amp;lt;prefix&amp;amp;gt;.txt file with assembly statistics for the final assembly&lt;br /&gt;
** &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory with variant calling VCF files with long and short reads, as well as VCF stats&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz.stats&lt;br /&gt;
*** {prefix}_longreads.vcf.gz&lt;br /&gt;
*** {prefix}_longreads.vcf.gz.stats&lt;br /&gt;
Both the short reads and long reads variant calling VCF is filtered for &amp;lt;code&amp;gt;QUAL &amp;gt; 20 &amp;lt;/code&amp;gt;.&lt;br /&gt;
Freebayes (short read var calling) is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2 &amp;lt;/code&amp;gt;. For more details check the Snakefile.&lt;br /&gt;
** &#039;&#039;&#039;genome_alignment&#039;&#039;&#039; directory with results and figure from whole genome alignment&lt;br /&gt;
*** {prefix}_{species}.png&lt;br /&gt;
* &#039;&#039;&#039;mapped&#039;&#039;&#039; directory that contains the bam file with long reads mapped to the new assembly&lt;br /&gt;
** {prefix}_longreads.mapped.sorted.bam&lt;br /&gt;
* &#039;&#039;&#039;busco_{prefix}_before_polish_&#039;&#039;&#039; and &#039;&#039;&#039;busco_{prefix}_after_polish&#039;&#039;&#039; directories - contain busco results before and after polishing&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_before_polish.txt&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_after_polish.txt&amp;amp;quot;&lt;br /&gt;
* &#039;&#039;&#039;other_files&#039;&#039;&#039; - directory containing other files created during the pipeline&lt;br /&gt;
* &#039;&#039;&#039;assembly&#039;&#039;&#039; - directory containing files created during the assembly step&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2170</id>
		<title>Population variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2170"/>
		<updated>2022-06-13T14:34:13Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added author info&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For up-to-date documentation see [https://github.com/CarolinaPB/population-variant-calling here]&lt;br /&gt;
&lt;br /&gt;
= Population level variant calling =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/population-variant-calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that takes short reads aligned to a genome (in &amp;lt;code&amp;gt;.bam&amp;lt;/code&amp;gt; format) and performs population level variant calling with &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt;. It uses VEP to annotate the resulting VCF, calculates statistics, and calculates and plots a PCA.&lt;br /&gt;
&lt;br /&gt;
It was developed to work with the results of [https://github.com/CarolinaPB/population-mapping this population mapping pipeline]. There are a few &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; requirements that you need to take into account if you don&#039;t use the mapping pipeline mentioned above to map your reads. You should make sure that:&lt;br /&gt;
&lt;br /&gt;
* Alignments have read groups&lt;br /&gt;
* Alignments are sorted&lt;br /&gt;
* Duplicates are marked&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/freebayes/freebayes#calling-variants-from-fastq-to-vcf here] for more details.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* [https://www.cog-genomics.org/plink/ Plink] - compute PCA&lt;br /&gt;
* R - Plot PCA&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:population-var-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;ASSEMBLY: /path/to/fasta&lt;br /&gt;
MAPPING_DIR: /path/to/bams/dir&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
SPECIES: &amp;lt;species&amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;lt;number of chromosomes&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to genome fasta file&lt;br /&gt;
* MAPPING_DIR - path to directory with bam files to be used&lt;br /&gt;
** the pipeline will use all bam files in the directory, if you want to use a subset of those, create a file named &amp;lt;code&amp;gt;bam_list.txt&amp;lt;/code&amp;gt; that contains the paths to the bam files you want to use. One path per line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;/path/to/file.bam&lt;br /&gt;
/path/to/file2.bam&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;configuring-vep&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;for-people-using-wurs-anunna&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== For people using WUR&#039;s Anunna: ====&lt;br /&gt;
&lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn&#039;t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where &amp;amp;quot;assembly name&amp;amp;quot; is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;for-those-not-from-wur&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== For those not from WUR: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where &amp;amp;quot;assembly name&amp;amp;quot; is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;installing-r-packages&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
&lt;br /&gt;
First load R: &amp;lt;code&amp;gt;module load R/3.6.2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Enter the R environment by writing &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt; and clicking enter. Install the packages:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;amp;lt;- c(&amp;amp;quot;optparse&amp;amp;quot;, &amp;amp;quot;data.table&amp;amp;quot;, &amp;amp;quot;ggplot2&amp;amp;quot;)&lt;br /&gt;
&lt;br /&gt;
new.packages &amp;amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;amp;quot;Package&amp;amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
&#039;lib = &amp;amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] and try to install the packages again.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;final_VCF&#039;&#039;&#039; directory with variant calling VCF files, as well as VCF stats&lt;br /&gt;
*** {prefix}.vep.vcf.gz - final VCF file&lt;br /&gt;
*** {prefix}.vep.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;PCA&#039;&#039;&#039; PCA results and plot&lt;br /&gt;
*** {prefix}.eigenvec and {prefix}.eigenval - file with PCA eigenvectors and eigenvalues, respectively&lt;br /&gt;
*** {prefix}.pdf - PCA plot&lt;br /&gt;
&lt;br /&gt;
The VCF file has been filtered for &amp;lt;code&amp;gt;QUAL &amp;amp;gt; 20&amp;lt;/code&amp;gt;. Freebayes is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2&amp;lt;/code&amp;gt;. These parameters can be changed in the Snakefile.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2169</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2169"/>
		<updated>2022-06-13T14:25:46Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling| Nanopore assembly and variant calling pipeline]]&lt;br /&gt;
* [[Population variant calling pipeline | Population variant calling pipeline]]&lt;br /&gt;
* [[Single Cell preprocessing pipeline| Single Cell preprocessing pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2168</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2168"/>
		<updated>2022-06-13T14:21:46Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added single cell page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling| Nanopore assembly and variant calling pipeline]]&lt;br /&gt;
* [[Population variant calling pipeline | Population variant calling pipeline]]&lt;br /&gt;
* [[Single Cell preprocessing | Single Cell preprocessing]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2159</id>
		<title>Population variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2159"/>
		<updated>2022-03-02T14:18:57Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info on how to configure VEP and install R packages&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Population level variant calling =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/population-variant-calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that takes short reads aligned to a genome (in &amp;lt;code&amp;gt;.bam&amp;lt;/code&amp;gt; format) and performs population level variant calling with &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt;. It uses VEP to annotate the resulting VCF, calculates statistics, and calculates and plots a PCA.&lt;br /&gt;
&lt;br /&gt;
It was developed to work with the results of [https://github.com/CarolinaPB/population-mapping this population mapping pipeline]. There are a few &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; requirements that you need to take into account if you don&#039;t use the mapping pipeline mentioned above to map your reads. You should make sure that:&lt;br /&gt;
&lt;br /&gt;
* Alignments have read groups&lt;br /&gt;
* Alignments are sorted&lt;br /&gt;
* Duplicates are marked&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/freebayes/freebayes#calling-variants-from-fastq-to-vcf here] for more details.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* [https://www.cog-genomics.org/plink/ Plink] - compute PCA&lt;br /&gt;
* R - Plot PCA&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:population-var-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;ASSEMBLY: /path/to/fasta&lt;br /&gt;
MAPPING_DIR: /path/to/bams/dir&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
SPECIES: &amp;lt;species&amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;lt;number of chromosomes&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to genome fasta file&lt;br /&gt;
* MAPPING_DIR - path to directory with bam files to be used&lt;br /&gt;
** the pipeline will use all bam files in the directory, if you want to use a subset of those, create a file named &amp;lt;code&amp;gt;bam_list.txt&amp;lt;/code&amp;gt; that contains the paths to the bam files you want to use. One path per line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;/path/to/file.bam&lt;br /&gt;
/path/to/file2.bam&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-set-up&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;configuring-vep&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;for-people-using-wurs-anunna&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== For people using WUR&#039;s Anunna: ====&lt;br /&gt;
&lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn&#039;t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where &amp;amp;quot;assembly name&amp;amp;quot; is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;for-those-not-from-wur&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== For those not from WUR: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where &amp;amp;quot;assembly name&amp;amp;quot; is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;installing-r-packages&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
&lt;br /&gt;
First load R: &amp;lt;code&amp;gt;module load R/3.6.2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Enter the R environment by writing &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt; and clicking enter. Install the packages:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;amp;lt;- c(&amp;amp;quot;optparse&amp;amp;quot;, &amp;amp;quot;data.table&amp;amp;quot;, &amp;amp;quot;ggplot2&amp;amp;quot;)&lt;br /&gt;
&lt;br /&gt;
new.packages &amp;amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;amp;quot;Package&amp;amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
&#039;lib = &amp;amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] and try to install the packages again.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;final_VCF&#039;&#039;&#039; directory with variant calling VCF files, as well as VCF stats&lt;br /&gt;
*** {prefix}.vep.vcf.gz - final VCF file&lt;br /&gt;
*** {prefix}.vep.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;PCA&#039;&#039;&#039; PCA results and plot&lt;br /&gt;
*** {prefix}.eigenvec and {prefix}.eigenval - file with PCA eigenvectors and eigenvalues, respectively&lt;br /&gt;
*** {prefix}.pdf - PCA plot&lt;br /&gt;
&lt;br /&gt;
The VCF file has been filtered for &amp;lt;code&amp;gt;QUAL &amp;amp;gt; 20&amp;lt;/code&amp;gt;. Freebayes is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2&amp;lt;/code&amp;gt;. These parameters can be changed in the Snakefile.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Population-var-calling-workflow.png&amp;diff=2158</id>
		<title>File:Population-var-calling-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Population-var-calling-workflow.png&amp;diff=2158"/>
		<updated>2022-03-02T13:45:46Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2157</id>
		<title>Population variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2157"/>
		<updated>2022-03-02T13:45:28Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added workflow image&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Population level variant calling =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/population-variant-calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that takes short reads aligned to a genome (in &amp;lt;code&amp;gt;.bam&amp;lt;/code&amp;gt; format) and performs population level variant calling with &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt;. It uses VEP to annotate the resulting VCF, calculates statistics, and calculates and plots a PCA.&lt;br /&gt;
&lt;br /&gt;
It was developed to work with the results of [https://github.com/CarolinaPB/population-mapping this population mapping pipeline]. There are a few &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; requirements that you need to take into account if you don&#039;t use the mapping pipeline mentioned above to map your reads. You should make sure that:&lt;br /&gt;
&lt;br /&gt;
* Alignments have read groups&lt;br /&gt;
* Alignments are sorted&lt;br /&gt;
* Duplicates are marked&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/freebayes/freebayes#calling-variants-from-fastq-to-vcf here] for more details.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* [https://www.cog-genomics.org/plink/ Plink] - compute PCA&lt;br /&gt;
* R - Plot PCA&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:population-var-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;ASSEMBLY: /path/to/fasta&lt;br /&gt;
MAPPING_DIR: /path/to/bams/dir&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
SPECIES: &amp;lt;species&amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;lt;number of chromosomes&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to genome fasta file&lt;br /&gt;
* MAPPING_DIR - path to directory with bam files to be used&lt;br /&gt;
** the pipeline will use all bam files in the directory, if you want to use a subset of those, create a file named &amp;lt;code&amp;gt;bam_list.txt&amp;lt;/code&amp;gt; that contains the paths to the bam files you want to use. One path per line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;/path/to/file.bam&lt;br /&gt;
/path/to/file2.bam&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;final_VCF&#039;&#039;&#039; directory with variant calling VCF files, as well as VCF stats&lt;br /&gt;
*** {prefix}.vep.vcf.gz - final VCF file&lt;br /&gt;
*** {prefix}.vep.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;PCA&#039;&#039;&#039; PCA results and plot&lt;br /&gt;
*** {prefix}.eigenvec and {prefix}.eigenval - file with PCA eigenvectors and eigenvalues, respectively&lt;br /&gt;
*** {prefix}.pdf - PCA plot&lt;br /&gt;
&lt;br /&gt;
The VCF file has been filtered for &amp;lt;code&amp;gt;QUAL &amp;amp;gt; 20&amp;lt;/code&amp;gt;. Freebayes is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2&amp;lt;/code&amp;gt;. These parameters can be changed in the Snakefile.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2156</id>
		<title>Population variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_variant_calling_pipeline&amp;diff=2156"/>
		<updated>2022-03-02T13:43:18Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info on pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Population level variant calling =&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/population-variant-calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;first-follow-the-instructions-here&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== First follow the instructions here ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;about&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that takes short reads aligned to a genome (in &amp;lt;code&amp;gt;.bam&amp;lt;/code&amp;gt; format) and performs population level variant calling with &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt;. It uses VEP to annotate the resulting VCF, calculates statistics, and calculates and plots a PCA.&lt;br /&gt;
&lt;br /&gt;
It was developed to work with the results of [https://github.com/CarolinaPB/population-mapping this population mapping pipeline]. There are a few &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; requirements that you need to take into account if you don&#039;t use the mapping pipeline mentioned above to map your reads. You should make sure that:&lt;br /&gt;
&lt;br /&gt;
* Alignments have read groups&lt;br /&gt;
* Alignments are sorted&lt;br /&gt;
* Duplicates are marked&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/freebayes/freebayes#calling-variants-from-fastq-to-vcf here] for more details.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;tools-used&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Tools used ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* [https://www.cog-genomics.org/plink/ Plink] - compute PCA&lt;br /&gt;
* R - Plot PCA&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:https://github.com/CarolinaPB/pop-var-calling/blob/master/workflow.png|DAG]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;edit-configyaml-with-the-paths-to-your-files&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;yaml&amp;quot;&amp;gt;ASSEMBLY: /path/to/fasta&lt;br /&gt;
MAPPING_DIR: /path/to/bams/dir&lt;br /&gt;
PREFIX: &amp;lt;prefix&amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
SPECIES: &amp;lt;species&amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;lt;number of chromosomes&amp;gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to genome fasta file&lt;br /&gt;
* MAPPING_DIR - path to directory with bam files to be used&lt;br /&gt;
** the pipeline will use all bam files in the directory, if you want to use a subset of those, create a file named &amp;lt;code&amp;gt;bam_list.txt&amp;lt;/code&amp;gt; that contains the paths to the bam files you want to use. One path per line.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;/path/to/file.bam&lt;br /&gt;
/path/to/file2.bam&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;results&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;final_VCF&#039;&#039;&#039; directory with variant calling VCF files, as well as VCF stats&lt;br /&gt;
*** {prefix}.vep.vcf.gz - final VCF file&lt;br /&gt;
*** {prefix}.vep.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;PCA&#039;&#039;&#039; PCA results and plot&lt;br /&gt;
*** {prefix}.eigenvec and {prefix}.eigenval - file with PCA eigenvectors and eigenvalues, respectively&lt;br /&gt;
*** {prefix}.pdf - PCA plot&lt;br /&gt;
&lt;br /&gt;
The VCF file has been filtered for &amp;lt;code&amp;gt;QUAL &amp;amp;gt; 20&amp;lt;/code&amp;gt;. Freebayes is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2&amp;lt;/code&amp;gt;. These parameters can be changed in the Snakefile.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2155</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2155"/>
		<updated>2022-03-02T13:36:56Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: fixed nanopore assembly pipeline page link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling| Nanopore assembly and variant calling pipeline]]&lt;br /&gt;
* [[Population variant calling pipeline | Population variant calling pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2154</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2154"/>
		<updated>2022-03-02T13:33:36Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added page for population variant calling pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling pipeline| Nanopore assembly and variant calling pipeline]]&lt;br /&gt;
* [[Population variant calling pipeline | Population variant calling pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2153</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2153"/>
		<updated>2022-01-05T11:25:11Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info on how to install miniconda&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path   &amp;lt;br/&amp;gt;&lt;br /&gt;
In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp -r &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&#039;&#039;Update 05/01/2022:&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
Here I show how to install miniconda in a linux system&amp;lt;br /&amp;gt;&lt;br /&gt;
[https://docs.conda.io/en/latest/miniconda.html Download installer]&amp;lt;br /&amp;gt;&lt;br /&gt;
[https://conda.io/projects/conda/en/latest/user-guide/install/index.html Installation instructions]&lt;br /&gt;
&lt;br /&gt;
# Download the installer to your home directory. Choose the version according to your operating system. You can right click the link, copy and download with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;wget &amp;amp;lt;link&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
At the time of writing this update, for me it would be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
To install miniconda, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;bash &amp;amp;lt;installer name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
installer name could be &amp;lt;code&amp;gt;Miniconda3-latest-Linux-x86_64.sh&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Set up the conda channels in this order:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda config --add channels defaults&lt;br /&gt;
conda config --add channels bioconda&lt;br /&gt;
conda config --add channels conda-forge&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You&#039;ll need to activate the conda environment again&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake.&lt;br /&gt;
&lt;br /&gt;
If all looks ok, you can now run your pipeline&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake --profile &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If everything was set up correctly, the jobs should be submitted and you should be able to see the progress of the pipeline in your terminal.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2149</id>
		<title>Mapping and variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2149"/>
		<updated>2021-11-22T11:56:38Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: info on how to concatenate fq files&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/mapping-variant-calling   &lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/WUR_mapping-variant-calling Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa-mem2 - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
* Freebayes - variant calling&lt;br /&gt;
* Bcftools - VCF statistics&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:mapping-variant-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* PREFIX - prefix for the final mapped reads file&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out &amp;lt;pre&amp;gt;OUTDIR: /path/to/output&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For the mapping step you should have one _1 fastq file and one _2 fastq file in &amp;lt;code&amp;gt;READS_DIR&amp;lt;/code&amp;gt;. If you have several _1 and _2 fastq files from the same sample, you can combine them so you have one file for all _1 reads and one for all the _2 reads. This can be done by concatenating them using &amp;lt;code&amp;gt;cat&amp;lt;/code&amp;gt;, if the original files are not compressed (&amp;lt;code&amp;gt;fastq&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fq&amp;lt;/code&amp;gt; extension), or &amp;lt;code&amp;gt;zcat&amp;lt;/code&amp;gt; if the original files are compressed (&amp;lt;code&amp;gt;fastq.gz&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fq.gz&amp;lt;/code&amp;gt; extension). Example where your files are in the same directory and are compressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;zcat *_1.fastq.gz &amp;amp;gt; &amp;amp;lt;new file name&amp;amp;gt;_1.fastq.gz&lt;br /&gt;
zcat *_2.fastq.gz &amp;amp;gt; &amp;amp;lt;new file name&amp;amp;gt;_2.fastq.gz&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;sorted_reads&#039;&#039;&#039; directory with the file containing the mapped reads&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory containing the qualimap results&lt;br /&gt;
* &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory containing the variant calling VCF file and a file with with VCF statistics&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2141</id>
		<title>Nanopore assembly and variant calling</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2141"/>
		<updated>2021-11-04T08:44:32Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about variant calling filtering&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Assemble nanopore reads and do variant calling with short and long reads =&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/nanopore-assembly&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create a nanopore assembly. It also does variant calling with long and short reads.&amp;lt;br /&amp;gt;&lt;br /&gt;
The pipeline starts by using &amp;lt;code&amp;gt;porechop&amp;lt;/code&amp;gt; to trim the adaptors, then it uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create the assembly. After that, &amp;lt;code&amp;gt;ntLink-arks&amp;lt;/code&amp;gt; from &amp;lt;code&amp;gt;Lonstitch&amp;lt;/code&amp;gt; is used to scaffold the assembly using the nanopore reads. The scaffolded assembly is polished with &amp;lt;code&amp;gt;polca&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;Bwa-mem2&amp;lt;/code&amp;gt; is used to map the short reads to the assembly and &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; to do variant calling using these reads. &amp;lt;code&amp;gt;Minimap2&amp;lt;/code&amp;gt; is used to map the long reads to the assembly, and &amp;lt;code&amp;gt;longshot&amp;lt;/code&amp;gt; for variant calling using these. In the end, in addition to your assembly and variant calling results, you&#039;ll also get assembly statistics and busco scores before and after the polishing.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/rrwick/Porechop Porechop] - trim adaptors&lt;br /&gt;
* [https://github.com/fenderglass/Flye Flye] - assembly&lt;br /&gt;
* [https://github.com/lh3/seqtk Seqtk] - convert fasta to one line fasta&lt;br /&gt;
* [https://github.com/bcgsc/longstitch LongStitch (ntLink-arks)] - scaffolding with nanopore reads&lt;br /&gt;
* [https://busco.ezlab.org/ BUSCO] - assess assembly completeness&lt;br /&gt;
* [https://github.com/alekseyzimin/masurca MaSuRCA (polca)] - polish assembly&lt;br /&gt;
* Python - get assembly stats&lt;br /&gt;
* [https://github.com/lh3/minimap2 Minimap2] - map long reads to reference. Genome alignment&lt;br /&gt;
* [http://www.htslib.org/ Samtools] - sort and index mapped reads and vcf files&lt;br /&gt;
* [https://github.com/pjedge/longshot Longshot] - variant calling with nanopore reads&lt;br /&gt;
* [https://github.com/bwa-mem2/bwa-mem2 Bwa-mem2] - map short reads to reference&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* R - [https://github.com/tpoorten/dotPlotly pafCoordsDotPlotly] - plot genome alignment&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:nanopore-assembly-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;LONGREADS: &amp;amp;lt;nanopore_reads.fq.gz&amp;amp;gt;&lt;br /&gt;
SHORTREADS:&lt;br /&gt;
  - /path/to/short/reads_1.fq.gz&lt;br /&gt;
  - /path/to/short/reads_2.fq.gz&lt;br /&gt;
GENOME_SIZE: &amp;amp;lt;approximate genome size&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;prefix&amp;amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
BUSCO_LINEAGE:&lt;br /&gt;
  - &amp;amp;lt;lineage&amp;amp;gt;&lt;br /&gt;
&lt;br /&gt;
# genome alignment parameters:&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&amp;lt;/pre&amp;gt;&lt;br /&gt;
* LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)&lt;br /&gt;
* SHORTREADS - paths to short reads fq.gz&lt;br /&gt;
* GENOME_SIZE - approximate genome size &amp;lt;code&amp;gt;haploid genome size (bp)(e.g. &#039;3e9&#039; for human genome)&amp;lt;/code&amp;gt; from [https://github.com/bcgsc/longstitch#full-help-page longstitch]&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* BUSCO_LINEAGE - lineage used for busco. Can be one or more (one per line). To see available lineages run &amp;lt;code&amp;gt;busco --list-datasets&amp;lt;/code&amp;gt;&lt;br /&gt;
* COMPARISON_GENOME - genome for whole genome comparison. Add your species name and the path to the fasta file. ex: &amp;lt;code&amp;gt;chicken: /path/to/chicken.fna.gz&amp;lt;/code&amp;gt;. You can add several genomes, one on each line.&lt;br /&gt;
** If you don&#039;t want to run the genome alignment step, comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&amp;lt;/pre&amp;gt;&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there&#039;s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
If you have your long reads in several fastq files and need to create one file compressed file with all the reads:&lt;br /&gt;
&lt;br /&gt;
# In your pipeline directory create one file with all the reads&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cat /path/to/fastq/directory/*.fastq &amp;amp;gt; &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compress the file you just created:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gzip &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== After installing the conda environment (first step of this guide) you&#039;ll need to edit the polca.sh file. ===&lt;br /&gt;
&lt;br /&gt;
First go to the directory where miniconda3 is installed (usually your home directory). Go to &amp;lt;code&amp;gt;/&amp;amp;lt;home&amp;amp;gt;/miniconda/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin&amp;lt;/code&amp;gt; and open the file &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt;. In my case the path looks like this: &amp;lt;code&amp;gt;/home/WUR/&amp;amp;lt;username&amp;amp;gt;/miniconda3/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin/&amp;lt;/code&amp;gt;. In your editor open &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt; and replace this line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) $BASM.alignSorted 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
With this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) -o $BASM.alignSorted.bam 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files are and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa&#039;&#039;&#039; final assembly&lt;br /&gt;
** assembly_stats_&amp;amp;lt;prefix&amp;amp;gt;.txt file with assembly statistics for the final assembly&lt;br /&gt;
** &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory with variant calling VCF files with long and short reads, as well as VCF stats&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz.stats&lt;br /&gt;
*** {prefix}_longreads.vcf.gz&lt;br /&gt;
*** {prefix}_longreads.vcf.gz.stats&lt;br /&gt;
Both the short reads and long reads variant calling VCF is filtered for &amp;lt;code&amp;gt;QUAL &amp;gt; 20 &amp;lt;/code&amp;gt;.&lt;br /&gt;
Freebayes (short read var calling) is ran with parameters &amp;lt;code&amp;gt;--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2 &amp;lt;/code&amp;gt;. For more details check the Snakefile.&lt;br /&gt;
** &#039;&#039;&#039;genome_alignment&#039;&#039;&#039; directory with results and figure from whole genome alignment&lt;br /&gt;
*** {prefix}_{species}.png&lt;br /&gt;
* &#039;&#039;&#039;mapped&#039;&#039;&#039; directory that contains the bam file with long reads mapped to the new assembly&lt;br /&gt;
** {prefix}_longreads.mapped.sorted.bam&lt;br /&gt;
* &#039;&#039;&#039;busco_{prefix}_before_polish_&#039;&#039;&#039; and &#039;&#039;&#039;busco_{prefix}_after_polish&#039;&#039;&#039; directories - contain busco results before and after polishing&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_before_polish.txt&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_after_polish.txt&amp;amp;quot;&lt;br /&gt;
* &#039;&#039;&#039;other_files&#039;&#039;&#039; - directory containing other files created during the pipeline&lt;br /&gt;
* &#039;&#039;&#039;assembly&#039;&#039;&#039; - directory containing files created during the assembly step&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Nanopore-assembly-workflow.png&amp;diff=2140</id>
		<title>File:Nanopore-assembly-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Nanopore-assembly-workflow.png&amp;diff=2140"/>
		<updated>2021-11-01T10:03:49Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: Moiti001 uploaded a new version of File:Nanopore-assembly-workflow.png&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2139</id>
		<title>Nanopore assembly and variant calling</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2139"/>
		<updated>2021-11-01T10:03:12Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info on var calling with short reads&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Assemble nanopore reads and do variant calling with short and long reads =&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/nanopore-assembly&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create a nanopore assembly. It also does variant calling with long and short reads.&amp;lt;br /&amp;gt;&lt;br /&gt;
The pipeline starts by using &amp;lt;code&amp;gt;porechop&amp;lt;/code&amp;gt; to trim the adaptors, then it uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create the assembly. After that, &amp;lt;code&amp;gt;ntLink-arks&amp;lt;/code&amp;gt; from &amp;lt;code&amp;gt;Lonstitch&amp;lt;/code&amp;gt; is used to scaffold the assembly using the nanopore reads. The scaffolded assembly is polished with &amp;lt;code&amp;gt;polca&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;Bwa-mem2&amp;lt;/code&amp;gt; is used to map the short reads to the assembly and &amp;lt;code&amp;gt;Freebayes&amp;lt;/code&amp;gt; to do variant calling using these reads. &amp;lt;code&amp;gt;Minimap2&amp;lt;/code&amp;gt; is used to map the long reads to the assembly, and &amp;lt;code&amp;gt;longshot&amp;lt;/code&amp;gt; for variant calling using these. In the end, in addition to your assembly and variant calling results, you&#039;ll also get assembly statistics and busco scores before and after the polishing.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/rrwick/Porechop Porechop] - trim adaptors&lt;br /&gt;
* [https://github.com/fenderglass/Flye Flye] - assembly&lt;br /&gt;
* [https://github.com/lh3/seqtk Seqtk] - convert fasta to one line fasta&lt;br /&gt;
* [https://github.com/bcgsc/longstitch LongStitch (ntLink-arks)] - scaffolding with nanopore reads&lt;br /&gt;
* [https://busco.ezlab.org/ BUSCO] - assess assembly completeness&lt;br /&gt;
* [https://github.com/alekseyzimin/masurca MaSuRCA (polca)] - polish assembly&lt;br /&gt;
* Python - get assembly stats&lt;br /&gt;
* [https://github.com/lh3/minimap2 Minimap2] - map long reads to reference. Genome alignment&lt;br /&gt;
* [http://www.htslib.org/ Samtools] - sort and index mapped reads and vcf files&lt;br /&gt;
* [https://github.com/pjedge/longshot Longshot] - variant calling with nanopore reads&lt;br /&gt;
* [https://github.com/bwa-mem2/bwa-mem2 Bwa-mem2] - map short reads to reference&lt;br /&gt;
* [https://github.com/freebayes/freebayes Freebayes] - variant calling using short reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* R - [https://github.com/tpoorten/dotPlotly pafCoordsDotPlotly] - plot genome alignment&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:nanopore-assembly-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;LONGREADS: &amp;amp;lt;nanopore_reads.fq.gz&amp;amp;gt;&lt;br /&gt;
SHORTREADS:&lt;br /&gt;
  - /path/to/short/reads_1.fq.gz&lt;br /&gt;
  - /path/to/short/reads_2.fq.gz&lt;br /&gt;
GENOME_SIZE: &amp;amp;lt;approximate genome size&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;prefix&amp;amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
BUSCO_LINEAGE:&lt;br /&gt;
  - &amp;amp;lt;lineage&amp;amp;gt;&lt;br /&gt;
&lt;br /&gt;
# genome alignment parameters:&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&amp;lt;/pre&amp;gt;&lt;br /&gt;
* LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)&lt;br /&gt;
* SHORTREADS - paths to short reads fq.gz&lt;br /&gt;
* GENOME_SIZE - approximate genome size &amp;lt;code&amp;gt;haploid genome size (bp)(e.g. &#039;3e9&#039; for human genome)&amp;lt;/code&amp;gt; from [https://github.com/bcgsc/longstitch#full-help-page longstitch]&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* BUSCO_LINEAGE - lineage used for busco. Can be one or more (one per line). To see available lineages run &amp;lt;code&amp;gt;busco --list-datasets&amp;lt;/code&amp;gt;&lt;br /&gt;
* COMPARISON_GENOME - genome for whole genome comparison. Add your species name and the path to the fasta file. ex: &amp;lt;code&amp;gt;chicken: /path/to/chicken.fna.gz&amp;lt;/code&amp;gt;. You can add several genomes, one on each line.&lt;br /&gt;
** If you don&#039;t want to run the genome alignment step, comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&amp;lt;/pre&amp;gt;&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there&#039;s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
If you have your long reads in several fastq files and need to create one file compressed file with all the reads:&lt;br /&gt;
&lt;br /&gt;
# In your pipeline directory create one file with all the reads&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cat /path/to/fastq/directory/*.fastq &amp;amp;gt; &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compress the file you just created:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gzip &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== After installing the conda environment (first step of this guide) you&#039;ll need to edit the polca.sh file. ===&lt;br /&gt;
&lt;br /&gt;
First go to the directory where miniconda3 is installed (usually your home directory). Go to &amp;lt;code&amp;gt;/&amp;amp;lt;home&amp;amp;gt;/miniconda/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin&amp;lt;/code&amp;gt; and open the file &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt;. In my case the path looks like this: &amp;lt;code&amp;gt;/home/WUR/&amp;amp;lt;username&amp;amp;gt;/miniconda3/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin/&amp;lt;/code&amp;gt;. In your editor open &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt; and replace this line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) $BASM.alignSorted 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
With this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) -o $BASM.alignSorted.bam 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files are and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa&#039;&#039;&#039; final assembly&lt;br /&gt;
** assembly_stats_&amp;amp;lt;prefix&amp;amp;gt;.txt file with assembly statistics for the final assembly&lt;br /&gt;
** &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory with variant calling VCF files with long and short reads, as well as VCF stats&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz.stats&lt;br /&gt;
*** {prefix}_longreads.vcf.gz&lt;br /&gt;
*** {prefix}_longreads.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;genome_alignment&#039;&#039;&#039; directory with results and figure from whole genome alignment&lt;br /&gt;
*** {prefix}_{species}.png&lt;br /&gt;
* &#039;&#039;&#039;mapped&#039;&#039;&#039; directory that contains the bam file with long reads mapped to the new assembly&lt;br /&gt;
** {prefix}_longreads.mapped.sorted.bam&lt;br /&gt;
* &#039;&#039;&#039;busco_{prefix}_before_polish_&#039;&#039;&#039; and &#039;&#039;&#039;busco_{prefix}_after_polish&#039;&#039;&#039; directories - contain busco results before and after polishing&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_before_polish.txt&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_after_polish.txt&amp;amp;quot;&lt;br /&gt;
* &#039;&#039;&#039;other_files&#039;&#039;&#039; - directory containing other files created during the pipeline&lt;br /&gt;
* &#039;&#039;&#039;assembly&#039;&#039;&#039; - directory containing files created during the assembly step&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2138</id>
		<title>Mapping and variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2138"/>
		<updated>2021-11-01T09:38:22Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: changed mapping tool&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/mapping-variant-calling   &lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/WUR_mapping-variant-calling Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa-mem2 - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
* Freebayes - variant calling&lt;br /&gt;
* Bcftools - VCF statistics&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:mapping-variant-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* PREFIX - prefix for the final mapped reads file&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out &amp;lt;pre&amp;gt;READS_DIR: /path/to/reads/&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;sorted_reads&#039;&#039;&#039; directory with the file containing the mapped reads&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory containing the qualimap results&lt;br /&gt;
* &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory containing the variant calling VCF file and a file with with VCF statistics&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Nanopore-assembly-workflow.png&amp;diff=2137</id>
		<title>File:Nanopore-assembly-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Nanopore-assembly-workflow.png&amp;diff=2137"/>
		<updated>2021-10-29T08:38:21Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2136</id>
		<title>Nanopore assembly and variant calling</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Nanopore_assembly_and_variant_calling&amp;diff=2136"/>
		<updated>2021-10-29T08:37:36Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about nanopore pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Assemble nanopore reads and do variant calling with short and long reads =&lt;br /&gt;
Path to pipeline: /lustre/nobackup/WUR/ABGC/shared/PIPELINES/nanopore-assembly&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline that uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create a nanopore assembly. It also does variant calling with long and short reads.&amp;lt;br /&amp;gt;&lt;br /&gt;
The pipeline starts by using &amp;lt;code&amp;gt;porechop&amp;lt;/code&amp;gt; to trim the adaptors, then it uses &amp;lt;code&amp;gt;Flye&amp;lt;/code&amp;gt; to create the assembly. After that, &amp;lt;code&amp;gt;ntLink-arks&amp;lt;/code&amp;gt; from &amp;lt;code&amp;gt;Lonstitch&amp;lt;/code&amp;gt; is used to scaffold the assembly using the nanopore reads. The scaffolded assembly is polished with &amp;lt;code&amp;gt;polca&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;Polca&amp;lt;/code&amp;gt; also does variant calling with the short reads, while &amp;lt;code&amp;gt;longshot&amp;lt;/code&amp;gt; does variant calling with the nanopore reads. To run &amp;lt;code&amp;gt;longshot&amp;lt;/code&amp;gt;, first the long reads are aligned to the assembly with &amp;lt;code&amp;gt;minimap2&amp;lt;/code&amp;gt;.&amp;lt;br /&amp;gt;&lt;br /&gt;
In the end, in addition to your assembly and variant calling results, you&#039;ll also get assembly statistics and busco scores before and after the polishing.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/rrwick/Porechop Porechop] - trim adaptors&lt;br /&gt;
* [https://github.com/fenderglass/Flye Flye] - assembly&lt;br /&gt;
* [https://github.com/lh3/seqtk Seqtk] - convert fasta to one line fasta&lt;br /&gt;
* [https://github.com/bcgsc/longstitch LongStitch (ntLink-arks)] - scaffolding with nanopore reads&lt;br /&gt;
* [https://busco.ezlab.org/ BUSCO] - assess assembly completeness&lt;br /&gt;
* [https://github.com/alekseyzimin/masurca MaSuRCA (polca)] - polish assembly and do variant calling with short reads&lt;br /&gt;
* Python - get assembly stats&lt;br /&gt;
* [https://github.com/lh3/minimap2 Minimap2] - map long reads to reference. Genome alignment&lt;br /&gt;
* [http://www.htslib.org/ Samtools] - sort and index mapped reads and vcf files&lt;br /&gt;
* [https://github.com/pjedge/longshot Longshot] - variant calling with nanopore reads&lt;br /&gt;
* [https://samtools.github.io/bcftools/bcftools.html bcftools] - vcf statistics&lt;br /&gt;
* R - [https://github.com/tpoorten/dotPlotly pafCoordsDotPlotly] - plot genome alignment&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:nanopore-assembly-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;LONGREADS: &amp;amp;lt;nanopore_reads.fq.gz&amp;amp;gt;&lt;br /&gt;
SHORTREADS:&lt;br /&gt;
  - /path/to/short/reads_1.fq.gz&lt;br /&gt;
  - /path/to/short/reads_2.fq.gz&lt;br /&gt;
GENOME_SIZE: &amp;amp;lt;approximate genome size&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;prefix&amp;amp;gt;&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
BUSCO_LINEAGE:&lt;br /&gt;
  - &amp;amp;lt;lineage&amp;amp;gt;&lt;br /&gt;
&lt;br /&gt;
# genome alignment parameters:&lt;br /&gt;
COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&lt;br /&gt;
&lt;br /&gt;
# filter alignments less than cutoff X bp&lt;br /&gt;
MIN_ALIGNMENT_LENGTH: 10000&lt;br /&gt;
MIN_QUERY_LENGTH: 50000&amp;lt;/pre&amp;gt;&lt;br /&gt;
* LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)&lt;br /&gt;
* SHORTREADS - paths to short reads fq.gz&lt;br /&gt;
* GENOME_SIZE - approximate genome size &amp;lt;code&amp;gt;haploid genome size (bp)(e.g. &#039;3e9&#039; for human genome)&amp;lt;/code&amp;gt; from [https://github.com/bcgsc/longstitch#full-help-page longstitch]&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open config.yaml and comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* BUSCO_LINEAGE - lineage used for busco. Can be one or more (one per line). To see available lineages run &amp;lt;code&amp;gt;busco --list-datasets&amp;lt;/code&amp;gt;&lt;br /&gt;
* COMPARISON_GENOME - genome for whole genome comparison. Add your species name and the path to the fasta file. ex: &amp;lt;code&amp;gt;chicken: /path/to/chicken.fna.gz&amp;lt;/code&amp;gt;. You can add several genomes, one on each line.&lt;br /&gt;
** If you don&#039;t want to run the genome alignment step, comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;COMPARISON_GENOME: &lt;br /&gt;
  &amp;amp;lt;species&amp;amp;gt;: /path/to/genome/fasta&amp;lt;/pre&amp;gt;&lt;br /&gt;
* MIN_ALIGNMENT_LENGTH and MIN_QUERY_LENGTH - parameters for plotting. If your plot is coming out blank or if there&#039;s an error with the plotting step, try lowering these thresholds. This happens because the alignments are not large enough.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
If you have your long reads in several fastq files and need to create one file compressed file with all the reads:&lt;br /&gt;
&lt;br /&gt;
# In your pipeline directory create one file with all the reads&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cat /path/to/fastq/directory/*.fastq &amp;amp;gt; &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;2&amp;quot; style=&amp;quot;list-style-type: decimal;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compress the file you just created:&amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gzip &amp;amp;lt;name of file&amp;amp;gt;.fq&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== After installing the conda environment (first step of this guide) you&#039;ll need to edit the polca.sh file. ===&lt;br /&gt;
&lt;br /&gt;
First go to the directory where miniconda3 is installed (usually your home directory). Go to &amp;lt;code&amp;gt;/&amp;amp;lt;home&amp;amp;gt;/miniconda/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin&amp;lt;/code&amp;gt; and open the file &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt;. In my case the path looks like this: &amp;lt;code&amp;gt;/home/WUR/&amp;amp;lt;username&amp;amp;gt;/miniconda3/envs/&amp;amp;lt;env_name&amp;amp;gt;/bin/&amp;lt;/code&amp;gt;. In your editor open &amp;lt;code&amp;gt;polca.sh&amp;lt;/code&amp;gt; and replace this line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) $BASM.alignSorted 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
With this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$SAMTOOLS sort -m $MEM -@ $NUM_THREADS &amp;amp;lt;(samtools view -uhS $BASM.unSorted.sam) -o $BASM.alignSorted.bam 2&amp;amp;gt;&amp;amp;gt;samtools.err &amp;amp;amp;&amp;amp;amp; \&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
The most important files are and directories are:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;amp;lt;run_date&amp;amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory that contains&lt;br /&gt;
** &#039;&#039;&#039;{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa&#039;&#039;&#039; final assembly&lt;br /&gt;
** assembly_stats_&amp;amp;lt;prefix&amp;amp;gt;.txt file with assembly statistics for the final assembly&lt;br /&gt;
** &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory with variant calling VCF files with long and short reads, as well as VCF stats&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz&lt;br /&gt;
*** {prefix}_shortreads.vcf.gz.stats&lt;br /&gt;
*** {prefix}_longreads.vcf.gz&lt;br /&gt;
*** {prefix}_longreads.vcf.gz.stats&lt;br /&gt;
** &#039;&#039;&#039;genome_alignment&#039;&#039;&#039; directory with results and figure from whole genome alignment&lt;br /&gt;
*** {prefix}_{species}.png&lt;br /&gt;
* &#039;&#039;&#039;mapped&#039;&#039;&#039; directory that contains the bam file with long reads mapped to the new assembly&lt;br /&gt;
** {prefix}_longreads.mapped.sorted.bam&lt;br /&gt;
* &#039;&#039;&#039;busco_{prefix}_before_polish_&#039;&#039;&#039; and &#039;&#039;&#039;busco_{prefix}_after_polish&#039;&#039;&#039; directories - contain busco results before and after polishing&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_before_polish.txt&lt;br /&gt;
** short_summary.specific.{lineage}.{prefix}_after_polish.txt&amp;amp;quot;&lt;br /&gt;
* &#039;&#039;&#039;other_files&#039;&#039;&#039; - directory containing other files created during the pipeline&lt;br /&gt;
* &#039;&#039;&#039;assembly&#039;&#039;&#039; - directory containing files created during the assembly step&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2135</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2135"/>
		<updated>2021-10-28T08:50:29Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about split read support option&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python  &lt;br /&gt;
** PyVcf - add depth to vcf and create final table&lt;br /&gt;
** bamgroupreads.py + samblaster - create bam files with split and discordant reads&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;amp;lt;number of chromosomes&amp;amp;gt;&lt;br /&gt;
BWA_MEM_M: &amp;amp;lt;Y/N&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
Tip: use the name of the bam file without the .bam extension as the sample name. Ex: from sample1.bam to sample1&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
* BWA_MEM_M - if you mapped your reads with `bwa mem` using the `-M` parameter and you want split read support in your VCF you need to run an extra step. For this write `Y`.  &lt;br /&gt;
For a more detailed explanation see here [https://carolinapb.github.io/2021-10-28-smoove-SR-support/ Smoove SR support]  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
First load R:&lt;br /&gt;
&amp;lt;pre&amp;gt;module load R/3.6.2&amp;lt;/pre&amp;gt;&lt;br /&gt;
Enter the R environment by writing &amp;lt;pre&amp;gt;R&amp;lt;/pre&amp;gt; and clicking enter.  &lt;br /&gt;
Install the packages:&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;lt;- c(&amp;quot;optparse&amp;quot;, &amp;quot;data.table&amp;quot;, &amp;quot;ggplot2&amp;quot;)&lt;br /&gt;
new.packages &amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;quot;Package&amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
  &#039;lib = &amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here]  and try to install the packages again. &lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF - with VEP annotation, not filtered for quality&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information. The variants have been filtered with Minimum Quality score = 30&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions. It has been filtered with Minimum Quality score = 30 and the DEPTH* field was added&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
*The &#039;&#039;&#039;DEPTH&#039;&#039;&#039; field in the vcf has six fields, corresponding to the average depth across all samples.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;DEPTH=(DHBFC_1/1, DHBFC_0/1, DHBFC_0/0, DHFFC_1/1, DHFFC_0/1, DHFFC_0/0)&amp;lt;/pre&amp;gt;&lt;br /&gt;
Depth fold change as described in [https://github.com/brentp/duphold duphold]: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;&lt;br /&gt;
DHFFC: fold-change for the variant depth relative to Flanking regions.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These fields are also in the &amp;lt;code&amp;gt;{prefix}_DUP_DEL_INV_table.tsv&amp;lt;/code&amp;gt; file&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Pop-sv-calling-workflow.png&amp;diff=2134</id>
		<title>File:Pop-sv-calling-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Pop-sv-calling-workflow.png&amp;diff=2134"/>
		<updated>2021-10-28T08:43:27Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: Moiti001 uploaded a new version of File:Pop-sv-calling-workflow.png&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;population structural variation calling pipeline workflow&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Mapping-variant-calling-workflow.png&amp;diff=2133</id>
		<title>File:Mapping-variant-calling-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Mapping-variant-calling-workflow.png&amp;diff=2133"/>
		<updated>2021-10-22T11:48:00Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: workflow for mapping and variant calling pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;workflow for mapping and variant calling pipeline&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2132</id>
		<title>Mapping and variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2132"/>
		<updated>2021-10-22T11:46:42Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added pipeline workflow, small details&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/WUR_mapping-variant-calling Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
* Freebayes - variant calling&lt;br /&gt;
* Bcftools - VCF statistics&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:mapping-variant-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* PREFIX - prefix for the final mapped reads file&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out &amp;lt;pre&amp;gt;READS_DIR: /path/to/reads/&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;sorted_reads&#039;&#039;&#039; directory with the file containing the mapped reads&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory containing the qualimap results&lt;br /&gt;
* &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory containing the variant calling VCF file and a file with with VCF statistics&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2131</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2131"/>
		<updated>2021-10-15T11:49:16Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added new config parameter - num of chrs&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python - add depth to vcf and create final table&lt;br /&gt;
** PyVcf&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&lt;br /&gt;
NUM_CHRS: &amp;amp;lt;number of chromosomes&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
Tip: use the name of the bam file without the .bam extension as the sample name. Ex: from sample1.bam to sample1&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
* NUM_CHRS - number of chromosomes for your species (necessary for plink). ex: 38&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
First load R:&lt;br /&gt;
&amp;lt;pre&amp;gt;module load R/3.6.2&amp;lt;/pre&amp;gt;&lt;br /&gt;
Enter the R environment by writing &amp;lt;pre&amp;gt;R&amp;lt;/pre&amp;gt; and clicking enter.  &lt;br /&gt;
Install the packages:&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;lt;- c(&amp;quot;optparse&amp;quot;, &amp;quot;data.table&amp;quot;, &amp;quot;ggplot2&amp;quot;)&lt;br /&gt;
new.packages &amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;quot;Package&amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
  &#039;lib = &amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here]  and try to install the packages again. &lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF - with VEP annotation, not filtered for quality&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information. The variants have been filtered with Minimum Quality score = 30&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions. It has been filtered with Minimum Quality score = 30 and the DEPTH* field was added&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
*The &#039;&#039;&#039;DEPTH&#039;&#039;&#039; field in the vcf has six fields, corresponding to the average depth across all samples.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;DEPTH=(DHBFC_1/1, DHBFC_0/1, DHBFC_0/0, DHFFC_1/1, DHFFC_0/1, DHFFC_0/0)&amp;lt;/pre&amp;gt;&lt;br /&gt;
Depth fold change as described in [https://github.com/brentp/duphold duphold]: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;&lt;br /&gt;
DHFFC: fold-change for the variant depth relative to Flanking regions.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These fields are also in the &amp;lt;code&amp;gt;{prefix}_DUP_DEL_INV_table.tsv&amp;lt;/code&amp;gt; file&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2130</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2130"/>
		<updated>2021-10-15T09:15:25Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added nanopore assembly pipeline page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
* [[Nanopore assembly and variant calling | Nanopore assembly and variant calling]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2129</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2129"/>
		<updated>2021-10-08T12:57:02Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path   &amp;lt;br/&amp;gt;&lt;br /&gt;
In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp -r &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You&#039;ll need to activate the conda environment again&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake.&lt;br /&gt;
&lt;br /&gt;
If all looks ok, you can now run your pipeline&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake --profile &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If everything was set up correctly, the jobs should be submitted and you should be able to see the progress of the pipeline in your terminal.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2128</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2128"/>
		<updated>2021-10-08T12:52:38Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python - add depth to vcf and create final table&lt;br /&gt;
** PyVcf&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
Tip: use the name of the bam file without the .bam extension as the sample name. Ex: from sample1.bam to sample1&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
First load R:&lt;br /&gt;
&amp;lt;pre&amp;gt;module load R/3.6.2&amp;lt;/pre&amp;gt;&lt;br /&gt;
Enter the R environment by writing &amp;lt;pre&amp;gt;R&amp;lt;/pre&amp;gt; and clicking enter.  &lt;br /&gt;
Install the packages:&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;lt;- c(&amp;quot;optparse&amp;quot;, &amp;quot;data.table&amp;quot;, &amp;quot;ggplot2&amp;quot;)&lt;br /&gt;
new.packages &amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;quot;Package&amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
  &#039;lib = &amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here]  and try to install the packages again. &lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF - with VEP annotation, not filtered for quality&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information. The variants have been filtered with Minimum Quality score = 30&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions. It has been filtered with Minimum Quality score = 30 and the DEPTH* field was added&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
*The &#039;&#039;&#039;DEPTH&#039;&#039;&#039; field in the vcf has six fields, corresponding to the average depth across all samples.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;DEPTH=(DHBFC_1/1, DHBFC_0/1, DHBFC_0/0, DHFFC_1/1, DHFFC_0/1, DHFFC_0/0)&amp;lt;/pre&amp;gt;&lt;br /&gt;
Depth fold change as described in [https://github.com/brentp/duphold duphold]: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;&lt;br /&gt;
DHFFC: fold-change for the variant depth relative to Flanking regions.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These fields are also in the &amp;lt;code&amp;gt;{prefix}_DUP_DEL_INV_table.tsv&amp;lt;/code&amp;gt; file&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2127</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2127"/>
		<updated>2021-10-08T09:42:37Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added tip on how to define sample name for sample list file&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python - add depth to vcf and create final table&lt;br /&gt;
** PyVcf&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &lt;br /&gt;
&amp;lt;blockquote&amp;gt;sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
Tip: use the name of the bam file without the .bam extension as the sample name. Ex: from sample1.bam to sample1&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
First load R:&lt;br /&gt;
&amp;lt;pre&amp;gt;module load R/3.6.2&amp;lt;/pre&amp;gt;&lt;br /&gt;
Enter the R environment by writing &amp;lt;pre&amp;gt;R&amp;lt;/pre&amp;gt; and clicking enter.  &lt;br /&gt;
Install the packages:&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;lt;- c(&amp;quot;optparse&amp;quot;, &amp;quot;data.table&amp;quot;, &amp;quot;ggplot2&amp;quot;)&lt;br /&gt;
new.packages &amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;quot;Package&amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
  &#039;lib = &amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] &lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2126</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2126"/>
		<updated>2021-10-08T09:20:57Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about installing R packages&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python - add depth to vcf and create final table&lt;br /&gt;
** PyVcf&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &amp;amp;gt; sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
=== Configuring VEP ===&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Installing R packages ===&lt;br /&gt;
First load R:&lt;br /&gt;
&amp;lt;pre&amp;gt;module load R/3.6.2&amp;lt;/pre&amp;gt;&lt;br /&gt;
Enter the R environment by writing &amp;lt;pre&amp;gt;R&amp;lt;/pre&amp;gt; and clicking enter.  &lt;br /&gt;
Install the packages:&lt;br /&gt;
&amp;lt;pre&amp;gt;list.of.packages &amp;lt;- c(&amp;quot;optparse&amp;quot;, &amp;quot;data.table&amp;quot;, &amp;quot;ggplot2&amp;quot;)&lt;br /&gt;
new.packages &amp;lt;- list.of.packages[!(list.of.packages %in% installed.packages()[,&amp;quot;Package&amp;quot;])]&lt;br /&gt;
if(length(new.packages)) install.packages(new.packages)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you get an error like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;Warning in install.packages(new.packages) :&lt;br /&gt;
  &#039;lib = &amp;quot;/cm/shared/apps/R/3.6.2/lib64/R/library&amp;quot;&#039; is not writable&amp;lt;/pre&amp;gt;&lt;br /&gt;
Follow the instructions on how to install R packages locally [https://wiki.anunna.wur.nl/index.php/Installing_R_packages_locally here] &lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_mapping_pipeline&amp;diff=2125</id>
		<title>Population mapping pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_mapping_pipeline&amp;diff=2125"/>
		<updated>2021-09-29T12:55:31Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added workflow image&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-mapping Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads from several individuals to a reference assembly. It outputs the mapped reads and a qualimap report.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Population-mapping-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ASSEMBLY: /path/to/assembly&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
PATHS_WITH_FILES:&lt;br /&gt;
  path1: /path/to/dir&amp;lt;/pre&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* PATHS_WITH_FILES - directory that can contain subdirectories where the &#039;&#039;&#039;fq.gz&#039;&#039;&#039; reads are located. You can add several paths by adding &amp;lt;code&amp;gt;path2: /path/to/dir&amp;lt;/code&amp;gt; under &amp;lt;code&amp;gt;PATHS_WITH_FILES&amp;lt;/code&amp;gt;. (The line you add has to have indentation)&lt;br /&gt;
&lt;br /&gt;
The script goes through the subdirectories of the directory you choose under &amp;lt;code&amp;gt;PATHS_WITH_FILES&amp;lt;/code&amp;gt; looking for files with &#039;&#039;&#039;fq.gz&#039;&#039;&#039; extension.&amp;lt;br /&amp;gt;&lt;br /&gt;
Example: if &amp;lt;code&amp;gt;path1: /lustre/nobackup/WUR/ABGC/shared/Chicken/Africa/X201SC20031230-Z01-F006_multipath&amp;lt;/code&amp;gt;, the subdirectory structure could be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/lustre/nobackup/WUR/ABGC/shared/Chicken/Africa/X201SC20031230-Z01-F006_multipath  &lt;br /&gt;
├── X201SC20031230-Z01-F006_1  &lt;br /&gt;
│   └── raw_data  &lt;br /&gt;
│       ├── a109_26_15_1_H  &lt;br /&gt;
│       │   ├── a109_26_15_1_H_FDSW202597655-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
│       │   ├── a109_26_15_1_H_FDSW202597655-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
│       │   └── MD5.txt  &lt;br /&gt;
│       └── a20_10_16_1_H  &lt;br /&gt;
│           ├── a20_10_16_1_H_FDSW202597566-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
│           ├── a20_10_16_1_H_FDSW202597566-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
│           └── MD5.txt  &lt;br /&gt;
└── X201SC20031230-Z01-F006_2  &lt;br /&gt;
    └── raw_data  &lt;br /&gt;
        ├── a349_Be_17_1_C  &lt;br /&gt;
        │   ├── a349_Be_17_1_C_FDSW202597895-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
        │   ├── a349_Be_17_1_C_FDSW202597895-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
        │   └── MD5.txt  &lt;br /&gt;
        └── a360_Be_05_1_H  &lt;br /&gt;
            ├── a360_Be_05_1_H_FDSW202597906-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
            ├── a360_Be_05_1_H_FDSW202597906-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
            └── MD5.txt  &amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;processed_reads&#039;&#039;&#039; directory with the bam files with the mapped reads for every sample&lt;br /&gt;
* &#039;&#039;&#039;mapping_stats&#039;&#039;&#039; directory containing the qualimap results and a summary of the qualimap results for all samples in &amp;lt;code&amp;gt;sample_quality_summary.tsv&amp;lt;/code&amp;gt;&lt;br /&gt;
** &#039;&#039;&#039;qualimap&#039;&#039;&#039; contains qualimap results per sample&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Population-mapping-workflow.png&amp;diff=2124</id>
		<title>File:Population-mapping-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Population-mapping-workflow.png&amp;diff=2124"/>
		<updated>2021-09-29T12:51:44Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: workflow for population mapping snakemake pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;workflow for population mapping snakemake pipeline&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_mapping_pipeline&amp;diff=2123</id>
		<title>Population mapping pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_mapping_pipeline&amp;diff=2123"/>
		<updated>2021-09-29T12:49:30Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-mapping Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads from several individuals to a reference assembly. It outputs the mapped reads and a qualimap report.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:https://github.com/CarolinaPB/population-mapping/blob/wur/workflow.png|DAG]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ASSEMBLY: /path/to/assembly&lt;br /&gt;
OUTDIR: /path/to/outdir&lt;br /&gt;
PATHS_WITH_FILES:&lt;br /&gt;
  path1: /path/to/dir&amp;lt;/pre&amp;gt;&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out &amp;lt;code&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/code&amp;gt;&lt;br /&gt;
* PATHS_WITH_FILES - directory that can contain subdirectories where the &#039;&#039;&#039;fq.gz&#039;&#039;&#039; reads are located. You can add several paths by adding &amp;lt;code&amp;gt;path2: /path/to/dir&amp;lt;/code&amp;gt; under &amp;lt;code&amp;gt;PATHS_WITH_FILES&amp;lt;/code&amp;gt;. (The line you add has to have indentation)&lt;br /&gt;
&lt;br /&gt;
The script goes through the subdirectories of the directory you choose under &amp;lt;code&amp;gt;PATHS_WITH_FILES&amp;lt;/code&amp;gt; looking for files with &#039;&#039;&#039;fq.gz&#039;&#039;&#039; extension.&amp;lt;br /&amp;gt;&lt;br /&gt;
Example: if &amp;lt;code&amp;gt;path1: /lustre/nobackup/WUR/ABGC/shared/Chicken/Africa/X201SC20031230-Z01-F006_multipath&amp;lt;/code&amp;gt;, the subdirectory structure could be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/lustre/nobackup/WUR/ABGC/shared/Chicken/Africa/X201SC20031230-Z01-F006_multipath  &lt;br /&gt;
├── X201SC20031230-Z01-F006_1  &lt;br /&gt;
│   └── raw_data  &lt;br /&gt;
│       ├── a109_26_15_1_H  &lt;br /&gt;
│       │   ├── a109_26_15_1_H_FDSW202597655-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
│       │   ├── a109_26_15_1_H_FDSW202597655-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
│       │   └── MD5.txt  &lt;br /&gt;
│       └── a20_10_16_1_H  &lt;br /&gt;
│           ├── a20_10_16_1_H_FDSW202597566-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
│           ├── a20_10_16_1_H_FDSW202597566-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
│           └── MD5.txt  &lt;br /&gt;
└── X201SC20031230-Z01-F006_2  &lt;br /&gt;
    └── raw_data  &lt;br /&gt;
        ├── a349_Be_17_1_C  &lt;br /&gt;
        │   ├── a349_Be_17_1_C_FDSW202597895-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
        │   ├── a349_Be_17_1_C_FDSW202597895-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
        │   └── MD5.txt  &lt;br /&gt;
        └── a360_Be_05_1_H  &lt;br /&gt;
            ├── a360_Be_05_1_H_FDSW202597906-1r_HWFFFDSXY_L3_1.fq.gz  &lt;br /&gt;
            ├── a360_Be_05_1_H_FDSW202597906-1r_HWFFFDSXY_L3_2.fq.gz  &lt;br /&gt;
            └── MD5.txt  &amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* **&amp;lt;run_date&amp;gt;_files.txt** dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;processed_reads&#039;&#039;&#039; directory with the bam files with the mapped reads for every sample&lt;br /&gt;
* &#039;&#039;&#039;mapping_stats&#039;&#039;&#039; directory containing the qualimap results and a summary of the qualimap results for all samples in &amp;lt;code&amp;gt;sample_quality_summary.tsv&amp;lt;/code&amp;gt;&lt;br /&gt;
** &#039;&#039;&#039;qualimap&#039;&#039;&#039; contains qualimap results per sample&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2122</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2122"/>
		<updated>2021-09-29T12:42:24Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added page for population mapping pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
* [[Population mapping pipeline | Population mapping pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2121</id>
		<title>Population structural variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Population_structural_variant_calling_pipeline&amp;diff=2121"/>
		<updated>2021-09-22T11:39:24Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: added info about population SV calling pipeline&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/population-structural-var-calling-smoove/tree/single_run Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to perform structural variant calling in a population using Smoove. It also runs VEP and performs PCA. In addition to the VCF with the SVs, you also get a .tsv file with some summarized information on the SVs: it includes allele frequency per population, as well as VEP annotation and depth fold change as described in [https://github.com/brentp/duphold duphold]:  &amp;lt;br /&amp;gt; &lt;br /&gt;
&amp;amp;gt; DHBFC: fold-change for the variant depth relative to bins in the genome with similar GC-content.&amp;lt;br /&amp;gt;  &lt;br /&gt;
&amp;amp;gt; DHFFC: fold-change for the variant depth relative to Flanking regions.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Smoove - SV calling&lt;br /&gt;
* VEP - determines the effect of the variants&lt;br /&gt;
* Plink - perform PCA&lt;br /&gt;
* R - plot PCA&lt;br /&gt;
* SURVIVOR - basic SV stats&lt;br /&gt;
* Python - add depth to vcf and create final table&lt;br /&gt;
** PyVcf&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| [[File:Pop-sv-calling-workflow.png]]&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| &#039;&#039;Pipeline workflow&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
SAMPLE_LIST: /path/to/file&lt;br /&gt;
REFERENCE: /path/to/assembly&lt;br /&gt;
CONTIGS_IGNORE: /path/to/file&lt;br /&gt;
SPECIES: &amp;amp;lt;species_name&amp;amp;gt;&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* SAMPLE_LIST - three column csv with the sample name, name of the bam files to use in the second column and the name of the corresponding population on the third column. These bams should all be in the same directory (READS_DIR)&lt;br /&gt;
* Example: &amp;amp;gt; sample1,sample1.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample2,sample2.bam,Pop1&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample3,sample3.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;amp;gt; sample4,sample4.bam,Pop2&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* REFERENCE - path to the assembly file&lt;br /&gt;
* CONTIGS_IGNORE - contigs to be excluded from SV calling (usually the small contigs)&lt;br /&gt;
** If you don&#039;t want to exclude contigs you&#039;ll need to edit the Snakefile to remove this line &amp;lt;code&amp;gt;--excludechroms {params.contigs} \&amp;lt;/code&amp;gt;&lt;br /&gt;
* SPECIES - species name to be used for VEP&lt;br /&gt;
* PREFIX - prefix for the created files&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), comment out or remove&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/outdir&amp;lt;/pre&amp;gt;&lt;br /&gt;
== ADDITIONAL SET UP ==&lt;br /&gt;
&lt;br /&gt;
This pipeline uses VEP in offline mode, which increases performance. In order to use it in this mode, the cache for the species used needs to be installed: &lt;br /&gt;
Check if the cache file for your species already exist in &amp;lt;code&amp;gt;/lustre/nobackup/SHARED/cache/&amp;lt;/code&amp;gt;. If it doesn’t, create it with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;/usr/bin/perl /cm/shared/apps/SHARED/ensembl-vep/INSTALL.pl --CACHEDIR /lustre/nobackup/SHARED/cache/ --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
==== Other option: ====&lt;br /&gt;
&lt;br /&gt;
You can install VEP with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda install -c bioconda ensembl-vep&amp;lt;/pre&amp;gt;&lt;br /&gt;
and install the cache with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;vep_install --CACHEDIR &amp;amp;lt;where/to/install/cache&amp;amp;gt; --AUTO c -n --SPECIES &amp;amp;lt;species&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
When multiple assemblies are found you need to run it again with &amp;lt;code&amp;gt;--ASSEMBLY &amp;amp;lt;assembly name&amp;amp;gt;&amp;lt;/code&amp;gt;, where “assembly name” is the name of the assembly you want to use.&lt;br /&gt;
&lt;br /&gt;
In the Snakefile, in rule &amp;lt;code&amp;gt;run_vep&amp;lt;/code&amp;gt;, replace &amp;lt;code&amp;gt;/cm/shared/apps/SHARED/ensembl-vep/vep&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;vep&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;&amp;lt;run_date&amp;gt;_files.txt&#039;&#039;&#039; Dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;2_merged&#039;&#039;&#039;&lt;br /&gt;
** {prefix}.smoove-counts.html - shows a summary of the number of reads before and after filtering&lt;br /&gt;
* &#039;&#039;&#039;5_postprocessing&#039;&#039;&#039; directory that contains the final VCF file containing the structural variants found. This file has been annotated with VEP&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz - Final VCF&lt;br /&gt;
** {prefix}.smoove.square.vep.vcf.gz_summary.html - statistics from VEP&lt;br /&gt;
** {prefix}.nosex, {prefix}.log, {prefix}.eigenvec, {prefix}.eigenval - output files from the PCA&lt;br /&gt;
** {prefix}_DUP_DEL_INV_table.tsv - table with the most important information extracted from the VCF. Contains information about the SV, allele frequency for each population, VEP annotation and depth information&lt;br /&gt;
** {prefix}_DUP_DEL_INV.vcf - vcf file with annotated duplications, deletions and inversions&lt;br /&gt;
** {prefix}_BND.vcf - vcf file with variants annotated with BND&lt;br /&gt;
* &#039;&#039;&#039;6_metrics&#039;&#039;&#039; directory that contains general stats about the number of SVs found&lt;br /&gt;
* &#039;&#039;&#039;FIGURES&#039;&#039;&#039; directory that contains the PCA plot&lt;br /&gt;
&lt;br /&gt;
What you do with the results from this structural variant calling pipeline depends on your research question: a possible next step would be to explore the &#039;&#039;&#039;{prefix}_DUP_DEL_INV_table.tsv&#039;&#039;&#039; file and look at the largest SVs found (sort by &#039;&#039;SVLEN&#039;&#039;) or at a specific effect in the ANNOTATION column, such as “frameshift_variant”.&lt;br /&gt;
&lt;br /&gt;
See [https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html VEP effect descriptions] for a short description of the effects annotated by VEP.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=File:Pop-sv-calling-workflow.png&amp;diff=2120</id>
		<title>File:Pop-sv-calling-workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=File:Pop-sv-calling-workflow.png&amp;diff=2120"/>
		<updated>2021-09-22T11:30:39Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: population structural variation calling pipeline workflow&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;population structural variation calling pipeline workflow&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2119</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2119"/>
		<updated>2021-09-22T11:19:16Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: /* A list of tutorials, workflows, and recipes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
* [[Population structural variant calling pipeline | Population structural variant calling pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2118</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2118"/>
		<updated>2021-07-01T13:38:17Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path   &amp;lt;br/&amp;gt;&lt;br /&gt;
In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You&#039;ll need to activate the conda environment again&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake.&lt;br /&gt;
&lt;br /&gt;
If all looks ok, you can now run your pipeline&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake --profile &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If everything was set up correctly, the jobs should be submitted and you should be able to see the progress of the pipeline in your terminal.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2117</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2117"/>
		<updated>2021-07-01T08:03:10Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path   &amp;lt;br/&amp;gt;&lt;br /&gt;
In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake.&lt;br /&gt;
&lt;br /&gt;
If all looks ok, you can now run your pipeline&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake --profile &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If everything was set up correctly, the jobs should be submitted and you should be able to see the progress of the pipeline in your terminal.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2116</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2116"/>
		<updated>2021-07-01T06:39:30Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake.&lt;br /&gt;
&lt;br /&gt;
If all looks ok, you can now run your pipeline&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake --profile &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If everything was set up correctly, the jobs should be submitted and you should be able to see the progress of the pipeline in your terminal.&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2115</id>
		<title>Running Snakemake pipelines</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Running_Snakemake_pipelines&amp;diff=2115"/>
		<updated>2021-06-30T14:24:33Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: Created page with &amp;quot;Author: Carolina Pita Barros &amp;lt;br/&amp;gt; Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt; ABG  &amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt; You can find my pipelines [https://github.com/CarolinaPB/ here]  The Snakemake sha...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br/&amp;gt;&lt;br /&gt;
Contact: carolina.pitabarros@wur.nl &amp;lt;br/&amp;gt;&lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
You can find my pipelines [https://github.com/CarolinaPB/ here]&lt;br /&gt;
&lt;br /&gt;
The Snakemake shared here use modules loaded from the HPC and tools installed with conda.&lt;br /&gt;
&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== Clone the repository ==&lt;br /&gt;
&lt;br /&gt;
==== From github ====&lt;br /&gt;
&lt;br /&gt;
Go to the repository’s page, click the green “Code” button and copy the path In your terminal go to where you want to download it to and run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;git clone &amp;amp;lt;path you copied from github&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== From the the WUR HPC (Anunna) ====&lt;br /&gt;
&lt;br /&gt;
Go to &amp;lt;code&amp;gt;/lustre/nobackup/WUR/ABGC/shared/PIPELINES/&amp;lt;/code&amp;gt; and choose which pipeline you want to use.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;cp &amp;amp;lt;pipeline directory&amp;amp;gt; &amp;amp;lt;directory where you want to save it to&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
First you’ll need to do some set up. Go to the pipeline’s directory.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
Install &amp;lt;code&amp;gt;conda&amp;lt;/code&amp;gt; if you don’t have it&lt;br /&gt;
&lt;br /&gt;
=== Create conda environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda create --name &amp;amp;lt;name-of-pipeline&amp;amp;gt; --file requirements.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;I recommend giving it the same name as the pipeline&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
This environment contains snakemake and the other packages that are needed to run the pipeline.&lt;br /&gt;
&lt;br /&gt;
=== Activate environment ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda activate &amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== To deactivate the environment (if you want to leave the conda environment) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;conda deactivate&amp;lt;/pre&amp;gt;&lt;br /&gt;
== File configuration ==&lt;br /&gt;
&lt;br /&gt;
=== Create HPC config file ===&lt;br /&gt;
&lt;br /&gt;
Necessary for snakemake to prepare and send jobs.&lt;br /&gt;
&lt;br /&gt;
==== Start with creating the directory ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;mkdir -p ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&lt;br /&gt;
cd ~/.config/snakemake/&amp;amp;lt;name-of-pipeline&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
==== Create config.yaml and include the following: ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;My pipelines are configured to work with SLURM&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;jobs: 10&lt;br /&gt;
cluster: &amp;amp;quot;sbatch -t 1:0:0 --mem=16000 -c 16 --job-name={rule} --exclude=fat001,fat002,fat101,fat100 --output=logs_slurm/{rule}.out --error=logs_slurm/{rule}.err&amp;amp;quot;&lt;br /&gt;
&lt;br /&gt;
use-conda: true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Here you should configure the resources you want to use.&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
=== Go to the pipeline directory and open config.yaml ===&lt;br /&gt;
&lt;br /&gt;
Configure your paths, but keep the variable names that are already in the config file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output&lt;br /&gt;
READS_DIR: /path/to/reads/ &lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open the Snakefile and comment out &amp;lt;code&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/code&amp;gt; and ignore or comment out the &amp;lt;code&amp;gt;OUTDIR: /path/to/output&amp;lt;/code&amp;gt; in the config file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Now the setup is complete&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== How to run the pipeline ==&lt;br /&gt;
&lt;br /&gt;
Since the pipelines can take a while to run, it’s best if you use a [https://linuxize.com/post/how-to-use-linux-screen/ screen session]. By using a screen session, Snakemake stays “active” in the shell while it’s running, there’s no risk of the connection going down and Snakemake stopping.&lt;br /&gt;
&lt;br /&gt;
Start by creating a screen session:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;screen -S &amp;amp;lt;name of session&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;snakemake -np&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will show you the steps and commands that will be executed. Check the commands and file names to see i&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2114</id>
		<title>Bioinformatics tips tricks workflows</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Bioinformatics_tips_tricks_workflows&amp;diff=2114"/>
		<updated>2021-06-30T14:19:28Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is intended as a portal to pages concerning best practices, workflows and pipelines, and other protocols (including scripts).&lt;br /&gt;
&lt;br /&gt;
== A list of tutorials, workflows, and recipes ==&lt;br /&gt;
* [[Mapping_reads_with_Mosaik | Mapping Illumina GA2/HiSeq reads to the Sus scrofa genome assembly]]&lt;br /&gt;
* [[convert_fastq_to_fasta | A Perl script to convert fastq to fasta file format]]&lt;br /&gt;
* [[Mapping Pair-end reads with Stampy]]&lt;br /&gt;
* [[making_slices_from_BAM_files | Create slices from a collection of BAM files ]]&lt;br /&gt;
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]&lt;br /&gt;
* [[ssh_without_password | ssh without password]]&lt;br /&gt;
* [[Create_shortcut_log-in_command | Create a shortcut for the ssh log-in command]]&lt;br /&gt;
* [[Installing_R_packages_locally | Installing R packages locally]]&lt;br /&gt;
* [[command_line_tricks_for_manipulating_fastq | Command-line tricks for manipulating fastq files]]&lt;br /&gt;
* [[assemble_mitochondrial_genomes_from_short_read_data | Assemble mitochondrial genomes from whole-genome short-read data]]&lt;br /&gt;
* [[1000Bulls_mapping_pipeline_at_ABGC | 1000 Bulls mapping pipeline at ABGC]]&lt;br /&gt;
* [[ABGSA | Animal Breeding and Genomics Sequence Archives (ABGSA)]]&lt;br /&gt;
* [[Short_read_mapping_pipeline_pig | Pig mapping pipeline at ABGC]]&lt;br /&gt;
* [[Extract_noncall_snps_from_soy | Extract a set of pig SNPs not called in a control sample (soybean)]]&lt;br /&gt;
* [[calculate_corrected_theta_from_resequencing_data | Calculate nucleotide diversity (theta) corrected for sequencing depth]]&lt;br /&gt;
* [[RNA-seq analysis | RNA-seq analysis with Tophat]]&lt;br /&gt;
* [[Variant_annotation_tutorial | Variant annotation tutorial]]&lt;br /&gt;
* [[issues_asreml | Issues with ASReml]]&lt;br /&gt;
* [[Checkpointing | Checkpointing]]&lt;br /&gt;
* [[Assembly &amp;amp; Annotation | Assembly and Annotation guidelines (denovo)]]&lt;br /&gt;
* [[DE expression | DE expression analysis with tophat2 / cuffdiff]]&lt;br /&gt;
* [[JBrowse | JBrowse]]&lt;br /&gt;
* [[Running Snakemake pipelines | Running Snakemake pipelines]]&lt;br /&gt;
* [[Mapping and variant calling pipeline | Mapping and variant calling pipeline]]&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Help:Cheatsheet Help with editing Wiki pages]&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2113</id>
		<title>Mapping and variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2113"/>
		<updated>2021-06-30T14:18:19Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros &amp;lt;br /&amp;gt; &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &amp;lt;br /&amp;gt;&lt;br /&gt;
ABG&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/WUR_mapping-variant-calling Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
* Freebayes - variant calling&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* PREFIX - prefix for the final mapped reads file&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open Snakefile and comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;sorted_reads&#039;&#039;&#039; directory with the file containing the mapped reads&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory containing the qualimap results&lt;br /&gt;
* &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory containing the variant calling VCF file&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
	<entry>
		<id>https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2112</id>
		<title>Mapping and variant calling pipeline</title>
		<link rel="alternate" type="text/html" href="https://wiki.anunna.wur.nl/index.php?title=Mapping_and_variant_calling_pipeline&amp;diff=2112"/>
		<updated>2021-06-30T14:17:06Z</updated>

		<summary type="html">&lt;p&gt;Moiti001: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Author: Carolina Pita Barros  &lt;br /&gt;
Contact: carolina.pitabarros@wur.nl  &lt;br /&gt;
ABG&lt;br /&gt;
&lt;br /&gt;
[https://github.com/CarolinaPB/WUR_mapping-variant-calling Link to the repository]&lt;br /&gt;
&lt;br /&gt;
== First follow the instructions here: ==&lt;br /&gt;
&lt;br /&gt;
[https://carolinapb.github.io/2021-06-23-how-to-run-my-pipelines/ Step by step guide on how to use my pipelines]&amp;lt;br /&amp;gt;&lt;br /&gt;
Click [https://github.com/CarolinaPB/snakemake-template/blob/master/Short%20introduction%20to%20Snakemake.pdf here] for an introduction to Snakemake&lt;br /&gt;
&lt;br /&gt;
== ABOUT ==&lt;br /&gt;
&lt;br /&gt;
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.&lt;br /&gt;
&lt;br /&gt;
==== Tools used: ====&lt;br /&gt;
&lt;br /&gt;
* Bwa - mapping&lt;br /&gt;
* Samtools - processing&lt;br /&gt;
* Qualimap - mapping summary&lt;br /&gt;
* Freebayes - variant calling&lt;br /&gt;
&lt;br /&gt;
=== Edit config.yaml with the paths to your files ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;OUTDIR: /path/to/output &lt;br /&gt;
READS_DIR: /path/to/reads/ # don&#039;t add the reads files, just the directory where they are&lt;br /&gt;
ASSEMBLY: /path/to/assembly&lt;br /&gt;
PREFIX: &amp;amp;lt;output name&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
* OUTDIR - directory where snakemake will run and where the results will be written to&lt;br /&gt;
* READS_DIR - path to the directory that contains the reads&lt;br /&gt;
* ASSEMBLY - path to the assembly file&lt;br /&gt;
* PREFIX - prefix for the final mapped reads file&lt;br /&gt;
&lt;br /&gt;
If you want the results to be written to this directory (not to a new directory), open Snakefile and comment out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;workdir: config[&amp;amp;quot;OUTDIR&amp;amp;quot;]&amp;lt;/pre&amp;gt;&lt;br /&gt;
== RESULTS ==&lt;br /&gt;
&lt;br /&gt;
* dated file with an overview of the files used to run the pipeline (for documentation purposes)&lt;br /&gt;
* &#039;&#039;&#039;sorted_reads&#039;&#039;&#039; directory with the file containing the mapped reads&lt;br /&gt;
* &#039;&#039;&#039;results&#039;&#039;&#039; directory containing the qualimap results&lt;br /&gt;
* &#039;&#039;&#039;variant_calling&#039;&#039;&#039; directory containing the variant calling VCF file&lt;/div&gt;</summary>
		<author><name>Moiti001</name></author>
	</entry>
</feed>