Assembly & Annotation: Difference between revisions
No edit summary |
|||
Line 19: | Line 19: | ||
Us the script kmer_analysis.sh to get the genomic properties based on the k-mer distribution. Genomic properties include genome size and percentage of heterozygosity. | Us the script kmer_analysis.sh to get the genomic properties based on the k-mer distribution. Genomic properties include genome size and percentage of heterozygosity. | ||
Preprocessing/kmer_analysis.sh -m <kmer_size> -c <error-cutoff> -s <hash-size> -t <threads> -i <R1.fastq.gz R2.fastq.gz ...> -o <output_dir> | <source lang='bash'> Preprocessing/kmer_analysis.sh -m <kmer_size> -c <error-cutoff> -s <hash-size> -t <threads> -i <R1.fastq.gz R2.fastq.gz ...> -o <output_dir> </source> | ||
Trimming: | Trimming: | ||
Use Trimmomatic to trim Illumina data. Make sure your fasta file with the adapters corresponds to the adapters found in the FastQC report. | Use Trimmomatic to trim Illumina data. Make sure your fasta file with the adapters corresponds to the adapters found in the FastQC report. | ||
Use the following script: preprocessing/run_trimmomatic.sh -t <num_threads> -f <FW_reads.fastq> -r <RV_reads.fastq> | Use the following script: <source lang='bash'> preprocessing/run_trimmomatic.sh -t <num_threads> -f <FW_reads.fastq> -r <RV_reads.fastq> </source> | ||
Error correction | Error correction | ||
Lighter is a fast tool to error correct your Illumina data. | Lighter is a fast tool to error correct your Illumina data. | ||
Use the following script: preprocessing/run_lighter_error_correction.sh -g <genome_size> -c <coverage> -f <FW_reads.fastq> -r <RV_reads.fastq> | Use the following script: <source lang='bash'> preprocessing/run_lighter_error_correction.sh -g <genome_size> -c <coverage> -f <FW_reads.fastq> -r <RV_reads.fastq> </source> | ||
Organelle assembly | Organelle assembly | ||
Download a proper reference from the NCBI database. | Download a proper reference from the NCBI database. | ||
Use the IOGA pipeline to assemble to organellar genome. | Use the IOGA pipeline to assemble to organellar genome. | ||
assembly/run_IOGA.sh -a <assembly> -f <fw_reads.fastq> -r <reverse_reads.fastq> -i <insert_size> -t <num_threads> -n <name_prefix> | <source lang='bash'> assembly/run_IOGA.sh -a <assembly> -f <fw_reads.fastq> -r <reverse_reads.fastq> -i <insert_size> -t <num_threads> -n <name_prefix> </source> | ||
Map your reads to the newly assembled genome and manually check if it is circular. | Map your reads to the newly assembled genome and manually check if it is circular. | ||
Revision as of 10:07, 21 January 2016
Protocol with typical commands used for de novo assembly and annotation
- Preprocessing
- Assembly
- Annotation
- Submission
- Visualization
Software
Preprocessing
Quality control: Check quality of your data using FastQC and fastq_stats.py. <source lang='bash'> fastqc ../*.gz </source> Explore the report to do the quality check and identify potential adapters and primers in the sequences.
K-mer analysis: Us the script kmer_analysis.sh to get the genomic properties based on the k-mer distribution. Genomic properties include genome size and percentage of heterozygosity.
<source lang='bash'> Preprocessing/kmer_analysis.sh -m <kmer_size> -c <error-cutoff> -s <hash-size> -t <threads> -i <R1.fastq.gz R2.fastq.gz ...> -o <output_dir> </source>
Trimming: Use Trimmomatic to trim Illumina data. Make sure your fasta file with the adapters corresponds to the adapters found in the FastQC report. Use the following script: <source lang='bash'> preprocessing/run_trimmomatic.sh -t <num_threads> -f <FW_reads.fastq> -r <RV_reads.fastq> </source>
Error correction Lighter is a fast tool to error correct your Illumina data. Use the following script: <source lang='bash'> preprocessing/run_lighter_error_correction.sh -g <genome_size> -c <coverage> -f <FW_reads.fastq> -r <RV_reads.fastq> </source>
Organelle assembly Download a proper reference from the NCBI database. Use the IOGA pipeline to assemble to organellar genome. <source lang='bash'> assembly/run_IOGA.sh -a <assembly> -f <fw_reads.fastq> -r <reverse_reads.fastq> -i <insert_size> -t <num_threads> -n <name_prefix> </source> Map your reads to the newly assembled genome and manually check if it is circular.
Use Pilon to correct remaining errors in the assembly using the mapped reads.
Annotate using MITOS or DOGMA online tools.
Submit here: http://www.ncbi.nlm.nih.gov/LargeDirSubs/dir_submit.cgi