Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
gmh-metagenomics [2019/08/14 02:57]
telatina
gmh-metagenomics [2019/08/14 03:47] (current)
telatina
Line 1: Line 1:
 ====== Metabarcoding analysis ====== ====== Metabarcoding analysis ======
  
-  * Pleaselog in into the VM using the given credentials.The hostname is ''​quadram.seq.space''​. +  * Please log into the VM using the given credentials.The hostname is ''​quadram.seq.space''​.
-  * There is a **Miniconda3** installation in your directory, but by default it's not imported in your ''​.bashrc''​. If you'd like to activate Miniconda, type<​code>​ +
-export PATH=~/​miniconda3/​bin/:​$PATH +
-</​code>​+
   * You can install tools using Miniconda, and (if needed) you can write small scripts using your favourite editor.   * You can install tools using Miniconda, and (if needed) you can write small scripts using your favourite editor.
   * :!: Should you wish to skip a step, you'll find the pre-computed output in ''/​media/​data/​precomputed/​16S/''​.   * :!: Should you wish to skip a step, you'll find the pre-computed output in ''/​media/​data/​precomputed/​16S/''​.
-  * :!: We don't expect you don't need to finish the whole test 
  
 ===== 16S Dataset ===== ===== 16S Dataset =====
Line 13: Line 9:
 This test will ask you to perform some computations related to 16S data, using a simplified and __unrealistic__ protocol. This test will ask you to perform some computations related to 16S data, using a simplified and __unrealistic__ protocol.
  
-In the shared directory ''/​media/​data/​16S/''​ you will find a metadata file (TSV) and a set of Illumina paired-end ​reads.+In the shared directory ''/​media/​data/​16S/''​ you will find a metadata file and a set of reads.
  
-  - Examine the metadata file and check what is the link between the metadata and the read filenames in ''/​media/​data/​16S/''​. +  - Examine the metadata file and check what is the link between the metadata and the read filenames in ''/​media/​data/​16S/''​.  
-  - Create a directory in your home called ''​16S''​.((at the end you should create a tree like<​code>​ +  - Create a directory in your home called ''​~/16S/reads''​.  
-home -- 16S -- reads +  - For each stool sample, take 5,000 reads and place them in ''​~/​16S/​reads''​. 
-         +---- merged +  - Create a subdirectory ''​merged''​ inside your ''​16S''​ folder (//i.e.// ''​~/​16S/​merged''​). ​ 
-         +---- otus +  - The read pairs for each sample overlap, so please merge them using tool such as FLASH, saving the output ​to ''​~/​16S/​merged''​. 
-</​code>​)) +  - Create ​a single file with all the reads from all of the samples ​produced in the last step (call it ''​~/​16/merged/all.fastq''​),​ relabeling each //read name// to begin with the sample name (filename prefix) and a progressive number. You can use the dot as separator. 
-  - Create a subdirectory ''​reads''​ inside your ''​16S''​ folder (//​i.e.// ​''​~/​16S/​reads''​) and put there a subsampling of the given reads aiming at approximately 5.000 reads per sample, but **only** for the samples coming from //stool// according to the metadata file (''/​media/​data/​16S/​metadata.tsv''​). /* <code bash> +  - Create ​an OTU table by running ​the command below: ​<code bash> 
-conda install -y -c bioconda seqkit +merged_to_otus ​~/​16/​merged/​all.fastq <​output_directory>​
-for i in $(grep stool /​media/​data/​16S/​metadata.tsv |cut -f1); do  +
-  for strand in R1 R2; do  +
-    seqkit sample -n 5000 /​media/​data/​16S/​${i}_*_${strand}* > 16S/​reads/​$(basename /​media/​data/​16S/​${i}_*_${strand}*| csed '​s/​.gz//'​);​  +
-  done; +
-done +
-</​code>​ */ +
-  - Create a subdirectory ''​merged''​ inside your ''​16S''​ folder (//i.e.// ''​~/​16S/​merged''​). ​Merge the paired ends using any tool, saving the output ​in FASTQ format in the ''​~/​16S/​merged'' ​directory/*<code bash> +
-conda install -c bioconda vsearch +
-for i in 16S/​reads/​*_R1*;​  +
-do vsearch --fastq_mergepairs $i --reverse ${i/​_R1/​_R2} --fastqout 16S/​merged/​$(basename $i|cut -f1 -d_).fastq -maxdiffs 90 +
-done +
-</​code>​ +
- +
-  - Now you need to prepare ​a single file (call it ''​~/​16/​merged/​all.fq''​) ​with all the reads produced in the last step (//i.e.// the merged pairs in ''​~/​16S/merge/''​),​ relabeling each //read name// to begin with the sample name (filename prefix) and a progressive number. You can use the dotas separator. ​/* <code bash> +
-conda install -c bioconda seqtk +
-for i in 16S/​merged/​*.fastq;​  +
-do seqtk rename $i $(basename $i| cut -f1 -d.).; done > ~/​16S/​merged/​all.fq +
-</​code>​ +
-*/ +
-  - Now you prepared the input file (''​all.fq''​) for a small script that will produce ​an //OTU table// from it. Save the output in ''​~/​16S/​otus''​ (after creating it). You can run <code bash> +
-merged_to_otus ​<​merged_reads.fq> ​<​output_directory>​+
 </​code>​ </​code>​
-  - Check the files produced by the script in the output folder. +  - Extract ​the sequences named //OTU1//, //OTU2//, //OTU3//, //OTU4//, //​OTU5//, ​ from the output of the previous step and save it as ''​~/​16S/​five.fasta''​. 
-  - Using shell commands, extract ​the sequences named //OTU1//, //OTU2//, //OTU3//, //OTU4//, //​OTU5//, ​ from the OTU database ​and save it as ''​~/​16S/​five.fa''​. +  - Calculate ​the sum of counts of each sample (column) of the ''​otutab.txt''​ and ''​otutab.raw''​ files.  
-  - Using R calculate ​the sum of counts of each sample (column) of the ''​otutab.txt''​ and ''​otutab.raw''​ files. ​What was the rarefaction level used to produce otutab.txt? ​ +  - Create ​stacked bar chart of the OTU composition of each sample from ''​otutab.txt''​.
-  - Using R save plot of of the OTU table (bar chart), possibly saving the commands you used as ''​~/plot.R''​.+
   - Create a ''​otutable.sorted.txt''​ file, sorting the table by total otu abundance (sum the counts in all samples)   - Create a ''​otutable.sorted.txt''​ file, sorting the table by total otu abundance (sum the counts in all samples)
   - Extract the top 5 most abundant OTUs from the OTU database and save them as ''​~/​16S/​top5.fa''​   - Extract the top 5 most abundant OTUs from the OTU database and save them as ''​~/​16S/​top5.fa''​
-  - Compress, in ''​gzip''​ format, all the sequence files in your ''​~/​16S''​ directory (*.fa, *.fastq, *.fasta).