This is an old revision of the document!


conda

SPAdes assembly

Today we are only briefly introducing the assembly with SPAdes. SPAdes is already installed in our system (as in any CLIMB VM, I think), but it this wasn't the case we could simply conda install spades to have it installed by Miniconda.

Read the manual first

There is a (confusing) online manual for SPAdes, but we are nerd enough to read the instructions from the shell. Since the program is writing its output on a different channel (called standard error) we can't simply pipe it into less, we need to add an extra character (&) to have the redirection of the standard error:

spades.py |& less -S 

(as always we can scroll the text with arrow keys then quit with q to return to the shell prompt). Here's an extract of the manual:

SPAdes genome assembler v3.11.0

Usage: /home/linuxbrew/.linuxbrew/bin/spades.py [options] -o <output_dir>

Basic options:
-o      <output_dir>    directory to store all the resulting files (required)
--meta                  this flag is required for metagenomic sample data

Input data:
--12    <filename>      file with interlaced forward and reverse paired-end reads
-1      <filename>      file with forward paired-end reads
-2      <filename>      file with reverse paired-end reads
-s      <filename>      file with unpaired reads

Perform the assembly

Default parameters: auto k-mer choice

spades.py -1 /bsb/denovo/phage/reads/shotgun1.fq -2 /bsb/denovo/phage/reads/shotgun2.fq -o ~/bsb01/phage_default/

If you want to see the output folder, there is an online version, in particular you can see:

  1. spades.log - this is the text that SPAdes writes to the terminal during the execution to keep us updated on the progress. Generally non so useful, but we can discover which k-mer settings have been used!
  2. contigs.fasta - usually the output we are mostly interested in: the contigs!

Default parameters: auto k-mer choice

We can perform a second assembly with k-mers set of our choice. We can compare results using different k-mer sets in our group. K-mers have to be odd!

Here an example:

spades.py -1 /bsb/denovo/phage/reads/shotgun1.fq -2 /bsb/denovo/phage/reads/shotgun2.fq -o ~/bsb01/phage_29,47,51,59/ -k 29,47,51,59

As you can see I specified as output directory, a directory that helps me reminding which k-mers have been used. In this case maybe not elegant, but it's just to stress the concept of choosing useful nonambiguous names.

Pre-made output

If you want to save some time there is a pre made output from the step above here:

/bsb/denovo/phage/spades/

You can evaluate the assembly metrics with this command:

seqkit stats --all /bsb/denovo/phage/spades/contigs.fasta

Or if you made more than one assembly in your home directory, using “phage_” as prefix:

seqkit stats --all ~/bsb01/phage_*/contigs.fasta

This will work if the suggested directory structure has been used. If you made customisations, tune the paths accordingly.