De novo assembly of long reads

Effect of coverage

To check the effect of coverage on the assembly results, we can subsample the input dataset (12k reads total, 6kb average length):

seqkit sample -n 5000 minion_2d.fq > minion_2d_5000.fq
seqkit sample -n 7500  minion_2d.fq > minion_2d_7500.fq
seqkit sample -n 10000 minion_2d.fq > minion_2d_10000.fq

and then, after performing a de novo assembly with canu, check the results:

file                             format  type  num_seqs    sum_len    min_len    avg_len    max_len  sum_gap        N50  L50
canu.12k/nanoraw.contigs.fasta  FASTA   DNA          1  4,507,919  4,507,919  4,507,919  4,507,919        0  4,507,919    1
canu.10k/nanoraw.contigs.fasta   FASTA   DNA          9  4,461,878      2,034  495,764.2  1,136,406        0  1,116,796    2
canu.7k/nanoraw.contigs.fasta    FASTA   DNA         39  4,412,109      1,527    113,131    568,484        0    228,632    7
canu.5k/nanoraw.contigs.fasta    FASTA   DNA         83  2,856,834      6,086   34,419.7    145,036        0     45,128   22