De novo assembly of long reads
Effect of coverage
To check the effect of coverage on the assembly results, we can subsample the input dataset (12k reads total, 6kb average length):
seqkit sample -n 5000 minion_2d.fq > minion_2d_5000.fq seqkit sample -n 7500 minion_2d.fq > minion_2d_7500.fq seqkit sample -n 10000 minion_2d.fq > minion_2d_10000.fq
and then, after performing a de novo assembly with canu, check the results:
file format type num_seqs sum_len min_len avg_len max_len sum_gap N50 L50 canu.12k/nanoraw.contigs.fasta FASTA DNA 1 4,507,919 4,507,919 4,507,919 4,507,919 0 4,507,919 1 canu.10k/nanoraw.contigs.fasta FASTA DNA 9 4,461,878 2,034 495,764.2 1,136,406 0 1,116,796 2 canu.7k/nanoraw.contigs.fasta FASTA DNA 39 4,412,109 1,527 113,131 568,484 0 228,632 7 canu.5k/nanoraw.contigs.fasta FASTA DNA 83 2,856,834 6,086 34,419.7 145,036 0 45,128 22