Genome assembly using Flye
Another very popular assembler that can be used for long-reads such as PacBio and Oxford Nanopore is Flye. In contrast to the minimap and miniasm pipeline Flye also produces a polished consensus sequence for the assembly which significantly reduces the error rate.
Change into the flye directory in the assembler_practical folder and run flye on the filtered read fastq file.
course_user> flye --nano-raw \
course_user> ~/biosec_course/qc_practical/filtered.fastq \
course_user> --genome-size 3m --out-dir ./flye_output
As you can see, flye requires the input reads (–nano-raw) as well as an output directory and the (expected) size of the final assembly which, in this case is set to 3 megabases (3,000,000 bases) which we estimate from the B. fermentans genome. The output of Flye are several files including the assembly in fasta format.
cp ~/biosec_course/misc/assembler_prac/flye_output/* ./flye_output.
When Flye is finished use assembly-stats to get a first overview over the finished assembly.
- Does the assembly differ from the miniasm assembly, e.g., wrt total length, number of contigs and length of the contigs?
Now change back into the assembly_practical/flye directory abd align the flye assembly to the B. fermentans genome using dnadiff
course_user> dnadiff -p flye_dnadiff \
course_user> ~/biosec_course/misc/assembly_prac/b_fermentans.fna \
course_user> flye_output/assembly.fasta
Open the flye_dnadiff.report file (e.g. double-click on the file).
- How many contigs aligned with the reference? What is the error rate?