Genome assembly using Flye

Another very popular assembler that can be used for long-reads such as PacBio and Oxford Nanopore is Flye. In contrast to the minimap and miniasm pipeline Flye also produces a polished consensus sequence for the assembly which significantly reduces the error rate.

Change into the flye directory in the assembler_practical folder and run flye on the filtered read fastq file.

course_user> flye --nano-raw \
course_user> ~/biosec_course/qc_practical/filtered.fastq \
course_user> --genome-size 3m --out-dir ./flye_output
In most cases Flye does not require reads to be pre=processed, i.e., we could use the fasq file of unfiltered reads. However, to make the output of Flye more comparable to Miniasm we use teh same input file for this example.

As you can see, flye requires the input reads (–nano-raw) as well as an output directory and the (expected) size of the final assembly which, in this case is set to 3 megabases (3,000,000 bases) which we estimate from the B. fermentans genome. The output of Flye are several files including the assembly in fasta format.

Flye takes significantly more time and resources to run. On an average laptop this assembly may take 20 minutes and more. If you don’t want to wait just stop flye (press 'Ctrl-c') and copy the provided result files from directory ~/biosec_course/misc/assembler_prac/flye_output into the flye_output directory using
cp ~/biosec_course/misc/assembler_prac/flye_output/* ./flye_output.

When Flye is finished use assembly-stats to get a first overview over the finished assembly.

  1. Does the assembly differ from the miniasm assembly, e.g., wrt total length, number of contigs and length of the contigs?

Answer

Now change back into the assembly_practical/flye directory abd align the flye assembly to the B. fermentans genome using dnadiff

course_user> dnadiff -p flye_dnadiff \
course_user> ~/biosec_course/misc/assembly_prac/b_fermentans.fna \
course_user> flye_output/assembly.fasta

Open the flye_dnadiff.report file (e.g. double-click on the file).

  1. How many contigs aligned with the reference? What is the error rate?

Answer

CONTINUE ->