Integrated metagenomic assembly pipeline for short reads




ABSTRACT        
Next-generation sequencing (NGS) has largely facilitated metagenomic analysis but also raises problems for the metagenomic DNA sequence-assembly process, owing to high throughput and extremely short reads (~100 bp), such as reads from Illumina sequencers. Although progress has been made in recent years, how to generate high-quality draft assembly from such short reads in metagenomic sequencing projects still remains elusive, since numerous metagenomes using NGS are unassembled and deposited into public databases. To clarify how state-of-the-art de novo assemblers perform on metagenomic data with NGS reads, we carried out a comprehensive investigation on current assemblers using simulated metagenomic data. Our analysis revealed that any given individual assembler has barely means of providing the equal performance of assembly on short reads with different sequencing depth. We found that different assemblers complement each other with different advantages at two levels: assembling the (different) sequences from different coverage levels and assembling different sequences from the same genome, especially at the lower sequencing-coverage level. On the basis of these findings, we developed a pipeline named InteMAP (Integrated Metagenome Assembly Pipeline for short reads) for integrating individual assemblers that complemented the advantages mutually in assembling metagenomic sequences. By comparing the performance of InteMAP with individual assemblers on both synthetic and real NGS metagenomic data, we showed that the InteMAP pipeline is able to achieve high performance of better assembly with a longer total contig length, the higher contiguity, and containing more genes than individual assemblers.

Please direct your questions or comments to hqzhu(at)pku.edu.cn or laibinbin(at)ctb.pku.edu.cn

RELEASE        
DATA        
Benchmarking data on simulated dataset sim-113sp.
Benchmarking data on Sample MH0012.

CITATION        
Binbin Lai, Fumeng Wang, Xiaoqi Wang, Liping Duan, Huaiqiu Zhu. InteMAP: Integrated Metagenomic Assembly Pipeline for Short Reads.