|
INTRODUCTION
|
Metagenomic sequencing is becoming a powerful technology for exploring microogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. Comparisons on articial shotgun fragments with multiple current metagenomic gene inders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders.
Please direct your questions or comments to liuyc(at)ctb.pku.edu.cn
|
RELEASE
|
|
DATA
|
The 261 species used to train the sequence binning model for fragment classification and the SVM classifiers for gene prediction. The 12 species used to evulate the prediction performance of 3' end of genes. The 6 species with experimentally characterized TISs used to evulate the prediction performance of 5' end of genes.
|
CITATION
|
Liu, Y., Guo, J., Hu, G. and Zhu, H. Gene Prediction in Metagenomic Fragments Based on the SVM Algorithm.
|
REFERENCES
|
Noguchi, H., Park, J., and Takagi, T. (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res, 34(19), 5623�C5630.
Hoff, K. J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., and Meinicke, P. (2008) Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics, 9, 217.
Hu, G. Q., Guo, J. T., Liu, Y. C., and Zhu, H. (2009) MetaTISA: Metagenomic Translation Initiation Site Annotator for improving gene start prediction. Bioinformatics, 25(14), 1843�C1845.
Zhu, W., Lomsadze, A., and Borodovsky, M. (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res, 38(12), e132.
Rho, M., Tang, H., and Ye, Y. (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res, 38(20), e191.
Hyatt, D., Locascio, P. F., Hauser, L. J., and Uberbacher, E. C. (2012) Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics, 28(17), 2223�C2230.
Kelley, D. R., Liu, B., Delcher, A. L., Pop, M., and Salzberg, S. L. (2012) Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res, 40(1), e9.
|
|