About LightCUD
|

The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing technology and the improvement of hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome. Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we performed processing for metagenomics short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction separately. For all module construction, we designed several steps of statistical analysis to select case-specific features. With these features, we built discrimination models using different machine learning algorithms. The algorithm LightGBM outperformed other algorithms, and thus was chosen as the core algorithm in the study. Specially, we identified two small set of biomarkers (strains) separately for WGS-based health vs IBD and ulcerative colitis (UC) vs Crohn’s disease (CD) diagnostic modules, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program based on lightGBM. The high performance has been validated with five-fold cross validation and also a test set from other study. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiota samples as input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identified the specific type of IBD. The executable program LightCUD is released here as open source. The identified strain biomarkers can be used to study the critical factors for disease development and treatment with regard to changes in the gut microbial community. |
Lastest update on October 20th, 2020
Biomedical Informatics and System Biology Laboratory, BME, PKU