lightCUD    
A program for diagnosing IBD based on human gut microbiome data
       
The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing technology and the improvement of hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome. Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we performed processing for metagenomics short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction separately. For all module construction, we designed several steps of statistical analysis to select case-specific features. With these features, we built discrimination models using different machine learning algorithms. The algorithm LightGBM outperformed other algorithms, and thus was chosen as the core algorithm in the study. Specially, we identified two small set of biomarkers (strains) separately for WGS-based health vs IBD and ulcerative colitis (UC) vs Crohn’s disease (CD) diagnostic modules, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program based on lightGBM. The high performance has been validated with five-fold cross validation and also a test set from other study. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiota samples as input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identified the specific type of IBD. The executable program LightCUD is released here as open source. The identified strain biomarkers can be used to study the critical factors for disease development and treatment with regard to changes in the gut microbial community.

CONTACT        
Please direct your questions or comments to hqzhu(at)pku.edu.cn or xucm(at)pku.edu.cn

RELEASE        
  • Current version: October 20th, 2020 - Release 1.0


CITATION        

Congmin Xu, Man Zhou, Zhongjie Xie, Mo Li, Xi Zhu* and Huaiqiu Zhu*. LightCUD: a program for diagnosing IBD based on human gut microbiome data.



DATA

Supplementary Table 1 For model construction, we downloaded 4.68T paired-end short-read sequences of these samples from NCBI GenBank.
Supplementary Table 2 Assembly result evaluation of the training short reads.
Supplementary Table 3 Ratio of reads mapped to annotated contigs.
Supplementary Table 4 Anotated strains from the training data.
Supplementary Table 5 Anotated genera from the training data.
Supplementary Table 6 Features/strains left after three-step feature selection.
Supplementary Table 7 49 most important features/strains for the WGS-based healthy vs IBD module.
Supplementary Table 8 12 most important features/strains for the WGS-based UC vs CD module.
Supplementary Table 9 Test data.
Supplementary Table 10 Taxonomic profile of test samples with lightCUD WGS-based health vs IBD module.
Supplementary Table 11 Taxonomic profile of test samples with lightCUD WGS-based UC vs CD module.
Codes Codes for feature selection and machine learning algorithms comparision, also model training and test.


LightCUD PACKAGE        
       

CONTACT        
Please direct your questions or comments to hqzhu(at)pku.edu.cn or xucm(at)pku.edu.cn

RELEASE        
  • Current version: October 20th, 2020 - Release 1.0


CITATION        
Congmin Xu, Man Zhou, Zhongjie Xie, Mo Li, Xi Zhu* and Huaiqiu Zhu*. LightCUD: a program for diagnosing IBD based on human gut microbiome data.

REQUIREMENT        
Python version >=2.7.6
NumPy version >=1.8.2
LightGBM package
blast package

INSTALLATION        
Please install numpy, LightGBM and blast according to their manuals. The following are examples for installing these prerequisites.

NumPy and LightGBM are python package, which can be installed with pip, for example:
  • NumPy:  pip install numpy    #NumPy v1.13.1

  • LightGBM:  pip install lightgbm    #lightgbm-2.1.1


Install blast from zipped file
  • Download the zipped file:  wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.7.1+-x64-linux.tar.gz

  • Unpack the zipped file:  tar -zxvf ncbi-blast-2.7.1+-x64-linux.tar.gz

  • The excutable files are in the bin folder now.


Install LightCUD from zipped file
  • Download the zipped file:  wget http://cqb.pku.edu.cn/ZhuLab/LightCUD/LightCUD.tar.gz

  • Unpack the zipped file:  tar -zxvf LightCUD.tar.gz

  • Get help:  python lightCUD.py -h



CONTACT        
Please direct your questions or comments to hqzhu(at)pku.edu.cn or xucm(at)pku.edu.cn

RELEASE        
  • Current version: October 20th, 2020 - Release 1.0


CITATION        
Congmin Xu, Man Zhou, Zhongjie Xie, Mo Li, Xi Zhu* and Huaiqiu Zhu*. LightCUD: a program for diagnosing IBD based on human gut microbiome data.