About    
Motivation: The continuous efforts have been addressed to describing the impact of gut microbiome on human health and diseases. Based on NGS, series of case-control metagenome-wide association studies have shown great value in investigating the genetic variations of intestinal microbial community. However, large amounts of gut metagenomic data were processed quite differently and scattered in public resources, leading to the difficulty in information integration and utilization, and to a provisional knowledge with fragmentary evidence.
Results: We construct a disease-related marker genes database, named DREEM, in human gut microbiome by comprehensively retrieving all at present available data resources with the 18.63T WGS data of 1,729 samples, involving the state-of-the-art bioinformatics tools and well-designed statistical analysis. A total of 1,953,046 DREEM genes is built with covering six diseases, and further classified into six groups corresponding to each disease. Moreover, 5,100 Core-DREEM genes are defined as a common set shared by the diseases. As a result, the DREEM database is used to perform functional annotation, taxonomic classification, and metabolic network analysis. For Core-DREEM genes, every item is manually linked to its best hit in NCBI according to sequence similarity. You can search for your interested items by switching to SEARCH page.
User guide: On DOWNLOAD page, users can download all the datasets described above. Every gene ID was designed as serial number in DREEM, followed by its short read source series in GenBank or EMBL, disease categories, integrity description, species classification and functional annotation. For those complete genes with complete 3'end and 5'end, length information was also stored in IDs. Every entry in the DREEM dataset is constituted by a gene ID followed by nucleotide sequence. There are eight sets in total on our webpage, one set of all the DREEM genes, six sets of six types of diseases separately, and one set of the Core-DREEM genes. On the SEARCH page, one may search for interesting items of the Core-DREEM genes. On each item of search result, apart from functional annotation and taxonomic classification, linkage to NCBI is also available for further investigation.

Please direct your questions or comments to hqzhu(at)pku.edu.cn or xucmpku(at)pku.edu.cn
Datasets download    
DREEM genes     DREEM genes_v1.0.fa: a total of 1,953,046 DREEM genes were embedded as the final DREEM dataset, which covers the six diseases.
DREEM_T2D    DREEM_T2D_v1.0.fa: DREEM genes corresponding to T2D.
DREEM_Crohn    DREEM_Crohns_v1.0.fa: DREEM genes corresponding to Crohn's disease.
DREEM_UC    DREEM_UC_v1.0.fa: DREEM genes corresponding to ulcerative colitis.
DREEM_Obesity    DREEM_Obesity_v1.0.fa: DREEM genes corresponding to Obesity.
DREEM_LC    DREEM_LiverCirrhosis_v1.0.fa: DREEM genes corresponding to liver cirrhosis.
DREEM_Athero    DREEM_Atherosclerosis_v1.0.fa: DREEM genes corresponding to atherosclerosis.
Core-DREEM genes    Core-DREEM genes_v1.0.fa: DREEM genes shared by the five sets of DREEM_T2D, DREEM_Obesity, DREEM_Crohn, DREEM_UC and DREEM_LC.

Supplementary tables download    
Supplementary table 1    Supplementary table 1.xlsx: original publications of samples.
Supplementary table 2    Supplementary table 2.xlsx: detailed information of samples and DREEM genes.
Supplementary table 3    Supplementary table 3.xlsx: evaluation of the assembled contigs.
Supplementary table 4    Supplementary table 4.xlsx: detailed information of significant genes.
Supplementary table 5    Supplementary table 5.xlsx: AUC of rarefaction curves of number of significant genes as function of sample numbers.
Supplementary table 6    Supplementary table 6.xlsx: detailed information of DREEM genes.
Supplementary table 7    Supplementary table 7.xlsx: number of DREEM genes shared by different data sets.
Supplementary table 8    Supplementary table 8.xlsx: detailed information of Core-DREEM genes.
Supplementary table 9    Supplementary table 9.xlsx: length distribution of Core-DREEM genes.
Supplementary table 10    Supplementary table 10.xlsx: taxonomic anotation of Core-DREEM genes.
Supplementary table 11    Supplementary table 11.xlsx: functional anotaion of DREEM genes.
Supplementary table 12    Supplementary table 12.xlsx: functional anotation of Core-DREEM genes.

Supplementary figures download    
Supplementary figure 1A    Supplementary figure 1A.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.1.
Supplementary figure 1B    Supplementary figure 1B.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.2.
Supplementary figure 1C    Supplementary figure 1C.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.3.
Supplementary figure 1D    Supplementary figure 1D.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.4.
Supplementary figure 1E    Supplementary figure 1E.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.5.
Supplementary figure 1F    Supplementary figure 1F.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.6.
Supplementary figure 1G    Supplementary figure 1G.tif: rarefaction curves of the number of significant genes as a function of the number of samples from Ref.7.
Supplementary figure 2    Supplementary figure 2.tif: distribution of DREEM gene length in four integrality categories.
Supplementary figure 3A    Supplementary figure 3A.tif: taxonomic annotation of the universal DREEM genes.
Supplementary figure 3B    Supplementary figure 3B.tif: taxonomic annotation of the Core-DREEM genes.
Supplementary figure 4A    Supplementary figure 3A.tif: functional annotation of the universal DREEM genes.
Supplementary figure 4B    Supplementary figure 4B.tif: functional annotation of the Core-DREEM genes.
Data analysis result illustration        
       
Rarefaction curves of the number of significant genes as a function of the number of samples from each publication. The number of samples and significant genes were both normalized by their maximum values to fit to one graph. As the AUC values(Supplementary Table 5)exceed 0.85, the sample number is sufficient for including almost all the potential significant genes related to the six diseases.                                                                                                                                                                                                        
       
       
       
Distribution of DREEM gene length in four integrality categories. All the DREEM genes are classified into four groups by the integrality. The distribution of each of the four groups are denoted in boxplots, which indicate the interquartile range, midhinge, range, mid-range, and trimean. For complete genes, the average length is 964bp. The average length of all the DREEM genes is 795bp.                                                                                                                                                                                                        
       
       
       
Statistics indicating the number of DREEM genes shared by different data sets. U, C, O, L and T stand for data set of DREEM_UC genes, DREEM_Crohn genes, DREEM_Obesity genes, DREEM_LC genes and DREEM_T2D genes separately. There are 5,100 Core-DREEM genes, which are shared by other five data sets. Most DREEM genes are unique to one data set. Nevertheless, DREEM_UC and DREEM_Crohn share the largest number of genes compared with other pairs of data sets, indicating a stronger correlation between the two types of IBD.                                                                                                                                                                                                        
       
       
                                                                                                                  
Taxonomic annotation of genes in DREEM (at phylum level). The area in pie shows the proportion of different phylums. Phylums of                                                                                                                                              Proteobacteria, Firmicutes and Bacteroidetes dominate the gut microbial community.                                                                                                                                                                                                        
       
       
                                                                                                                 
Functional annotation via BLAST against COG database (e-value��10-5). A. Functional annotation of integrated DREEM genes. Most of the DREEM genes are functionally classified into metabolism, cellular processes and cell signaling.                                                                                                                                             B. Functional annotation of Core-DREEM genes. 4255 (83%) out of 5100 genes are conferred with certain functions, among which about 41.1% are responsible for metabolism, while 23.1% for cellular processes and cell signaling, and 27.8% for information storage and procession.                                                                                                                                                                                                        
       
       
       
Metabolic network of the Core-DREEM genes. The metabolic network of the Core-DREEM genes gives insights into the complexity of connections among the DREEM genes belonging to various COG categories as bubbles in the figure (N.A. means the genes without any matched COG category). The height of peripheric columns shows activeness of the corresponding COG category while the width of lines reflects degree of interaction between the two categories. It suggests that the membrane transport related categories, i.e.amino acid transport and metabolism (E), inorganic ions transport and metabolism (P), carbohydrate transport and metabolism (G), nucleotide and lipid transport and metabolism (F and I), are the most complex groups and connected with each other more intensively.                                                                                                                                                                                                        
       
Release        
  • Current version: July 8th, 2017 - Release 1.0      Manual

Citation        
Congmin Xu, Zhe Wang, Xiaoqi Wang, Mo Li, Xiao Guo, and Huaiqiu Zhu, Database of disease-related marker genes in human gut microbiome.