Understanding the effects of mutations and how phenotypes are encoded in the genome are two of the primary goals of modern biology. The ability to make quantitative predictions as to the effects of complex genetic changes will enable breakthroughs in personalized medicine, designer organisms for agriculture and bioremediation, and many other applications.

However, despite major advances in measuring genotypes and phenotypes, predicting the effects of even single mutations in well-studied systems remains difficult. The consequences of most mutations still cannot be predicted. This is true both for disease phenotypes, and for relatively simple molecular phenotypes such as gene expression.

One reason for this difficulty is that mutations do not have independent effects. Instead, the consequence of each mutation depends on both the genetic background and the intracellular state in which that mutation occurs.

Our group takes a systems biology approach to identify the genetic and non-genetic determinants of phenotypic variability, with a focus on gene expression, proliferation and drug resistance in single cells. To do so we combine synthetic biology, high-throughput quantitative experiments with millions of designed sequence variants, and mathematical modeling to determine how the genotype-to-phenotype map is modulated by genetic and non-genetic heterogeneity. Our research is supported by American, European, Spanish, and Catalan funding sources.

Some results from our group

The central challenge in personalized medicine, agricultural genetics, and evolutionary biology is the mechanistic understanding and quantitative prediction of how changes in genotype lead to changes in phenotype. However, the consequences of most mutations still cannot be predicted. My group combines mathematical modeling with high-throughput quantitative experiments to develop a quantitative mechanistic understanding of why each individual is unique.


Why do identical mutations and drug treatments have different outcomes in different cells?

Within an isogenic population not all microbes are killed by an antibiotic and not all cancer cells are killed by chemotherapy. In addition, the effect of a mutation varies across individuals; identical mutations often have no effect in some people but result in a severe disease phenotype in others. Why are only some individuals affected by a drug or mutation?


Work from my group has shown that much of this phenotypic variability is due epigenetic heterogeneity in the intracellular states of single cells. We use high-throughput time-lapse microscopy, flow-cytometry, single-cell RNA & DNA sequencing, machine learning and quantitative data-driven mathematical models to understand and predict the fates of single cells and of organisms. We developed novel flow-cytometry and sequencing based methods to investigate the causes and consequences of non-genetic heterogeneity, and found that, in yeast, the main driver of heterogeneity in proliferation, mutational outcome, and drug resistance is cell-to-cell variability in mitochondria activity (van Dijk et al. Nature Communications 2015 , Carey. eLife 2015, Dhar et al. in review).


This work has immediate implications for understanding both antifungal and anti-cancer treatment. Cells within an otherwise homogenous population vary greatly in their organelle content and state. There is a subpopulation of slowly proliferating drug-resistant cells that can give rise to both drug sensitive and drug-resistant progeny. This knowledge will enable the development of novel treatments that take epigenetic heterogeneity into account.


Machine learning to predict mutational impact in heterogeneous genetic backgrounds.

We also work on discovering fundamental principles that govern how interactions between genotypes, and interaction between cell state and genotype, determine phenotype. In one collaborative project we measured the fitness of over 4,000,000 genetic variants of a single gene, the largest set ever by over an order of magnitude. To understand this large multi-dimensional dataset we developed a novel machine-learning based approach to quantify and predict the impact of each mutation on fitness. We found that, while the same mutation can be neutral, deleterious or beneficial depending on the genetic background, the majority of this variability is predictable from genotype (Pokusaeva et al. in review, related: Espinar et al. Genome Research 2018).


The majority of genetic variation among individuals and the majority of significant hits in Genome Wide Association Studies (GWAS) are in non-coding regions. However, predicting the effects of simple genetic changes even in well-studied regulatory systems is remarkably difficult. To determine how genetic variation in regulatory regions affects gene expression we developed a high-throughput experimental system in which we can measure the effects of defined genetic changes on gene expression using thousands of synthetic designed promoters.  By testing several mathematical models of gene expression we obtained a mechanistic understanding for why the impact of many mutations depends on both the genetic background and the intracellular state of the cell (van Dijk*, Sharon*, Lotan-Pompan, Weinberger, Segal, Carey. Genome Research 2017).