RNA polymerase errors: non-heritable variability & experiment-free mutagenesis

RNA polymerase errors cause splicing defects and can be regulated by differential expression of RNA polymerase subunits

Lucas B. Carey

eLife 2015;4:e09945 DOI: 10.7554/eLife.09945

Errors during transcription may play an important role in determining cellular phenotypes: the RNA polymerase error rate is >4 orders of magnitude higher than that of DNA polymerase and errors are amplified >1000-fold due to translation. However, current methods to measure RNA polymerase fidelity are low-throughout, technically challenging, and organism-specific. Here I show that changes in RNA polymerase fidelity can be measured using standard RNA sequencing protocols. I find that RNA polymerase is error-prone, and these errors can result in splicing defects. Furthermore, I find that differential expression of RNA polymerase subunits causes changes in RNA polymerase fidelity, and that coding sequences may have evolved to minimize the effect of these errors. These results suggest that errors caused by RNA polymerase may be a major source of stochastic variability at the level of single cells.

Figure 1. A computational framework to measure relative changes in RNA polymerase fidelity. (a) Pipeline to identify potential RNA polymerase errors in RNA-seq data. High quality full-length RNA-seq reads are mapped to the reference genome or transcriptome using bwa, and only reads that map completely with two or fewer mismatches are kept. (b) Then 10 bp from the front and 10 bp from the end of the read are discarded as these regions have high error rates and are prone to poor quality local alignments. (c) Errors that occur multiple times (purple boxes) are discarded, as these are likely due to subclonal DNA mutations or motifs that sequence poorly on the HiSeq. Unique errors in the middle of reads (cyan box) are kept and counted.

 

In order to better understand how slow and fast proliferating subpopulations differ, we developed a method to sort cells by quantitative differences in single-cell fitness (FitFlow) (van Dijk et al. 2015). RNA-seq on the slow growing subpopulation revealed that it exhibits more transcriptional diversity and an increased RNA polymerase error rate. This was the first transcriptome-wide characterization of isogenic slow and fast growing subpopulations in any organism.

 

Our RNA-seq data suggested that DNA damage might cause cells to become slow growing; we found that addition of an antioxidant reduced the size of the slow-growing subpopulation, suggesting that oxidative damage causes DNA damage, which causes cells to become slow. We successfully combine RNA-seq data with targeted experiments to identify molecular mechanisms. In a follow up project, we developed a new set of RNA-seq analysis tools that allow the measurement of changes RNA polymerase fidelity using standard RNA sequencing techniques, and found that both yeast and human cells regulate RNA polymerase fidelity through differential expression of specific RNA polymerase subunits, and that errors made by the polymerase result in splicing defects (Carey 2015). This has opened up a line of research in the lab in which we use sequencing data and the stochastic sequence variability generated by polymerase errors to determine how regulatory information is encoded in the genome.