Genome-wide RNA polymerase stalling shapes the transcriptome during aging


Mouse housing and experiments were performed according to the Animal Welfare Act of the Dutch government, following the Guide for the Care and Use of Laboratory Animals and with the guidelines approved by the Dutch Ethical Committee in full accordance with European legislation. The institutional ethical committee for animal care and usage approved all animal protocols. WT male mice (Mus musculus) in F1 C57BL6J/FVB (1:1) hybrid background, were euthanized at 15 weeks and 104 weeks of age. WT C57BL6J and FVB strains are frequently obtained from the Jackson Laboratories to maintain a standard genetic background. DNA repair-deficient premature aging mouse models that were generated in house and WT littermates in F1 C57BL6J/FVB (1:1) hybrid background were euthanized at 4 and 10 weeks for Ercc1Δ/− mutants35; and 7 and 14 weeks for Xpg−/− mutants34. All animals were bred and maintained on AIN93G synthetic pellets (Research Diet Services; gross energy content 4.9 kcal g−1 dry mass, digestible energy 3.97 kcal g−1). Animals were maintained in a controlled environment (20–22 °C, 12 h light:12 h dark cycle) and were individually housed in individual ventilated cages under specific pathogen-free conditions at the Animal Resource Center (Erasmus University Medical Center). No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications8,18,26,29,38,41. Data collection and analysis were not performed blind to the conditions of the experiments and randomization of animals to experimental groups was not applicable. No animals were excluded from the experimental groups in any analysis.

Cell culture

To assess endogenous DNA damage-induced de novo RNA synthesis, Ercc1∆/−, Xpg−/− and WT (all in C57BL6J/FVB F1 hybrid genetic background) MDFs were isolated from the tails of the corresponding mouse models and cultured in DMEM supplemented with 10% FCS and 1% PS at 5% CO2 and 3% O2.

Nascent RNA labeling in vivo

Mice were injected intraperitoneally with 5-EU (AXXORA) 0.088 mg per gram of body weight. Five hours after intraperitoneal injection, mice were euthanized. Tissue samples were formalin-fixed for fluorescence staining or snap-frozen for the RNA isolation and ChIP experiments.

Immunofluorescence staining

Slices measuring 3–5 µm were cut from paraffin-embedded, formalin-fixed liver pieces. Slices were mounted on microscope slides (Superfrost Ultra Plus Adhesion Slides, Thermo Fisher Scientific). For RNAPII staining, samples were deparaffinized with xylene, rehydrated with an alcohol gradient and washed with Milli-Q water before antigen retrieval (30 min in citrate buffer, pH 6). The antibodies used were: Alexa Fluor 594-RPB1 antibody (in 1:250 dilution), recognizing all forms of RNAPII independently of the phosphorylation status of their CTD (cat. no. 664908, BioLegend); RNAPII-ser2p (cat. no. ab5095, Abcam); RNAPII-ser5p (cat. no. ab5131, Abcam); phospho-ATM (Ser1981, cat. no. 4526, Cell Signaling Technology); and phospho-histone H2A.X (Ser139, cat. no. 9718; Cell Signaling Technology) all in 1:500 dilution. To reduce the background fluorescence level, a mouse-on-mouse detection kit was used (cat. no. BMK-2202, Vector Laboratories). Sections were counterstained using DAPI or Hoechst 33342.

EU-labeled nascent RNA staining

After the xylene-based paraffin removal and rehydration steps (the antigen retrieval step was omitted for EU staining) we used the Click-iT RNA Alexa Fluor 488 Imaging Kit (cat. no. c10329; Thermo Fisher Scientific) according to the standard immunofluorescence protocol. Images were taken by a ZEISS LSM 700 system. Nascent RNA staining intensity was quantified by calculating the integrated density values for each nuclear staining using the Fiji software61. Statistical significance was calculated from normalized fluorescence intensity values using an unpaired Student’s t-test in Prism version 7.04 (GraphPad Software). For EU-labeled nascent RNA staining in vitro, cells were grown on coverslips. In confluent growth dishes, medium was replaced with fresh medium supplemented with 1% FCS and (when indicated) moved to 20% O2 for the indicated time. Medium was renewed twice a week. To measure de novo RNA synthesis after UVC treatment, Ercc1∆/− and WT MDFs (both C57BL6J/FVB F1 hybrid background) were cultured to confluency and maintained as described above followed by UVC irradiation (0, 2, 4 and 6 J m2 UVC) using a 254-nm germicidal lamp (Philips). The assays were performed 24 h later to allow the MDFs to recover from the immediate transcriptional effects in trans. To assess their transcriptional level, 1 mM EU was added to the medium for 1 h before cell collection for total RNA extraction or fixed for fluorescence staining. Cells were washed with ice-cold tris-buffered saline (TBS) and fixed for 20 min on ice in 4% formalin. Subsequently, cells were washed in 3% bovine serum albumin (BSA) in TBS and permeabilized using 0.5% Triton X-100 in TBS for 20 min at room temperature. The coverslips were then washed twice with 3% BSA in TBS and incubated with Click-iT reaction mix (Invitrogen) for 30 min. After the Click-iT reaction, cells were washed once with 3% BSA in TBS and once with TBS before being incubated in TBS containing 1:1,000 Hoechst 33342 for 30 min. Samples were mounted using Prolong Diamond (Invitrogen). Images were obtained with a LSM700 ZEISS Microscope and EU staining intensity in nuclei was quantified with Fiji (Image J 1.53q).

Total RNA-seq and EU-seq library generation and sequencing

Total RNA was isolated from snap-frozen liver and kidney slices or scraped cells using the miRNeasy kit (QIAGEN) including the on-column DNase step (RNase-Free DNase Set, QIAGEN). RNA quality and quantity were estimated with the Bioanalyzer (Agilent Technologies) and only high-quality RNA (RNA integrity number >8) was used for further analyses. Total RNA sequencing was performed as described elsewhere62. To selectively isolate EU-labeled nascent RNA, we used the Click-iT nascent RNA Capture Kit (cat. no. c10365, Thermo Fisher Scientific): biotin azide was attached to the ethylene groups of the EU-labeled RNA using Click-iT chemistry. The EU-labeled nascent RNA was purified using MyOne Streptavidin T1 magnetic beads. Captured EU-RNA attached on streptavidin beads was immediately subjected to on-bead sequencing library generation using the TruSeq mRNA Sample Preparation Kit v2 (Illumina) according to the manufacturer’s protocols with modifications. The first steps of the protocol were skipped; directly on-bead complementary DNA (cDNA) was synthesized by reverse transcriptase (Super-Script II) using random hexamer primers. The cDNA fragments were then blunt-ended through an end-repair reaction, followed by dA-tailing. Subsequently, specific double-stranded barcoded adapters were ligated and library amplification for 15 cycles was performed. PCR libraries were cleaned up, measured on an Agilent Bioanalyzer using the DNA1000 assay, pooled at equal concentrations and sequenced per three in one lane on a HiSeq 2500.

ChIP–seq library generation and sequencing

Snap-frozen liver was minced in ice-cold PBS, homogenized in a Dounce homogenizer and filtered through a cell strainer (Falcon). After adding formaldehyde (total 1%), the homogenate was shaken on ice for 10 min and quenched with glycine. Pelleted homogenate was washed with ice-cold PBS, resuspended in cell lysis buffer (0.25% Triton X-100, 10 mM EDTA, 0.5 mM EGTA, 20 mM HEPES, pH 8.0, cOmplete EDTA-free and PhosSTOP, Sigma-Aldrich) and incubated for 10 min on ice. Samples were centrifuged and resuspended in nuclei lysis buffer (0.15 M NaCl, 1 mM EDTA, 20 mM HEPES, pH8.0, cOmplete EDTA-free and PhosSTOP). Samples were further homogenized by Dounce homogenizer and incubated on ice for 10 min. The nuclear fraction was resuspended in sonication buffer (50 mM HEPES, pH 7.8, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate, 1% sodium dodecyl sulfate, cOmplete EDTA-free and PhosSTOP). The chromatin was sonicated with a Bioruptor (Diagenode) sonicator into 100–500 bp fragments and centrifuged to remove any remaining cell debris. From the supernatant, 15 µg chromatin was used for one round of immunoprecipitation. For ChIP–seq, five samples were pooled. Dynabeads M-280 Sheep Anti-Rabbit IgG beads were used for the immunoprecipitation step. Chromatin samples were precleared with beads at 4 °C, for 2 h. The precleared chromatin samples were rotated overnight at 4 °C with the RNAPII antibodies: RNAPII-ser2p, RNAPII-ser5p or RNAPII RPB1-NTD-specific antibody (clone D8L4Y, Cell Signaling Technology). Dynabeads were added for 2 h to the samples to pull down protein–DNA complexes. After immunoprecipitation, samples were washed twice with the following buffers: sonication buffer, twice with buffer A (50 mM HEPES, pH 7.8, 500 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS, cOmplete EDTA-free and PhosSTOP), twice with buffer B (20 mM Tris, pH 8, 1 mM EDTA, 250 mM LiCl, 0.5% NP-40, 0.5% sodium deoxycholate, cOmplete EDTA-free and PhosSTOP) and finally twice with Tris-EDTA buffer (10 mM Tris, pH 8, 1 mM EDTA). The bound fraction of the chromatin was isolated using the IPURE DNA recovery for ChIP Kit (Diagenode). Sequencing libraries were generated using the Illumina TruSeq ChIP Library Preparation Kit. Samples were sequenced on the HiSeq 4000 platform.

Sequence read mapping

EU-seq reads were preprocessed with the quality control software FastQC v.0.11.9, FastQScreen v.0.14.0 and Trimmomatic v.0.35 (ref. 63) using the parameters: SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 ILLUMINACLIP:adapter.fa:2:30:10 LEADING:3 TRAILING:3 MINLEN:36. The remaining reads were successively aligned to the mouse ribosomal DNA (BK000964.3), mitochondrial sequences (UCSC, mm10) and mouse reference genome (GRCm38/mm10) using Tophat2 v.2.0.9 (ref. 64) with default settings except for the -g 1 option. ChIP–seq reads were aligned to the mm10 mouse reference genome using Bowtie65 v.2.1.0. The public total RNA-seq dataset applied the same mapping algorithm with EU-seq using the corresponding reference genome (hg19 and ce10) to study the nascent RNA dynamics in aging among species. The same mapping algorithm as used with ChIP–seq was applied to the other public data.

Definition of unique intron, exon, gene regions and gene groups

All RefSeq (release: 95) genes, exons and introns were extracted from the UCSC Genome Browser66 and the gene lists were collapsed to the longest transcript for each gene. Genes with regions overlapping another coding or noncoding gene were removed. Thus, genes having only regions unique to a specific RefSeq gene were used for further analysis. In some experiments as indicated in this study, specific genomic regions (from TSS to 1 kb downstream; intronic regions only; from TSS to 20 kb downstream; 3′UTR; first and last exon of expressed genes; around TSS region (−0.75 kb to +0.75 kb and −0.3 kb to +0.3 kb) were generated in the same manner. To investigate the productive elongation process per gene, genomic regions around the TSS and TTS of genes were divided into k proportional bins (k = 20 by default; due to the data quality in different datasets, the number of bins varies (details following)). Genes with length smaller than 10 kb were removed from the study to avoid too many reads mapping to short genes overlapping bins in the gene proportional elongation analysis.

A Python pipeline ( was created that takes aligned RNA/EU/ChIP–seq reads in BAM/BW format as input to quantify reads in the transcription elongation region of genes and HTSeq was performed for read quantification in the aforementioned genomic regions. Reads per million (RPM) was applied to normalize different sequencing libraries to exclude technical variation (especially sequencing depth) in further studies. The ‘all genes’ gene set comprises all genes with at least one read mapping in the first kilobase. A gene set with genes that have at least 1 RPM in each of the 20 bins was constructed and termed ‘all expressed genes’. To study read distribution across gene bodies and due to sequence depth and data quality, different number of bins (k) and the number of all expressed genes (n) were selected for each EU-seq and public total RNA dataset collection. Therefore, the ‘all expressed genes’ set in WT aging contain: n = 3,970 genes and were divided into k = 20 bins (>90% of all EU-seq reads are mapped to these genes). Ercc1Δ/− mice (k = 20, n = 2,430); Xpg−/− mice (k = 20, n = 3,842); UV-treated Ercc1Δ/− MDFs (k = 10, n = 1974); WT aging kidney (k = 20, n = 2,135); human tendon (k = 5, n = 773); C. elegans (k = 5, n = 2,872). To match WT aging EU-seq data with the corresponding RNAPII ChIP–seq data (generated from the same liver), the corresponding genes from the ‘all expressed genes’ gene set were also selected in the RNAPII ChIP–seq datasets. The intra-sample-specific background was determined by calculating the reads in the intergenic regions and proportionally removed. The overall background signal was subtracted using the DNA input samples. To biologically define the ‘all expressed genes’ (n = 3,970) in WT aging, we performed a k-mean clustering analysis combined with EU-seq and total ChIP–seq reads between adult and old samples. Under the criterion describes in Extended Data Fig. 3a, we defined the four main patterns found in k-mean cluster analysis as four biological groups: promoter-upregulated genes, n = 778 (EU-seq and RNAPII ChIP–seq level increased across three bins); promoter-downregulated genes, n = 394 (EU-seq and RNAPII ChIP–seq level decreased across three bins); GLPThigh genes, n = 914 (steep EU-seq level progressive decrease, steep RNAPII ChIP–seq level increase across three bins); remainder genes, n = 1,884 (mild EU-seq level progressive decrease, mild RNAPII ChIP–seq increase across three bins). To study the relationship between gene length and transcriptional stress phenotype, the expressed genes (n = 3,970) in WT mice were divided into six groups according to their length, each containing a similar number of genes: 10–22 kb (n = 662, average = 16.47 kb, median = 16.75 kb); 22–30 kb (n = 644, average = 26.87 kb, median = 26.94 kb); 30–50 kb (n = 788, average = 40.19 kb, median = 39.68 kb); 50–70 kb (n = 587, average = 59.18 kb, median = 59.02 kb); 70–110 kb (n = 643, average = 87.93 kb, median = 86.75 kb) and >110 kb (n = 646, average = 199.47 kb, median = 160 kb). In figures measuring gene class behavior, we first calculated the per gene the average signal from n = 3 mice followed by averaging the signal for all genes in the gene class.

Gene function enrichment analysis

Gene ontology and functional clustering analyses of TShigh genes were performed by using multiple databases and software: Ingenuity Pathway Analysis, GSEA v.4.2.2 that also includes the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome Pathway Databases, Aging Perturbations from GEO and the down datasets from Enrichr27,28,67,68. Datasets in the Aging Perturbations from the GEO and the down datasets from Enrichr were only included if young/adult >8 weeks, old is >14 months and age difference between young/adult and aged organs is >6 months (mouse, rat); human old is >56 years with at least an age difference of >12 years. We adopted a threshold FDR < 0.05. Aggregated P values for the main identified biological processes were calculated by combining the P values of the corresponding detected subpathways using Fisher’s exact test.

Genome-wide characterization of the aging transcriptome

The RNAPII travel ratio is the ratio between RNAPII on the TSS and RNAPII in the first 1-kb gene body in old and adult. The total RNAPII density value around the TSS (±300 bp) was divided by the total RNAPII density value on the first 1-kb gene body measured from 300 bp downstream of TSS for each gene. To compare nascent RNA synthesis from the first 1-kb region of the first introns or from the TSS to 1 kb downstream of all genes between adult and old mouse livers, all datasets were only normalized based on sequencing read depth, not correcting for the approximately 1.5-fold reduced number of intronic sequences in aged liver. To measure productive transcription elongation, all expressed genes were divided into k proportional bins, where mean read counts from the EU-seq and RNAPII ChIP–seq dataset was calculated. Counts were subsequently normalized to the first bin and plotted. The percentage density changes per bin were calculated by: old(readcount) / adult(readcount) × 100%. Since the first and last bin includes the signal at the TSS in which RNAPII promoter proximal pausing is present, and the TTS in which RNAPII accumulates, we defined the middle 18 bins as the transcription elongation phase. The 3-bin heatmap is derived from the 18-bin data by aggregating 6 subsequent bins and calculating log2 fold changes of every gene between old and adult samples for both EU-seq and total RNAPII ChIP–seq reads. To determine the percentage increase RNAPII stalling or unproductive RNAPII in aged livers (Extended Data Fig. 3f), we assumed a baseline in adult livers in which the total nascent RNA level in the elongation phase (18 bins) is the result of the total RNAPII levels in the elongation phase (18 bins). The mean relationship across n = 3 mice is set as the baseline. Since there is an increase in RNAPII levels and a reduction in nascent RNA levels across the gene body in aged liver, the expected number of total RNAPII ChIP–seq reads was calculated that should support these nascent RNA levels based on the adult liver baseline, that is, the ratio of EU-seq read counts and total RNAPII ChIP–seq read counts in the elongation phase (18 bins). Subsequently, the observed total RNAPII ChIP–seq read count per sample was determined in each aged liver sample. Then, we subtracted the expected total RNAPII ChIP–seq read count from the observed total RNAPII ChIP–seq read count and divided this by the expected read count in aged liver: RNAPII stalling in aging (%) = (observed RNAPII read count − expected RNAPII read count) / expected RNAPII read count × 100%. To analyze nucleotide composition, the top 50 genes from GLPThigh and bottom 50 from the remainder genes were selected within the length range of 70–110 kb and were divided into 35 bins from TSS to TSS + 70 kb (2 kb per bin). The nucleotide composition percentage (cytidine, thymine, adenine and guanine) was determined by Qualimap v.2.21 (ref. 69). The transcription error ratio in the EU-seq datasets was calculated using BioConductor seqTools (R v.1.2.0. and IRanges packages)70. Total error rates were calculated as the percentage of total reads with a mismatched base at each read position during the alignment step. Analysis of EU-seq read abundance at splicing donor and acceptor sites was carried out using a custom-written script: Splicingdonor& in which the expression values from ±49 bp around the splicing donor and acceptor site for all selected genes were captured by the HTSeq v.0.6.0. Alternative splicing events were detected by Astalavista v.4.0 (ref. 71) with default settings. DNA methylation status was detected by Qualimap v.2.21 and deepTools v.2.0 (ref. 72). Average RNAPII profiles at promoters (±750 bp around the TSS) and average histone modification profiles (H3K27ac, H3K4me3 and DNA methylation) at the TSS and gene bodies were plotted using HOMER v.4.11 software ( command)73.

To calculate the number of RNAPII stalled on a lesion compared to queuing behind the initially stalled RNAPII, we first estimated the number of DNA lesions in our expressed genes dataset, which has a total length of 280,010,046 bp. With a lesion density of 1.6 per 100,000 bp in a diploid genome, we expect approximately 8,960 DNA lesions. Since DNA lesions are equally occurring on both the coding and template strands and the latter is only important for RNAPII stalling, there are 4,480 DNA lesions in the template strand of the selected gene set. If we assume that all DNA lesions are obstructing an RNAPII complex and we have an estimated 18,000 stalled RNAPII complexes per cell, we estimate that for every RNAPII stalled on a DNA lesion three RNAPII complexes are queuing. The strand bias analysis in the RNAPII ChIP–seq data was done as described elsewhere38, which is based on the observation that PCR amplification of RNAPII ChIP–seq libraries is biased toward the coding strand if there is a transcription-blocking DNA lesion in the template strand on which RNAPII is stalled, with some modifications. We first monitored whether our specific ChIP–seq protocol could detect strand bias and optimized the analysis by using our previously published RNAPII ChIP–seq data after UVB treatment74. In short, forward and reverse reads from RNAPII ChIP–seq were separated and processed by SAMtools v.1.9 (ref. 75) and counted by BEDtools v.2.27.1 (ref. 76). For each gene in the selected gene set, we first corrected for the orientation of the template strand (forward/reverse strand) because genes are located on both the forward and reverse strand. Then, we calculated for each gene in the dataset the fraction bias toward the coding strand and subsequently the strand bias was calculated across all expressed genes in the gene set.

Statistical reproducibility and modeling

In vitro experiments are based on triplicates of independent experiments and the plots are presented as the means, unless otherwise indicated. Details of the statistical tests and quantifications used in this study are described in the corresponding parts of the main text, figure legends or Methods. Data distribution was assumed to be normal but this was not formally tested. All statistical tests were performed with Prism or the packages or functions implemented in R (edgeRpackage and fisher.test functions) except for the enrichment analysis with, GSEA, Enrichr and Ingenuity Pathway Analysis, which were performed by and thoroughly described in their Web applications.

A statistical and probabilistic framework was generated for EU incorporation for a range of distances between EU molecules, and in case of a 1.5-fold reduction for a range of EU incorporation distance differences. The probability that at least one EU is incorporated into nascent RNA was modeled in the situation where there is a 1.5-fold reduction in EU incorporation in old mouse livers due to lower or slower uptake or processing. The assumptions were: (1) as there is at least a >400-fold surplus of biotin for every incorporated EU in nascent RNA in the Click-iT reaction, the reaction is saturated or follows the same asymptote; (2) only one EU incorporation per RNA molecule is sufficient to isolate that specific molecule; (3) EU incorporation is a stochastic process in which the concentration of available EU in the total nucleotide pool linearly correlates with the distance between EU molecules in the nascent RNAs. If there is an EU availability difference between adult and old mice, it is expected that in short RNA species (≤300 nucleotides) the probability of at least one EU incorporation is significantly lower and thus we would empirically observe a lower percentage sequence read mapping to such small RNA species in aged liver. The process of EU incorporation was modeled into nascent RNA species by means of a Poisson process. Specifically, one can think of the number of EU incorporations into nascent RNA as a Poisson process not in time, as it is generally used, but in length as measured in nucleotides. Mathematically, if X(t) is a Poisson process then the probability that there is no event in a time interval (0,t) reads exp(-λt) where λ is the intensity of the Poisson process. Equally, the probability that there is at least one event in the time interval (0,t) is thus 1 − exp(-λt). For each RNA species in our specified RNA length classes identified in the EU-seq datasets, the probability that at least one EU has been incorporated was subsequently computed using the formula above. Clearly, since \(1 – {\mathrm{e}}^{ – {\textstyle{1 \over x}}} > 1 – {\mathrm{e}}^{ – {\textstyle{1 \over y}}}\) for all x < y, Poisson processes with higher intensity will necessarily exhibit a larger probability that at least one EU has been incorporated than Poisson processes of lower intensity. Three groups of RNA species were examined: (1) ≤300 nucleotides (number of RNA species n = 7,932); (2) between 1,000 and 3,000 nucleotides (number of RNA species, n = 1983); and (3) between 2,000 and 4,000 nucleotides (number of species, n = 1,802). The number of RNA species reflect the total number present in the Mus musculus genome database (Ensembl). The latter two classes, although still representing short RNA species, are incorporated as a positive control in which a difference, if there is 1.5-fold less EU available, is not expected. In all cases, the probability vectors were not Gaussian as calculated by Kolmogorov–Smirnov test; thus, for each fixed intensity of the underlying Poisson process, the median and interquartile range (IQR) for the probability that at least one EU is incorporated are calculated. Significance between 1.5-fold-apart intensities was calculated by the Mann–Whitney U-test.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

Leave a Comment