Revolutionizing Gene Annotation: The Power of Ribosome Profiling

By Gary Loughran, Head of Biology

Gene annotation is a cornerstone of genomics, providing insights into the functional elements of the genome and enabling researchers to understand the roles of genes in health, disease, and development. However, due to the complexity of many genomes, gene annotation is a daunting task, and traditional methods often fall short in capturing the full complexity of the genome. One of the key objectives of gene annotation is to define the functional elements of the genome, which primarily involves delineating the boundaries of protein-coding sequences. Annotators must first define the transcriptome, as only RNA transcripts can be used as templates for protein synthesis. In this regard, RNA sequencing (RNA-Seq) has revolutionized transcriptomics, shifting the next major challenge to defining the translated regions within mRNAs, known as translons. Enter ribosome profiling (Ribo-Seq), a cutting-edge technique that offers unparalleled precision in identifying translated regions of the transcriptome. This blog delves into the transformative potential of ribosome profiling for gene annotation, exploring its methodology, applications, and the profound implications for genomic research.

The Basics of Ribosome Profiling

Ribosome profiling involves sequencing the regions of mRNAs that are within ribosomes at the time of the experiment, providing a high-resolution snapshot of ribosome positions across the transcriptome (Ingolia et al., 2009). This technique allows researchers to identify not only which regions of mRNAs are being actively translated into proteins but also, due to the sub-codon resolution afforded by ribosome profiling, to infer which reading frame is being translated. This makes ribosome profiling an especially powerful tool for translated open reading frame (ORF) prediction, which is invaluable for annotators.

Some Translation-specific Challenges in Gene Annotation

Traditional gene annotation methods rely heavily on comparative genomics. If the longest ORF of a particular mRNA encodes a similar amino acid sequence as an orthologous mRNA, then annotation would seem straightforward. While this approach is valid for annotating most protein-coding regions, there are some noteworthy limitations:

  1. ORF Definition: Some define an ORF as the region between an AUG and the next in-frame stop codon, while others define it as the region between a stop codon and the next in-frame stop codon. Only the latter approach can account for the possibility of initiation at non-AUG codons.

  2. ORF Length: Short ORFs can be dismissed because their small size may not be sufficient to detect significant similarity with their orthologs (i.e., are not conserved).

  3. ORF Choice: Two ribosomes translating the same mRNA molecule can translate different ORFs and produce different proteins.

Ribosome Profiling: A Game-Changer for Gene Annotation

Ribosome profiling can help address some of these challenges by providing empirical evidence of translation, refining and enhancing gene annotation in several key ways:

  1. ORF Definition: Assuming that translation starts exclusively at an AUG triplet can result in misannotation. Ribosome profiling, particularly when performed with compounds like harringtonine or lactimidomycin that arrest ribosomes at initiation sites, has revealed that near-cognate initiation (which differs from AUG by a single nucleotide) is pervasive (Ingolia et al., 2011). Therefore, protein coding prediction tools should be trained to detect in-frame protein-coding signatures upstream of the predicted AUG start codon. Potential N-terminal extensions could be particularly important since many subcellular targeting signals are located at the N-terminus. Generally, the efficiency of near-cognate initiation is much lower than that of cognate (AUG) initiation. In eukaryotes, the small subunit of the ribosome is loaded onto the extreme 5’ end of the mRNA before scanning in the 3’ direction in search of a suitable start site. Most ribosomes encountering a near-cognate triplet will simply inspect it, reject it, and continue scanning. An mRNA with a near-cognate codon upstream of, and in-frame with, the predicted AUG initiation codon could encode two proteins with different N-termini from the same mRNA (but different ribosomes). Ribosome profiling can detect such instances where there is a low density of mapped reads that dramatically increases just after an in-frame AUG codon. However, several unusual mRNAs have been identified in ribosome profiling data where protein synthesis is exclusively initiated at a near-cognate triplet (Fedorova et al., 2022). Here, the next downstream AUG is generally in a different reading frame and likely acts as a ribosome sink for those ribosomes that fail to initiate at the near-cognate triplet.

An analogous scenario can occur at the 3’ end of a predicted ORF when ribosomes encountering a weak stop codon continue translation to encode a C-terminal extension. This readthrough of stop codons is generally inefficient, again leading to the synthesis of two proteins (with different C-termini) from the same mRNA. Several novel instances of stop codon readthrough have been detected by ribosome profiling (Dunn et al., 2013).

  1. ORF Length: Until very recently, annotators searching for protein-coding ORFs generally excluded those with fewer than 100 codons, even though they could potentially encode functional microproteins. Although somewhat arbitrary, the rationale for requiring a cut-off can be readily justified by the increasing power of comparative genomics with longer sequences. This raises another challenge faced by annotators: how to detect non-conserved translated ORFs or translons. The 100-codon cut-off has likely resulted in the misclassification of many mRNAs as non-coding RNAs. Ribosome profiling does not have such limitations and excels in identifying small ORFs, pinpointing even the briefest translation events and highlighting regions where microproteins are being synthesized (Chothani et al., 2023). Ribosome profiling is also agnostic to whether an ORF is conserved. While conservation implies functionality, lack of conservation does not preclude functionality. Translated ORFs initiated within the supposed 5’ untranslated region (uORFs for upstream ORFs) can significantly impact the number of small ribosomal subunits that reach the longest, and often the main, protein-coding ORF. These uORFs could hardly be described as non-functional, yet only a handful of uORFs encode conserved peptides. Here, it is the act of translation rather than the product of translation that is important. Of course, some of these uORF products could be shaped over time to acquire function, and there are several such examples (Andreev et al., 2015a; Wang et al., 2018; Rathore et al., 2018). Ribosome profiling has revealed that uORF translation occurs on most human mRNAs and is likely prevalent in most higher eukaryotes (Ingolia et al., 2009). Many uORFs are initiated at near-cognate AUGs; few produce conserved peptides, posing a significant challenge for annotators. The unpredictability of uORF translation may be a critical consideration for designers of therapies relying on protein expression, such as mRNA therapeutics.

  2. ORF Choice: In higher eukaryotes, predicting the products of protein synthesis from an mRNA sequence isn’t always straightforward because translation is not a deterministic process. Different ribosomes can synthesize completely different products from the same mRNA molecule, sometimes even from the same ORF. For example, some scanning ribosomes may initiate translation at a near-cognate codon within the 5’ untranslated region (5’UTR), while other ribosomes may bypass this codon, continuing to search for a more optimal start codon. In a phenomenon known as ribosomal frameshifting, some ribosomes shift reading frames during elongation, producing two products with identical N-termini (Atkins et al., 2016). While frameshifting is rare in vertebrate genes, it is often employed by viruses. Another example of atypical translation is stop codon readthrough, where a tRNA, rather than a release factor, recognizes the stop codon, allowing ribosomes to continue translation until the next in-frame stop codon is encountered. This decoding complexity poses a significant challenge for annotators. Furthermore, environmental conditions such as stress (Andreev et al., 2015a, 2015b), or the abundance of certain translation factors can completely alter the translatome (Fijalkowska et al 2017). Since ribosome profiling can readily detect instances of uORF translation and stop codon readthrough, it can greatly enhance the predictive power for annotators. Frameshifting is more challenging to detect by ribosome profiling, but higher frameshifting efficiency and/or long ORFs in the shifted reading frame can help identify such instances (Michel et al., 2012).


Ribosome profiling is transforming the field of gene annotation by providing high-resolution data on sites of mRNA translation, uncovering previously unannotated translated ORFs, and reclassifying ncRNAs (Mudge et al., 2022). The importance of ribosome profiling for enabling more accurate models for predicting the phenotypic effects of genetic variants affecting translation cannot be understated. As this technology continues to evolve, it promises to unlock new dimensions of the genomic landscape, paving the way for groundbreaking discoveries in biology, medicine, and biotechnology. The integration of ribosome profiling with other omics technologies and the development of advanced computational tools will further enhance its impact. As we continue to explore the hidden depths of the genome, ribosome profiling will undoubtedly remain at the forefront of genomic research, driving our understanding of the molecular mechanisms that underpin life.


Here is the list of references formatted with clickable DOI links:

  1. Andreev, D. E., O’Connor, P. B., Fahey, C., Kenny, E. M., Terenin, I. M., Dmitriev, S. E., Cormican, P., Morris, D. W., Shatsky, I. N., & Baranov, P. V. (2015a). Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression. Elife, 4, e03971.

  2. Andreev, D. E., O’Connor, P. B., Zhdanov, A. V., Dmitriev, R. I., Shatsky, I. N., Papkovsky, D. B., & Baranov, P. V. (2015b). Oxygen and glucose deprivation induces widespread alterations in mRNA translation within 20 minutes. Genome Biology, 16(1), 90.

  3. Atkins, J. F., Loughran, G., Bhatt, P. R., Firth, A. E., & Baranov, P. V. (2016). Ribosomal frameshifting and transcriptional slippage: From genetic steganography and cryptography to adventitious use. Nucleic Acids Research, 44(15), 7007-7078.

  4. Chothani, S., Ho, L., Schafer, S., & Rackham, O. (2023). Discovering microproteins: Making the most of ribosome profiling data. RNA Biology.

  5. Dunn, J. G., Foo, C. K., Belletier, N. G., Gavis, E. R., & Weissman, J. S. (2013). Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster. Elife, 2, e01179.

  6. Fedorova, A. D., Kiniry, S. J., Andreev, D. E., Mudge, J. M., & Baranov, P. V. (2022). Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals. Nature Communications, 13(1), 7910.

  7. Fijalkowska, D., Verbruggen, S., Ndah, E., Jonckheere, V., Menschaert, G., & Van Damme, P. (2017). eIF1 modulates the recognition of suboptimal translation initiation sites and steers gene expression via uORFs. Nucleic Acids Research, 45(13), 7997-8013.

  8. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R., & Weissman, J. S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science, 324(5924), 218-223.

  9. Ingolia, N. T., Lareau, L. F., & Weissman, J. S. (2011). Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell, 147(4), 789-802.

  10. Michel, A. M., Choudhury, K. R., Firth, A. E., Ingolia, N. T., Atkins, J. F., & Baranov, P. V. (2012). Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Research, 22(11), 2219-2229.

  11. Mudge, J. M., Ruiz-Orera, J., Prensner, J. R., Brunet, M. A., Calvet, F., Jungreis, I., Gonzalez, J. M., Magrane, M., Martinez, T. F., Schulz, J. F., Yang, Y. T., Albà, M. M., Aspden, J. L., Baranov, P. V., Bazzini, A. A., Bruford, E., Martin, M. J., Calviello, L., Carvunis, A. R., Chen, J., Couso, J. P., Deutsch, E. W., Flicek, P., Frankish, A., Gerstein, M., Hubner, N., Ingolia, N. T., Kellis, M., Menschaert, G., Moritz, R. L., Ohler, U., Roucou, X., Saghatelian, A., Weissman, J. S., & van Heesch, S. (2022). Standardized annotation of translated open reading frames. Nature Biotechnology, 40(7), 994-999.

  12. Rathore, A., Chu, Q., Tan, D., Martinez, T. F., Donaldson, C. J., Diedrich, J. K., Yates, J. R. 3rd, & Saghatelian, A. (2018). MIEF1 Microprotein Regulates Mitochondrial Translation. Biochemistry, 57(38), 5564-5575.

  13. Wang, Y. J., Vaidyanathan, P. P., Rojas-Duran, M. F., Udeshi, N. D., Bartoli, K. M., Carr, S. A., & Gilbert, W. V. (2018). Lso2 is a conserved ribosome-bound protein required for translational recovery in yeast. PLoS Biology, 16(9), e2005903.

Scroll to Top