Micropeptide Prediction

Micropeptides are peptides that are generally considered to be less than 150 amino acids in length. The identification of these short peptides has been limited by their size, abundance, and the current standard parameters for defining protein-coding regions. They are noteworthy in that the evolutionary conservation of the ORFs protein coding potential tends to be less than that observed for protein coding regions. Micropeptides have been well-documented in prokaryotes, and also have been found to play a key part in eukaryotic development and physiology. In Escherichia coli, for example, the 49-amino-acid protein AcrZ renders cells less susceptible to specific antibiotics. In mammals, the small proteins myoregulin, sarcolipin, and phospholamban affect calcium transport, thereby regulating muscle activity.

Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes

Cell, 2011; 147(4), pp.789-802
Ingolia, N.T., Lareau, L.F. and Weissman, J.S.

Advances in DNA sequencing technology have enabled the gathering of comprehensive genomic information in a cost-efficient and time-saving manner. However, decoding the information in these genomes continues to pose challenges. Here, the authors utilise ribosome profiling of mammalian systems to provide a genome-wide map of protein synthesis as well as a pulse-chase strategy for determining translation elongation rates.

Key Findings

    • Identification of a wide range of unannotated or modified ORFs, including highly translated short ORFs in the majority of annotated lincRNAs.
    • Annotation of translation start sites using a modified ribosome profiling strategy. This relies on the treatment of cells with harringtonine which causes ribosomes to accumulate precisely at initiation codons.
    • Identification of over a thousand strong translational pauses that could act as key regulatory sites.
    • Estimation of codon decoding rate of approximately 6 codons per second.


These studies reveal a complexity to mammalian proteomes that had yet to be seen. This approach is applicable to other cells and organisms with potential use in decoding complex genomes, monitoring rates of protein production, and exploring the molecular mechanisms of translation regulation.

Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation

EMBO Journal, 2014; 33, pp.981-993
Bazzini, A.A., Johnstone, T.G., Christiano, R., Mackowiak, S.D., Obermayer, B., Fleming, E.S., Vejnar, C.E., Lee, M.T., Rajewsky, N., Walther, T.C. and Giraldez, A.J.

The comprehensive identification of small ORFs has proven difficult and has relied on evolutionary conservation, known patterns of codon occurrence and use of mass spectrometry. Here, the authors have utilised the periodicity of ribosome movement on the mRNA to define actively translated ORFs by means of ribosome footprinting.

Key Findings

    • The authors developed ORFscore, a metric to indicate the likelihood that each ORF was actively translated based on the framing bias in ribosome profiling data.
    • The authors experimentally identified hundreds of translated ORF regions that encode micropeptides. Specifically, they defined 303 ORFs in genes previously annotated as non-coding, 311 ORFs in the 5′UTR and 93 in the 3′UTR, several of which were then validated by mass spectrometry.
    • A statistically significant (though small) overlap between evolutionary conservation of small ORFs (as measured by phyloCSF) and likelihood of translation as measured by ORFscore.


The identification of hundreds of translated smORFs significantly expands the set of micropeptide-encoding vertebrate genes, prompting future investigation of their function in vivo.

Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression

eLife, 2015, 2015;4:e03971
Andreev, D.E., O’Connor, P.B., Fahey, C., Kenny, E.M., Terenin, I.M., Dmitriev, S.E., Cormican, P., Morris, D.W., Shatsky, I.N. and Baranov, P.V.

Eukaryotic cells have the ability to rapidly reduce protein synthesis in response to stress conditions by means of phosphorylation-mediated inactivation of a key translation initiation factor, eukaryotic initiation factor 2 (eIF2). This consists of a key waypoint of the integrated stress response (ISR). However, the translation of certain mRNAs must be maintained for an adequate stress response. In this study, the authors carried out ribosome profiling of human cells under severe stress conditions, induced with sodium arsenite.

Key Findings

    • While there was a 5.4-fold general translational repression, the protein coding open reading frames (ORFs) a small number of mRNAs were resistant to the inhibition.
    • Almost all of such resistant transcripts possess at least one efficiently translated upstream open reading frame (uORF) that represses translation of the main coding ORF under normal conditions.
    • Phylogenetic analysis suggests that at least two regulatory uORFs (in SLC35A4 and MIEF1) encode functional protein products.


Confirms previous work of the importance of uORFs for the regulation of genes resistant to repression during ISR. The ratio of mapped footprints between the uORF and main ORF translation changes in both bicistronic mRNAs identified in this study (MIEF1 and SLC35A4), upon eIF2 inactivation. This may be advantageous for coordination of their expression.


A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation

Molecular Cell, 2015; 60(5), pp.816-827
Fields, A.P., Rodriguez, E.H., Jovanovic, M., Stern-Ginossar, N., Haas, B.J., Mertins, P., Raychowdhury, R., Hacohen, N., Carr, S.A., Ingolia, N.T., Regev, A. and Weissman, J.S.

Previously, proteins were generally assumed to to be conserved, not to overlap, and to exceed a minimum length. However proteins that do not abide by these parameters are being uncovered at an increasing rate. Similarly, alternative initiation can produce an N-terminally truncated or extended version of a protein that behaves differently from the canonical form. Here, the authors describe the ORF Regression Algorithm for Translational Evaluation of RPFs (ORF-RATER). This framework uses ribosome profiling data to identify and quantify translation from CDSs regardless of start codon, length, or overlap with other CDSs.

Key Findings

    • The protein-coding potential of all ORFs in a transcriptome was assessed by ORF-RATER, revealing translated CDSs that were overlooked by existing annotation pipelines, including those that are short, overlapping, or that do not initiate at AUG codons.
    • This approach was applied in lipopolysaccharide-stimulated mouse dendritic cells and HCMV-infected human fibroblasts. Although translation of these unannotated CDSs appears to be conserved from mouse to human, the majority (∼60%) of them give rise to polypeptides lacking evidence of codon-level conservation. Instead, what appears to be maintained is the length of the open reading frame, suggesting that in these cases translation may be conserved for regulatory purposes.
    • Identification of a GUG-initiated N-terminal extension for Fxr2. This encodes for the fragile X mental retardation syndrome-related protein 2.


This work reveals an unforeseen complexity to mammalian translation suited to provide both conserved regulatory or protein-based functions.

Identifying Small Proteins by Ribosome Profiling with Stalled Initiation Complexes

mBio, 2019, 10(2), e02819-18
Weaver, J., Mohammad, F., Buskirk, A.R. and Storz, G.

Small proteins of 50 or fewer amino acids have been found to be key regulators of larger proteins in prokaryotes and eukaryotes. In spite of this, the extent of small proteins remains to be seen as existing annotation pipelines usually do not take small open reading frames into account. The identification, characterization, and purification of microproteins continues to be hindered due to their small size. This group has previously uncovered a number of small proteins in Escherichia coli using bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. In this research article, the authors present an empirical approach for discovering new proteins, availing of the recent advances in ribosome profiling.

Key Findings

    • Novel approach to identifying translational start sites. The approach uses two treatments that both stall ribosomes at initiation sites (Onc112 and retapamulin).
    • 41 high confidence putative small ORFs were identified. Subsequent analysis revealed protein synthesis was detected for all but three. Therefore, the use of ribosome profiling with stalled initiation complexes enabled the discovery of 38 new small proteins.
    • The corresponding 38 genes are mostly intergenic but are also found antisense to other genes,  overlapping other ORFs, and in operons.
    • Small ORFs were found to overlap the 5′ ends of larger protein-coding genes. Further analysis indicates that they could have a regulatory role for downstream ORFs.


This article reveals a complex gene organisation in prokaryotes, where translation of small upstream ORFs and antisense genes may have regulatory function. The use of two approaches to capture initiation ribosomes results in a significant reduction of false positive detections of initiation. These findings have contributed to defining the small proteome in less-well-characterized bacteria, which will help to understand the regulation that allows the growth and survival of these organisms.

Scroll to Top