Mini-Review: Alternative Proteoform Prediction
It has been estimated that there are approximately 20,000 protein-coding genes in humans, however, when considering splice variants, the number of proteins can be estimated at around 70,000. Moreover, post-translational modifications including phosphorylation, glycosylation, among many others, can also produce thousands of protein variants. Each of these forms of expressed proteins are termed proteoforms. Identifying and characterising the expanse of these proteins proves challenging. Although, the detection of proteoforms has been limited in the past, the methods of measuring and identifying these proteins continue to evolve (1).
The previous standard of proteomic analysis – the “bottom-up” strategy devised by Eng and Yates over 2 decades ago – does not directly predict alternative proteoforms (2). The current demands for such analysis would require the identification and quantification of proteoforms, and the information gained would be of great benefit to the biomedical industry. The current standards of proteomics involve tandem mass spectrometry, resulting in invaluable knowledge on protein expression, however, gene products and proteoforms can contain the same peptide and direct information of the proteoforms may be lost. On the other hand, the “top-down” strategy involves the study of whole proteins via tandem mass spectrometry methods, though with each method comes its own limitations. RNA sequencing can be used to build sample-specific databases of proteoforms, and integration of this method with the existing strategies can provide a more comprehensive analysis (3).
Alternative proteoforms may also arise from translation of a single mRNA due to a number of different translation initiation sites (TISs), and sequencing information alone may not be enough for precise identification of TISs and in turn, alternative proteoforms. These proteoforms can not be detected using RNA-Seq, however, ribosome profiling or Ribo-Seq presents a significant advantage in the prediction of alternative proteoforms by identifying translated open reading frames (ORFs) (4).
In the majority of eukaryotic mRNAs, typically a single TIS is annotated for the main protein-coding ORF. Although, because of leaky scanning, downstream AUG codons can be used as TISs, and if in the same reading frame as the annotated TIS, the resulting alternative proteoforms may be truncated at their N-termini compared to the annotated proteoforms (5). An example of a truncated proteoform uncovered using ribosome profiling is the human PRKAA1 gene (6, 7). Alternative proteoforms with N-terminal extensions can arise via translation initiation on upstream in-frame AUG codons, though this is less likely as the first in-frame AUG codon is usually annotated as the TIS (5).
Alternative proteoforms with N-terminal extensions have also proven difficult to predict as usually it is non-AUG codons that are used for initiation. Although, with the human tumor suppressor PTEN gene, a 173 amino acid N-terminal extension with CUG initiation allows for secretion of an alternative proteoform of PTEN (8, 9, 10). Further analysis of the PTEN 5’leader gave rise to the discovery of at least two more non-AUG-initiated proteoforms of PTEN that were found in several human cell lines (11).
Proteoform prediction will continue to advance as the current knowledge of proteoforms in nature increases, and a comprehensive map of these proteoforms in humans and other species is already in progress, and will lead to unique insights and ultimately aid in diagnostics and therapeutics (3). The power of ribosome profiling can be used to predict such alternative proteoforms, by utilising elongating and initiation Ribo-Seq data to analyse ribosome occupancy at the 5’ends of protein coding genes as well as AUG and non-AUG TISs.
References:
- Aebersold R, Agar JN, […] Zhang B. How many human proteoforms are there? Nat Chem Biol. 2018;14;206–214.
- Eng JK, McCormack AL, Yates III JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5(11);976–989.
- Smith LM, Kelleher NL. Proteoforms as the next proteomics currency. Science. 2018;359(6380);1106–1107.
- Ingolia NT, Ghaemmaghami S, Newman JR, Weissman, JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324(5924);218–223.
- Michel AM, Ahern AM, Donohue CA, Baranov PV. GWIPS‐viz as a tool for exploring ribosome profiling evidence supporting the synthesis of alternative proteoforms. Proteomics. 2015;25(14);2410–2416.
- Lee S, Liu B, Lee S, Huang SX, Shen B, Qian, SB. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. PNAS. 2012;109(37);14728–14729.
- Van Damme P, Gawron D, Van Criekinge W, Menschaert G. N-terminal Proteomics and Ribosome Profiling Provide a Comprehensive View of the Alternative Translation Initiation Landscape in Mice and Men. Molecular & Cellular Proteomics. 2014;13(5);1245–1261.
- Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV. Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences. Nucleic Acids Research, 2011;39(10);4220–4234.
- Hopkins BD, Fine B, […] Parsons R. A Secreted PTEN Phosphatase That Enters Cells to Alter Signaling and Survival. Science. 2013;341(6144);399–402.
- Liang H, He S, Yang J, Jia X, Wang P, Chen X, Zhang Z, Zou X, McNutt MA, Shen WH, Yin Y. PTENα, a PTEN isoform translated through alternative initiation, regulates mitochondrial function and energy metabolism. Cell Metabolism. 2014;19(5);836–48.
- Tzani I, Ivanov IP, Andreev DE, Dmitriev RI, Dean KA, Baranov PV, Atkins JF, Loughran G. Systematic analysis of the PTEN 5′ leader identifies a major AUU initiated proteoform. Open Biology. 2016;6(5);150203.