The extent of Ds1 transposon to enrich transcriptomes and proteomes by exonization
© Charng and Liu; licensee Springer. 2013
Received: 25 October 2012
Accepted: 12 March 2013
Published: 21 August 2013
Exonization is an event which an intronic transposed element (TE) provides splice sites and leads to alternatively spliced cassette exons. Without disrupting of the inserted gene’s function, TEs can expand the proteome diversity by adding the splice variant that encodes a different, yet functional protein. Previously, we found that the main contribution of Ds exonization for gene divergence is not providing genetic messages but incorporating the intron sequences with different reading frame patterns to enrich the plant proteome. Ds1, another member of Ac/Ds transposon system, differs from Ds by providing 3 splice donor sites and 2 acceptor sites for alternative splicing, which may greatly increase the extent for proteome expansion.
In this study, we performed a genome-wide survey of Ds1 exonization events to assess its extent to enrich proteomes in plants. Each Ds1 insertion yielded 11 transcript isoforms by integrating the splice donor and/or acceptor sites, which composed a bulk of all exonized transcript orthologs from the dicot Arabidopsis thaliana and the monocot Oryza sativa (rice). The exonized transcripts were analyzed by the locations of the termination codon (PTC) and the putative targets for the nonsense-mediated decay (NMD) pathway were then excluded. Compared with the Ds element, Ds1 harbors more contents of non-NMD transcripts for protein isoforms.
The contribution of Ds1 exonization for gene divergence is incorporating the intron sequences with different reading frame patterns to enrich the plant proteome. All these simulation results direct new experimental analysis at the molecular level.
KeywordsAlternative splicing Ds1 transposon Exonization Nonsense-mediated decay pathway
Insertion of transposed elements (TEs) within eukaryotic genes is thought to be an important contributor to evolution and speciation (Sela et al., 2010). A well-known effect of TEs is to disrupt the function of the inserted gene, mostly in exons. However, TEs inserted into intronic sequences may not disrupt the target gene but, by alternative splicing (AS) and exonization, alter the regular splicing pattern of a pre-mRNA and result in the translation of new protein isoforms (Feschotte, 2008). With AS, the inserted TE interferes with the normal splicing of a gene’s transcribed region. With exonization, the inserted TE offers cryptic splice sites incorporated (exonized) as an alternative exon. While the prevailing original splice variant maintains functionality, the additional sequence, free from selection pressure, evolves a new function or eventually vanishes. If the new splice variant is advantageous, selection might operate to optimize the new splice sites and consequently increase the proportion of the alternative splice variant (Schmitz and Brosius, 2011).
Even in the absence of TE insertions, AS is a widespread phenomenon in higher eukaryotes. Eukaryotes can produce different mRNAs from a single gene transcript through the process of AS. More than 60% of human genes and around 20–30% of plant genes undergo AS (Campbell et al., 2006; Kim et al., 2007; Wang and Brendel, 2006). Yet, the extent to which AS leads to functional protein isoforms and to proteome expansion at large is still in dispute. (Severing et al. 2009) performed a detailed comparison of AS events in alternative spliced orthologs from the dicot Arabidopsis thaliana and the monocot Oryza sativa (rice) and revealed that AS has a limited role in functional expansion of the plant proteome. This conclusion was based on the ability of AS to add or delete functional protein domains. Those AS events, which result in small stretches of amino acids and therefore modify protein domains, need further structural and experimental analyses.
Unlike AS, exonization may insert portion(s) of the TE transcripts into the target gene and alter the reading frames to enrich the complexity of proteomes. Recent studies of exonization have mostly involved mammalian TEs in silico (Levy et al., 2008; Mersch et al., 2007; Mola et al., 2007; Sela et al., 2007). Many results also provided mechanistic insights into the process of exonization, especially 5’ and 3’ splice sites (i.e., splice donor/acceptor) formation in Alu exons (Krull et al., 2007; Lev-Maor et al., 2007; Ram et al., 2008; Sorek et al., 2004). In plants, we had assessed the ability of a TE to provide splice/acceptor sites by inserting a mini Ds transposon into each intron of the epsps gene (Charng et al., 2008; Huang et al., 2012) and found that Ds is biased in favor of providing splice donor sites from the beginning of the inserted Ds sequence. We also performed a genome-wide survey of Ds exonization to enrich transcriptomes and proteomes in plants (Liu and Charng, 2012) and found out up to 71% of the exonized transcripts were putative targets of the nonsense-mediated mRNA decay (NMD) pathway (Chang et al., 2007). Although the non-NMD exonized transcripts of Ds could be translated into abundant protein isoforms, it is interesting to study the extent to which the proteomes are triggered by another TE, Ds1, which can provide both donor and acceptor sites (Wessler, 1991).
In this study, we proposed a computational approach to genome-wide assess the role of Ds1 exonization in plants. We simulated a bulk of all exonized transcript orthologs from the dicot Arabidopsis thaliana and the monocot Oryza sativa (rice). The resulting exoinzed transcripts were divided into 5 types by location of the termination codon (PTC) (Figure 1). The protein isoforms of the exonized transcripts bypassing the NMD pathway were further classified as C-terminal or interior variants to reveal the possible complexity of the proteome caused by Ds1 exonization. Compared with Ds element, the Ds1 harbors more possibilities for proteomes enrichment by combining the splice donors and acceptors for exonization. Therefore, the contribution of Ds1 insertion to evolution may enrich the genome by incorporating the intronic sequences of the inserted genes into the exonized transcripts orthologs.
Data sources and exonized transcript construction
Arabidopsis and rice chromosome Genbank data and whole-genome sequences were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html), and the amino acid coding region (CDS) for each gene was extracted. For rice, every gene has only one CDS record. However, for Arabidopsis, some genes have multiple CDS records, so we used only the first CDS record to avoid redundancy. Exonization was defined as an event in which a transcript variant was created with insertion of a TE in the intronic sequence of a gene. Therefore, we considered only genes that were completely sequenced and had at least 1 intron.
The construction of the exonized transcripts involved use of R (R Development Core Team, 2008). For each gene, we used a three-step procedure for every intron. Let a target gene, G, have I introns (and, of course, I + 1 exons), with the i th intron of length n i .
First, the sequence of the 512 bp Ds1 was inserted in a forward or reverse direction after the j th nt (j = 0, …, n i ) of the i th intron of G. The insertion we describe here means literally to insert the letters of Ds1 after the j th nucleotide. This insertion was equivalent to a biological event of Ds1 inserted at 8 bp before the assigned position. Biologically, the insertion of Ds1 causes the duplication of 8 bp of G right after the insertion position, and the sequence of the Ds1 starts at the 9th nt after the insertion position.
Second, we obtained all exonized sequences by recognizing appropriate splice donor/acceptor sites. From our previous observations, Ds1 provides 3 donors (maximal 35 nucleotides from the donors-end) and 2 acceptors (24 nucleotides from the acceptors-end for the subsequent transcripts). This yielded 11 transcript isoforms for each Ds1 insertion.
Finally, the exonized transcripts were constructed by joining the sequences. When Ds1 only provides a donor, the exonized transcript combines the 1st to the i th exons, the first j nt of the i th intron, the Ds1 sequences until the junction site, and the sequences of the (i + 1)th to the (I + 1)th exons. When Ds1 only provides an acceptor, the exonized transcript combines the 1st to the i th exons, the Ds1 sequences until the junction site, the i th intron starting from the (j + 1) nt to the end, and the sequences of the (i + 1)th to the (I + 1)th exons. When Ds1 provides both a donor and an acceptor, the exonized transcript combines the 1st to the i th exons, the first j nt of the i th intron, the Ds1 sequences until the junction site, the 8 nt upstream sequence, the i th intron starting from the (j + 1) nt to the end, and the sequences of the (i + 1)th to the (I + 1)th exons. Note the 8 nt upstream sequence contains the last 8 nt of the joined sequence of the 1st to the ith exons and the first j nt of the ith i ntron.
Analysis of exonized transcript variants and prediction of isoforms
All exonized transcripts were assigned for open reading frame (ORF) analysis starting at the original start codon and terminating at the first in-frame stop codon. The transcripts were designated type I, II, III, or IV, if the in-frame stop codon occurred at the conserved region in the original splice junction, the intron inserted by Ds1, the Ds1, or any exon after Ds1 insertion, respectively. If no in-frame stop codon was found during ORF analysis, the corresponding transcript was designated type V, and the incomplete transcript without a stop codon was output directly. All transcripts containing a termination codon more than 55 nt upstream of the last exon/exon junction were considered putative targets for the NMD pathway (Chang et al., 2007; Hori and Watanabe, 2007) and were excluded from isoform prediction.
The proteins for transcripts not targeted to the NMD pathway were further classified into 2 subtypes: an interior isoform if the termination codon was the same as the reference transcript (the transcript without Ds1 insertion); otherwise, a C-terminal isoform. For an interior variant, the number of additional peptides inserted in the middle was recorded. For a C-terminal variant, its similarity to the corresponding reference protein was defined by the fraction of the number of peptides in the 2 sequences being identical to the total length of the reference protein.
More than half of Ds1 exonized transcripts undergo the NMD pathway or yield truncated protein isoforms without a TE genetic message.
Previous study had revealed that an exonic Ds1 can provide splice donor as well as acceptor sites for alternative splicing (Wessler, 1991). A genome-wide computational analysis to simulate all possible Ds1 exonized transcripts was performed accordingly in each intron of rice and Arabidopsis genes and yielded 422,960,068 and 196,284,528 exonized transcripts, respectively (Additional file 1: Table S1 and Table S2). The resulting transcripts in each genome were classified into 5 types by the locations of PTC (Figure 1).
Ds1 or subsequent flanking intron offers the termination codons of the exonized transcripts
The TE itself may contain PTCs upstream of the donor sites or downstream the acceptor sites. This results in the third type of exonization events, whereby the inserted Ds1 offers the in-frame termination codon of the exonized transcripts. According to previous studies, Ds1 provides 3 donors (maximal 35 nucleotides from the donors-end) and 2 acceptors (24 nucleotides from the acceptors-end for the subsequent transcripts). This yielded 11 transcript isoforms for each Ds1 insertion. For type III transcripts in rice, when Ds1 provides one splice site alone, i.e. either donor (3D1-3D3) or acceptor (3A1 and 3A2), 3D1-3D3 account for about 2.5% while 2As’ account for about 13.6%. About 23.5% of the total simulated transcripts are type III, including 18.0% that are targets of NMD (Figure 2). Even though the other 5.5% are non-NMD targets, they would yield the protein isoforms with up to 59 bp of Ds1 messages. As shown in Figure 1a, Ds1 begins with 2 discontinuous but in-framed PTCs. About 45.9% of the total type III transcripts were determined by the first PTC of Ds1. This kind of transcripts will yield truncated protein isoforms without any genetic message of Ds1. For the other 54.1% of type III transcripts, the PTCs locate on the inserted intron sequences downstream the Ds1.
With type IV Ds1 exonization, PTCs locate in exons downstream the Ds1
The above results as well as our previous studies indicated that the major potential protein isoforms caused by Ds1 exonization depend on the location of termination codons downstream the Ds1 transposon. Type IV involves events in which PTCs in the original transcripts become in-frame in the simulated transcripts. Ds1 inserted in introns would yield 11 different patterns of type IV transcripts of a single Ds1 insertion site. In order to clarify the effect of Ds1 exonization in type IV transcripts, we divided the outcomes into 3 groups: group D providing donor alone, group A providing acceptor alone; and group DA providing a donor and an acceptor. For type IV transcripts, the outcomes were termed 4D1, 4D2, 4D3 (when using donors only for exonization); 4A1, 4A2 (when using acceptor only for exonization); and D1A1, D1A2, D2A1, D2A2, D3A1 and D3A2. Among all type IV transcripts, about 51.4% were non-NMD targets and would yield protein isoforms. The translated products of transcripts D1A1, D1A2, D2A1, D2A2, D3A1 and D3A2 will retain the genetic messages of the intron which Ds1 inserted in. Type V exonization events indicate simulated transcripts harboring no in-frame termination codon until the end of the target gene. In rice, these occur with low frequencies: less than 0.2% (Figure 2). Exonized transcripts of Type V may enhance the functional protein isoforms of the Ds1-inserted genes.
Characterization of protein isoforms created by Ds1 exonized transcripts
To analyze the homology to the reference proteins, C-terminal variants were further graded by the proportion of the amino acids in the reference sequences containing in the isoform, which presented as <25% (Very low), 25%-50% (Low), 50%-75% (Medium) and >75% (High). In general, the proportions of H and M variants were more than 40% and 20%, respectively.
Our previous studies suggested that the main contribution of Ds exonization to gene diversity is, based on a single transposition event, providing different splice donors for different reading frames rather than providing genetic messages (Huang et al., 2012). According to this, we performed a genome-wide computational approach to assess the role of Ds exonization in plants. We simulated a bulk of Ds exonized transcript orthologs from the dicot Arabidopsis thaliana and the monocot Oryza sativa (Liu and Charng, 2012). In addition, the protein isoforms are classified as C-terminal or interior variants to reveal the possible complexity of the proteome caused by Ds exonization. However, its extend is limited by the fact that Ds is biased towards providing donors for exonization. Blasting analyses indicate many other members of Ac/Ds family, e.g. Ds2 (D1-6) transposons, contain the same splice consensuses. However, Ds1 transposon shows a different pattern, which has been reported to offer 3 splice donors and 2 acceptors when it exists in an exon. Therefore, it is expected that Ac/Ds family can enrich the plant proteome with two forms, one by Ds members and the other by Ds1 members. This aspect encourages us to reveal the effects of the exonization events which occurred by Ds1, i.e. either combined usage of one donor and one acceptor or single usage of one donor or one acceptor. In this report, genome-wide exonized transcript orthologs with Ds1 insertion from rice and Arabidopsis were simulated to study their impact on proteome complexity in plants. Unlike Ds, whose both forward and reverse forms contain splice sites (Huang et al., 2012), all splice sites of Ds1 were observed in reverse pattern. Therefore, all simulated transcripts were created by presuming that Ds1 inserts in introns in the reverse pattern. According to our previous report, we set an equal probability of the Ds1 exonization in each position to simulate all possible exonized transcripts for assessing the extent to which it leads to proteome expansion. These yielded 422,960,068 transcripts from rice and 196,284,528 from Arabidopsis for further analysis. Previously, we studied the exonization effect of a single inserted site by Ds providing 5 splice donor sites (1 for forward and 4 for reverse pattern), which may result in 5 exonized transcripts. Contrarily, Ds1 provide 3 donors and 2 acceptors, which resulting 11 additional transcript isoforms by a single insertion event. Therefore, Ds1 exonization in plants may yield more than 2-fold number of transcripts isoforms than Ds exonization does. Also, Ds1 differs to Ds at position 20 leads to abortion of another PTC (Figure 1), which will greatly decrease the proportion of the NMD pathway of the exonized transcripts (see below). Thus, the effects of Ds1 exonization harbor more extent than Ds exonization. To study whether part of Ds1 sequence may be found out in the plant full length cDNA data, we performed a BLAST search of Ds1 sequences by running megablast (BLASTN 2.2.27+) in the NCBI nr nucleotide database for vascular plants (txid58023). There were 63 matched transcripts resulting from the exonization events either by providing donor/acceptor alone or both (Additional file 1: Table S3).
In conclusion, the main contribution of Ds1 and Ds exonization for gene divergence is not providing genetic messages but incorporating the intron sequences with different reading frame patterns to enrich the plant proteome. All these simulation results direct new experimental analysis at the molecular level.
Premature termination codon
This project was supported by the National Science Council of Taiwan.
- Baek J-M, Han P, Iandolino A, Cook D: Characterization and comparison of intron structure and alternative splicing between Medicago truncatula , Populus trichocarpa, Arabidopsis and rice. Plant Mol Biol 2008, 67: 499–510. 10.1007/s11103-008-9334-4View ArticlePubMedGoogle Scholar
- Campbell M, Haas B, Hamilton J, Mount S, Buell CR: Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics 2006, 7: 327. 10.1186/1471-2164-7-327PubMed CentralView ArticlePubMedGoogle Scholar
- Chang Y-F, Imam JS, Wilkinson MF: The Nonsense-Mediated Decay RNA Surveillance Pathway. Annu Rev Biochem 2007, 76: 51–74. 10.1146/annurev.biochem.76.050106.093909View ArticlePubMedGoogle Scholar
- Charng Y-C, Li K-T, Tai H-K, Lin N-S, Tu J: An inducible transposon system to terminate the function of a selectable marker in transgenic plants. Mol Breeding 2008, 21: 359–368. 10.1007/s11032-007-9137-3View ArticleGoogle Scholar
- Development Core Team R: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2008.Google Scholar
- Feschotte C: Transposable elements and the evolution of regulatory networks. Nat Rev Genet 2008, 9: 397–405. 10.1038/nrg2337PubMed CentralView ArticlePubMedGoogle Scholar
- Hori K, Watanabe Y: Context Analysis of Termination Codons in mRNA that are Recognized by Plant NMD. Plant Cell Physiol 2007, 48: 1072–1078. 10.1093/pcp/pcm075View ArticlePubMedGoogle Scholar
- Huang K-C, Yang H-C, Li K-T, Liu L-Y, Charng Y-C: Ds transposon is biased towards providing splice donor sites for exonization in transgenic tobacco. Plant Mol Biol 2012, 79: 509–519. 10.1007/s11103-012-9927-9View ArticlePubMedGoogle Scholar
- Kim E, Magen A, Ast G: Different levels of alternative splicing among eukaryotes. Nucleic Acids Res 2007, 35: 125–131. 10.1093/nar/gkm529PubMed CentralView ArticlePubMedGoogle Scholar
- Krull M, Petrusma M, Makalowski W, Brosius J, Schmitz J: Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs). Genome Res 2007, 17: 1139–1145. 10.1101/gr.6320607PubMed CentralView ArticlePubMedGoogle Scholar
- Lev-Maor G, Sorek R, Levanon E, Paz N, Eisenberg E, Ast G: RNA-editing-mediated exon evolution. Genome Biol 2007, 8: R29. 10.1186/gb-2007-8-2-r29PubMed CentralView ArticlePubMedGoogle Scholar
- Levy A, Sela N, Ast G: TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates. Nucleic Acids Res 2008, 36: D47-D52.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu L-YD, Charng Y-C: Genome-Wide Survey of Ds Exonization to Enrich Transcriptomes and Proteomes in Plants. Evol. Bioinform 2012, 8: 575.Google Scholar
- Liu J-W, Chandra D, Tang S-H, Chopra D, Tang DG: Identification and Characterization of Bimgamma, a Novel Proapoptotic BH3-only Splice Variant of Bim. Cancer Res 2002, 62: 2976–2981.PubMedGoogle Scholar
- Mersch B, Sela N, Ast G, Suhai S, Hotz-Wagenblatt A: SERpredict: Detection of tissue- or tumor-specific isoforms generated through exonization of transposable elements. BMC Genet 2007, 8: 78.PubMed CentralView ArticlePubMedGoogle Scholar
- Mola G, Vela E, Fernández-Figueras MT, Isamat M, Muñoz-Mármol AM: Exonization of Alu-generated Splice Variants in the Survivin Gene of Human and Non-human Primates. J Mol Biol 2007, 366: 1055–1063. 10.1016/j.jmb.2006.11.089View ArticlePubMedGoogle Scholar
- Ram O, Schwartz S, Ast G: Multifactorial Interplay Controls the Splicing Profile of Alu-Derived Exons. Mol Cell Biol 2008, 28: 3513–3525. 10.1128/MCB.02279-07PubMed CentralView ArticlePubMedGoogle Scholar
- Schmitz JR, Brosius JR: Exonization of transposed elements: A challenge and opportunity for evolution. Biochimie 2011, 93: 1928–1934. 10.1016/j.biochi.2011.07.014View ArticlePubMedGoogle Scholar
- Schuler MA: Splice Site Requirements and Switches. In Plants Nuclear pre-mRNA Processing in Plants. Edited by: Reddy ASN, Golovkin M. Berlin Heidelberg: Springer; 2008:39–59.View ArticleGoogle Scholar
- Sela N, Mersch B, Gal-Mark N, Lev-Maor G, Hotz-Wagenblatt A, Ast G: Comparative analysis of transposed element insertion within human and mouse genomes reveals Alu’s unique role in shaping the human transcriptome. Genome Biol 2007, 8: R127. 10.1186/gb-2007-8-6-r127PubMed CentralView ArticlePubMedGoogle Scholar
- Sela N, Kim E, Ast G: The role of transposable elements in the evolution of non-mammalian vertebrates and invertebrates. Genome Biol 2010, 11: R59. 10.1186/gb-2010-11-6-r59PubMed CentralView ArticlePubMedGoogle Scholar
- Severing E, van Dijk A, Stiekema W, van Ham R: Comparative analysis indicates that alternative splicing in plants has a limited role in functional expansion of the proteome. BMC Genomics 2009, 10: 154. 10.1186/1471-2164-10-154PubMed CentralView ArticlePubMedGoogle Scholar
- Sorek R, Lev-Maor G, Reznik M, Dagan T, Belinky F, Graur D, Ast G: Minimal Conditions for Exonization of Intronic Sequences: 5’ Splice Site Formation in Alu Exons. Mol Cell 2004, 14: 221–231. 10.1016/S1097-2765(04)00181-9View ArticlePubMedGoogle Scholar
- Wang B-B, Brendel V: Genomewide Comparative Analysis of Alternative Splicing in Plants. Proc Natl Acad Sci U S A 2006, 103: 7175–7180. 10.1073/pnas.0602039103PubMed CentralView ArticlePubMedGoogle Scholar
- Wessler SR: The maize transposable Ds1 element is alternatively spliced from exon sequences. Mol Cell Biol 1991, 11: 6192–6196.PubMed CentralPubMedGoogle Scholar
- Wu M, Li L, Sun Z: Transposable element fragments in protein-coding regions and their contributions to human functional proteins. Gene 2007, 401: 165–171. 10.1016/j.gene.2007.07.012View ArticlePubMedGoogle Scholar
- Yi P, Zhang W, Zhai Z, Miao L, Wang Y, Wu M: Bcl-rambo beta, a special splicing variant with an insertion of an Alu-like cassette, promotes etoposide- and Taxol-induced cell death. FEBS Lett 2003, 534: 61–68. 10.1016/S0014-5793(02)03778-XView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.