ResearchPad - exon-mapping https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Differences in splicing defects between the grey and white matter in myotonic dystrophy type 1 patients]]> https://www.researchpad.co/article/elastic_article_14627 Myotonic dystrophy type 1 (DM1) is a multi-system disorder caused by CTG repeats in the myotonic dystrophy protein kinase (DMPK) gene. This leads to the sequestration of splicing factors such as muscleblind-like 1/2 (MBNL1/2) and aberrant splicing in the central nervous system. We investigated the splicing patterns of MBNL1/2 and genes controlled by MBNL2 in several regions of the brain and between the grey matter (GM) and white matter (WM) in DM1 patients using RT-PCR. Compared with amyotrophic lateral sclerosis (ALS, as disease controls), the percentage of spliced-in parameter (PSI) for most of the examined exons were significantly altered in most of the brain regions of DM1 patients, except for the cerebellum. The splicing of many genes was differently regulated between the GM and WM in both DM1 and ALS. In 7 out of the 15 examined splicing events, the level of PSI change between DM1 and ALS was significantly higher in the GM than in the WM. The differences in alternative splicing between the GM and WM may be related to the effect of DM1 on the WM of the brain.

]]>
<![CDATA[Variants encoding a restricted carboxy-terminal domain of SLC12A2 cause hereditary hearing loss in humans]]> https://www.researchpad.co/article/Nd1837fa5-7737-42fc-aa07-ce2092d99c03

Hereditary hearing loss is challenging to diagnose because of the heterogeneity of the causative genes. Further, some genes involved in hereditary hearing loss have yet to be identified. Using whole-exome analysis of three families with congenital, severe-to-profound hearing loss, we identified a missense variant of SLC12A2 in five affected members of one family showing a dominant inheritance mode, along with de novo splice-site and missense variants of SLC12A2 in two sporadic cases, as promising candidates associated with hearing loss. Furthermore, we detected another de novo missense variant of SLC12A2 in a sporadic case. SLC12A2 encodes Na+, K+, 2Cl cotransporter (NKCC) 1 and plays critical roles in the homeostasis of K+-enriched endolymph. Slc12a2-deficient mice have congenital, profound deafness; however, no human variant of SLC12A2 has been reported as associated with hearing loss. All identified SLC12A2 variants mapped to exon 21 or its 3’-splice site. In vitro analysis indicated that the splice-site variant generates an exon 21-skipped SLC12A2 mRNA transcript expressed at much lower levels than the exon 21-included transcript in the cochlea, suggesting a tissue-specific role for the exon 21-encoded region in the carboy-terminal domain. In vitro functional analysis demonstrated that Cl influx was significantly decreased in all SLC12A2 variants studied. Immunohistochemistry revealed that SLC12A2 is located on the plasma membrane of several types of cells in the cochlea, including the strial marginal cells, which are critical for endolymph homeostasis. Overall, this study suggests that variants affecting exon 21 of the SLC12A2 transcript are responsible for hereditary hearing loss in humans.

]]>
<![CDATA[Modeling the structural implications of an alternatively spliced Exoc3l2, a paralog of the tunneling nanotube-forming M-Sec]]> https://www.researchpad.co/article/5b8687db40307c73f6bbfec3

The exocyst is a molecular tether that retains secretory vesicles at the plasma membrane prior to SNARE-mediated docking and fusion. However, individual exocyst complex components (EXOCs) may also function independently of exocyst assembly. Alternative splice variants of EXOC mRNA and paralogs of EXOC genes have been described and several have been attributed functions that may be independent of the exocyst complex. Here we describe a novel splice variant of murine Exoc3l2, which we term Exoc3l2a. We discuss possible functional implications of the resulting domain excision from this isoform of EXOC3L2 based on structural similarities with its paralog M-Sec (EXOC3L3), which is implicated in tunneling nanotube formation. The identification of this Exoc3l2 splice variant expands the potential for subunit diversity within the exocyst and for alternative functionality of this component independently of the exocyst.

]]>
<![CDATA[The host ubiquitin-dependent segregase VCP/p97 is required for the onset of human cytomegalovirus replication]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be0162

The human cytomegalovirus major immediate early proteins IE1 and IE2 are critical drivers of virus replication and are considered pivotal in determining the balance between productive and latent infection. IE1 and IE2 are derived from the same primary transcript by alternative splicing and regulation of their expression likely involves a complex interplay between cellular and viral factors. Here we show that knockdown of the host ubiquitin-dependent segregase VCP/p97, results in loss of IE2 expression, subsequent suppression of early and late gene expression and, ultimately, failure in virus replication. RNAseq analysis showed increased levels of IE1 splicing, with a corresponding decrease in IE2 splicing following VCP knockdown. Global analysis of viral transcription showed the expression of a subset of viral genes is not reduced despite the loss of IE2 expression, including UL112/113. Furthermore, Immunofluorescence studies demonstrated that VCP strongly colocalised with the viral replication compartments in the nucleus. Finally, we show that NMS-873, a small molecule inhibitor of VCP, is a potent HCMV antiviral with potential as a novel host targeting therapeutic for HCMV infection.

]]>
<![CDATA[Computational Identification of Tissue-Specific Splicing Regulatory Elements in Human Genes from RNA-Seq Data]]> https://www.researchpad.co/article/5989d9feab0ee8fa60b73345

Alternative splicing is a vital process for regulating gene expression and promoting proteomic diversity. It plays a key role in tissue-specific expressed genes. This specificity is mainly regulated by splicing factors that bind to specific sequences called splicing regulatory elements (SREs). Here, we report a genome-wide analysis to study alternative splicing on multiple tissues, including brain, heart, liver, and muscle. We propose a pipeline to identify differential exons across tissues and hence tissue-specific SREs. In our pipeline, we utilize the DEXSeq package along with our previously reported algorithms. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, we identified 28,100 differentially used exons across the four tissues. We identified tissue-specific exonic splicing enhancers that overlap with various previously published experimental and computational databases. A complicated exonic enhancer regulatory network was revealed, where multiple exonic enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the exonic enhancers are found to be co-occurring with multiple exonic silencers and vice versa, which demonstrates a complicated relationship between tissue-specific exonic enhancers and silencers.

]]>
<![CDATA[Functional classification of DNA variants by hybrid minigenes: Identification of 30 spliceogenic variants of BRCA2 exons 17 and 18]]> https://www.researchpad.co/article/5989db53ab0ee8fa60bdceb1

Mutation screening of the breast cancer genes BRCA1 and BRCA2 identifies a large fraction of variants of uncertain clinical significance (VUS) whose functional and clinical interpretations pose a challenge for genomic medicine. Likewise, an increasing amount of evidence indicates that genetic variants can have deleterious effects on pre-mRNA splicing. Our goal was to investigate the impact on splicing of a set of reported variants of BRCA2 exons 17 and 18 to assess their role in hereditary breast cancer and to identify critical regulatory elements that may constitute hotspots for spliceogenic variants. A splicing reporter minigene with BRCA2 exons 14 to-20 (MGBR2_ex14-20) was constructed in the pSAD vector. Fifty-two candidate variants were selected with splicing prediction programs, introduced in MGBR2_ex14-20 by site-directed mutagenesis and assayed in triplicate in MCF-7 cells. Wild type MGBR2_ex14-20 produced a stable transcript of the expected size (1,806 nucleotides) and structure (V1-[BRCA2_exons_14–20]–V2). Functional mapping by microdeletions revealed essential sequences for exon recognition on the 3’ end of exon 17 (c.7944-7973) and the 5’ end of exon 18 (c.7979-7988, c.7999-8013). Thirty out of the 52 selected variants induced anomalous splicing in minigene assays with >16 different aberrant transcripts, where exon skipping was the most common event. A wide range of splicing motifs were affected including the canonical splice sites (15 variants), novel alternative sites (3 variants), the polypyrimidine tract (3 variants) and enhancers/silencers (9 variants). According to the guidelines of the American College of Medical Genetics and Genomics (ACMG), 20 variants could be classified as pathogenic (c.7806-2A>G, c.7806-1G>A, c.7806-1G>T, c.7806-1_7806-2dup, c.7976+1G>A, c.7977-3_7978del, c.7977-2A>T, c.7977-1G>T, c.7977-1G>C, c.8009C>A, c.8331+1G>T and c.8331+2T>C) or likely pathogenic (c.7806-9T>G, c.7976G>C, c.7976G>A, c.7977-7C>G, c.7985C>G, c.8023A>G, c.8035G>T and c.8331G>A), accounting for 30.8% of all pathogenic/likely pathogenic variants of exons 17–18 at the BRCA Share database. The remaining 8 variants (c.7975A>G, c.7977-6T>G, c.7988A>T, c.7992T>A, c.8007A>G, c.8009C>T, c.8009C>G, and c.8072C>T) induced partial splicing anomalies with important ratios of the full-length transcript (≥70%), so that they remained classified as VUS. Aberrant splicing is therefore especially prevalent in BRCA2 exons 17 and 18 due to the presence of active ESEs involved in exon recognition. Splicing functional assays with minigenes are a valuable strategy for the initial characterization of the splicing outcomes and the subsequent clinical interpretation of variants of any disease-gene, although these results should be checked, whenever possible, against patient RNA.

]]>
<![CDATA[The identification of switch-like alternative splicing exons among multiple samples with RNA-Seq data]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be006d

Alternative splicing is an ubiquitous phenomenon in most human genes and has important functions. The switch-like exon is the type of exon that has a high level of usage in some tissues, but has a low level of usage in the other tissues. They usually undergo strong tissue-specific regulations. There is still a lack a systematic method to identify switch-like exons from multiple RNA-seq samples. We proposed a novel method called iterative Tertile Absolute Deviation around the mode (iTAD) to profile the distribution of exon relative usages among multiple samples and to identify switch-like exons and other types of exons using a robust statistic estimator. We validated the method with simulation data, and applied it on RNA-seq data of 16 human body tissues and detected 3,100 switch-like exons. We found that switch-like exons tend to be more associated with Alu elements in their flanking intron regions than other types of exons.

]]>
<![CDATA[Altered Expression of Human Smooth Muscle Myosin Phosphatase Targeting (MYPT) Isovariants with Pregnancy and Labor]]> https://www.researchpad.co/article/5989da97ab0ee8fa60ba2594

Background

Myosin light-chain phosphatase is a trimeric protein that hydrolyses phosphorylated myosin II light chains (MYLII) to cause relaxation in smooth muscle cells including those of the uterus. A major component of the phosphatase is the myosin targeting subunit (MYPT), which directs a catalytic subunit to dephosphorylate MYLII. There are 5 main MYPT family members (MYPT1 (PPP1R12A), MYPT2 (PPP1R12B), MYPT3 (PPP1R16A), myosin binding subunit 85 MBS85 (PPP1R12C) and TIMAP (TGF-beta-inhibited membrane-associated protein (PPP1R16B)). Nitric oxide (NO)-mediated smooth muscle relaxation has in part been attributed to activation of the phosphatase by PKG binding to a leucine zipper (LZ) dimerization domain located at the carboxyl-terminus of PPP1R12A. In animal studies, alternative splicing of PPP1R12A can lead to the inclusion of a 31-nucleotide exonic segment that generates a LZ negative (LZ-) isovariant rendering the phosphatase less sensitive to NO vasodilators and alterations in PPP1R12ALZ- and LZ+ expression have been linked to phenotypic changes in smooth muscle function. Moreover, PPP1R12B and PPP1R12C, but not PPP1R16A or PPP1R16B, have the potential for LZ+/LZ- alternative splicing. Yet, by comparison to animal studies, the information on human MYPT genomic sequences/mRNA expressions is scant. As uterine smooth muscle undergoes substantial remodeling during pregnancy we were interested in establishing the patterns of expression of human MYPT isovariants during this process and also following labor onset as this could have important implications for determining successful pregnancy outcome.

Objectives

We used cross-species genome alignment, to infer putative human sequences not available in the public domain, and isovariant-specific quantitative PCR, to analyse the expression of mRNA encoding putative LZ+ and LZ- forms of PPP1R12A, PPP1R12B and PPP1R12C as well as canonical PPP1R16A and PPP1R16B genes in human uterine smooth muscle from non-pregnant, pregnant and in-labor donors.

Results

We found a reduction in the expression of PPP1R12A, PPP1R12BLZ+, PPP1R16A and PPP1R16B mRNA in late pregnancy (not-in-labor) relative to non-pregnancy. PPP1R12ALZ+ and PPP1R12ALZ- mRNA levels were similar in the non-pregnant and pregnant not in labor groups. There was a further reduction in the uterine expression of PPP1R12ALZ+, PPP1R12CLZ+ and PPP1R12ALZ- mRNA with labor relative to the pregnant not-in-labor group. PPP1R12A, PPP1R12BLZ+, PPP1R16A and PPP1R16B mRNA levels were invariant between the not in labor and in-labor groups.

Conclusions

MYPT proteins are crucial determinants of smooth muscle function. Therefore, these alterations in human uterine smooth muscle MYPT isovariant expression during pregnancy and labor may be part of the important molecular physiological transition between uterine quiescence and activation.

]]>
<![CDATA[Deep RNA sequencing reveals the smallest known mitochondrial micro exon in animals: The placozoan cox1 single base pair exon]]> https://www.researchpad.co/article/5989db5cab0ee8fa60bdfea1

The phylum Placozoa holds a key position for our understanding of the evolution of mitochondrial genomes in Metazoa. Placozoans possess large mitochondrial genomes which harbor several remarkable characteristics such as a fragmented cox1 gene and trans-splicing cox1 introns. A previous study also suggested the existence of cox1 mRNA editing in Trichoplax adhaerens, yet the only formally described species in the phylum Placozoa. We have analyzed RNA-seq data of the undescribed sister species, Placozoa sp. H2 (“Panama” clone), with special focus on the mitochondrial mRNA. While we did not find support for a previously postulated cox1 mRNA editing mechanism, we surprisingly found two independent transcripts representing intermediate cox1 mRNA splicing stages. Both transcripts consist of partial cox1 exon as well as overlapping intron fragments. The data suggest that the cox1 gene harbors a single base pair (cytosine) micro exon. Furthermore, conserved group I intron structures flank this unique micro exon also in other placozoans. We discuss the evolutionary origin of this micro exon in the context of a self-splicing intron gain in the cox1 gene of the last common ancestor of extant placozoans.

]]>
<![CDATA[MAP3K19 Is a Novel Regulator of TGF-β Signaling That Impacts Bleomycin-Induced Lung Injury and Pulmonary Fibrosis]]> https://www.researchpad.co/article/5989da3fab0ee8fa60b894ee

Idiopathic pulmonary fibrosis (IPF) is a progressive, debilitating disease for which two medications, pirfenidone and nintedanib, have only recently been approved for treatment. The cytokine TGF-β has been shown to be a central mediator in the disease process. We investigated the role of a novel kinase, MAP3K19, upregulated in IPF tissue, in TGF-β-induced signal transduction and in bleomycin-induced pulmonary fibrosis. MAP3K19 has a very limited tissue expression, restricted primarily to the lungs and trachea. In pulmonary tissue, expression was predominantly localized to alveolar and interstitial macrophages, bronchial epithelial cells and type II pneumocytes of the epithelium. MAP3K19 was also found to be overexpressed in bronchoalveolar lavage macrophages from IPF patients compared to normal patients. Treatment of A549 or THP-1 cells with either MAP3K19 siRNA or a highly potent and specific inhibitor reduced phospho-Smad2 & 3 nuclear translocation following TGF-β stimulation. TGF-β-induced gene transcription was also strongly inhibited by both the MAP3K19 inhibitor and nintedanib, whereas pirfenidone had a much less pronounced effect. In combination, the MAP3K19 inhibitor appeared to act synergistically with either pirfenidone or nintedanib, at the level of target gene transcription or protein production. Finally, in an animal model of IPF, inhibition of MAP3K19 strongly attenuated bleomycin-induced pulmonary fibrosis when administered either prophylactically ortherapeutically. In summary, these results strongly suggest that inhibition of MAP3K19 may have a beneficial therapeutic effect in the treatment of IPF and represents a novel strategy to target this disease.

]]>
<![CDATA[Association of IFIH1 and pro-inflammatory mediators: Potential new clues in SLE-associated pathogenesis]]> https://www.researchpad.co/article/5989db4fab0ee8fa60bdbc8a

Antiviral defenses are inappropriately activated in systemic lupus erythematosus (SLE) and association between SLE and the antiviral helicase gene, IFIH1, is well established. We sought to extend the previously reported association of pathogenic soluble mediators and autoantibodies with mouse Mda5 to its human ortholog, IFIH1. To better understand the role this gene plays in human lupus, we assessed association of IFIH1 variants with soluble mediators and autoantibodies in 357 European-American SLE patients, first-degree relatives, and unrelated, unaffected healthy controls. Association between each of 135 genotyped SNPs in IFIH1 and four lupus-associated plasma mediators, IL-6, TNF-α, IFN-β, and IP-10, were investigated via linear regression. No significant associations were found to SNPs orthologous to those identified in exon 13 of the mouse. However, outside of this region there were significant associations between IL-6 and rs76162067 (p = 0.008), as well as IP-10 and rs79711023 (p = 0.003), located in a region of IFIH1 previously shown to directly influence MDA-5 mediated IP-10 and IL-6 secretion. SLE patients and FDRs carrying the minor allele for rs79711023 demonstrated lower levels of IP-10, while only FDRs carrying the minor allele for rs76162067 demonstrated an increased level of IL-6. This would suggest that the change in IP-10 is genotypically driven, while the change in IL-6 may be reflective of SLE transition status. These data suggest that IFIH1 may contribute to SLE pathogenesis via altered inflammatory mechanisms.

]]>
<![CDATA[FCGR2C Polymorphisms Associated with HIV-1 Vaccine Protection Are Linked to Altered Gene Expression of Fc-γ Receptors in Human B Cells]]> https://www.researchpad.co/article/5989dae7ab0ee8fa60bbdfe5

The phase III Thai RV144 vaccine trial showed an estimated vaccine efficacy (VE) to prevent HIV-1 infection of 31.2%, which has motivated the search for immune correlates of vaccine protection. In a recent report, several single nucleotide polymorphisms (SNPs) in FCGR2C were identified to associate with the level of VE in the RV144 trial. To investigate the functional significance of these SNPs, we utilized a large scale B cell RNA sequencing database of 462 individuals from the 1000 Genomes Project to examine associations between FCGR2C SNPs and gene expression. We found that the FCGR2C SNPs that associated with vaccine efficacy in RV144 also strongly associated with the expression of FCGR2A/C and one of them also associated with the expression of Fc receptor-like A (FCRLA), another Fc-γ receptor (FcγR) gene that was not examined in the previous report. These results suggest that the expression of FcγR genes is influenced by these SNPs either directly or in linkage with other causal variants. More importantly, these results motivate further investigations into the potential for a causal association of expression and alternative splicing of FCGR2C and other FcγR genes with the HIV-1 vaccine protection in the RV144 trial and other similar studies.

]]>
<![CDATA[Mutation Frequency of the Major Frontotemporal Dementia Genes, MAPT, GRN and C9ORF72 in a Turkish Cohort of Dementia Patients]]> https://www.researchpad.co/article/5989daa5ab0ee8fa60ba73bf

‘Microtubule-associated protein tau’ (MAPT), ‘granulin’ (GRN) and ‘chromosome 9 open reading frame72’ (C9ORF72) gene mutations are the major known genetic causes of frontotemporal dementia (FTD). Recent studies suggest that mutations in these genes may also be associated with other forms of dementia. Therefore we investigated whether MAPT, GRN and C9ORF72 gene mutations are major contributors to dementia in a random, unselected Turkish cohort of dementia patients. A combination of whole-exome sequencing, Sanger sequencing and fragment analysis/Southern blot was performed in order to identify pathogenic mutations and novel variants in these genes as well as other FTD-related genes such as the ‘charged multivesicular body protein 2B’ (CHMP2B), the ‘FUS RNA binding protein’ (FUS), the ‘TAR DNA binding protein’ (TARDBP), the ‘sequestosome1’ (SQSTM1), and the ‘valosin containing protein’ (VCP). We determined one pathogenic MAPT mutation (c.1906C>T, p.P636L) and one novel missense variant (c.38A>G, p.D13G). In GRN we identified a probably pathogenic TGAG deletion in the splice donor site of exon 6. Three patients were found to carry the GGGGCC expansions in the non-coding region of the C9ORF72 gene. In summary, a complete screening for mutations in MAPT, GRN and C9ORF72 genes revealed a frequency of 5.4% of pathogenic mutations in a random cohort of 93 Turkish index patients with dementia.

]]>
<![CDATA[Transcriptome Analysis of Targeted Mouse Mutations Reveals the Topography of Local Changes in Gene Expression]]> https://www.researchpad.co/article/5989da64ab0ee8fa60b917db

The unintended consequences of gene targeting in mouse models have not been thoroughly studied and a more systematic analysis is needed to understand the frequency and characteristics of off-target effects. Using RNA-seq, we evaluated targeted and neighboring gene expression in tissues from 44 homozygous mutants compared with C57BL/6N control mice. Two allele types were evaluated: 15 targeted trap mutations (TRAP); and 29 deletion alleles (DEL), usually a deletion between the translational start and the 3’ UTR. Both targeting strategies insert a bacterial beta-galactosidase reporter (LacZ) and a neomycin resistance selection cassette. Evaluating transcription of genes in +/- 500 kb of flanking DNA around the targeted gene, we found up-regulated genes more frequently around DEL compared with TRAP alleles, however the frequency of alleles with local down-regulated genes flanking DEL and TRAP targets was similar. Down-regulated genes around both DEL and TRAP targets were found at a higher frequency than expected from a genome-wide survey. However, only around DEL targets were up-regulated genes found with a significantly higher frequency compared with genome-wide sampling. Transcriptome analysis confirms targeting in 97% of DEL alleles, but in only 47% of TRAP alleles probably due to non-functional splice variants, and some splicing around the gene trap. Local effects on gene expression are likely due to a number of factors including compensatory regulation, loss or disruption of intragenic regulatory elements, the exogenous promoter in the neo selection cassette, removal of insulating DNA in the DEL mutants, and local silencing due to disruption of normal chromatin organization or presence of exogenous DNA. An understanding of local position effects is important for understanding and interpreting any phenotype attributed to targeted gene mutations, or to spontaneous indels.

]]>
<![CDATA[Prediction of Poly(A) Sites by Poly(A) Read Mapping]]> https://www.researchpad.co/article/5989db4fab0ee8fa60bdb887

RNA-seq reads containing part of the poly(A) tail of transcripts (denoted as poly(A) reads) provide the most direct evidence for the position of poly(A) sites in the genome. However, due to reduced coverage of poly(A) tails by reads, poly(A) reads are not routinely identified during RNA-seq mapping. Nevertheless, recent studies for several herpesviruses successfully employed mapping of poly(A) reads to identify herpesvirus poly(A) sites using different strategies and customized programs. To more easily allow such analyses without requiring additional programs, we integrated poly(A) read mapping and prediction of poly(A) sites into our RNA-seq mapping program ContextMap 2. The implemented approach essentially generalizes previously used poly(A) read mapping approaches and combines them with the context-based approach of ContextMap 2 to take into account information provided by other reads aligned to the same location. Poly(A) read mapping using ContextMap 2 was evaluated on real-life data from the ENCODE project and compared against a competing approach based on transcriptome assembly (KLEAT). This showed high positive predictive value for our approach, evidenced also by the presence of poly(A) signals, and considerably lower runtime than KLEAT. Although sensitivity is low for both methods, we show that this is in part due to a high extent of spurious results in the gold standard set derived from RNA-PET data. Sensitivity improves for poly(A) sites of known transcripts or determined with a more specific poly(A) sequencing protocol and increases with read coverage on transcript ends. Finally, we illustrate the usefulness of the approach in a high read coverage scenario by a re-analysis of published data for herpes simplex virus 1. Thus, with current trends towards increasing sequencing depth and read length, poly(A) read mapping will prove to be increasingly useful and can now be performed automatically during RNA-seq mapping with ContextMap 2.

]]>
<![CDATA[Prediction and Quantification of Splice Events from RNA-Seq Data]]> https://www.researchpad.co/article/5989da09ab0ee8fa60b76eed

Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package SGSeq.

]]>