The Journals of Gerontology Series A: Biological Sciences and Medical Sciences
Allele-Specific Transcript Abundance: A Pilot Study in Healthy Centenarians
Volume: 75, Issue: 6
DOI 10.1093/gerona/glz188
  • PDF   
  • XML   

The genetic basis of healthy aging and longevity remains largely unexplained. One hypothesis as to why long-lived individuals do not appear to have a lower number of common-complex disease variants, is that despite carrying risk variants, they express disease-linked alleles at a lower level than the wild-type alleles. Allele-specific abundance (ASA) is the different transcript abundance of the two haplotypes of a diploid individual. We sequenced the transcriptomes of four healthy centenarians and four mid-life controls. CIBERSORT was used to estimate blood cell fractions: neutrophils were the most abundant source of RNA, followed by CD8+ T cells, resting NK cells, and monocytes. ASA variants were more common in noncoding than coding regions. Centenarians and controls had a comparable distribution of ASA variants by predicted effect, and we did not observe an overall bias in expression toward major or minor alleles. Immune pathways were most highly represented among the gene set that showed ASA. Although we found evidence of ASA in disease-associated genes and transcription factors, we did not observe any differences in the pattern of expression between centenarians and controls in this small pilot study.

Tindale, Thiessen, Leach, Brooks-Wilson, and Le Couteur: Allele-Specific Transcript Abundance: A Pilot Study in Healthy Centenarians

Several studies have shown that long-lived individuals do not appear to carry a substantially lower number of common-complex disease risk alleles compared with typical individuals (1–3). It may be that despite carrying risk alleles, they express the disease-related variants at a lower level. Allele-specific abundance (ASA) is the different transcript abundance of the two haplotypes of a diploid individual. ASA occurs across the genome in different biological processes and across multiple tissues (4). ASA can be caused by allele-specific alternative splicing, variation in transcriptional start or stop sites, cis- acting regulatory variants, epigenetic differences, such as imprinting, DNA methylation, chromatin state, and differences in mRNA stability (4). ASA has been shown to be under genetic control using lymphoblastoid cell lines where monozygotic twins were found to have a more similar degree of ASA than unrelated controls (5).

Although different tissues have been shown to have important differences in expression, whole blood (WB) analysis has advantages. Peripheral blood is a cost-effective and clinically relevant tissue, it provides an overall view of the body since it comes into contact with almost all organs and tissues (6), and it is useful when looking for biomarkers of biological age (7) and diseases where the tissue of interest is not readily available. WB collected in PAXgene tubes have identified and validated a gene expression signature that distinguished between Alzheimer’s disease (AD) patients and cognitively healthy controls, showing that AD could be detected away from the primary site of the disease (8). Mutations causing expression differences in cardiac-restricted genes for Long QT syndrome, Marfan syndrome, and hypertrophic cardiomyopathy were also identified from WB from PAXgene tubes (9).

In a pilot study of four healthy centenarians and four mid-life controls, we hypothesized that healthy centenarians may exhibit an overall pattern of lower expression of disease-associated alleles. This would effectively allow them to favor expression of nonrisk alleles, despite carrying disease-causing variants. Heterozygous SNPs in the transcribed region of a gene may be used as indicators to determine if one copy of the gene is more highly expressed than the other. We tested whether preferential transcript abundance was observed, and whether it was skewed in centenarians versus controls by examining ASA in disease-associated genes and variants.


The Super-Seniors Study

Super-Seniors are aged 85 years and older with no reported history of cancer, cardiovascular disease (CVD), diabetes, major pulmonary disease, or dementia (10). Four Super-Senior centenarians aged 100–104 years (two males, two females), and four controls aged 50–56 years (two males, two females) who were not selected for health, participated in a transcriptome pilot study. Research ethics board (REB) approval was received from the joint Clinical REB of BC Cancer and the University of British Columbia, and the REB of Simon Fraser University. Participants gave written informed consent.

Sample Preparation

WB was collected in PAXgene tubes, which preserve in vivo transcription levels. RNA was extracted using the PAXgene blood miRNA kit (QIAGEN) and globin depleted using the GLOBINclear kit (Thermo Fisher Scientific, MA). Transcriptome and exome libraries were constructed using the Illumina TruSeq stranded mRNA kit and Agilent Technologies SureSelect v4+UTR kit, respectively.


Transcriptomes were aligned in a two-step process using bwa-mem 0.7.7a (11) and JAGuaR (12) 2.2: (a) reads were aligned to a genomic plus exon–exon junction reference created using hg38_no_alt reference with annotations from ensembl79 and refseq, and (b) reads that aligned to the exon–exon junction sequence were repositioned into standard genomic coordinates to become large-gapped genomic alignments. Exomes were aligned using bwa-mem 0.7.7a. Variants were called using bcftools mpileup.

CIBERSORT (13), an in silico flow cytometry tool, was used to estimate immune cell type abundances from gene expression data. The signature gene file of 22 distinct immune cell types provided by CIBERSORT was used as a reference for gene expression signatures.

Exome sequence calling was used to determine heterozygous SNPs, which were then matched to the transcriptome sequences. Autosomal variants were filtered for a read depth (RD) ≥30 and an alternate allele frequency >0.7 or <0.3.

Variants with ASA were compared by SNPEff predicted effect (14) and by major or minor allele bias as determined from minor allele frequencies reported in dbSNP build 150 (15). Gene lists were entered into QIAGEN Ingenuity® Pathway Analysis (IPA) to determine enriched pathways and physiological functions. To look for disease associations we compared genes showing ASA to lists of publically available disease associations. We looked for evidence of ASA in the 60 genes recommended by ACMG for reporting of incidental findings (16), as well as variants with SNP-trait associations that reached genome-wide significance listed in the NHGRI-EBI GWAS catalog v1.0.1 (17). Variants with ASA were input into SNP2TFBS to map SNPs to transcription factors (TF) (18).


CIBERSORT estimated the mean major cell types to be neutrophils, followed by CD8+ T cells, monocytes, and resting natural killer (NK) cells (Supplementary Figure S1; Supplementary Table S1). Cell type distribution varied between participants. However, a larger sample size would be required to determine if there is a pattern of differences between centenarians and controls.

After transcriptome alignment, there were ~240–320 million mapped reads per sample. Centenarians had a mean of 344,167 (SD = 14,369) transcriptome variants and 130,068 (SD = 10,188) heterozygous exome variants; controls had a mean of 355,220 (SD = 136,358) transcriptome variants and 128,775 (SD = 3,029) heterozygous exome variants. There was a mean of 30,335 (SD = 1,480) matches between the transcriptome and heterozygous exome variants in centenarians and 28,670 (SD = 2,822) matches in controls. After filtering for RD ≥30, there were 16,679 (SD = 1,108) matches in centenarians and 15,000 (SD = 1,967) matches in controls. After filtering for alternate allele frequency >0.7 or <0.3, the final list contained 1,145 (SD = 92) ASA variants in centenarians and 1,048 (SD = 136) ASA variants in controls.

Distribution and Function of Variants With ASA

Variants with ASA were most frequently noncoding: 3′UTR variants were most frequent, followed by intronic variants, downstream gene variants, and then coding missense and synonymous variants (Supplementary Figure S2).

There was no distinguishable difference in the proportion of variants with major allele bias versus minor allele bias when comparing centenarians and controls (Figure 1). Among ASA variants with an allelic ratio ≥0.7:0.3, centenarians had 49.1% of variants skewed to the minor allele, compared with 59.9% in controls. At the more extreme ASA variants with an allelic ratio ≥0.9:0.1, centenarians had 53.3% of variants skewed to the minor allele, compared with 51.2% among controls. There was no difference in the proportion of variants skewed to the minor allele between centenarians and controls for allelic ratio ≥0.7:0.3 (χ 2 = 3.4; p = .065) or for allelic ratio ≥0.9:0.1 (χ 2 = 3.33; p = .068). There was no difference in the proportion in % skewed to the minor allele between the allelic ratio ≥0.7:0.3 and the allelic ratio ≥0.9:0.1 sets (χ 2 = 0.06; p = .806).

(A) Proportion of major and minor allele bias by group. (B) Coding region variants by major and minor allele bias (n = 3,700 coding variants with ASA ≥ 0.7:0.3). (C) Coding region variants by major and minor allele bias with alternate allele frequency >0.9 or <0.1 (n = 360 coding variants with ASA ≥ 0.9:0.1).
Figure 1.
(A) Proportion of major and minor allele bias by group. (B) Coding region variants by major and minor allele bias (n = 3,700 coding variants with ASA ≥ 0.7:0.3). (C) Coding region variants by major and minor allele bias with alternate allele frequency >0.9 or <0.1 (n = 360 coding variants with ASA ≥ 0.9:0.1).

Disease Association of Genes and Variants With ASA

Eighty-two SNPs observed to have ASA ≥0.7:0.3 in either Super-Seniors or controls overlapped with SNPs in the NHGRI-EBI GWAS catalog. Trait associations from the catalog were filtered to keep only associations with exclusion criteria diseases for Super-Seniors. Twenty-one disease-associated SNPs remained. Of these, 9 were skewed to the minor allele and 12 were skewed to the major allele. Seven were skewed toward the disease-associated allele, 9 were skewed away from the disease-associated allele, and 5 did not have a clear disease-associated allele (Table 1). Seven SNPs had ASA in multiple subjects, all of which were skewed in the same direction. Among the set of 21 disease-associated SNPs, there were 20 incidences of ASA in centenarians and 13 in controls.

Table 1.
NHGRI-EBI GWAS Catalog SNPs With Evidence of Allele-Specific Expression ≥0.7:0.3 in Centenarians or Controls
Geners IDAssociated Disease/TraitSkewed to Minor Allele?Skewed to GWAS Disease- Associated Allele?CentsControls
MTX1rs1057941Multiple cancersNoNo10
HIP1rs1167827Body mass indexNon/a10
SREBF1rs11868035Parkinson’s diseaseYesNo01
LYSMD4rs12185079Post bronchodilator FEV1/FVC ratio in COPDYesYes10
SLC25A44rs2072499Testicular germ cell tumorNoNo01
OR52K2-OR52K1rs2278170Amyotrophic lateral sclerosisNon/a20
PRC1rs2290203Breast cancerNoYes10
GSDMBrs2290400Type 1 diabetes, bronchial hyper-responsiveness in asthmaNoNo31
GSDMBrs2305480Asthma, ulcerative colitisNon/a21
TNS1rs2571445Pulmonary functionYesYes11
CASP8rs3769823Basal cell carcinomaNoYes10
TRIM8rs3850699Prostate cancerYesNo01
SFTPDrs721917Chronic obstructive pulmonary diseaseNoNo10
ASAH1rs7508Atrial fibrillationYesYes11
CD226rs763361Type 1 diabetesYesn/a01
C9orf72rs774359Amyotrophic lateral sclerosisYesn/a01
RCCD1rs79548680Type 2 diabetesYesYes21
RNF207rs846111QT intervalNoNo10
SDCCAG8rs953492Diastolic blood pressureYesNo01

There were 2,182 and 2,052 genes where ASA was observed in centenarians or controls, respectively. When the gene list was entered into QIAGEN IPA to determine enriched pathways and physiological functions, the top canonical pathways in centenarians were: antigen presentation; autophagy; crosstalk between dendritic cells and NK cells; CD28 signaling in T helper cells. The top canonical pathways in controls were: antigen presentation; NK cell signaling; Th1 pathway; Th1 and Th2 activation.

Top centenarian genes with ASA were defined as genes for which there was evidence of ASA in four centenarians and only one or no controls, or in three centenarians and no controls; and vice versa for top genes with ASA in controls. There were 35 top genes with ASA in centenarians and 23 top genes with ASA in controls (Supplementary Table S2). In QIAGEN IPA, the top canonical pathways for the top genes with ASA in centenarians were, in order: citrulline-nitric oxide cycle; LXR/RXR activation; IL-12 signaling and production in macrophages; superpathway of citrulline metabolism; production of nitric oxide. The top canonical pathways for the top genes with ASA in controls were: LXR/RXR activation; role of osteoblast, osteoclasts, and chondrocytes in rheumatoid arthritis; IL-10 signaling; CCR5 signaling in macrophages.

Among the 60 “ACMG” genes, we found that centenarians and controls showed ASA ≥0.7:0.3 in 8 and 11 genes, respectively (Table 2). Of note, in LDLR ASA was observed in one centenarian and all four controls; and in TMEM43 ASA was observed in three centenarians and no controls. All other ASA variants in ACMG genes were unique with the exception of two SNPs in PRKAG2, which were each present in two subjects. One variant had a high predicted functional impact: a TTCT>TTCTCT indel at chr18:31068033 in DSC2.

Table 2.
ACMG Genes With Allele-Specific Abundance ≥0.7:0.3 in Centenarians or Controls
APCAdenomatous polyposis coli1
BRCA1Breast-ovarian cancer, familial 11
DSC2Arrhythmogenic right ventricular cardiomyopathy, type 111
KCNH2Long QT syndrome 21
LDLRFamilial hypercholesterolemia14
MYH11Aortic aneurysm, familial thoracic 41
NF2Neurofibromatosis, type 21
PRKAG2Familial hypertrophic cardiomyopathy 621
SDHAF2Paragangliomas 21
SMAD3Loeys–Dietz syndrome type 31
SMAD4Juvenile polyposis syndrome,1
TGFBR1Loeys–Dietz syndrome type 1A1
Loeys–Dietz syndrome type 2A
Marfan’s syndrome
TMEM43Arrhythmogenic right ventricular cardiomyopathy, type 53
TNNI3Familial hypertrophic cardiomyopathy 71
TPM1Familial hypertrophic cardiomyopathy 31
TSC1Tuberous sclerosis 11
VHLVon Hippel–Lindau syndrome1

Six thousand thirty-nine unique variants with rsIDs were entered into SNP2TFBS (15). TF enrichment was determined by the ratio (total TF sites identified by SNP2TFBS): (TFs identified in the input list) (Supplementary Table S3). ZBTB33, Tcfcp2l1, NHLH1, TFAP2C, and Pax2 were the top five most enriched TFs.


Overall we found that centenarians and controls had comparable distributions of ASA variants with no difference in proportion by predicted effect. ASA variants were more common in noncoding than coding regions. We did not observe an overall bias in expression toward the major or minor alleles in centenarians or controls.

Since this is an exploratory study, to detect allelic ratios with a difference ≥0.7:0.3, samples were sequenced to 330–400 million raw reads per sample. Li and colleagues (19) calculated that at RD 30, there is ~60% statistical power to detect an allelic ratio of 0.7:0.3, ~90% power to detect an allelic ratio of 0.8:0.2, and nearly 100% power to detect an allelic ratio of 0.9:0.1. They further calculated that to achieve RD 30 in ~90% and ~95% of exonic variants would require ~190 million and ~240 million reads, respectively.

Immune pathways were most highly represented among the gene set that showed ASA in both centenarians and controls. We cannot determine whether this observation is related to aging; however, the presence of ASA in genes in these pathways is nonetheless of interest. The occurrence of immune dysfunction with aging is well established, including increased susceptibility to infection, frequency of neoplasia, inflammation, and autoimmune response (20). A common factor underlying the pathogenesis of many age-related chronic diseases, including CVD, cancer, neurodegenerative diseases, and diabetes is inflammation (21). It has been previously suggested that centenarians may have genetic factors that allow them to better maintain immune function as they age (22,23).

When looking at ACMG genes with potential clinical utility (16), we found evidence of ASA in some genes related to disease etiology. Although it may be an artifact of small sample size, LDLR, for which ASA was observed in one centenarian and all four controls, and TMEM43, for which ASA was observed in three centenarians and no controls, are potentially interesting to examine in a larger study. LDLR is associated with familial hypercholesterolemia and is a major apoE receptor; variants of the APOE gene are the most replicated association with longevity and healthy aging phenotypes (24) including the Super-Senior phenotype (25). TMEM43 is associated with arrhythmogenic right ventricular cardiomyopathy, type 5. ASA variants also overlapped with disease-associated NHGRI-EBI GWAS catalogue variants and were slightly more prevalent in centenarians; this difference should be explored further in a larger study. It is possible that among patients with disorders associated with these genes, even when a pathogenic variant is not identified, ASA may play a role in disease pathogenicity if a cis-acting factor is affecting gene expression. There was evidence that ASA variants overlapped with numerous TFs, which are possible mechanisms for cis-acting affects.

ASA can be caused by numerous mechanisms that cannot be distinguished in the present study. It is possible that some of our observed cases of ASA could be the result of alternative transcripts that have been collapsed into single loci. ASA differences between blood cell types could confound our analyses. By using a RD 30 cutoff we are unable to examine ASA in low-abundance genes. A caveat to the ASA observed is that it may not reflect true in vivo RNA expression levels. Limitations of this study include that we cannot determine if ASA differences are associated with healthy aging, or are a consequence of getting old.

As a pilot study, the sample size was very small, and may lead to false negative findings. We did not find evidence for a pattern of lower expression of disease-associated variants in centenarians compared to mid-life controls. The presence of ASA in some ACMG genes suggests that in patients with disorders that are highly associated with a certain gene, but for which no causal variant is apparent, looking at ASA may be of interest. As well, the representation of immune pathways among the ASA gene set may be related to decreased immune function with aging.


This work was supported by the Lotte and John Hecht Memorial Foundation.


L.C.T. and A.R.B.W. were involved in study design; L.C.T. and S.L. were involved in lab work; L.C.T. and N.T. were involved in analysis; and L.C.T. drafted the article. All authors read and approved the article.

Conflict of Interest

None reported.



Stevenson M, Bae H, Schupf N, et al. Burden of disease variants in participants of the Long Life Family Study. Aging (Albany NY). 2015;7:, pp.123–132. doi:10.18632/aging.100724


Tindale LC, Zeng A, Bretherick KL, Leach S, Thiessen N, Brooks-Wilson AR. Burden of common complex disease variants in the exomes of two healthy centenarian brothers. Gerontology. 2016;62:, pp.58–62. doi:10.1159/000430462


Beekman M, Nederstigt C, Suchiman HE, et al. Genome-wide association study (GWAS)-identified disease risk alleles do not compromise human longevity. Proc Natl Acad Sci U S A. 2010;107:, pp.18046–18049. doi:10.1073/pnas.1003540107


Pastinen T. Genome-wide allele-specific analysis: insights into regulatory variation. Nat Rev Genet. 2010;11:, pp.533–538. doi:10.1038/nrg2815


Cheung VG, Bruzel A, Burdick JT, Morley M, Devlin JL, Spielman RS. Monozygotic twins reveal germline contribution to allelic expression differences. Am J Hum Genet. 2008;82:, pp.1357–1360. doi:10.1016/j.ajhg.2008.05.003


Mastrokolias A, den Dunnen JT, van Ommen GB, ‘t Hoen PA, van Roon-Mom WM. Increased sensitivity of next generation sequencing-based expression profiling after globin reduction in human blood RNA. BMC Genomics. 2012;13:, pp.28. doi:10.1186/1471-2164-13-28


Lin H, Lunetta KL, Zhao Q, et al. Whole blood gene expression associated with clinical biological age. J Gerontol A Biol Sci Med Sci. 2019;74:, pp.81–88. doi:10.1093/gerona/gly164


Booij BB, Lindahl T, Wetterberg P, et al. A gene expression pattern in blood for the early detection of Alzheimer’s disease. J Alzheimers Dis. 2011;23:, pp.109–119. doi:10.3233/JAD-2010-101518


Miller TE, You L, Myerburg RJ, Benke PJ, Bishopric NH. Whole blood RNA offers a rapid, comprehensive approach to genetic diagnosis of cardiovascular diseases. Genet Med. 2007;9:, pp.23–33. doi:10.1097GIM.0b013e31802d74de


Halaschek-Wiener J, Tindale LC, Collins JA, et al. The Super-Seniors Study: phenotypic characterization of a healthy 85+ population. PLoS One. 2018;13:, pp.e0197578. doi:10.1371/journal.pone.0197578


Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:, pp.1754–1760. doi:10.1093/bioinformatics/btp324


Butterfield YS, Kreitzman M, Thiessen N, et al. JAGuaR: junction alignments to genome for RNA-seq reads. PLoS One. 2014;9:, pp.e102398. doi:10.1371/journal.pone.0102398


Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:, pp.453–457. doi:10.1038/nmeth.3337


Cingolani P, Platts A, Wang le L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6:, pp.80–92. doi:10.4161/fly.19695


Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:, pp.308–311. doi:10.1093/nar/29.1.308


Green RC, Berg JS, Grody WW, et al.; American College of Medical Genetics and Genomics. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med. 2013;15:, pp.565–574. doi:10.1038/gim.2013.73


MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:, pp.D896–D901. doi:10.1093/nar/gkw1133


Kumar S, Ambrosini G, Bucher P. SNP2TFBS—a database of regulatory SNPs affecting predicted transcription factor binding site affinity. Nucleic Acids Res. 2017;45:, pp.D139–D144. doi:10.1093/nar/gkw1064


Li G, Bahn JH, Lee JH, et al. Identification of allele-specific alternative mRNA processing via transcriptome sequencing. Nucleic Acids Res. 2012;40:, pp.e104. doi:10.1093/nar/gks280


Ponnappan S, Ponnappan U. Aging and immune function: molecular mechanisms to interventions. Antioxid Redox Signal. 2011;14:, pp.1551–1585. doi:10.1089/ars.2010.3228


Fulop T, Witkowski JM, Le Page A, Fortin C, Pawelec G, Larbi A. Intracellular signalling pathways: targets to reverse immunosenescence. Clin Exp Immunol. 2017;187:, pp.35–43. doi:10.1111/cei.12836


Ostan R, Bucci L, Capri M, et al. Immunosenescence and immunogenetics of human longevity. Neuroimmunomodulation. 2008;15:, pp.224–240. doi:10.1159/000156466


Franceschi C, Bonafè M. Centenarians as a model for healthy aging. Biochem Soc Trans. 2003;31(2):, pp.457–461. doi:10.1042/bst0310457


Murabito JM, Yuan R, Lunetta KL. The search for longevity and healthy aging genes: insights from epidemiological studies and samples of long-lived individuals. J Gerontol A Biol Sci Med Sci. 2012;67:, pp.470–479. doi:10.1093/gerona/gls089


Tindale LC, Leach S, Ushey K, Daley D, Brooks-Wilson AR. Rare and common variants in the Apolipoprotein E gene in healthy oldest old. Neurobiol Aging. 2014;35:, pp.727.e1–727.e3. doi:10.1016/j.neurobiolaging.2013.09.010 Transcript Abundance: A Pilot Study in Healthy Centenarians&author=Lauren C Tindale,Nina Thiessen,Stephen Leach,Angela R Brooks-Wilson,David Le Couteur,&keyword=Longevity,Human aging,Genetics,Transcriptomics,&subject=THE JOURNAL OF GERONTOLOGY: Biological Sciences,Human Aging,AcademicSubjects/MED00280,AcademicSubjects/SCI00960,