The genetic basis of healthy aging and longevity remains largely unexplained. One hypothesis as to why long-lived individuals do not appear to have a lower number of common-complex disease variants, is that despite carrying risk variants, they express disease-linked alleles at a lower level than the wild-type alleles. Allele-specific abundance (ASA) is the different transcript abundance of the two haplotypes of a diploid individual. We sequenced the transcriptomes of four healthy centenarians and four mid-life controls. CIBERSORT was used to estimate blood cell fractions: neutrophils were the most abundant source of RNA, followed by CD8+ T cells, resting NK cells, and monocytes. ASA variants were more common in noncoding than coding regions. Centenarians and controls had a comparable distribution of ASA variants by predicted effect, and we did not observe an overall bias in expression toward major or minor alleles. Immune pathways were most highly represented among the gene set that showed ASA. Although we found evidence of ASA in disease-associated genes and transcription factors, we did not observe any differences in the pattern of expression between centenarians and controls in this small pilot study.
Several studies have shown that long-lived individuals do not appear to carry a substantially lower number of common-complex disease risk alleles compared with typical individuals (1–3). It may be that despite carrying risk alleles, they express the disease-related variants at a lower level. Allele-specific abundance (ASA) is the different transcript abundance of the two haplotypes of a diploid individual. ASA occurs across the genome in different biological processes and across multiple tissues (4). ASA can be caused by allele-specific alternative splicing, variation in transcriptional start or stop sites, cis- acting regulatory variants, epigenetic differences, such as imprinting, DNA methylation, chromatin state, and differences in mRNA stability (4). ASA has been shown to be under genetic control using lymphoblastoid cell lines where monozygotic twins were found to have a more similar degree of ASA than unrelated controls (5).
Although different tissues have been shown to have important differences in expression, whole blood (WB) analysis has advantages. Peripheral blood is a cost-effective and clinically relevant tissue, it provides an overall view of the body since it comes into contact with almost all organs and tissues (6), and it is useful when looking for biomarkers of biological age (7) and diseases where the tissue of interest is not readily available. WB collected in PAXgene tubes have identified and validated a gene expression signature that distinguished between Alzheimer’s disease (AD) patients and cognitively healthy controls, showing that AD could be detected away from the primary site of the disease (8). Mutations causing expression differences in cardiac-restricted genes for Long QT syndrome, Marfan syndrome, and hypertrophic cardiomyopathy were also identified from WB from PAXgene tubes (9).
In a pilot study of four healthy centenarians and four mid-life controls, we hypothesized that healthy centenarians may exhibit an overall pattern of lower expression of disease-associated alleles. This would effectively allow them to favor expression of nonrisk alleles, despite carrying disease-causing variants. Heterozygous SNPs in the transcribed region of a gene may be used as indicators to determine if one copy of the gene is more highly expressed than the other. We tested whether preferential transcript abundance was observed, and whether it was skewed in centenarians versus controls by examining ASA in disease-associated genes and variants.
Super-Seniors are aged 85 years and older with no reported history of cancer, cardiovascular disease (CVD), diabetes, major pulmonary disease, or dementia (10). Four Super-Senior centenarians aged 100–104 years (two males, two females), and four controls aged 50–56 years (two males, two females) who were not selected for health, participated in a transcriptome pilot study. Research ethics board (REB) approval was received from the joint Clinical REB of BC Cancer and the University of British Columbia, and the REB of Simon Fraser University. Participants gave written informed consent.
WB was collected in PAXgene tubes, which preserve in vivo transcription levels. RNA was extracted using the PAXgene blood miRNA kit (QIAGEN) and globin depleted using the GLOBINclear kit (Thermo Fisher Scientific, MA). Transcriptome and exome libraries were constructed using the Illumina TruSeq stranded mRNA kit and Agilent Technologies SureSelect v4+UTR kit, respectively.
Transcriptomes were aligned in a two-step process using bwa-mem 0.7.7a (11) and JAGuaR (12) 2.2: (a) reads were aligned to a genomic plus exon–exon junction reference created using hg38_no_alt reference with annotations from ensembl79 and refseq, and (b) reads that aligned to the exon–exon junction sequence were repositioned into standard genomic coordinates to become large-gapped genomic alignments. Exomes were aligned using bwa-mem 0.7.7a. Variants were called using bcftools mpileup.
CIBERSORT (13), an in silico flow cytometry tool, was used to estimate immune cell type abundances from gene expression data. The signature gene file of 22 distinct immune cell types provided by CIBERSORT was used as a reference for gene expression signatures.
Exome sequence calling was used to determine heterozygous SNPs, which were then matched to the transcriptome sequences. Autosomal variants were filtered for a read depth (RD) ≥30 and an alternate allele frequency >0.7 or <0.3.
Variants with ASA were compared by SNPEff predicted effect (14) and by major or minor allele bias as determined from minor allele frequencies reported in dbSNP build 150 (15). Gene lists were entered into QIAGEN Ingenuity® Pathway Analysis (IPA) to determine enriched pathways and physiological functions. To look for disease associations we compared genes showing ASA to lists of publically available disease associations. We looked for evidence of ASA in the 60 genes recommended by ACMG for reporting of incidental findings (16), as well as variants with SNP-trait associations that reached genome-wide significance listed in the NHGRI-EBI GWAS catalog v1.0.1 (17). Variants with ASA were input into SNP2TFBS to map SNPs to transcription factors (TF) (18).
CIBERSORT estimated the mean major cell types to be neutrophils, followed by CD8+ T cells, monocytes, and resting natural killer (NK) cells (Supplementary Figure S1; Supplementary Table S1). Cell type distribution varied between participants. However, a larger sample size would be required to determine if there is a pattern of differences between centenarians and controls.
After transcriptome alignment, there were ~240–320 million mapped reads per sample. Centenarians had a mean of 344,167 (SD = 14,369) transcriptome variants and 130,068 (SD = 10,188) heterozygous exome variants; controls had a mean of 355,220 (SD = 136,358) transcriptome variants and 128,775 (SD = 3,029) heterozygous exome variants. There was a mean of 30,335 (SD = 1,480) matches between the transcriptome and heterozygous exome variants in centenarians and 28,670 (SD = 2,822) matches in controls. After filtering for RD ≥30, there were 16,679 (SD = 1,108) matches in centenarians and 15,000 (SD = 1,967) matches in controls. After filtering for alternate allele frequency >0.7 or <0.3, the final list contained 1,145 (SD = 92) ASA variants in centenarians and 1,048 (SD = 136) ASA variants in controls.
Variants with ASA were most frequently noncoding: 3′UTR variants were most frequent, followed by intronic variants, downstream gene variants, and then coding missense and synonymous variants (Supplementary Figure S2).
There was no distinguishable difference in the proportion of variants with major allele bias versus minor allele bias when comparing centenarians and controls (Figure 1). Among ASA variants with an allelic ratio ≥0.7:0.3, centenarians had 49.1% of variants skewed to the minor allele, compared with 59.9% in controls. At the more extreme ASA variants with an allelic ratio ≥0.9:0.1, centenarians had 53.3% of variants skewed to the minor allele, compared with 51.2% among controls. There was no difference in the proportion of variants skewed to the minor allele between centenarians and controls for allelic ratio ≥0.7:0.3 (χ 2 = 3.4; p = .065) or for allelic ratio ≥0.9:0.1 (χ 2 = 3.33; p = .068). There was no difference in the proportion in % skewed to the minor allele between the allelic ratio ≥0.7:0.3 and the allelic ratio ≥0.9:0.1 sets (χ 2 = 0.06; p = .806).
Eighty-two SNPs observed to have ASA ≥0.7:0.3 in either Super-Seniors or controls overlapped with SNPs in the NHGRI-EBI GWAS catalog. Trait associations from the catalog were filtered to keep only associations with exclusion criteria diseases for Super-Seniors. Twenty-one disease-associated SNPs remained. Of these, 9 were skewed to the minor allele and 12 were skewed to the major allele. Seven were skewed toward the disease-associated allele, 9 were skewed away from the disease-associated allele, and 5 did not have a clear disease-associated allele (Table 1). Seven SNPs had ASA in multiple subjects, all of which were skewed in the same direction. Among the set of 21 disease-associated SNPs, there were 20 incidences of ASA in centenarians and 13 in controls.
|Gene||rs ID||Associated Disease/Trait||Skewed to Minor Allele?||Skewed to GWAS Disease- Associated Allele?||Cents||Controls|
|HIP1||rs1167827||Body mass index||No||n/a||1||0|
|LYSMD4||rs12185079||Post bronchodilator FEV1/FVC ratio in COPD||Yes||Yes||1||0|
|SLC25A44||rs2072499||Testicular germ cell tumor||No||No||0||1|
|OR52K2-OR52K1||rs2278170||Amyotrophic lateral sclerosis||No||n/a||2||0|
|GSDMB||rs2290400||Type 1 diabetes, bronchial hyper-responsiveness in asthma||No||No||3||1|
|GSDMB||rs2305480||Asthma, ulcerative colitis||No||n/a||2||1|
|CASP8||rs3769823||Basal cell carcinoma||No||Yes||1||0|
|SFTPD||rs721917||Chronic obstructive pulmonary disease||No||No||1||0|
|CD226||rs763361||Type 1 diabetes||Yes||n/a||0||1|
|C9orf72||rs774359||Amyotrophic lateral sclerosis||Yes||n/a||0||1|
|RCCD1||rs79548680||Type 2 diabetes||Yes||Yes||2||1|
|SDCCAG8||rs953492||Diastolic blood pressure||Yes||No||0||1|
There were 2,182 and 2,052 genes where ASA was observed in centenarians or controls, respectively. When the gene list was entered into QIAGEN IPA to determine enriched pathways and physiological functions, the top canonical pathways in centenarians were: antigen presentation; autophagy; crosstalk between dendritic cells and NK cells; CD28 signaling in T helper cells. The top canonical pathways in controls were: antigen presentation; NK cell signaling; Th1 pathway; Th1 and Th2 activation.
Top centenarian genes with ASA were defined as genes for which there was evidence of ASA in four centenarians and only one or no controls, or in three centenarians and no controls; and vice versa for top genes with ASA in controls. There were 35 top genes with ASA in centenarians and 23 top genes with ASA in controls (Supplementary Table S2). In QIAGEN IPA, the top canonical pathways for the top genes with ASA in centenarians were, in order: citrulline-nitric oxide cycle; LXR/RXR activation; IL-12 signaling and production in macrophages; superpathway of citrulline metabolism; production of nitric oxide. The top canonical pathways for the top genes with ASA in controls were: LXR/RXR activation; role of osteoblast, osteoclasts, and chondrocytes in rheumatoid arthritis; IL-10 signaling; CCR5 signaling in macrophages.
Among the 60 “ACMG” genes, we found that centenarians and controls showed ASA ≥0.7:0.3 in 8 and 11 genes, respectively (Table 2). Of note, in LDLR ASA was observed in one centenarian and all four controls; and in TMEM43 ASA was observed in three centenarians and no controls. All other ASA variants in ACMG genes were unique with the exception of two SNPs in PRKAG2, which were each present in two subjects. One variant had a high predicted functional impact: a TTCT>TTCTCT indel at chr18:31068033 in DSC2.
|APC||Adenomatous polyposis coli||1|
|BRCA1||Breast-ovarian cancer, familial 1||1|
|DSC2||Arrhythmogenic right ventricular cardiomyopathy, type 11||1|
|KCNH2||Long QT syndrome 2||1|
|MYH11||Aortic aneurysm, familial thoracic 4||1|
|NF2||Neurofibromatosis, type 2||1|
|PRKAG2||Familial hypertrophic cardiomyopathy 6||2||1|
|SMAD3||Loeys–Dietz syndrome type 3||1|
|SMAD4||Juvenile polyposis syndrome,||1|
|TGFBR1||Loeys–Dietz syndrome type 1A||1|
|Loeys–Dietz syndrome type 2A|
|TMEM43||Arrhythmogenic right ventricular cardiomyopathy, type 5||3|
|TNNI3||Familial hypertrophic cardiomyopathy 7||1|
|TPM1||Familial hypertrophic cardiomyopathy 3||1|
|TSC1||Tuberous sclerosis 1||1|
|VHL||Von Hippel–Lindau syndrome||1|
Six thousand thirty-nine unique variants with rsIDs were entered into SNP2TFBS (15). TF enrichment was determined by the ratio (total TF sites identified by SNP2TFBS): (TFs identified in the input list) (Supplementary Table S3). ZBTB33, Tcfcp2l1, NHLH1, TFAP2C, and Pax2 were the top five most enriched TFs.
Overall we found that centenarians and controls had comparable distributions of ASA variants with no difference in proportion by predicted effect. ASA variants were more common in noncoding than coding regions. We did not observe an overall bias in expression toward the major or minor alleles in centenarians or controls.
Since this is an exploratory study, to detect allelic ratios with a difference ≥0.7:0.3, samples were sequenced to 330–400 million raw reads per sample. Li and colleagues (19) calculated that at RD 30, there is ~60% statistical power to detect an allelic ratio of 0.7:0.3, ~90% power to detect an allelic ratio of 0.8:0.2, and nearly 100% power to detect an allelic ratio of 0.9:0.1. They further calculated that to achieve RD 30 in ~90% and ~95% of exonic variants would require ~190 million and ~240 million reads, respectively.
Immune pathways were most highly represented among the gene set that showed ASA in both centenarians and controls. We cannot determine whether this observation is related to aging; however, the presence of ASA in genes in these pathways is nonetheless of interest. The occurrence of immune dysfunction with aging is well established, including increased susceptibility to infection, frequency of neoplasia, inflammation, and autoimmune response (20). A common factor underlying the pathogenesis of many age-related chronic diseases, including CVD, cancer, neurodegenerative diseases, and diabetes is inflammation (21). It has been previously suggested that centenarians may have genetic factors that allow them to better maintain immune function as they age (22,23).
When looking at ACMG genes with potential clinical utility (16), we found evidence of ASA in some genes related to disease etiology. Although it may be an artifact of small sample size, LDLR, for which ASA was observed in one centenarian and all four controls, and TMEM43, for which ASA was observed in three centenarians and no controls, are potentially interesting to examine in a larger study. LDLR is associated with familial hypercholesterolemia and is a major apoE receptor; variants of the APOE gene are the most replicated association with longevity and healthy aging phenotypes (24) including the Super-Senior phenotype (25). TMEM43 is associated with arrhythmogenic right ventricular cardiomyopathy, type 5. ASA variants also overlapped with disease-associated NHGRI-EBI GWAS catalogue variants and were slightly more prevalent in centenarians; this difference should be explored further in a larger study. It is possible that among patients with disorders associated with these genes, even when a pathogenic variant is not identified, ASA may play a role in disease pathogenicity if a cis-acting factor is affecting gene expression. There was evidence that ASA variants overlapped with numerous TFs, which are possible mechanisms for cis-acting affects.
ASA can be caused by numerous mechanisms that cannot be distinguished in the present study. It is possible that some of our observed cases of ASA could be the result of alternative transcripts that have been collapsed into single loci. ASA differences between blood cell types could confound our analyses. By using a RD 30 cutoff we are unable to examine ASA in low-abundance genes. A caveat to the ASA observed is that it may not reflect true in vivo RNA expression levels. Limitations of this study include that we cannot determine if ASA differences are associated with healthy aging, or are a consequence of getting old.
As a pilot study, the sample size was very small, and may lead to false negative findings. We did not find evidence for a pattern of lower expression of disease-associated variants in centenarians compared to mid-life controls. The presence of ASA in some ACMG genes suggests that in patients with disorders that are highly associated with a certain gene, but for which no causal variant is apparent, looking at ASA may be of interest. As well, the representation of immune pathways among the ASA gene set may be related to decreased immune function with aging.
L.C.T. and A.R.B.W. were involved in study design; L.C.T. and S.L. were involved in lab work; L.C.T. and N.T. were involved in analysis; and L.C.T. drafted the article. All authors read and approved the article.