MotivationGenome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data.ResultsWe present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource.Availability and implementationThe analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.
Genome-wide association studies (GWAS) are a powerful method to analyse common diseases in a large cohort. Taking advantage of affordable large-scale genotyping chip technologies, such studies are now routinely run across cohorts large enough to detect even weak associations between common variants and a phenotype of interest. There are now enough GWAS studies to warrant the existence of specialized databases, such as the GWAS Catalog (MacArthur et al., 2017).
Despite this wealth of data, GWAS have not succeeded in translating into many therapeutic success stories (Huang, 2015). The main bottleneck is inferring truly causal genes from the GWAS results that can then be used as drug targets and thus new therapies. This gap between genetics research and translational applications is largely explained by the difficulty in functionally interpreting non-coding variants. Although annotating and prioritizing coding variants are a well-studied problem, determining the regulatory effect of non-coding variants is still difficult. In effect, many of the drug targets tested by the pharmaceutical industry fail to yield a new drug because they are revealed to be unrelated to the phenotype (Cook et al., 2014).
To close this gap, a number of experimental techniques have been developed, such as molecular Quantitative Trait Loci (QTL) (Brem et al., 2002), covariance analysis in chromatin state between distant regions of the genome (Thurman et al., 2012) or sequencing-based assays, such as Promoter-Capture Hi-C (Javierre et al., 2016). Existing pipelines (Shen et al., 2017) integrate all these datasets but they do not connect directly to databases to gather their latest results.
We present here a pipeline that compares GWAS results to a collection of useful cis-regulatory datasets. We have run our pipeline across all GWAS Catalog studies and present the results in an interactive web interface, which can be used to examine the evidence supporting the candidate causal genes.
The analysis automates a number of standard post-GWAS data integration steps as follows:
The pipeline is coded in Python and was designed to be easily installed locally and run privately, whether against public databases or on a private dataset (provided as summary statistics in a tab-delimited file).
The Open Target post-GWAS web browser allows users to browse through the pre-computed results of the pipeline run across all of GWAS Catalog. If searching from an SNP rsID or a gene symbol the browser displays either a genomic view or a table of associations (see Fig. 1). If searching for a disease, a table of known associations are displayed.
We ran our pipeline on all GWAS Catalog studies at the time (last update December 7, 2018). This comprised 2092 phenotypes and diseases, described in 3187 publications, and a total of 67 771 significant SNPs. After LD expansion, a total of 923 891 unique SNPs were analysed, each SNP being involved in 290 publications on average. The average run time for each study was 40 min.
GWAS is a powerful approach to understanding disease mechanism but requires functional analysis to produce actionable results. The Open Targets post-GWAS pipeline facilitates this process, both through an automated tool and pre-processed results, freeing GWAS analysts from the laborious process of data integration.