ResearchPad - genome-wide-association-studies https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Genome-wide association study of partial resistance to sclerotinia stem rot of cultivated soybean based on the detached leaf method]]> https://www.researchpad.co/article/elastic_article_15721 Sclerotinia stem rot (SSR) is a devastating fungal disease that causes severe yield losses of soybean worldwide. In the present study, a representative population of 185 soybean accessions was selected and utilized to identify the quantitative trait nucleotide (QTN) of partial resistance to soybean SSR via a genome-wide association study (GWAS). A total of 22,048 single-nucleotide polymorphisms (SNPs) with minor allele frequencies (MAF) > 5% and missing data < 3% were used to assess linkage disequilibrium (LD) levels. Association signals associated with SSR partial resistance were identified by two models, including compressed mixed linear model (CMLM) and multi-locus random-SNP-effect mixed linear model (mrMLM). Finally, seven QTNs with major effects (a known locus and six novel loci) via CMLM and nine novel QTNs with minor effects via mrMLM were detected in relation to partial resistance to SSR, respectively. One of all the novel loci (Gm05:14834789 on Chr.05), which was co-located by these two methods, might be a stable one that showed high significance in SSR partial resistance. Additionally, a total of 71 major and 85 minor candidate genes located in the 200-kb genomic region of each peak SNP detected by CMLM and mrMLM were found, respectively. By using a gene-based association, a total of six SNPs from three major effects genes and eight SNPs from four minor effects genes were identified. Of them, Glyma.18G012200 has been characterized as a significant element in controlling fungal disease in plants.

]]>
<![CDATA[A genome-wide association study of deafness in three canine breeds]]> https://www.researchpad.co/article/elastic_article_14705 Congenital deafness in the domestic dog is usually related to the presence of white pigmentation, which is controlled primarily by the piebald locus on chromosome 20 and also by merle on chromosome 10. Pigment-associated deafness is also seen in other species, including cats, mice, sheep, alpacas, horses, cows, pigs, and humans, but the genetic factors determining why some piebald or merle dogs develop deafness while others do not have yet to be determined. Here we perform a genome-wide association study (GWAS) to identify regions of the canine genome significantly associated with deafness in three dog breeds carrying piebald: Dalmatian, Australian cattle dog, and English setter. We include bilaterally deaf, unilaterally deaf, and matched control dogs from the same litter, phenotyped using the brainstem auditory evoked response (BAER) hearing test. Principal component analysis showed that we have different distributions of cases and controls in genetically distinct Dalmatian populations, therefore GWAS was performed separately for North American and UK samples. We identified one genome-wide significant association and 14 suggestive (chromosome-wide) associations using the GWAS design of bilaterally deaf vs. control Australian cattle dogs. However, these associations were not located on the same chromosome as the piebald locus, indicating the complexity of the genetics underlying this disease in the domestic dog. Because of this apparent complex genetic architecture, larger sample sizes may be needed to detect the genetic loci modulating risk in piebald dogs.

]]>
<![CDATA[Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models]]> https://www.researchpad.co/article/elastic_article_14653 This work addresses a recurring challenge in the analysis and interpretation of genetic association studies: which genetic variants can best predict and are independently associated with a given phenotype in the presence of population structure? Not controlling confounding due to geographic population structure, family and/or cryptic relatedness can lead to spurious associations. Much of the existing research has therefore focused on modeling the association between a phenotype and a single genetic variant in a linear mixed model with a random effect. However, this univariate approach may miss true associations due to the stringent significance thresholds required to reduce the number of false positives and also ignores the correlations between markers. We propose an alternative method for fitting high-dimensional multivariable models, which selects SNPs that are independently associated with the phenotype while also accounting for population structure. We provide an efficient implementation of our algorithm and show through simulation studies and real data examples that our method outperforms existing methods in terms of prediction accuracy and controlling the false discovery rate.

]]>
<![CDATA[An improved 7K SNP array, the C7AIR, provides a wealth of validated SNP markers for rice breeding and genetics studies]]> https://www.researchpad.co/article/elastic_article_14581 Single nucleotide polymorphisms (SNPs) are highly abundant, amendable to high-throughput genotyping, and useful for a number of breeding and genetics applications in crops. SNP frequencies vary depending on the species and populations under study, and therefore target SNPs need to be carefully selected to be informative for each application. While multiple SNP genotyping systems are available for rice (Oryza sativa L. and its relatives), they vary in their informativeness, cost, marker density, speed, flexibility, and data quality. In this study, we report the development and performance of the Cornell-IR LD Rice Array (C7AIR), a second-generation SNP array containing 7,098 markers that improves upon the previously released C6AIR. The C7AIR is designed to detect genome-wide polymorphisms within and between subpopulations of O. sativa, as well as O. glaberrima, O. rufipogon and O. nivara. The C7AIR combines top-performing SNPs from several previous rice arrays, including 4,007 SNPs from the C6AIR, 2,056 SNPs from the High Density Rice Array (HDRA), 910 SNPs from the 384-SNP GoldenGate sets, 189 SNPs from the 44K array selected to add information content for elite U.S. tropical japonica rice varieties, and 8 trait-specific SNPs. To demonstrate its utility, we carried out a genome-wide association analysis for plant height, employing the C7AIR across a diversity panel of 189 rice accessions and identified 20 QTLs contributing to plant height. The C7AIR SNP chip has so far been used for genotyping >10,000 rice samples. It successfully differentiates the five subpopulations of Oryza sativa, identifies introgressions from wild and exotic relatives, and is useful for quantitative trait loci (QTL) and association mapping in diverse materials. Moreover, data from the C7AIR provides valuable information that can be used to select informative and reliable SNP markers for conversion to lower-cost genotyping platforms for genomic selection and other downstream applications in breeding.

]]>
<![CDATA[A framework for gene mapping in wheat demonstrated using the Yr7 yellow rust resistance gene]]> https://www.researchpad.co/article/N8aa5bdf2-6390-43c2-aef2-b7a76659179a

We used three approaches to map the yellow rust resistance gene Yr7 and identify associated SNPs in wheat. First, we used a traditional QTL mapping approach using a double haploid (DH) population and mapped Yr7 to a low-recombination region of chromosome 2B. To fine map the QTL, we then used an association mapping panel. Both populations were SNP array genotyped allowing alignment of QTL and genome-wide association scans based on common segregating SNPs. Analysis of the association panel spanning the QTL interval, narrowed the interval down to a single haplotype block. Finally, we used mapping-by-sequencing of resistant and susceptible DH bulks to identify a candidate gene in the interval showing high homology to a previously suggested Yr7 candidate and to populate the Yr7 interval with a higher density of polymorphisms. We highlight the power of combining mapping-by-sequencing, delivering a complete list of gene-based segregating polymorphisms in the interval with the high recombination, low LD precision of the association mapping panel. Our mapping-by-sequencing methodology is applicable to any trait and our results validate the approach in wheat, where with a near complete reference genome sequence, we are able to define a small interval containing the causative gene.

]]>
<![CDATA[Mixed evidence for the relationship between periodontitis and Alzheimer’s disease: A bidirectional Mendelian randomization study]]> https://www.researchpad.co/article/N89b89fe7-2f39-423b-9f5f-6e2e7b2736b5

Recent experimental studies indicated that a periodontitis-causing bacterium might be a causal factor for Alzheimer’s disease (AD). We applied a two-sample Mendelian randomization (MR) approach to examine the potential causal relationship between chronic periodontitis and AD bidirectionally in the population of European ancestry. We used publicly available data of genome-wide association studies (GWAS) on periodontitis and AD. Five single-nucleotide polymorphisms (SNPs) were used as instrumental variables for periodontitis. For the MR analysis of periodontitis on risk of AD, the causal odds ratio (OR) and 95% confidence interval (CI) were derived from the GWAS of periodontitis (4,924 cases vs. 7,301 controls) and from the GWAS of AD (21,982 cases vs. 41,944 controls). Seven non-overlapping SNPs from another latest GWAS of periodontitis was used to validate the above association. Twenty SNPs were used as instrumental variables for AD. For the MR analysis of liability to AD on risk of periodontitis, the causal OR was derived from the GWAS of AD including 30,344 cases and 52,427 controls and from the GWAS of periodontitis consisted of 12,289 cases and 22,326 controls. We employed multiple methods of MR. Using the five SNPs as instruments of periodontitis, there was suggestive evidence of genetically predicted periodontitis being associated with a higher risk of AD (OR 1.10, 95% CI 1.02 to 1.19, P = 0.02). However, this association was not verified using the seven independent SNPs (OR 0.97, 95% CI 0.87 to 1.08, P = 0.59). There was no association of genetically predicted AD with the risk of periodontitis (OR 1.00, 95% CI 0.96 to 1.04, P = 0.85). In summary, we did not find convincing evidence to support periodontitis being a causal factor for the development of AD. There was also limited evidence to suggest genetic liability to AD being associated with the risk of periodontitis.

]]>
<![CDATA[Genome-wide haplotype-based association analysis of key traits of plant lodging and architecture of maize identifies major determinants for leaf angle: hapLA4]]> https://www.researchpad.co/article/5c89773ed5eed0c4847d27e7

Traits related to plant lodging and architecture are important determinants of plant productivity in intensive maize cultivation systems. Motivated by the identification of genomic associations with the leaf angle, plant height (PH), ear height (EH) and the EH/PH ratio, we characterized approximately 7,800 haplotypes from a set of high-quality single nucleotide polymorphisms (SNPs), in an association panel consisting of tropical maize inbred lines. The proportion of the phenotypic variations explained by the individual SNPs varied between 7%, for the SNP S1_285330124 (located on chromosome 9 and associated with the EH/PH ratio), and 22%, for the SNP S1_317085830 (located on chromosome 6 and associated with the leaf angle). A total of 40 haplotype blocks were significantly associated with the traits of interest, explaining up to 29% of the phenotypic variation for the leaf angle, corresponding to the haplotype hapLA4.04, which was stable over two growing seasons. Overall, the associations for PH, EH and the EH/PH ratio were environment-specific, which was confirmed by performing a model comparison analysis using the information criteria of Akaike and Schwarz. In addition, five stable haplotypes (83%) and 15 SNPs (75%) were identified for the leaf angle. Finally, approximately 62% of the associated haplotypes (25/40) did not contain SNPs detected in the association study using individual SNP markers. This result confirms the advantage of haplotype-based genome-wide association studies for examining genomic regions that control the determining traits for architecture and lodging in maize plants.

]]>
<![CDATA[Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci]]> https://www.researchpad.co/article/5c7ee7c7d5eed0c4848f4db2

Cystic Fibrosis (CF) exhibits morbidity in several organs, including progressive lung disease in all patients and intestinal obstruction at birth (meconium ileus) in ~15%. Individuals with the same causal CFTR mutations show variable disease presentation which is partly attributed to modifier genes. With >6,500 participants from the International CF Gene Modifier Consortium, genome-wide association investigation identified a new modifier locus for meconium ileus encompassing ATP12A on chromosome 13 (min p = 3.83x10-10); replicated loci encompassing SLC6A14 on chromosome X and SLC26A9 on chromosome 1, (min p<2.2x10-16, 2.81x10−11, respectively); and replicated a suggestive locus on chromosome 7 near PRSS1 (min p = 2.55x10-7). PRSS1 is exclusively expressed in the exocrine pancreas and was previously associated with non-CF pancreatitis with functional characterization demonstrating impact on PRSS1 gene expression. We thus asked whether the other meconium ileus modifier loci impact gene expression and in which organ. We developed and applied a colocalization framework called the Simple Sum (SS) that integrates regulatory and genetic association information, and also contrasts colocalization evidence across tissues or genes. The associated modifier loci colocalized with expression quantitative trait loci (eQTLs) for ATP12A (p = 3.35x10-8), SLC6A14 (p = 1.12x10-10) and SLC26A9 (p = 4.48x10-5) in the pancreas, even though meconium ileus manifests in the intestine. The meconium ileus susceptibility locus on chromosome X appeared shifted in location from a previously identified locus for CF lung disease severity. Using the SS we integrated the lung disease association locus with eQTLs from nasal epithelia of 63 CF participants and demonstrated evidence of colocalization with airway-specific regulation of SLC6A14 (p = 2.3x10-4). Cystic Fibrosis is realizing the promise of personalized medicine, and identification of the contributing organ and understanding of tissue specificity for a gene modifier is essential for the next phase of personalizing therapeutic strategies.

]]>
<![CDATA[Fast and flexible linear mixed models for genome-wide genetics]]> https://www.researchpad.co/article/5c6730aed5eed0c484f37eb1

Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.

]]>
<![CDATA[An African-specific haplotype in MRGPRX4 is associated with menthol cigarette smoking]]> https://www.researchpad.co/article/5c706741d5eed0c4847c6cc6

In the U.S., more than 80% of African-American smokers use mentholated cigarettes, compared to less than 30% of Caucasian smokers. The reasons for these differences are not well understood. To determine if genetic variation contributes to mentholated cigarette smoking, we performed an exome-wide association analysis in a multiethnic population-based sample from Dallas, TX (N = 561). Findings were replicated in an independent cohort of African Americans from Washington, DC (N = 741). We identified a haplotype of MRGPRX4 (composed of rs7102322[G], encoding N245S, and rs61733596[G], T43T), that was associated with a 5-to-8 fold increase in the odds of menthol cigarette smoking. The variants are present solely in persons of African ancestry. Functional studies indicated that the variant G protein-coupled receptor encoded by MRGPRX4 displays reduced agonism in both arrestin-based and G protein-based assays, and alteration of agonism by menthol. These data indicate that genetic variation in MRGPRX4 contributes to inter-individual and inter-ethnic differences in the preference for mentholated cigarettes, and that the existence of genetic factors predisposing vulnerable populations to mentholated cigarette smoking can inform tobacco control and public health policies.

]]>
<![CDATA[Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)]]> https://www.researchpad.co/article/5c6dc9d5d5eed0c48452a2b4

Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.

]]>
<![CDATA[A novel resistance gene for bacterial blight in rice, Xa43(t) identified by GWAS, confirmed by QTL mapping using a bi-parental population]]> https://www.researchpad.co/article/5c6c75dbd5eed0c4843d0337

Bacterial blight (BB) caused by the Xanthomonas oryzae pv. oryzae (Xoo) pathogen is a significant disease in most rice cultivation areas. The disease is estimated to cause annual rice production losses of 20–30 percent throughout rice-growing countries in Asia. The discovery and deployment of durable resistance genes for BB is an effective and sustainable means of mitigating production losses. In this study QTL analysis and fine mapping were performed using an F2 and a BC2F2 population derived from a cross with a new R-donor having broad spectrum resistance to Korean BB races. The QTL qBB11 was identified by composite interval mapping and explained 31.25% of the phenotypic variation (R2) with LOD values of 43.44 harboring two SNP markers. The single major R-gene was designated Xa43 (t). Through dissection of the target region we were able to narrow the region to within 27.83–27.95 Mbp, a physical interval of about 119-kb designated by the two flanking markers IBb27os11_14 and S_BB11.ssr_9. Of nine ORFs in the target region two ORFs revealed significantly different expression levels of the candidate genes. From these results we developed a marker specific to this R-gene, which will have utility for future BB resistance breeding and/or R-gene pyramiding using marker assisted selection. Further characterization of the R-gene would be helpful to enhance understanding of mechanisms of BB resistance in rice.

]]>
<![CDATA[Variance components for bovine tuberculosis infection and multi-breed genome-wide association analysis using imputed whole genome sequence data]]> https://www.researchpad.co/article/5c6f1539d5eed0c48467af0c

Bovine tuberculosis (bTB) is an infectious disease of cattle generally caused by Mycobacterium bovis, a bacterium that can elicit disease humans. Since the 1950s, the objective of the national bTB eradication program in Republic of Ireland was the biological extinction of bTB; that purpose has yet to be achieved. Objectives of the present study were to develop the statistical methodology and variance components to undertake routine genetic evaluations for resistance to bTB; also of interest was the detection of regions of the bovine genome putatively associated with bTB infection in dairy and beef breeds. The novelty of the present study, in terms of research on bTB infection, was the use of beef breeds in the genome-wide association and the utilization of imputed whole genome sequence data. Phenotypic bTB data on 781,270 animals together with imputed whole genome sequence data on 7,346 of these animals’ sires were available. Linear mixed models were used to quantify variance components for bTB and EBVs were validated. Within-breed and multi-breed genome-wide associations were undertaken using a single-SNP regression approach. The estimated genetic standard deviation (0.09), heritability (0.12), and repeatability (0.30) substantiate that genetic selection help to eradicate bTB. The multi-breed genome-wide association analysis identified 38 SNPs and 64 QTL regions associated with bTB infection; two QTL regions (both on BTA23) identified in the multi-breed analysis overlapped with the within-breed analyses of Charolais, Limousin, and Holstein-Friesian. Results from the association analysis, coupled with previous studies, suggest bTB is controlled by an infinitely large number of loci, each having a small effect. The methodology and results from the present study will be used to develop national genetic evaluations for bTB in the Republic of Ireland. In addition, results can also be used to help uncover the biological architecture underlying resistance to bTB infection in cattle.

]]>
<![CDATA[Training set optimization of genomic prediction by means of EthAcc]]> https://www.researchpad.co/article/5c75ac8dd5eed0c484d08a24

Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc’s precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.

]]>
<![CDATA[Genes encoding SATB2-interacting proteins in adult cerebral cortex contribute to human cognitive ability]]> https://www.researchpad.co/article/5c648d17d5eed0c484c81f75

During CNS development, the nuclear protein SATB2 is expressed in superficial cortical layers and determines projection neuron identity. In the adult CNS, SATB2 is expressed in pyramidal neurons of all cortical layers and is a regulator of synaptic plasticity and long-term memory. Common variation in SATB2 locus confers risk of schizophrenia, whereas rare, de novo structural and single nucleotide variants cause severe intellectual disability and absent or limited speech. To characterize differences in SATB2 molecular function in developing vs adult neocortex, we isolated SATB2 protein interactomes at the two ontogenetic stages and identified multiple novel SATB2 interactors. SATB2 interactomes are highly enriched for proteins that stabilize de novo chromatin loops. The comparison between the neonatal and adult SATB2 protein complexes indicates a developmental shift in SATB2 molecular function, from transcriptional repression towards organization of chromosomal superstructure. Accordingly, gene sets regulated by SATB2 in the neocortex of neonatal and adult mice show limited overlap. Genes encoding SATB2 protein interactors were grouped for gene set analysis of human GWAS data. Common variants associated with human cognitive ability are enriched within the genes encoding adult but not neonatal SATB2 interactors. Our data support a shift in the function of SATB2 in cortex over lifetime and indicate that regulation of spatial chromatin architecture by the SATB2 interactome contributes to cognitive function in the general population.

]]>
<![CDATA[Evidence of a causal relationship between body mass index and psoriasis: A mendelian randomization study]]> https://www.researchpad.co/article/5c5ca31cd5eed0c48441f191

Background

Psoriasis is a common inflammatory skin disease that has been reported to be associated with obesity. We aimed to investigate a possible causal relationship between body mass index (BMI) and psoriasis.

Methods and findings

Following a review of published epidemiological evidence of the association between obesity and psoriasis, mendelian randomization (MR) was used to test for a causal relationship with BMI. We used a genetic instrument comprising 97 single-nucleotide polymorphisms (SNPs) associated with BMI as a proxy for BMI (expected to be much less confounded than measured BMI). One-sample MR was conducted using individual-level data (396,495 individuals) from the UK Biobank and the Nord-Trøndelag Health Study (HUNT), Norway. Two-sample MR was performed with summary-level data (356,926 individuals) from published BMI and psoriasis genome-wide association studies (GWASs). The one-sample and two-sample MR estimates were meta-analysed using a fixed-effect model. To test for a potential reverse causal effect, MR analysis with genetic instruments comprising variants from recent genome-wide analyses for psoriasis were used to test whether genetic risk for this skin disease has a causal effect on BMI.

Published observational data showed an association of higher BMI with psoriasis. A mean difference in BMI of 1.26 kg/m2 (95% CI 1.02–1.51) between psoriasis cases and controls was observed in adults, while a 1.55 kg/m2 mean difference (95% CI 1.13–1.98) was observed in children. The observational association was confirmed in UK Biobank and HUNT data sets. Overall, a 1 kg/m2 increase in BMI was associated with 4% higher odds of psoriasis (meta-analysis odds ratio [OR] = 1.04; 95% CI 1.03–1.04; P = 1.73 × 10−60). MR analyses provided evidence that higher BMI causally increases the odds of psoriasis (by 9% per 1 unit increase in BMI; OR = 1.09 (1.06–1.12) per 1 kg/m2; P = 4.67 × 10−9). In contrast, MR estimates gave little support to a possible causal effect of psoriasis genetic risk on BMI (0.004 kg/m2 change in BMI per doubling odds of psoriasis (−0.003 to 0.011). Limitations of our study include possible misreporting of psoriasis by patients, as well as potential misdiagnosis by clinicians. In addition, there is also limited ethnic variation in the cohorts studied.

Conclusions

Our study, using genetic variants as instrumental variables for BMI, provides evidence that higher BMI leads to a higher risk of psoriasis. This supports the prioritization of therapies and lifestyle interventions aimed at controlling weight for the prevention or treatment of this common skin disease. Mechanistic studies are required to improve understanding of this relationship.

]]>
<![CDATA[Genome-wide association mapping of grain yield in a diverse collection of spring wheat (Triticum aestivum L.) evaluated in southern Australia]]> https://www.researchpad.co/article/5c61e8c8d5eed0c48496f180

Wheat landraces, wild relatives and other ‘exotic’ accessions are important sources of new favorable alleles. The use of those exotic alleles is facilitated by having access to information on the association of specific genomic regions with desirable traits. Here, we conducted a genome-wide association study (GWAS) using a wheat panel that includes landraces, synthetic hexaploids and other exotic wheat accessions to identify loci that contribute to increases in grain yield in southern Australia. The 568 accessions were grown in the field during the 2014 and 2015 seasons and measured for plant height, maturity, spike length, spike number, grain yield, plant biomass, HI and TGW. We used the 90K SNP array and two GWAS approaches (GAPIT and QTCAT) to identify loci associated with the different traits. We identified 17 loci with GAPIT and 25 with QTCAT. Ten of these loci were associated with known genes that are routinely employed in marker assisted selection such as Ppd-D1 for maturity and Rht-D1 for plant height and seven of those were detected with both methods. We identified one locus for yield per se in 2014 on chromosome 6B with QTCAT and three in 2015, on chromosomes 4B and 5A with GAPIT and 6B with QTCAT. The 6B loci corresponded to the same region in both years. The favorable haplotypes for yield at the 5A and 6B loci are widespread in Australian accessions with 112 out of 153 carrying the favorable haplotype at the 5A locus and 136 out of 146 carrying the favorable haplotype at the 6A locus, while the favorable haplotype at 4B is only present in 65 out of 149 Australian accessions. The low number of yield QTL in our study corroborate with other GWAS for yield in wheat, where most of the identified loci have very small effects.

]]>
<![CDATA[Finding the needle in a haystack: Mapping antifungal drug resistance in fungal pathogen by genomic approaches]]> https://www.researchpad.co/article/5c5ca2fad5eed0c48441eee3 ]]> <![CDATA[Integrating predicted transcriptome from multiple tissues improves association detection]]> https://www.researchpad.co/article/5c50c43bd5eed0c4845e8359

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.

]]>
<![CDATA[SNP variable selection by generalized graph domination]]> https://www.researchpad.co/article/5c536b67d5eed0c484a48932

Background

High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the pn problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models.

Methods and findings

K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum k-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).

]]>