ResearchPad - computer-and-information-sciences https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae]]> https://www.researchpad.co/article/elastic_article_11231 In Rubiaceae phylogenetics, the number of markers often proved a limitation with authors failing to provide well-supported trees at tribal and generic levels. A robust phylogeny is a prerequisite to study the evolutionary patterns of traits at different taxonomic levels. Advances in next-generation sequencing technologies have revolutionized biology by providing, at reduced cost, huge amounts of data for an increased number of species. Due to their highly conserved structure, generally recombination-free, and mostly uniparental inheritance, chloroplast DNA sequences have long been used as choice markers for plant phylogeny reconstruction. The main objectives of this study are: 1) to gain insight in chloroplast genome evolution in the Rubiaceae (Ixoroideae) through efficient methodology for de novo assembly of plastid genomes; and, 2) to test the efficiency of mining SNPs in the nuclear genome of Ixoroideae based on the use of a coffee reference genome to produce well-supported nuclear trees. We assembled whole chloroplast genome sequences for 27 species of the Rubiaceae subfamily Ixoroideae using next-generation sequences. Analysis of the plastid genome structure reveals a relatively good conservation of gene content and order. Generally, low variation was observed between taxa in the boundary regions with the exception of the inverted repeat at both the large and short single copy junctions for some taxa. An average of 79% of the SNP determined in the Coffea genus are transferable to Ixoroideae, with variation ranging from 35% to 96%. In general, the plastid and the nuclear genome phylogenies are congruent with each other. They are well-resolved with well-supported branches. Generally, the tribes form well-identified clades but the tribe Sherbournieae is shown to be polyphyletic. The results are discussed relative to the methodology used and the chloroplast genome features in Rubiaceae and compared to previous Rubiaceae phylogenies.

]]>
<![CDATA[A model for the assessment of bluetongue virus serotype 1 persistence in Spain]]> https://www.researchpad.co/article/elastic_article_11225 Bluetongue virus (BTV) is an arbovirus of ruminants that has been circulating in Europe continuously for more than two decades and has become endemic in some countries such as Spain. Spain is ideal for BTV epidemiological studies since BTV outbreaks from different sources and serotypes have occurred continuously there since 2000; BTV-1 has been reported there from 2007 to 2017. Here we develop a model for BTV-1 endemic scenario to estimate the risk of an area becoming endemic, as well as to identify the most influential factors for BTV-1 persistence. We created abundance maps at 1-km2 spatial resolution for the main vectors in Spain, Culicoides imicola and Obsoletus and Pulicaris complexes, by combining environmental satellite data with occurrence models and a random forest machine learning algorithm. The endemic model included vector abundance and host-related variables (farm density). The three most relevant variables in the endemic model were the abundance of C. imicola and Obsoletus complex and density of goat farms (AUC 0.86); this model suggests that BTV-1 is more likely to become endemic in central and southwestern regions of Spain. It only requires host- and vector-related variables to identify areas at greater risk of becoming endemic for bluetongue. Our results highlight the importance of suitable Culicoides spp. prediction maps for bluetongue epidemiological studies and decision-making about control and eradication measures.

]]>
<![CDATA[The Language of Innovation]]> https://www.researchpad.co/article/elastic_article_10245 Predicting innovation is a peculiar problem in data science. Following its definition, an innovation is always a never-seen-before event, leaving no room for traditional supervised learning approaches. Here we propose a strategy to address the problem in the context of innovative patents, by defining innovations as never-seen-before associations of technologies and exploiting self-supervised learning techniques. We think of technological codes present in patents as a vocabulary and the whole technological corpus as written in a specific, evolving language. We leverage such structure with techniques borrowed from Natural Language Processing by embedding technologies in a high dimensional euclidean space where relative positions are representative of learned semantics. Proximity in this space is an effective predictor of specific innovation events, that outperforms a wide range of standard link-prediction metrics. The success of patented innovations follows a complex dynamics characterized by different patterns which we analyze in details with specific examples. The methods proposed in this paper provide a completely new way of understanding and forecasting innovation, by tackling it from a revealing perspective and opening interesting scenarios for a number of applications and further analytic approaches.

]]>
<![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Medusa: Software to build and analyze ensembles of genome-scale metabolic network reconstructions]]> https://www.researchpad.co/article/elastic_article_7734 Uncertainty in the structure and parameters of networks is ubiquitous across computational biology. In constraint-based reconstruction and analysis of metabolic networks, this uncertainty is present both during the reconstruction of networks and in simulations performed with them. Here, we present Medusa, a Python package for the generation and analysis of ensembles of genome-scale metabolic network reconstructions. Medusa builds on the COBRApy package for constraint-based reconstruction and analysis by compressing a set of models into a compact ensemble object, providing functions for the generation of ensembles using experimental data, and extending constraint-based analyses to ensemble scale. We demonstrate how Medusa can be used to generate ensembles and perform ensemble simulations, and how machine learning can be used in conjunction with Medusa to guide the curation of genome-scale metabolic network reconstructions. Medusa is available under the permissive MIT license from the Python Packaging Index (https://pypi.org) and from github (https://github.com/opencobra/Medusa), and comprehensive documentation is available at https://medusa.readthedocs.io/en/latest.

]]>
<![CDATA[Mechanism to prevent the abuse of IPv6 fragmentation in OpenFlow networks]]> https://www.researchpad.co/article/elastic_article_7717 OpenFlow makes a network highly flexible and fast-evolving by separating control and data planes. The control plane thus becomes responsive to changes in topology and load balancing requirements. OpenFlow also offers a new approach to handle security threats accurately and responsively. Therefore, it is used as an innovative firewall that acts as a first-hop security to protect networks against malicious users. However, the firewall provided by OpenFlow suffers from Internet protocol version 6 (IPv6) fragmentation, which can be used to bypass the OpenFlow firewall. The OpenFlow firewall cannot identify the message payload unless the switch implements IPv6 fragment reassembly. This study tests the IPv6 fragmented packets that can evade the OpenFlow firewall, and proposes a new mechanism to guard against attacks carried out by malicious users to exploit IPv6 fragmentation loophole in OpenFlow networks. The proposed mechanism is evaluated in a simulated environment by using six scenarios, and results exhibit that the proposed mechanism effectively fixes the loophole and successfully prevents the abuse of IPv6 fragmentation in OpenFlow networks.

]]>
<![CDATA[SimSurvey: An R package for comparing the design and analysis of surveys by simulating spatially-correlated populations]]> https://www.researchpad.co/article/elastic_article_8465 Populations often show complex spatial and temporal dynamics, creating challenges in designing and implementing effective surveys. Inappropriate sampling designs can potentially lead to both under-sampling (reducing precision) and over-sampling (through the extensive and potentially expensive sampling of correlated metrics). These issues can be difficult to identify and avoid in sample surveys of fish populations as they tend to be costly and comprised of multiple levels of sampling. Population estimates are therefore affected by each level of sampling as well as the pathway taken to analyze such data. Though simulations are a useful tool for exploring the efficacy of specific sampling strategies and statistical methods, there are a limited number of tools that facilitate the simulation testing of a range of sampling and analytical pathways for multi-stage survey data. Here we introduce the R package SimSurvey, which has been designed to simplify the process of simulating surveys of age-structured and spatially-distributed populations. The package allows the user to simulate age-structured populations that vary in space and time and explore the efficacy of a range of built-in or user-defined sampling protocols to reproduce the population parameters of the known population. SimSurvey also includes a function for estimating the stratified mean and variance of the population from the simulated survey data. We demonstrate the use of this package using a case study and show that it can reveal unexpected sources of bias and be used to explore design-based solutions to such problems. In summary, SimSurvey can serve as a convenient, accessible and flexible platform for simulating a wide range of sampling strategies for fish stocks and other populations that show complex structuring. Various statistical approaches can then be applied to the results to test the efficacy of different analytical approaches.

]]>
<![CDATA[Managing possible serious bacterial infection of young infants where referral is not possible: Lessons from the early implementation experience in Kushtia District learning laboratory, Bangladesh]]> https://www.researchpad.co/article/elastic_article_7649 Serious infections account for 25% of global newborn deaths annually, most in low-resource settings where hospital-based treatment is not accessible or feasible. In Bangladesh, one-third of neonatal deaths are attributable to serious infection; in 2014, the government adopted new policy for outpatient management of danger signs indicating possible serious bacterial infections (PSBI) when referral was not possible. We conducted implementation research to understand what it takes for a district health team to implement quality outpatient PSBI management per national guidelines.MethodsPSBI management was introduced as part of the Comprehensive Newborn Care Package in 2015. The study piloted this package through government health systems with limited partner support to inform scale-up efforts. Data collection included facility register reviews for cases seen at primary level facilities; facility readiness and provider knowledge and skills assessments; household surveys capturing caregiver knowledge of newborn danger signs and care-seeking for newborn illness; and follow-up case tracking, capturing treatment adherence and outcomes. Analysis consisted of descriptive statistics.ResultsOver the 15-month implementation period, 1432 young infants received care, of which 649 (45%) were classified as PSBI. Estimated coverage of care-seeking increased from 22% to 42% during the implementation period. Although facility readiness and providers’ skills increased, providers’ adherence to guidelines was not optimal. Among locally managed PSBI cases, 75% completed the oral antibiotic course and 15% received the fourth day follow-up. Care-seeking remained high among private providers (95%), predominantly village health doctors (over 80%).ConclusionsFacility readiness, including health care provider knowledge and skills were strengthened; future efforts should focus on improving provider adherence to guidelines. Social and behavior change strategies targeting families and communities should explore shifting care-seeking from private, possibly less-qualified providers. Strategies to improve private sector management of PSBI cases and improved linkages between private and public sector providers could be explored. ]]> <![CDATA[Adherence to antiretroviral therapy and associated factors among Human immunodeficiency virus positive patients accessing treatment at Nekemte referral hospital, west Ethiopia, 2019]]> https://www.researchpad.co/article/elastic_article_7637 Antiretroviral therapy has a remarkable clinical effect in reducing the progress of Acquired Immune Deficiency Syndrome. The clinical outcome of Anti-Retroviral therapy depends on strict adherence. Poor adherence reduces the effectiveness of antiretroviral therapy and increases viral replication. With changes in service delivery over time and differences in socio-demographic status from region to region, it is essential to measure adherence. Therefore, this study aimed to assess adherence to antiretroviral therapy and its associated factors among HIV/AIDS patients accessing treatment at Nekemte referral hospital, West Ethiopia.MethodsInstitutional based cross-sectional study was conducted on 311 HIV/AIDS patients from March 01 to March 30, 2019. The study participants were selected by a simple random sampling method and interviewed using structured questionnaires. Bivariable logistic regression was conducted to find an association between each independent variable and adherence to antiretroviral medication. Multivariable logistic regression was used to find the independent variables which best predict adherence. The statistical significance was measured using odds ratio at a 95% confidence interval with a p-value of less than 0.05.ResultsOut of a total of 311 patients sampled, 305 were participated in the study, making a response rate of 98.07%. From these 305 study participants,73.1% (95% CI = 68.2, 78.0) were adherent to their medication. Having knowledge about HIV and its treatment (AOR = 8.24, 95% CI: 3.10, 21.92), having strong family/social support (AOR = 6.21, 95% CI: 1.39, 27.62), absence of adverse drug reaction (AOR = 5.33, 95% CI: 1.95, 14.57), absence of comorbidity of other chronic diseases (AOR = 5.72, 95% CI: 1.91, 17.16) and disclosing HIV status to the family (AOR = 5.08, 95% CI: 2.09, 12.34) were significantly associated with an increased likelihood of adherence to antiretroviral medication.ConclusionThe level of adherence to antiretroviral therapy was found low compared to WHO recommendation. The clinician should emphasize reducing adverse drug reaction, detecting and treating co-morbidities early, improving knowledge through health education, and encouraging the patients to disclose their HIV status to their families. ]]> <![CDATA[Long-term outcomes after extracorporeal membrane oxygenation in patients with dialysis-requiring acute kidney injury: A cohort study]]> https://www.researchpad.co/article/5c92b361d5eed0c4843a3f31

Background

Acute kidney injury (AKI) is a common complication of extracorporeal membrane oxygenation (ECMO) treatment. The aim of this study was to elucidate the long-term outcomes of adult patients with AKI who receive ECMO.

Materials and methods

The study analyzed encrypted datasets from Taiwan’s National Health Insurance Research Database. The data of 3251 patients who received first-time ECMO treatment between January 1, 2003, and December 31, 2013, were analyzed. Characteristics and outcomes were compared between patients who required dialysis for AKI (D-AKI) and those who did not in order to evaluate the impact of D-AKI on long-term mortality and major adverse kidney events.

Results

Of the 3251 patients, 54.1% had D-AKI. Compared with the patients without D-AKI, those with D-AKI had higher rates of all-cause mortality (52.3% vs. 33.3%; adjusted hazard ratio [aHR] 1.82, 95% confidence interval [CI] 1.53–2.17), chronic kidney disease (13.7% vs. 8.1%; adjusted subdistribution HR [aSHR] 1.66, 95% CI 1.16–2.38), and end-stage renal disease (5.2% vs. 0.5%; aSHR 14.28, 95% CI 4.67–43.62). The long-term mortality of patients who survived more than 90 days after discharge was 22.0% (153/695), 32.3% (91/282), and 50.0% (10/20) in the patients without D-AKI, with recovery D-AKI, and with nonrecovery D-AKI who required long-term dialysis, respectively, demonstrating a significant trend (Pfor trend <0.001).

Conclusion

AKI is associated with an increased risk of long-term mortality and major adverse kidney events in adult patients who receive ECMO.

]]>
<![CDATA[Specific clones of Trichomonas tenax are associated with periodontitis]]> https://www.researchpad.co/article/5c900d3bd5eed0c48407e3b6

Trichomonas tenax, an anaerobic protist difficult to cultivate with an unreliable molecular identification, has been suspected of involvement in periodontitis, a multifactorial inflammatory dental disease affecting the soft tissue and bone of periodontium. A cohort of 106 periodontitis patients classified by stages of severity and 85 healthy adult control patients was constituted. An efficient culture protocol, a new identification tool by real-time qPCR of T. tenax and a Multi-Locus Sequence Typing system (MLST) based on T. tenax NIH4 reference strain were created. Fifty-three strains of Trichomonas sp. were obtained from periodontal samples. 37/106 (34.90%) T. tenax from patients with periodontitis and 16/85 (18.80%°) T. tenax from control patients were detected by culture (p = 0.018). Sixty of the 191 samples were tested positive for T. tenax by qPCR, 24/85 (28%) controls and 36/106 (34%) periodontitis patients (p = 0.089). By combining both results, 45/106 (42.5%) patients were positive by culture and/or PCR, as compared to 24/85 (28.2%) controls (p = 0.042). A link was established between the carriage in patients of Trichomonas tenax and the severity of the disease. Genotyping demonstrates the presence of strain diversity with three major different clusters and a relation between disease strains and the periodontitis severity (p<0.05). More frequently detected in periodontal cases, T. tenax is likely to be related to the onset or/and evolution of periodontal diseases.

]]>
<![CDATA[‘In search of lost time’: Identifying the causative role of cumulative competition load and competition time-loss in professional tennis using a structural nested mean model]]> https://www.researchpad.co/article/N4f3da08e-598e-44d5-a4f3-a2c64fcebd1f

Injury prevention is critical to the achievement of peak performance in elite sport. For professional tennis players, the topic of injury prevention has gained even greater importance in recent years as multiple of the best male players have been sidelined owing to injury. Identifying potential causative factors of injury is essential for the development of effective prevention strategies, yet such research is hampered by incomplete data, the complexity of injury etiology, and observational study biases. The present study attempts to address these challenges by focusing on competition load and time-loss to competition—a completely observable risk factor and outcome—and using a structural nested mean model (SNMM) to identify the potential causal role of cumulative competition load on the risk of time-loss. Using inverse probability of treatment weights to balance exposure histories with respect to player ability, past injury, and consecutive competition weeks at each time point; the SNMM analysis of 389 professional male players and 55,773 weeks of competition found that total load significantly increases the risk of time-loss (HR = 1.05 per 1,000 games of additional load 95% CI 1.01-1.10) and this effect becomes magnified with age. Standard regression showed a protective effect of load, highlighting the value of more robust causal methods in the study of dynamic exposures and injury in sport and the need for further applications of these methods for understanding how time-loss and injuries of elite athletes might be prevented in the future.

]]>
<![CDATA[The draft mitochondrial genome of Magnolia biondii and mitochondrial phylogenomics of angiosperms]]> https://www.researchpad.co/article/N1f661d3e-d0c0-407e-92c0-bb72cd78029d

The mitochondrial genomes of flowering plants are well known for their large size, variable coding-gene set and fluid genome structure. The available mitochondrial genomes of the early angiosperms show extreme genetic diversity in genome size, structure, and sequences, such as rampant HGTs in Amborella mt genome, numerous repeated sequences in Nymphaea mt genome, and conserved gene evolution in Liriodendron mt genome. However, currently available early angiosperm mt genomes are still limited, hampering us from obtaining an overall picture of the mitogenomic evolution in angiosperms. Here we sequenced and assembled the draft mitochondrial genome of Magnolia biondii Pamp. from Magnoliaceae (magnoliids) using Oxford Nanopore sequencing technology. We recovered a single linear mitochondrial contig of 967,100 bp with an average read coverage of 122 × and a GC content of 46.6%. This draft mitochondrial genome contains a rich 64-gene set, similar to those of Liriodendron and Nymphaea, including 41 protein-coding genes, 20 tRNAs, and 3 rRNAs. Twenty cis-spliced and five trans-spliced introns break ten protein-coding genes in the Magnolia mt genome. Repeated sequences account for 27% of the draft genome, with 17 out of the 1,145 repeats showing recombination evidence. Although partially assembled, the approximately 1-Mb mt genome of Magnolia is still among the largest in angiosperms, which is possibly due to the expansion of repeated sequences, retention of ancestral mtDNAs, and the incorporation of nuclear genome sequences. Mitochondrial phylogenomic analysis of the concatenated datasets of 38 conserved protein-coding genes from 91 representatives of angiosperm species supports the sister relationship of magnoliids with monocots and eudicots, which is congruent with plastid evidence.

]]>
<![CDATA["Clicks, likes, shares and comments" a systematic review of breast cancer screening discourse in social media]]> https://www.researchpad.co/article/N8d8d3073-6769-4a60-aed8-e2beb958c228

Background

Unsatisfactory participation rate at population based organised breast cancer screening is a long standing problem. Social media, with 3.2 billion users in 2019, is potentially an important site of breast cancer related discourse. Determining whether these platforms might be used as channels by screening providers to reach under-screened women may have considerable public health significance.

Objectives

By systematically reviewing original research studies on breast cancer related social media discourse, we had two aims: first, to assess the volume, participants and content of breast screening social media communication and second, to find out whether social media can be used by screening organisers as a channel of patient education.

Methods

We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). After searching PubMed, ScienceDirect, Web of Science, Springer and Ebsco, 17 studies were found that met our criteria. A systematic narrative framework was used for data synthesis. Owing to the high degree of heterogeneity in social media channels, outcomes and measurement included in this study, a meta-analytic approach was not appropriate.

Results

The volume of breast cancer related social media discourse is considerable. The majority of participants are lay individuals as opposed to healthcare professionals or advocacy groups. The lay misunderstandings surrounding the harms and benefits of mammography is well mirrored in the content of social media discourse. Although there is criticism, breast cancer screening sentiment on the social media ranges from the neutral to the positive. Social media is suitable for offering peer emotional support for potential participants.

Conclusion

Dedicated breast screening websites operated by screening organisers would ensure much needed quality controlled information and also provide space for reliable question and answer forums, the sharing of personal experience and the provision of peer and professional support.

]]>
<![CDATA[Dysregulated biodynamics in metabolic attractor systems precede the emergence of amyotrophic lateral sclerosis]]> https://www.researchpad.co/article/Nd64c8bc4-d849-4cf6-88a9-792b4ee4d346

Evolutionarily conserved mechanisms maintain homeostasis of essential elements, and are believed to be highly time-variant. However, current approaches measure elemental biomarkers at a few discrete time-points, ignoring complex higher-order dynamical features. To study dynamical properties of elemental homeostasis, we apply laser ablation inductively-coupled plasma mass spectrometry (LA-ICP-MS) to tooth samples to generate 500 temporally sequential measurements of elemental concentrations from birth to 10 years. We applied dynamical system and Information Theory-based analyses to reveal the longest-known attractor system in mammalian biology underlying the metabolism of nutrient elements, and identify distinct and consistent transitions between stable and unstable states throughout development. Extending these dynamical features to disease prediction, we find that attractor topography of nutrient metabolism is altered in amyotrophic lateral sclerosis (ALS), as early as childhood, suggesting these pathways are involved in disease risk. Mechanistic analysis was undertaken in a transgenic mouse model of ALS, where we find similar marked disruptions in elemental attractor systems as in humans. Our results demonstrate the application of a phenomological analysis of dynamical systems underlying elemental metabolism, and emphasize the utility of these measures in characterizing risk of disease.

]]>
<![CDATA[Transcriptomic analysis of polyketide synthases in a highly ciguatoxic dinoflagellate, Gambierdiscus polynesiensis and low toxicity Gambierdiscus pacificus, from French Polynesia]]> https://www.researchpad.co/article/Nca210627-69b7-4a50-96ce-ecb4ce1a2ae1

Marine dinoflagellates produce a diversity of polyketide toxins that are accumulated in marine food webs and are responsible for a variety of seafood poisonings. Reef-associated dinoflagellates of the genus Gambierdiscus produce toxins responsible for ciguatera poisoning (CP), which causes over 50,000 cases of illness annually worldwide. The biosynthetic machinery for dinoflagellate polyketides remains poorly understood. Recent transcriptomic and genomic sequencing projects have revealed the presence of Type I modular polyketide synthases in dinoflagellates, as well as a plethora of single domain transcripts with Type I sequence homology. The current transcriptome analysis compares polyketide synthase (PKS) gene transcripts expressed in two species of Gambierdiscus from French Polynesia: a highly toxic ciguatoxin producer, G. polynesiensis, versus a non-ciguatoxic species G. pacificus, each assembled from approximately 180 million Illumina 125 nt reads using Trinity, and compares their PKS content with previously published data from other Gambierdiscus species and more distantly related dinoflagellates. Both modular and single-domain PKS transcripts were present. Single domain β-ketoacyl synthase (KS) transcripts were highly amplified in both species (98 in G. polynesiensis, 99 in G. pacificus), with smaller numbers of standalone acyl transferase (AT), ketoacyl reductase (KR), dehydratase (DH), enoyl reductase (ER), and thioesterase (TE) domains. G. polynesiensis expressed both a larger number of multidomain PKSs, and larger numbers of modules per transcript, than the non-ciguatoxic G. pacificus. The largest PKS transcript in G. polynesiensis encoded a 10,516 aa, 7 module protein, predicted to synthesize part of the polyether backbone. Transcripts and gene models representing portions of this PKS are present in other species, suggesting that its function may be performed in those species by multiple interacting proteins. This study contributes to the building consensus that dinoflagellates utilize a combination of Type I modular and single domain PKS proteins, in an as yet undefined manner, to synthesize polyketides.

]]>
<![CDATA[Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk]]> https://www.researchpad.co/article/N8ae86a5e-90c1-41b1-ba31-58a5a939e3bc

Terminal duct lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses’ Health Study. A set of 92 WSIs was annotated for acini, TDLUs and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73±0.07, and segmented TDLUs and adipose tissue with Dice scores of 0.84±0.13 and 0.87±0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 0.81 and 0.73, respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80 for number of TDLUs per tissue area, 0.57 for median TDLU span, and 0.80 for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status. We developed a computational pathology method to measure TDLU involution. This technology eliminates the labor-intensiveness and subjectivity of manual TDLU assessment, and can be applied to future breast cancer risk studies.

]]>
<![CDATA[Predicting 30-day hospital readmissions using artificial neural networks with medical code embedding]]> https://www.researchpad.co/article/N1f40719a-4631-45e6-bedb-5cf8a42ecf53

Reducing unplanned readmissions is a major focus of current hospital quality efforts. In order to avoid unfair penalization, administrators and policymakers use prediction models to adjust for the performance of hospitals from healthcare claims data. Regression-based models are a commonly utilized method for such risk-standardization across hospitals; however, these models often suffer in accuracy. In this study we, compare four prediction models for unplanned patient readmission for patients hospitalized with acute myocardial infarction (AMI), congestive health failure (HF), and pneumonia (PNA) within the Nationwide Readmissions Database in 2014. We evaluated hierarchical logistic regression and compared its performance with gradient boosting and two models that utilize artificial neural networks. We show that unsupervised Global Vector for Word Representations embedding representations of administrative claims data combined with artificial neural network classification models improves prediction of 30-day readmission. Our best models increased the AUC for prediction of 30-day readmissions from 0.68 to 0.72 for AMI, 0.60 to 0.64 for HF, and 0.63 to 0.68 for PNA compared to hierarchical logistic regression. Furthermore, risk-standardized hospital readmission rates calculated from our artificial neural network model that employed embeddings led to reclassification of approximately 10% of hospitals across categories of hospital performance. This finding suggests that prediction models that incorporate new methods classify hospitals differently than traditional regression-based approaches and that their role in assessing hospital performance warrants further investigation.

]]>
<![CDATA[Chalcone synthase (CHS) family members analysis from eggplant (Solanum melongena L.) in the flavonoid biosynthetic pathway and expression patterns in response to heat stress]]> https://www.researchpad.co/article/N0c4703df-5c43-4557-a077-ba839b092c8d

Enzymes of the chalcone synthase (CHS) family participate in the synthesis of multiple secondary metabolites in plants, fungi and bacteria. CHS showed a significant correlation with the accumulation patterns of anthocyanin. The peel color, which is primarily determined by the content of anthocyanin, is an economically important trait for eggplants that is affected by heat stress. A total of 7 CHS (SmCHS1-7) putative genes were identified in a genome-wide analysis of eggplants (S. melongena L.). The SmCHS genes were distributed on 7 scaffolds and were classified into 3 clusters. Phylogenetic relationship analysis showed that 73 CHS genes from 7 Solanaceae species were classified into 10 groups. SmCHS5, SmCHS6 and SmCHS7 were continuously down-regulated under 38°C and 45°C treatment, while SmCHS4 was up-regulated under 38°C but showed little change at 45°C in peel. Expression profiles of key anthocyanin biosynthesis gene families showed that the PAL, 4CL and AN11 genes were primarily expressed in all five tissues. The CHI, F3H, F3’5’H, DFR, 3GT and bHLH1 genes were expressed in flower and peel. Under heat stress, the expression level of 52 key genes were reduced. In contrast, the expression patterns of eight key genes similar to SmCHS4 were up-regulated at a treatment of 38°C for 3 hour. Comparative analysis of putative CHS protein evolutionary relationships, cis-regulatory elements, and regulatory networks indicated that SmCHS gene family has a conserved gene structure and functional diversification. SmCHS showed two or more expression patterns, these results of this study may facilitate further research to understand the regulatory mechanism governing peel color in eggplants.

]]>
<![CDATA[LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data]]> https://www.researchpad.co/article/Na533cb35-b26a-447b-bd62-8e125a165db4

Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.

]]>