ResearchPad - artificial-intelligence https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis]]> https://www.researchpad.co/article/elastic_article_13837 In single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.

]]>
<![CDATA[Insight into the protein solubility driving forces with neural attention]]> https://www.researchpad.co/article/elastic_article_13832 The solubility of proteins is a crucial biophysical aspect when it comes to understanding many human diseases and to improve the industrial processes for protein production. Due to its relevance, computational methods have been devised in order to study and possibly optimize the solubility of proteins. In this work we apply a deep-learning technique, called neural attention to predict protein solubility while “opening” the model itself to interpretability, even though Machine Learning models are usually considered black boxes. Thank to the attention mechanism, we show that i) our model implicitly learns complex patterns related to emergent, protein folding-related, aspects such as to recognize β-amyloidosis regions and that ii) the N-and C-termini are the regions with the highes signal fro solubility prediction. When it comes to enhancing the solubility of proteins, we, for the first time, propose to investigate the synergistic effects of tandem mutations instead of “single” mutations, suggesting that this could minimize the number of required proposed mutations.

]]>
<![CDATA[Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019]]> https://www.researchpad.co/article/elastic_article_13811 The identification of statistical models for the accurate forecast and timely determination of the outbreak of infectious diseases is very important for the healthcare system. Thus, this study was conducted to assess and compare the performance of four machine-learning methods in modeling and forecasting brucellosis time series data based on climatic parameters.MethodsIn this cohort study, human brucellosis cases and climatic parameters were analyzed on a monthly basis for the Qazvin province–located in northwestern Iran- over a period of 9 years (2010–2018). The data were classified into two subsets of education (80%) and testing (20%). Artificial neural network methods (radial basis function and multilayer perceptron), support vector machine and random forest were fitted to each set. Performance analysis of the models were done using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Root Error (MARE), and R2 criteria.ResultsThe incidence rate of the brucellosis in Qazvin province was 27.43 per 100,000 during 2010–2019. Based on our results, the values of the RMSE (0.22), MAE (0.175), MARE (0.007) criteria were smaller for the multilayer perceptron neural network than their values in the other three models. Moreover, the R2 (0.99) value was bigger in this model. Therefore, the multilayer perceptron neural network exhibited better performance in forecasting the studied data. The average wind speed and mean temperature were the most effective climatic parameters in the incidence of this disease.ConclusionsThe multilayer perceptron neural network can be used as an effective method in detecting the behavioral trend of brucellosis over time. Nevertheless, further studies focusing on the application and comparison of these methods are needed to detect the most appropriate forecast method for this disease. ]]> <![CDATA[Any unique image biomarkers associated with COVID-19?]]> https://www.researchpad.co/article/elastic_article_13335 To define the uniqueness of chest CT infiltrative features associated with COVID-19 image characteristics as potential diagnostic biomarkers.MethodsWe retrospectively collected chest CT exams including n = 498 on 151 unique patients RT-PCR positive for COVID-19 and n = 497 unique patients with community-acquired pneumonia (CAP). Both COVID-19 and CAP image sets were partitioned into three groups for training, validation, and testing respectively. In an attempt to discriminate COVID-19 from CAP, we developed several classifiers based on three-dimensional (3D) convolutional neural networks (CNNs). We also asked two experienced radiologists to visually interpret the testing set and discriminate COVID-19 from CAP. The classification performance of the computer algorithms and the radiologists was assessed using the receiver operating characteristic (ROC) analysis, and the nonparametric approaches with multiplicity adjustments when necessary.ResultsOne of the considered models showed non-trivial, but moderate diagnostic ability overall (AUC of 0.70 with 99% CI 0.56–0.85). This model allowed for the identification of 8–50% of CAP patients with only 2% of COVID-19 patients.ConclusionsProfessional or automated interpretation of CT exams has a moderately low ability to distinguish between COVID-19 and CAP cases. However, the automated image analysis is promising for targeted decision-making due to being able to accurately identify a sizable subsect of non-COVID-19 cases.Key Points • Both human experts and artificial intelligent models were used to classify the CT scans. • ROC analysis and the nonparametric approaches were used to analyze the performance of the radiologists and computer algorithms. • Unique image features or patterns may not exist for reliably distinguishing all COVID-19 from CAP; however, there may be imaging markers that can identify a sizable subset of non-COVID-19 cases. ]]> <![CDATA[A model for the assessment of bluetongue virus serotype 1 persistence in Spain]]> https://www.researchpad.co/article/elastic_article_11225 Bluetongue virus (BTV) is an arbovirus of ruminants that has been circulating in Europe continuously for more than two decades and has become endemic in some countries such as Spain. Spain is ideal for BTV epidemiological studies since BTV outbreaks from different sources and serotypes have occurred continuously there since 2000; BTV-1 has been reported there from 2007 to 2017. Here we develop a model for BTV-1 endemic scenario to estimate the risk of an area becoming endemic, as well as to identify the most influential factors for BTV-1 persistence. We created abundance maps at 1-km2 spatial resolution for the main vectors in Spain, Culicoides imicola and Obsoletus and Pulicaris complexes, by combining environmental satellite data with occurrence models and a random forest machine learning algorithm. The endemic model included vector abundance and host-related variables (farm density). The three most relevant variables in the endemic model were the abundance of C. imicola and Obsoletus complex and density of goat farms (AUC 0.86); this model suggests that BTV-1 is more likely to become endemic in central and southwestern regions of Spain. It only requires host- and vector-related variables to identify areas at greater risk of becoming endemic for bluetongue. Our results highlight the importance of suitable Culicoides spp. prediction maps for bluetongue epidemiological studies and decision-making about control and eradication measures.

]]>
<![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Medusa: Software to build and analyze ensembles of genome-scale metabolic network reconstructions]]> https://www.researchpad.co/article/elastic_article_7734 Uncertainty in the structure and parameters of networks is ubiquitous across computational biology. In constraint-based reconstruction and analysis of metabolic networks, this uncertainty is present both during the reconstruction of networks and in simulations performed with them. Here, we present Medusa, a Python package for the generation and analysis of ensembles of genome-scale metabolic network reconstructions. Medusa builds on the COBRApy package for constraint-based reconstruction and analysis by compressing a set of models into a compact ensemble object, providing functions for the generation of ensembles using experimental data, and extending constraint-based analyses to ensemble scale. We demonstrate how Medusa can be used to generate ensembles and perform ensemble simulations, and how machine learning can be used in conjunction with Medusa to guide the curation of genome-scale metabolic network reconstructions. Medusa is available under the permissive MIT license from the Python Packaging Index (https://pypi.org) and from github (https://github.com/opencobra/Medusa), and comprehensive documentation is available at https://medusa.readthedocs.io/en/latest.

]]>
<![CDATA[FaceLift: a transparent deep learning framework to beautify urban scenes]]> https://www.researchpad.co/article/N5fd42e94-d295-4df2-a0e6-a8f34580eb5a

In the area of computer vision, deep learning techniques have recently been used to predict whether urban scenes are likely to be considered beautiful: it turns out that these techniques are able to make accurate predictions. Yet they fall short when it comes to generating actionable insights for urban design. To support urban interventions, one needs to go beyond predicting beauty, and tackle the challenge of recreating beauty. Unfortunately, deep learning techniques have not been designed with that challenge in mind. Given their ‘black-box nature’, these models cannot be directly used to explain why a particular urban scene is deemed to be beautiful. To partly fix that, we propose a deep learning framework (which we name FaceLift1) that is able to both beautify existing urban scenes (Google Street Views) and explain which urban elements make those transformed scenes beautiful. To quantitatively evaluate our framework, we cannot resort to any existing metric (as the research problem at hand has never been tackled before) and need to formulate new ones. These new metrics should ideally capture the presence (or absence) of elements that make urban spaces great. Upon a review of the urban planning literature, we identify five main metrics: walkability, green spaces, openness, landmarks and visual complexity. We find that, across all the five metrics, the beautified scenes meet the expectations set by the literature on what great spaces tend to be made of. This result is further confirmed by a 20-participant expert survey in which FaceLift has been found to be effective in promoting citizen participation. All this suggests that, in the future, as our framework’s components are further researched and become better and more sophisticated, it is not hard to imagine technologies that will be able to accurately and efficiently support architects and planners in the design of the spaces we intuitively love.

]]>
<![CDATA[Reconciling periodic rhythms of large-scale biological networks by optimal control]]> https://www.researchpad.co/article/Nd3fd2fe7-1722-490f-9f77-9cf9436cd0cd

Periodic rhythms are ubiquitous phenomena that illuminate the underlying mechanism of cyclic activities in biological systems, which can be represented by cyclic attractors of the related biological network. Disorders of periodic rhythms are detrimental to the natural behaviours of living organisms. Previous studies have shown that the state transition from one to another attractor can be accomplished by regulating external signals. However, most of these studies until now have mainly focused on point attractors while ignoring cyclic ones. The aim of this study is to investigate an approach for reconciling abnormal periodic rhythms, such as diminished circadian amplitude and phase delay, to the regular rhythms of complex biological networks. For this purpose, we formulate and solve a mixed-integer nonlinear dynamic optimization problem simultaneously to identify regulation variables and to determine optimal control strategies for state transition and adjustment of periodic rhythms. Numerical experiments are implemented in three examples including a chaotic system, a mammalian circadian rhythm system and a gastric cancer gene regulatory network. The results show that regulating a small number of biochemical molecules in the network is sufficient to successfully drive the system to the target cyclic attractor by implementing an optimal control strategy.

]]>
<![CDATA[Intermediacy of publications]]> https://www.researchpad.co/article/Nff3da153-262d-4273-9b64-46b5cf2760ab

Citation networks of scientific publications offer fundamental insights into the structure and development of scientific knowledge. We propose a new measure, called intermediacy, for tracing the historical development of scientific knowledge. Given two publications, an older and a more recent one, intermediacy identifies publications that seem to play a major role in the historical development from the older to the more recent publication. The identified publications are important in connecting the older and the more recent publication in the citation network. After providing a formal definition of intermediacy, we study its mathematical properties. We then present two empirical case studies, one tracing historical developments at the interface between the community detection literature and the scientometric literature and one examining the development of the literature on peer review. We show both conceptually and empirically how intermediacy differs from main path analysis, which is the most popular approach for tracing historical developments in citation networks. Main path analysis tends to favour longer paths over shorter ones, whereas intermediacy has the opposite tendency. Compared to the main path analysis, we conclude that intermediacy offers a more principled approach for tracing the historical development of scientific knowledge.

]]>
<![CDATA[Dealing with uncertainty in agent-based models for short-term predictions]]> https://www.researchpad.co/article/Nb7fb9af3-6a06-4655-b0d5-864323a6b15d

Agent-based models (ABMs) are gaining traction as one of the most powerful modelling tools within the social sciences. They are particularly suited to simulating complex systems. Despite many methodological advances within ABM, one of the major drawbacks is their inability to incorporate real-time data to make accurate short-term predictions. This paper presents an approach that allows ABMs to be dynamically optimized. Through a combination of parameter calibration and data assimilation (DA), the accuracy of model-based predictions using ABM in real time is increased. We use the exemplar of a bus route system to explore these methods. The bus route ABMs developed in this research are examples of ABMs that can be dynamically optimized by a combination of parameter calibration and DA. The proposed model and framework is a novel and transferable approach that can be used in any passenger information system, or in an intelligent transport systems to provide forecasts of bus locations and arrival times.

]]>
<![CDATA[Predicting 30-day hospital readmissions using artificial neural networks with medical code embedding]]> https://www.researchpad.co/article/N1f40719a-4631-45e6-bedb-5cf8a42ecf53

Reducing unplanned readmissions is a major focus of current hospital quality efforts. In order to avoid unfair penalization, administrators and policymakers use prediction models to adjust for the performance of hospitals from healthcare claims data. Regression-based models are a commonly utilized method for such risk-standardization across hospitals; however, these models often suffer in accuracy. In this study we, compare four prediction models for unplanned patient readmission for patients hospitalized with acute myocardial infarction (AMI), congestive health failure (HF), and pneumonia (PNA) within the Nationwide Readmissions Database in 2014. We evaluated hierarchical logistic regression and compared its performance with gradient boosting and two models that utilize artificial neural networks. We show that unsupervised Global Vector for Word Representations embedding representations of administrative claims data combined with artificial neural network classification models improves prediction of 30-day readmission. Our best models increased the AUC for prediction of 30-day readmissions from 0.68 to 0.72 for AMI, 0.60 to 0.64 for HF, and 0.63 to 0.68 for PNA compared to hierarchical logistic regression. Furthermore, risk-standardized hospital readmission rates calculated from our artificial neural network model that employed embeddings led to reclassification of approximately 10% of hospitals across categories of hospital performance. This finding suggests that prediction models that incorporate new methods classify hospitals differently than traditional regression-based approaches and that their role in assessing hospital performance warrants further investigation.

]]>
<![CDATA[LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data]]> https://www.researchpad.co/article/Na533cb35-b26a-447b-bd62-8e125a165db4

Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.

]]>
<![CDATA[A compound attributes-based predictive model for drug induced liver injury in humans]]> https://www.researchpad.co/article/Ndeb57c49-a1cc-41d4-9618-08dc56c45dac

Drug induced liver injury (DILI) is one of the key safety concerns in drug development. To assess the likelihood of drug candidates with potential adverse reactions of liver, we propose a compound attributes-based approach to predicting hepatobiliary disorders that are routinely reported to US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). Specifically, we developed a support vector machine (SVM) model with recursive feature extraction, based on physicochemical and structural properties of compounds as model input. Cross validation demonstrates that the predictive model has a robust performance with averaged 70% of both sensitivity and specificity over 500 trials. An independent validation was performed on public benchmark drugs and the results suggest potential utility of our model for identifying safety alerts. This in silico approach, upon further validation, would ultimately be implemented, together with other in vitro safety assays, for screening compounds early in drug development.

]]>
<![CDATA[Early warning of some notifiable infectious diseases in China by the artificial neural network]]> https://www.researchpad.co/article/Nf6c3af52-397d-45c0-99c9-48bcd25792d4

In order to accurately grasp the timing for the prevention and control of diseases, we established an artificial neural network model to issue early warning signals. The real-time recurrent learning (RTRL) and extended Kalman filter (EKF) methods were performed to analyse four types of respiratory infectious diseases and four types of digestive tract infectious diseases in China to comprehensively determine the epidemic intensities and whether to issue early warning signals. The numbers of new confirmed cases per month between January 2004 and December 2017 were used as the training set; the data from 2018 were used as the test set. The results of RTRL showed that the number of new confirmed cases of respiratory infectious diseases in September 2018 increased abnormally. The results of the EKF showed that the number of new confirmed cases of respiratory infectious diseases increased abnormally in January and February of 2018. The results of these two algorithms showed that the number of new confirmed cases of digestive tract infectious diseases in the test set did not have any abnormal increases. The neural network and machine learning can further enrich and develop the early warning theory.

]]>
<![CDATA[Fluid–structure interaction simulations outperform computational fluid dynamics in the description of thoracic aorta haemodynamics and in the differentiation of progressive dilation in Marfan syndrome patients]]> https://www.researchpad.co/article/N35b6edf0-5fe1-4fe2-8f14-835eea74ba8a

Abnormal fluid dynamics at the ascending aorta may be at the origin of aortic aneurysms. This study was aimed at comparing the performance of computational fluid dynamics (CFD) and fluid–structure interaction (FSI) simulations against four-dimensional (4D) flow magnetic resonance imaging (MRI) data; and to assess the capacity of advanced fluid dynamics markers to stratify aneurysm progression risk. Eight Marfan syndrome (MFS) patients, four with stable and four with dilating aneurysms of the proximal aorta, and four healthy controls were studied. FSI and CFD simulations were performed with MRI-derived geometry, inlet velocity field and Young's modulus. Flow displacement, jet angle and maximum velocity evaluated from FSI and CFD simulations were compared to 4D flow MRI data. A dimensionless parameter, the shear stress ratio (SSR), was evaluated from FSI and CFD simulations and assessed as potential correlate of aneurysm progression. FSI simulations successfully matched MRI data regarding descending to ascending aorta flow rates (R2 = 0.92) and pulse wave velocity (R2 = 0.99). Compared to CFD, FSI simulations showed significantly lower percentage errors in ascending and descending aorta in flow displacement (−46% ascending, −41% descending), jet angle (−28% ascending, −50% descending) and maximum velocity (−37% ascending, −34% descending) with respect to 4D flow MRI. FSI- but not CFD-derived SSR differentiated between stable and dilating MFS patients. Fluid dynamic simulations of the thoracic aorta require fluid–solid interaction to properly reproduce complex haemodynamics. FSI- but not CFD-derived SSR could help stratifying MFS patients.

]]>
<![CDATA[Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality]]> https://www.researchpad.co/article/Nbf30117c-7bf3-4987-ad3a-597177b037e8

The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.

]]>
<![CDATA[Neuroimaging modality fusion in Alzheimer’s classification using convolutional neural networks]]> https://www.researchpad.co/article/N4bce0426-e39d-45a0-9dc9-42db4f6cba04

Automated methods for Alzheimer’s disease (AD) classification have the potential for great clinical benefits and may provide insight for combating the disease. Machine learning, and more specifically deep neural networks, have been shown to have great efficacy in this domain. These algorithms often use neurological imaging data such as MRI and FDG PET, but a comprehensive and balanced comparison of the MRI and amyloid PET modalities has not been performed. In order to accurately determine the relative strength of each imaging variant, this work performs a comparison study in the context of Alzheimer’s dementia classification using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with identical neural network architectures. Furthermore, this work analyzes the benefits of using both modalities in a fusion setting and discusses how these data types may be leveraged in future AD studies using deep learning.

]]>
<![CDATA[The Tumor Target Segmentation of Nasopharyngeal Cancer in CT Images Based on Deep Learning Methods]]> https://www.researchpad.co/article/Nce691920-3d51-4e3c-8252-d1d422007d89

Radiotherapy is the main treatment strategy for nasopharyngeal carcinoma. A major factor affecting radiotherapy outcome is the accuracy of target delineation. Target delineation is time-consuming, and the results can vary depending on the experience of the oncologist. Using deep learning methods to automate target delineation may increase its efficiency. We used a modified deep learning model called U-Net to automatically segment and delineate tumor targets in patients with nasopharyngeal carcinoma. Patients were randomly divided into a training set (302 patients), validation set (100 patients), and test set (100 patients). The U-Net model was trained using labeled computed tomography images from the training set. The U-Net was able to delineate nasopharyngeal carcinoma tumors with an overall dice similarity coefficient of 65.86% for lymph nodes and 74.00% for primary tumor, with respective Hausdorff distances of 32.10 and 12.85 mm. Delineation accuracy decreased with increasing cancer stage. Automatic delineation took approximately 2.6 hours, compared to 3 hours, using an entirely manual procedure. Deep learning models can therefore improve accuracy, consistency, and efficiency of target delineation in T stage, but additional physician input may be required for lymph nodes.

]]>
<![CDATA[Exploit fully automatic low-level segmented PET data for training high-level deep learning algorithms for the corresponding CT data]]> https://www.researchpad.co/article/5c8823d0d5eed0c484639091

We present an approach for fully automatic urinary bladder segmentation in CT images with artificial neural networks in this study. Automatic medical image analysis has become an invaluable tool in the different treatment stages of diseases. Especially medical image segmentation plays a vital role, since segmentation is often the initial step in an image analysis pipeline. Since deep neural networks have made a large impact on the field of image processing in the past years, we use two different deep learning architectures to segment the urinary bladder. Both of these architectures are based on pre-trained classification networks that are adapted to perform semantic segmentation. Since deep neural networks require a large amount of training data, specifically images and corresponding ground truth labels, we furthermore propose a method to generate such a suitable training data set from Positron Emission Tomography/Computed Tomography image data. This is done by applying thresholding to the Positron Emission Tomography data for obtaining a ground truth and by utilizing data augmentation to enlarge the dataset. In this study, we discuss the influence of data augmentation on the segmentation results, and compare and evaluate the proposed architectures in terms of qualitative and quantitative segmentation performance. The results presented in this study allow concluding that deep neural networks can be considered a promising approach to segment the urinary bladder in CT images.

]]>