ResearchPad - data-science https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Individual behavioral type captured by a Bayesian model comparison of cap making by sponge crabs]]> https://www.researchpad.co/article/elastic_article_8336 ‘Animal personality’ is considered to be developed through complex interactions of an individual with its surrounding environment. How can we quantify the ‘personality’ of an individual? Quantifying intra- and inter-individual variability of behavior, or individual behavioral type, appears to be a prerequisite in the study of animal personality. We propose a statistical method from a predictive point of view to measure the appropriateness of our assumption of ‘individual’ behavior in repeatedly measured behavioral data from several individuals. For a model case, we studied the sponge crab Lauridromia dehaani known to make and carry a ‘cap’ from a natural sponge for camouflage. Because a cap is most likely to be rebuilt and replaced repeatedly, we hypothesized that each individual crab would grow a unique behavioral type and it would be observed under an experimentally controlled environmental condition. To test the hypothesis, we conducted behavioral experiments and employed a new Bayesian model-based comparison method to examine whether crabs have individual behavioral types in the cap making behavior. Crabs were given behavioral choices by using artificial sponges of three different sizes. We modeled the choice of sponges, size of the trimmed part of a cap, size of the cavity of a cap, and the latency to produce a cap, as random variables in 26 models, including hierarchical models specifying the behavioral types. In addition, we calculated the marginal-level widely applicable information criterion (mWAIC) values for hierarchical models to evaluate and compared them with the non-hierarchical models from the predictive point of view. As a result, the crabs of less than about 9 cm in size were found to make caps from the sponges. The body size explained the behavioral variables namely, choice, trimmed cap characteristics, and cavity size, but not latency. Furthermore, we captured the behavioral type as a probabilistic distribution structure of the behavioral data by comparing WAIC. Our statistical approach is not limited to behavioral data but is also applicable to physiological or morphological data when examining whether some group structure exists behind fluctuating empirical data.

]]>
<![CDATA[Comparison of mobile and clinical EEG sensors through resting state simultaneous data collection]]> https://www.researchpad.co/article/Na9ad671e-9020-41e5-9873-c25142c66f31 Development of mobile sensors brings new opportunities to medical research. In particular, mobile electroencephalography (EEG) devices can be potentially used in low cost screening for epilepsy and other neurological and psychiatric disorders. The necessary condition for such applications is thoughtful validation in the specific medical context. As part of validation and quality assurance, we developed a computer-based analysis pipeline, which aims to compare the EEG signal acquired by a mobile EEG device to the one collected by a medically approved clinical-grade EEG device. Both signals are recorded simultaneously during 30 min long sessions in resting state. The data are collected from 22 patients with epileptiform abnormalities in EEG. In order to compare two multichannel EEG signals with differently placed references and electrodes, a novel data processing pipeline is proposed. It allows deriving matching pairs of time series which are suitable for similarity assessment through Pearson correlation. The average correlation of 0.64 is achieved on a test dataset, which can be considered a promising result, taking the positions shift due to the simultaneous electrode placement into account.

]]>
<![CDATA[RNA binding proteins involved in regulation of protein synthesis to initiate biogenesis of secondary tumor in hepatocellular carcinoma in mice]]> https://www.researchpad.co/article/Ncfae7dcf-9371-483c-b88b-8aff35c6389a

Background

The tumor microenvironment (TM) in close contact with cancer cells is highly related to tumor growth and cancer metastasis. This study is to explore the biogenesis mechanism of a secondary hepatocellular carcinoma (HCC) based on the function of RNA binding proteins (RBPs)-encoding genes in the physiological microenvironment (PM).

Methods

The healthy and HCC mice were used to isolate the PM, pre-tumor microenvironment (PTM), and TM. The samples were analyzed using the technology of RNA-seq and bioinformatics. The differentially expressed RBPs-encoding genes (DERs) and differentially expressed DERs-associated genes (DEDs) were screened to undergo GO and KEGG analysis.

Results

18 DERs and DEDs were identified in the PTM vs. PM, 87 in the TM vs. PTM, and 87 in the TM vs. PM. Those DERs and DEDs participated in the regulation of gene expression at the levels of chromatin conformation, gene activation and silencing, splicing and degradation of mRNA, biogenesis of piRNA and miRNA, ribosome assemble, and translation of proteins.

Conclusion

The genes encoding RBPs and the relevant genes are involved in the transformation from PM to PTM, then constructing the TM by regulating protein synthesis. This regulation included whole process of biological genetic information transmission from chromatin conformation to gene activation and silencing to mRNA splicing to ribosome assemble to translation of proteins and degradation of mRNA. The abnormality of those functions in the organic microenvironments promoted the metastasis of HCC and initiated the biogenesis of a secondary HCC in a PM when the PM encountered the invasion of cancer cells.

]]>
<![CDATA[Prognostic analysis of histopathological images using pre-trained convolutional neural networks: application to hepatocellular carcinoma]]> https://www.researchpad.co/article/N7c23648b-97a0-433c-b24f-48e8e5e7a36b

Histopathological images contain rich phenotypic descriptions of the molecular processes underlying disease progression. Convolutional neural networks, state-of-the-art image analysis techniques in computer vision, automatically learn representative features from such images which can be useful for disease diagnosis, prognosis, and subtyping. Hepatocellular carcinoma (HCC) is the sixth most common type of primary liver malignancy. Despite the high mortality rate of HCC, little previous work has made use of CNN models to explore the use of histopathological images for prognosis and clinical survival prediction of HCC. We applied three pre-trained CNN models—VGG 16, Inception V3 and ResNet 50—to extract features from HCC histopathological images. Sample visualization and classification analyses based on these features showed a very clear separation between cancer and normal samples. In a univariate Cox regression analysis, 21.4% and 16% of image features on average were significantly associated with overall survival (OS) and disease-free survival (DFS), respectively. We also observed significant correlations between these features and integrated biological pathways derived from gene expression and copy number variation. Using an elastic net regularized Cox Proportional Hazards model of OS constructed from Inception image features, we obtained a concordance index (C-index) of 0.789 and a significant log-rank test (p = 7.6E−18). We also performed unsupervised classification to identify HCC subgroups from image features. The optimal two subgroups discovered using Inception model image features showed significant differences in both overall (C-index = 0.628 and p = 7.39E−07) and DFS (C-index = 0.558 and p = 0.012). Our work demonstrates the utility of extracting image features using pre-trained models by using them to build accurate prognostic models of HCC as well as highlight significant correlations between these features, clinical survival, and relevant biological pathways. Image features extracted from HCC histopathological images using the pre-trained CNN models VGG 16, Inception V3 and ResNet 50 can accurately distinguish normal and cancer samples. Furthermore, these image features are significantly correlated with survival and relevant biological pathways.

]]>
<![CDATA[Speeding up training of automated bird recognizers by data reduction of audio features]]> https://www.researchpad.co/article/N40f36632-3f00-4944-a631-cf570fa0d134

Automated acoustic recognition of birds is considered an important technology in support of biodiversity monitoring and biodiversity conservation activities. These activities require processing large amounts of soundscape recordings. Typically, recordings are transformed to a number of acoustic features, and a machine learning method is used to build models and recognize the sound events of interest. The main problem is the scalability of data processing, either for developing models or for processing recordings made over long time periods. In those cases, the processing time and resources required might become prohibitive for the average user. To address this problem, we evaluated the applicability of three data reduction methods. These methods were applied to a series of acoustic feature vectors as an additional postprocessing step, which aims to reduce the computational demand during training. The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance.

]]>
<![CDATA[Rapid digitization to reclaim thematic maps of white-tailed deer density from 1982 and 2003 in the conterminous US]]> https://www.researchpad.co/article/Nfd5ff598-716d-4aab-b712-e3eabb28e781

Background

Despite availability of valuable ecological data in published thematic maps, manual methods to transfer published maps to a more accessible digital format are time-intensive. Application of object-based image analysis makes digitization faster.

Methods

Using object-based image analysis followed by random forests classification, we rapidly digitized choropleth maps of white-tailed deer (Odocoileus virginianus) densities in the conterminous US during 1982 and 2001 to 2005 (hereafter, 2003), allowing access to deer density information stored in images.

Results

The digitization process took about one day each per deer density map, of which about two hours was computer processing time, which will differ due to factors such as resolution and number of objects. Deer were present in 4.75 million km2 (60% of the area) and 5.56 million km2 (70%) during 1982 and 2003, respectively. Population and density in areas with deer presence were 17.15 million and 3.6 deer/km2 during 1982 and 29.93 million and 5.4 deer/km2 during 2003. Greatest densities were 7.2 deer/km2 in Georgia during 1982 and 14.6 deer/km2 in Wisconsin during 2003. Six states had deer densities ≥9.8 deer/km2 during 2003. Colorado, Idaho, and Oregon had greatest increases in population and area of deer presence, and deer expansion is likely to continue into western states. Error in these estimates may be similar to error resulting from differential reporting by state agencies. Deer densities likely are within historical levels in most of the US.

Discussion

This method rapidly reclaimed informational value of deer density maps, enabling greater analysis, and similarly may be applied to digitize a variety of published maps to geographic information system layers, which permit greater analysis.

]]>
<![CDATA[Clinical characteristics and prognostic value of MEX3A mRNA in liver cancer]]> https://www.researchpad.co/article/N94e06f4b-4970-421c-89de-5164c61bf812

Background

MEX3A is an RNA-binding proteins (RBPs) that promotes the proliferation, invasion, migration and viability of cancer cells. The aim of this study was to explore the clinicopathological characteristics and prognostic significance of MEX3A mRNA expression in liver cancer.

Methods

RNA-Seq and clinical data were collected from The Cancer Genome Atlas (TCGA). Boxplots were used to represent discrete variables of MEX3A. Chi-square tests were used to analyze the correlation between clinical features and MEX3A expression. Receiver operating characteristic (ROC) curves were used to confirm diagnostic ability. Independent prognostic ability and values were assessed using Kaplan–Meier curves and Cox analysis.

Results

We acquired MEX3A RNA-Seq from 50 normal liver tissues and 373 liver cancer patients along with clinical data. We found that MEX3A was up-regulated in liver cancer which increased according to histological grade (p < 0.001). MEX3A showed moderate diagnostic ability for liver cancer (AUC = 0.837). Kaplan–Meier curves and Cox analysis revealed that the high expression of MEX3A was significantly associated with poor survival (OS and RFS) (p < 0.001). Moreover, MEX3A was identified as an independent prognostic factor of liver cancer (p < 0.001).

Conclusions

MEX3A expression shows promise as an independent predictor of liver cancer prognosis.

]]>
<![CDATA[Assessing distinct patterns of cognitive aging using tissue-specific brain age prediction based on diffusion tensor imaging and brain morphometry]]> https://www.researchpad.co/article/5c16d4fcd5eed0c4845461eb

Multimodal imaging enables sensitive measures of the architecture and integrity of the human brain, but the high-dimensional nature of advanced brain imaging features poses inherent challenges for the analyses and interpretations. Multivariate age prediction reduces the dimensionality to one biologically informative summary measure with potential for assessing deviations from normal lifespan trajectories. A number of studies documented remarkably accurate age prediction, but the differential age trajectories and the cognitive sensitivity of distinct brain tissue classes have yet to be adequately characterized. Exploring differential brain age models driven by tissue-specific classifiers provides a hitherto unexplored opportunity to disentangle independent sources of heterogeneity in brain biology. We trained machine-learning models to estimate brain age using various combinations of FreeSurfer based morphometry and diffusion tensor imaging based indices of white matter microstructure in 612 healthy controls aged 18–87 years. To compare the tissue-specific brain ages and their cognitive sensitivity, we applied each of the 11 models in an independent and cognitively well-characterized sample (n = 265, 20–88 years). Correlations between true and estimated age and mean absolute error (MAE) in our test sample were highest for the most comprehensive brain morphometry (r = 0.83, CI:0.78–0.86, MAE = 6.76 years) and white matter microstructure (r = 0.79, CI:0.74–0.83, MAE = 7.28 years) models, confirming sensitivity and generalizability. The deviance from the chronological age were sensitive to performance on several cognitive tests for various models, including spatial Stroop and symbol coding, indicating poorer performance in individuals with an over-estimated age. Tissue-specific brain age models provide sensitive measures of brain integrity, with implications for the study of a range of brain disorders.

]]>
<![CDATA[A scalable discrete-time survival model for neural networks]]> https://www.researchpad.co/article/5c605d8cd5eed0c4847d0748

There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival model that is designed to be used with neural networks, which we refer to as Nnet-survival. The model is trained with the maximum likelihood method using mini-batch stochastic gradient descent (SGD). The use of SGD enables rapid convergence and application to large datasets that do not fit in memory. The model is flexible, so that the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time. It has been implemented in the Keras deep learning framework, and source code for the model and several examples is available online. We demonstrate the performance of the model on both simulated and real data and compare it to existing models Cox-nnet and Deepsurv.

]]>
<![CDATA[A direct approach to estimating false discovery rates conditional on covariates]]> https://www.researchpad.co/article/5c26b155d5eed0c48475a309

Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini–Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.

]]>