ResearchPad - probability-distribution https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[ToyArchitecture: Unsupervised learning of interpretable models of the environment]]> https://www.researchpad.co/article/elastic_article_15730 Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are often uncomputable, or lack practical implementations. In this paper we attempt to follow a big picture view while also providing a particular theory and its implementation to present a novel, purposely simple, and interpretable hierarchical architecture. This architecture incorporates the unsupervised learning of a model of the environment, learning the influence of one’s own actions, model-based reinforcement learning, hierarchical planning, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations which are increasingly more abstract, but can retain details when needed. We demonstrate the universality of the architecture by testing it on a series of diverse environments ranging from audio/visual compression to discrete and continuous action spaces, to learning disentangled representations.

]]>
<![CDATA[Agent-based and continuous models of hopper bands for the Australian plague locust: How resource consumption mediates pulse formation and geometry]]> https://www.researchpad.co/article/elastic_article_14654 Locusts aggregate in swarms that threaten agriculture worldwide. Initially these aggregations form as aligned groups, known as hopper bands, whose individuals alternate between marching and paused (associated with feeding) states. The Australian plague locust (for which there are excellent field studies) forms wide crescent-shaped bands with a high density at the front where locusts slow in uneaten vegetation. The density of locusts rapidly decreases behind the front where the majority of food has been consumed. Most models of collective behavior focus on social interactions as the key organizing principle. We demonstrate that the formation of locust bands may be driven by resource consumption. Our first model treats each locust as an individual agent with probabilistic rules governing motion and feeding. Our second model describes locust density with deterministic differential equations. We use biological observations of individual behavior and collective band shape to identify numerical values for the model parameters and conduct a sensitivity analysis of outcomes to parameter changes. Our models are capable of reproducing the characteristics observed in the field. Moreover, they provide insight into how resource availability influences collective locust behavior that may eventually aid in disrupting the formation of locust bands, mitigating agricultural losses.

]]>
<![CDATA[The Childbirth Experience Questionnaire (CEQ)—Validation of its use in a Danish-speaking population of new mothers stimulated with oxytocin during labour]]> https://www.researchpad.co/article/elastic_article_14579 When determining optimal treatment regimens, patient reported outcomes including satisfaction are increasingly appreciated. It is well established that the birth experience may affect the postnatal attachment to the newborn and the management of subsequent pregnancies and deliveries. As we have no robust validated Danish tool to evaluate the childbirth experience exists, we aimed to perform a transcultural adaptation of the Childbirth Experience Questionnaire (CEQ) to a Danish context.MethodsIn accordance with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN), we translated the Swedish-CEQ to Danish. The Danish-CEQ was tested for content validity among 10 new mothers. In a population of women who have had their labour induced, we then assessed the electronic questionnaire for validity and reliability using factor analytical design, hypothesis testing, and internal consistency. Based on these data, we determined criterion and construct responsiveness in addition to floor and ceiling effects.ResultsThe content validation resulted in minor adjustments in two items. This improved the comprehensibility. The electronic questionnaire was completed by 377 of 495 women (76.2%). The original Swedish-CEQ was four-dimensional, however an exploratory factor analysis revealed a three-dimensional structure in our Danish population (Own capacity, Participation, and Professional support). Parous women, women who delivered vaginally, and women with a labour duration <12 hours had a higher score in each domain. The internal consistency (Cronbach’s alpha) ranged between 0.75 and 0.89 and the ICC between 0.68–0.93. We found ceiling effects of 57.6% in the domain Professional support and of 25.5% in the domain Participation.ConclusionThis study offers transcultural adaptation of the Swedish-CEQ to a Danish context. The 3-dimensional Danish-CEQ demonstrates construct validity and reliability. Our results revealed significant ceiling effect especially in the domain Professional support, which needs to be acknowledged when considering implementing the Danish-CEQ into trials and clinical practice. ]]> <![CDATA[A non-stationary Markov model for economic evaluation of grass pollen allergoid immunotherapy]]> https://www.researchpad.co/article/elastic_article_14555 Allergic rhino-conjunctivitis (ARC) is an IgE-mediated disease that occurs after exposure to indoor or outdoor allergens, or to non-specific triggers. Effective treatment options for seasonal ARC are available, but the economic aspects and burden of these therapies are not of secondary importance, also considered that the prevalence of ARC has been estimated at 23% in Europe. For these reasons, we propose a novel flexible cost-effectiveness analysis (CEA) model, intended to provide healthcare professionals and policymakers with useful information aimed at cost-effective interventions for grass-pollen induced allergic rhino-conjunctivitis (ARC).MethodsTreatments compared are: 1. no AIT, first-line symptomatic drug-therapy with no allergoid immunotherapy (AIT). 2. SCIT, subcutaneous immunotherapy. 3. SLIT, sublingual immunotherapy. The proposed model is a non-stationary Markovian model, that is flexible enough to reflect those treatment-related problems often encountered in real-life and clinical practice, but that cannot be adequately represented in randomized clinical trials (RCTs). At the same time, we described in detail all the structural elements of the model as well as its input parameters, in order to minimize any issue of transparency and facilitate the reproducibility and circulation of the results among researchers.ResultsUsing the no AIT strategy as a comparator, and the Incremental Cost Effectiveness Ratio (ICER) as a statistic to summarize the cost-effectiveness of a health care intervention, we could conclude that:SCIT systematically outperforms SLIT, except when a full societal perspective is considered. For example, for T = 9 and a pollen season of 60 days, we have ICER = €16,729 for SCIT vs. ICER = €15,116 for SLIT (in the full societal perspective).For longer pollen seasons or longer follow-up duration the ICER decreases, because each patient experiences a greater clinical benefit over a larger time span, and Quality-adjusted Life Year (QALYs) gained per cycle increase accordingly.Assuming that no clinical benefit is achieved after premature discontinuation, and that at least three years of immunotherapy are required to improve clinical manifestations and perceiving a better quality of life, ICERs become far greater than €30,000.If the immunotherapy is effective only at the peak of the pollen season, the relative ICERs rise sharply. For example, in the scenario where no clinical benefit is present after premature discontinuation of immunotherapy, we have ICER = €74,770 for SCIT vs. ICER = €152,110 for SLIT.The distance between SCIT and SLIT strongly depends on under which model the interventions are meta-analyzed.ConclusionsEven though there is a considerable evidence that SCIT outperforms SLIT, we could not state that both SCIT and SLIT (or only one of these two) can be considered cost-effective for ARC, as a reliable threshold value for cost-effectiveness set by national regulatory agencies for pharmaceutical products is missing. Moreover, the impact of model input parameters uncertainty on the reliability of our conclusions needs to be investigated further. ]]> <![CDATA[Class enumeration false positive in skew-t family of continuous growth mixture models]]> https://www.researchpad.co/article/Ne0623f60-4058-4fc0-9606-ac0f597752dc

Growth Mixture Modeling (GMM) has gained great popularity in the last decades as a methodology for longitudinal data analysis. The usual assumption of normally distributed repeated measures has been shown as problematic in real-life data applications. Namely, performing normal GMM on data that is even slightly skewed can lead to an over selection of the number of latent classes. In order to ameliorate this unwanted result, GMM based on the skew t family of continuous distributions has been proposed. This family of distributions includes the normal, skew normal, t, and skew t. This simulation study aims to determine the efficiency of selecting the “true” number of latent groups in GMM based on the skew t family of continuous distributions, using fit indices and likelihood ratio tests. Results show that the skew t GMM was the only model considered that showed fit indices and LRT false positive rates under the 0.05 cutoff value across sample sizes and for normal, and skewed and kurtic data. Simulation results are corroborated by a real educational data application example. These findings favor the development of practical guides of the benefits and risks of using the GMM based on this family of distributions.

]]>
<![CDATA[Intra-individual variation of particles in exhaled air and of the contents of Surfactant protein A and albumin]]> https://www.researchpad.co/article/N3daed577-6f93-4f19-9dc8-54ce3f8d7d6e

Introduction

Particles in exhaled air (PEx) provide samples of respiratory tract lining fluid from small airways containing, for example, Surfactant protein A (SP-A) and albumin, potential biomarkers of small airway disease. We hypothesized that there are differences between morning, noon, and afternoon measurements and that the variability of repeated measurements is larger between days than within days.

Methods

PEx was obtained in sixteen healthy non-smoking adults on 11 occasions, within one day and between days. SP-A and albumin were quantified by ELISA. The coefficient of repeatability (CR), intraclass correlation coefficient (ICC), and coefficient of variation (CV) were used to assess the variation of repeated measurements.

Results

SP-A and albumin increased significantly from morning towards the noon and afternoon by 13% and 25% on average, respectively, whereas PEx number concentration and particle mean mass did not differ significantly between the morning, noon and afternoon. Between-day CRs were not larger than within-day CRs.

Conclusions

Time of the day influences the contents of SP-A and albumin in exhaled particles. The variation of repeated measurements was rather high but was not influenced by the time intervals between measurements.

]]>
<![CDATA[Projections of Ebola outbreak size and duration with and without vaccine use in Équateur, Democratic Republic of Congo, as of May 27, 2018]]> https://www.researchpad.co/article/5c8accd5d5eed0c4849900f7

As of May 27, 2018, 6 suspected, 13 probable and 35 confirmed cases of Ebola virus disease (EVD) had been reported in Équateur Province, Democratic Republic of Congo. We used reported case counts and time series from prior outbreaks to estimate the total outbreak size and duration with and without vaccine use. We modeled Ebola virus transmission using a stochastic branching process model that included reproduction numbers from past Ebola outbreaks and a particle filtering method to generate a probabilistic projection of the outbreak size and duration conditioned on its reported trajectory to date; modeled using high (62%), low (44%), and zero (0%) estimates of vaccination coverage (after deployment). Additionally, we used the time series for 18 prior Ebola outbreaks from 1976 to 2016 to parameterize the Thiel-Sen regression model predicting the outbreak size from the number of observed cases from April 4 to May 27. We used these techniques on probable and confirmed case counts with and without inclusion of suspected cases. Probabilistic projections were scored against the actual outbreak size of 54 EVD cases, using a log-likelihood score. With the stochastic model, using high, low, and zero estimates of vaccination coverage, the median outbreak sizes for probable and confirmed cases were 82 cases (95% prediction interval [PI]: 55, 156), 104 cases (95% PI: 58, 271), and 213 cases (95% PI: 64, 1450), respectively. With the Thiel-Sen regression model, the median outbreak size was estimated to be 65.0 probable and confirmed cases (95% PI: 48.8, 119.7). Among our three mathematical models, the stochastic model with suspected cases and high vaccine coverage predicted total outbreak sizes closest to the true outcome. Relatively simple mathematical models updated in real time may inform outbreak response teams with projections of total outbreak size and duration.

]]>
<![CDATA[Cross-comparative analysis of evacuation behavior after earthquakes using mobile phone data]]> https://www.researchpad.co/article/5c76fe4dd5eed0c484e5b867

Despite the importance of predicting evacuation mobility dynamics after large scale disasters for effective first response and disaster relief, our general understanding of evacuation behavior remains limited because of the lack of empirical evidence on the evacuation movement of individuals across multiple disaster instances. Here we investigate the GPS trajectories of a total of more than 1 million anonymized mobile phone users whose positions were tracked for a period of 2 months before and after four of the major earthquakes that occurred in Japan. Through a cross comparative analysis between the four disaster instances, we find that in contrast to the assumed complexity of evacuation decision making mechanisms in crisis situations, an individual’s evacuation probability is strongly dependent on the seismic intensity that they experience. In fact, we show that the evacuation probabilities in all earthquakes collapse into a similar pattern, with a critical threshold at around seismic intensity 5.5. This indicates that despite the diversity in the earthquakes profiles and urban characteristics, evacuation behavior is similarly dependent on seismic intensity. Moreover, we found that probability density functions of the distances that individuals evacuate are not dependent on seismic intensities that individuals experience. These insights from empirical analysis on evacuation from multiple earthquake instances using large scale mobility data contributes to a deeper understanding of how people react to earthquakes, and can potentially assist decision makers to simulate and predict the number of evacuees in urban areas with little computational time and cost. This can be achieved by utilizing only the information on population density distribution and seismic intensity distribution, which can be observed instantaneously after the shock.

]]>
<![CDATA[Population dynamics and entrainment of basal ganglia pacemakers are shaped by their dendritic arbors]]> https://www.researchpad.co/article/5c65dcf2d5eed0c484dec628

The theory of phase oscillators is an essential tool for understanding population dynamics of pacemaking neurons. GABAergic pacemakers in the substantia nigra pars reticulata (SNr), a main basal ganglia (BG) output nucleus, receive inputs from the direct and indirect pathways at distal and proximal regions of their dendritic arbors, respectively. We combine theory, optogenetic stimulation and electrophysiological experiments in acute brain slices to ask how dendritic properties impact the propensity of the various inputs, arriving at different locations along the dendrite, to recruit or entrain SNr pacemakers. By combining cable theory with sinusoidally-modulated optogenetic activation of either proximal somatodendritic regions or the entire somatodendritic arbor of SNr neurons, we construct an analytical model that accurately fits the empirically measured somatic current response to inputs arising from illuminating the soma and various portions of the dendritic field. We show that the extent of the dendritic tree that is illuminated generates measurable and systematic differences in the pacemaker’s phase response curve (PRC), causing a shift in its peak. Finally, we show that the divergent PRCs correctly predict differences in two major features of the collective dynamics of SNr neurons: the fidelity of population responses to sudden step-like changes in inputs; and the phase latency at which SNr neurons are entrained by rhythmic stimulation, which can occur in the BG under both physiological and pathophysiological conditions. Our novel method generates measurable and physiologically meaningful spatial effects, and provides the first empirical demonstration of how the collective responses of SNr pacemakers are determined by the transmission properties of their dendrites. SNr dendrites may serve to delay distal striatal inputs so that they impinge on the spike initiation zone simultaneously with pallidal and subthalamic inputs in order to guarantee a fair competition between the influence of the monosynaptic direct- and polysynaptic indirect pathways.

]]>
<![CDATA[Rainfall trend and variability in Southeast Florida: Implications for freshwater availability in the Everglades]]> https://www.researchpad.co/article/5c6c759ed5eed0c4843cff23

Freshwater demand in Southeast Florida is predicted to increase over the next few decades. However, shifting patterns in the intensity and frequency of drought create considerable pressure on local freshwater availability. Well-established water resources management requires evaluating and understanding long-term rainfall patterns, drought intensity and cycle, and related rainfall deficit. In this study, the presence of rainfall monotonic trends was analyzed using linear regression and Mann–Kendal trend tests. Pettit's single point detection test examined the presence of an abrupt change of rainfall. Drought in Southeast Florida is assessed using the Standardized Precipitation Index (SPI) in 3-, 6-, 12-, and 24-months scale; and the Fast Fourier Transform is applied to evaluate the frequency of each drought intensity. There was an increase of rainfall in most of the wet season months, the total wet season, and the annual total. The wet season duration showed a decrease driven by a decrease in October rainfall. Since 1990, wet season and total annual rainfall exhibited an abrupt increase. The SPI analysis has indicated that extended wetness characterizes the contemporary rainfall regime since 1995, except for the incidence of intermittent dry spells. Short-term droughts have 3-year to 5-year recurrence intervals, and sustained droughts have a 10-year and 20-year recurrence intervals. In Southeast Florida, prolonged drought limits freshwater availability by decreasing recharge, resulting in a longer hydro-period to maintain the health of the Everglades Ecosystem, and to control saltwater intrusion. The increasing dry season duration suggests the growing importance of promoting surface water storage and demand-side management practices.

]]>
<![CDATA[Histogram analysis of prostate cancer on dynamic contrast-enhanced magnetic resonance imaging: A preliminary study emphasizing on zonal difference]]> https://www.researchpad.co/article/5c6c75d9d5eed0c4843d02ed

Background

This study evaluated the performance of histogram analysis in the time course of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) for differentiating cancerous tissues from benign tissues in the prostate.

Methods

We retrospectively analyzed the histograms of DCE-MRI of 30 patients. Histograms within regions of interest(ROI) in the peripheral zone (PZ) and transitional zone (TZ) were separately analyzed. The maximum difference wash-in slope (MWS) and delay phase slope (DPS) were defined for each voxel. Differences in histogram parameters, namely the mean, standard deviation (SD), the coefficient of variation (CV), kurtosis, skewness, interquartile range (IQR), percentile (P10, P25, P75, P90, and P90P10), Range, and modified full width at half-maximum (mFWHM) between cancerous and benign tissues were assessed.

Results

In the TZ, CV for ROIs of 7.5 and 10mm was the only significantly different parameter of the MWS (P = 0.034 and P = 0.004, respectively), whereas many parameters of the DPS (mean, skewness, P10, P25, P50, P75 and P90) differed significantly (P = <0.001–0.016 and area under the curve [AUC] = 0.73–0.822). In the PZ, all parameters of the MWS exhibited significant differences, except kurtosis and skewness in the ROI of 7.5mm(P = <0.001–0.017 and AUC = 0.865–0.898). SD, IQR, mFWHM, P90P10 and Range were also significant differences in the DPS (P = 0.001–0.035).

Conclusion

The histogram analysis of DCE-MRI is a potentially useful approach for differentiating prostate cancer from normal tissues. Different histogram parameters of the MWS and DPS should be applied in the TZ and PZ.

]]>
<![CDATA[Systematically false positives in early warning signal analysis]]> https://www.researchpad.co/article/5c648ce2d5eed0c484c819e6

Many systems in various scientific fields like medicine, ecology, economics or climate science exhibit so-called critical transitions, through which a system abruptly changes from one state to a different state. Typical examples are epileptic seizures, changes in the climate system or catastrophic shifts in ecosystems. In order to predict imminent critical transitions, a mathematical apparatus called early warning signals has been developed and this method is used successfully in many scientific areas. However, not all critical transitions can be detected by this approach (false negative) and the appearance of early warning signals does not necessarily proof that a critical transition is imminent (false positive). Furthermore, there are whole classes of systems that always show early warning signals, even though they do not feature critical transitions. In this study we identify such classes in order to provide a safeguard against a misinterpretation of the results of an early warning signal analysis of such systems. Furthermore, we discuss strategies to avoid such systematic false positives and test our theoretical insights by applying them to real world data.

]]>
<![CDATA[Identification of movement synchrony: Validation of windowed cross-lagged correlation and -regression with peak-picking algorithm]]> https://www.researchpad.co/article/5c6b26b9d5eed0c484289f1e

In psychotherapy, movement synchrony seems to be associated with higher patient satisfaction and treatment outcome. However, it remains unclear whether movement synchrony rated by humans and movement synchrony identified by automated methods reflect the same construct. To address this issue, video sequences showing movement synchrony of patients and therapists (N = 10) or not (N = 10), were analyzed using motion energy analysis. Three different synchrony conditions with varying levels of complexity (naturally embedded, naturally isolated, and artificial) were generated for time series analysis with windowed cross-lagged correlation/ -regression (WCLC, WCLR). The concordance of ratings (human rating vs. automatic assessment) was computed for 600 different parameter configurations of the WCLC/WCLR to identify the parameter settings that measure movement synchrony best. A parameter configuration was rated as having a good identification rate if it yields high concordance with human-rated intervals (Cohen’s kappa) and a low amount of over-identified data points. Results indicate that 76 configurations had a good identification rate (IR) in the least complex condition (artificial). Two had an acceptable IR with regard to the naturally isolated condition. Concordance was low with regard to the most complex (naturally embedded) condition. A valid identification of movement synchrony strongly depends on parameter configuration and goes beyond the identification of synchrony by human raters. Differences between human-rated synchrony and nonverbal synchrony measured by algorithms are discussed.

]]>
<![CDATA[Testing of library preparation methods for transcriptome sequencing of real life glioblastoma and brain tissue specimens: A comparative study with special focus on long non-coding RNAs]]> https://www.researchpad.co/article/5c6b26afd5eed0c484289e7d

Current progress in the field of next-generation transcriptome sequencing have contributed significantly to the study of various malignancies including glioblastoma multiforme (GBM). Differential sequencing of transcriptomes of patients and non-tumor controls has a potential to reveal novel transcripts with significant role in GBM. One such candidate group of molecules are long non-coding RNAs (lncRNAs) which have been proved to be involved in processes such as carcinogenesis, epigenetic modifications and resistance to various therapeutic approaches. To maximize the value of transcriptome sequencing, a proper protocol for library preparation from tissue-derived RNA needs to be found which would produce high quality transcriptome sequencing data and increase the number of detected lncRNAs. It is important to mention that success of library preparation is determined by the quality of input RNA, which is in case of real-life tissue specimens very often altered in comparison to high quality RNA commonly used by manufacturers for development of library preparation chemistry. In the present study, we used GBM and non-tumor brain tissue specimens and compared three different commercial library preparation kits, namely NEXTflex Rapid Directional qRNA-Seq Kit (Bioo Scientific), SENSE Total RNA-Seq Library Prep Kit (Lexogen) and NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB). Libraries generated using SENSE kit were characterized by the most normal distribution of normalized average GC content, the least amount of over-represented sequences and the percentage of ribosomal RNA reads (0.3–1.5%) and highest numbers of uniquely mapped reads and reads aligning to coding regions. However, NEBNext kit performed better having relatively low duplication rates, even transcript coverage and the highest number of hits in Ensembl database for every biotype of our interest including lncRNAs. Our results indicate that out of three approaches the NEBNext library preparation kit was most suitable for the study of lncRNAs via transcriptome sequencing. This was further confirmed by highly consistent data reached in an independent validation on an expanded cohort.

]]>
<![CDATA[University-industry-government relations of the Ministry of Industry and Information Technology (MIIT) universities: The perspective of the mutual information]]> https://www.researchpad.co/article/5c67306ed5eed0c484f37b1a

The Ministry of Industry and Information Technology (MIIT) universities are important bases for science and technology research and play a critical role in China’s National Innovation System. Based on the Web of Science (WoS), this article analyzes the statistics of paper published by MIIT universities and universities from across China including MIIT universities. The results are as follows: (1) Both the MIIT universities and universities nationwide in China have increased their international academic publications, and MIIT have shown a greater increase for the past decade. (2) In terms of U-I-G interaction, for UG relations, the Tug value of MIIT universities has remained stable, while that of universities in China has become declined. For UI relations, the Tui value of both MIIT universities and universities in China has shown steady growth. For UIG relations, MIIT universities have a greater synergistic effect of Triple Helix relationship than universities in China. (3) For more details in seven MIIT universities, universities elected into “Project 985”, including HIT, BUAA, BIT and NPU, have published more papers, and been more synergistic with government and industry (UIG relations) than other three universities, including NUAA, NUST and HEU. Based on the empirical results, we discuss our findings, and make certain suggestions regarding policy incentives, reasonable administrative system and U-I-G interaction mode, which is significant not only for Chinese universities but also for universities in other developing countries.

]]>
<![CDATA[Threshold response to stochasticity in morphogenesis]]> https://www.researchpad.co/article/5c5b52e0d5eed0c4842bd1b9

During development of biological organisms, multiple complex structures are formed. In many instances, these structures need to exhibit a high degree of order to be functional, although many of their constituents are intrinsically stochastic. Hence, it has been suggested that biological robustness ultimately must rely on complex gene regulatory networks and clean-up mechanisms. Here we explore developmental processes that have evolved inherent robustness against stochasticity. In the context of the Drosophila eye disc, multiple optical units, ommatidia, develop into crystal-like patterns. During the larva-to-pupa stage of metamorphosis, the centers of the ommatidia are specified initially through the diffusion of morphogens, followed by the specification of R8 cells. Establishing the R8 cell is crucial in setting up the geometric, and functional, relationships of cells within an ommatidium and among neighboring ommatidia. Here we study an PDE mathematical model of these spatio-temporal processes in the presence of parametric stochasticity, defining and applying measures that quantify order within the resulting spatial patterns. We observe a universal sigmoidal response to increasing transcriptional noise. Ordered patterns persist up to a threshold noise level in the model parameters. In accordance with prior qualitative observations, as the noise is further increased past a threshold point of no return, these ordered patterns rapidly become disordered. Such robustness in development allows for the accumulation of genetic variation without any observable changes in phenotype. We argue that the observed sigmoidal dependence introduces robustness allowing for sizable amounts of genetic variation and transcriptional noise to be tolerated in natural populations without resulting in phenotype variation.

]]>
<![CDATA[Dynamical analogues of rank distributions]]> https://www.researchpad.co/article/5c61e933d5eed0c48496f97e

We present an equivalence between stochastic and deterministic variable approaches to represent ranked data and find the expressions obtained to be suggestive of statistical-mechanical meanings. We first reproduce size-rank distributions N(k) from real data sets by straightforward considerations based on the assumed knowledge of the background probability distribution P(N) that generates samples of random variable values similar to real data. The choice of different functional expressions for P(N): power law, exponential, Gaussian, etc., leads to different classes of distributions N(k) for which we find examples in nature. Then we show that all of these types of functions can be alternatively obtained from deterministic dynamical systems. These correspond to one-dimensional nonlinear iterated maps near a tangent bifurcation whose trajectories are proved to be precise analogues of the N(k). We provide explicit expressions for the maps and their trajectories and find they operate under conditions of vanishing or small Lyapunov exponent, therefore at or near a transition to or out of chaos. We give explicit examples ranging from exponential to logarithmic behavior, including Zipf’s law. Adoption of the nonlinear map as the formalism central character is a useful viewpoint, as variation of its few parameters, that modify its tangency property, translate into the different classes for N(k).

]]>
<![CDATA[The finite state projection based Fisher information matrix approach to estimate information and optimize single-cell experiments]]> https://www.researchpad.co/article/5c478c61d5eed0c484bd1f74

Modern optical imaging experiments not only measure single-cell and single-molecule dynamics with high precision, but they can also perturb the cellular environment in myriad controlled and novel settings. Techniques, such as single-molecule fluorescence in-situ hybridization, microfluidics, and optogenetics, have opened the door to a large number of potential experiments, which begs the question of how to choose the best possible experiment. The Fisher information matrix (FIM) estimates how well potential experiments will constrain model parameters and can be used to design optimal experiments. Here, we introduce the finite state projection (FSP) based FIM, which uses the formalism of the chemical master equation to derive and compute the FIM. The FSP-FIM makes no assumptions about the distribution shapes of single-cell data, and it does not require precise measurements of higher order moments of such distributions. We validate the FSP-FIM against well-known Fisher information results for the simple case of constitutive gene expression. We then use numerical simulations to demonstrate the use of the FSP-FIM to optimize the timing of single-cell experiments with more complex, non-Gaussian fluctuations. We validate optimal simulated experiments determined using the FSP-FIM with Monte-Carlo approaches and contrast these to experiment designs chosen by traditional analyses that assume Gaussian fluctuations or use the central limit theorem. By systematically designing experiments to use all of the measurable fluctuations, our method enables a key step to improve co-design of experiments and quantitative models.

]]>
<![CDATA[A novel scale-space approach for multinormality testing and the k-sample problem in the high dimension low sample size scenario]]> https://www.researchpad.co/article/5c50c44ed5eed0c4845e84bb

Two classical multivariate statistical problems, testing of multivariate normality and the k-sample problem, are explored by a novel analysis on several resolutions simultaneously. The presented methods do not invert any estimated covariance matrix. Thereby, the methods work in the High Dimension Low Sample Size situation, i.e. when np. The output, a significance map, is produced by doing a one-dimensional test for all possible resolution/position pairs. The significance map shows for which resolution/position pairs the null hypothesis is rejected. For the testing of multinormality, the Anderson-Darling test is utilized to detect potential departures from multinormality at different combinations of resolutions and positions. In the k-sample case, it is tested whether k data sets can be said to originate from the same unspecified discrete or continuous multivariate distribution. This is done by testing the k vectors corresponding to the same resolution/position pair of the k different data sets through the k-sample Anderson-Darling test. Successful demonstrations of the new methodology on artificial and real data sets are presented, and a feature selection scheme is demonstrated.

]]>
<![CDATA[A sequential Monte Carlo algorithm for inference of subclonal structure in cancer]]> https://www.researchpad.co/article/5c6448e4d5eed0c484c2f100

Tumors are heterogeneous in the sense that they consist of multiple subpopulations of cells, referred to as subclones, each of which is characterized by a distinct profile of genomic variations such as somatic mutations. Inferring the underlying clonal landscape has become an important topic in that it can help in understanding cancer development and progression, and thereby help in improving treatment. We describe a novel state-space model, based on the feature allocation framework and an efficient sequential Monte Carlo (SMC) algorithm, using the somatic mutation data obtained from tumor samples to estimate the number of subclones, as well as their characterization. Our approach, by design, is capable of handling any number of mutations. Via extensive simulations, our method exhibits high accuracy, in most cases, and compares favorably with existing methods. Moreover, we demonstrated the validity of our method through analyzing real tumor samples from patients from multiple cancer types (breast, prostate, and lung). Our results reveal driver mutation events specific to cancer types, and indicate clonal expansion by manual phylogenetic analysis. MATLAB code and datasets are available to download at: https://github.com/moyanre/tumor_clones.

]]>