ResearchPad - forecasting https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia]]> https://www.researchpad.co/article/elastic_article_13868 Early T-cell precursor (ETP) is the only subtype of acute T-cell lymphoblastic leukemia (T-ALL) listed in the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia. Patients with ETP tend to have worse disease outcomes. ETP is defined by a series of immune markers. The diagnosis of ETP status can be vague due to the limitation of the current measurement. In this study, we performed unsupervised clustering and supervised prediction to investigate whether a molecular biomarker can be used to identify the ETP status in order to stratify risk groups. We found that the ETP status can be predicted by the expression level of Lymphoid enhancer binding factor 1 (LEF1) with high accuracy (AUC of ROC = 0.957 and 0.933 in two T-ALL cohorts). The patients with ETP subtype have a lower level of LEF1 comparing to the those without ETP. We suggest that incorporating the biomarker LEF1 with traditional immune-phenotyping will improve the diagnosis of ETP.

]]>
<![CDATA[Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019]]> https://www.researchpad.co/article/elastic_article_13811 The identification of statistical models for the accurate forecast and timely determination of the outbreak of infectious diseases is very important for the healthcare system. Thus, this study was conducted to assess and compare the performance of four machine-learning methods in modeling and forecasting brucellosis time series data based on climatic parameters.MethodsIn this cohort study, human brucellosis cases and climatic parameters were analyzed on a monthly basis for the Qazvin province–located in northwestern Iran- over a period of 9 years (2010–2018). The data were classified into two subsets of education (80%) and testing (20%). Artificial neural network methods (radial basis function and multilayer perceptron), support vector machine and random forest were fitted to each set. Performance analysis of the models were done using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Root Error (MARE), and R2 criteria.ResultsThe incidence rate of the brucellosis in Qazvin province was 27.43 per 100,000 during 2010–2019. Based on our results, the values of the RMSE (0.22), MAE (0.175), MARE (0.007) criteria were smaller for the multilayer perceptron neural network than their values in the other three models. Moreover, the R2 (0.99) value was bigger in this model. Therefore, the multilayer perceptron neural network exhibited better performance in forecasting the studied data. The average wind speed and mean temperature were the most effective climatic parameters in the incidence of this disease.ConclusionsThe multilayer perceptron neural network can be used as an effective method in detecting the behavioral trend of brucellosis over time. Nevertheless, further studies focusing on the application and comparison of these methods are needed to detect the most appropriate forecast method for this disease. ]]> <![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Early transmission dynamics of COVID-19 in a southern hemisphere setting: Lima-Peru: February 29<sup>th</sup>–March 30<sup>th</sup>, 2020]]> https://www.researchpad.co/article/elastic_article_8469 The COVID-19 pandemic that emerged in Wuhan China has generated substantial morbidity and mortality impact around the world during the last four months. The daily trend in reported cases has been rapidly rising in Latin America since March 2020 with the great majority of the cases reported in Brazil followed by Peru as of April 15th, 2020. Although Peru implemented a range of social distancing measures soon after the confirmation of its first case on March 6th, 2020, the daily number of new COVID-19 cases continues to accumulate in this country. We assessed the early COVID-19 transmission dynamics and the effect of social distancing interventions in Lima, Peru.

We estimated the reproduction number, R, during the early transmission phase in Lima from the daily series of imported and autochthonous cases by the date of symptoms onset as of March 30th, 2020. We also assessed the effect of social distancing interventions in Lima by generating short-term forecasts grounded on the early transmission dynamics before interventions were put in place.

Prior to the implementation of the social distancing measures in Lima, the local incidence curve by the date of symptoms onset displays near exponential growth dynamics with the mean scaling of growth parameter, p, estimated at 0.96 (95% CI: 0.87, 1.0) and the reproduction number at 2.3 (95% CI: 2.0, 2.5). Our analysis indicates that school closures and other social distancing interventions have helped slow down the spread of the novel coronavirus, with the nearly exponential growth trend shifting to an approximately linear growth trend soon after the broad scale social distancing interventions were put in place by the government.

While the interventions appear to have slowed the transmission rate in Lima, the number of new COVID-19 cases continue to accumulate, highlighting the need to strengthen social distancing and active case finding efforts to mitigate disease transmission in the region.

]]>
<![CDATA[Assessment of the genomic prediction accuracy for feed efficiency traits in meat-type chickens]]> https://www.researchpad.co/article/5989db51ab0ee8fa60bdc4ab

Feed represents the major cost of chicken production. Selection for improving feed utilization is a feasible way to reduce feed cost and greenhouse gas emissions. The objectives of this study were to investigate the efficiency of genomic prediction for feed conversion ratio (FCR), residual feed intake (RFI), average daily gain (ADG) and average daily feed intake (ADFI) and to assess the impact of selection for feed efficiency traits FCR and RFI on eviscerating percentage (EP), breast muscle percentage (BMP) and leg muscle percentage (LMP) in meat-type chickens. Genomic prediction was assessed using a 4-fold cross-validation for two validation scenarios. The first scenario was a random family sampling validation (CVF), and the second scenario was a random individual sampling validation (CVR). Variance components were estimated based on the genomic relationship built with single nucleotide polymorphism markers. Genomic estimated breeding values (GEBV) were predicted using a genomic best linear unbiased prediction model. The accuracies of GEBV were evaluated in two ways: the correlation between GEBV and corrected phenotypic value divided by the square root of heritability, i.e., the correlation-based accuracy, and model-based theoretical accuracy. Breeding values were also predicted using a conventional pedigree-based best linear unbiased prediction model in order to compare accuracies of genomic and conventional predictions. The heritability estimates of FCR and RFI were 0.29 and 0.50, respectively. The heritability estimates of ADG, ADFI, EP, BMP and LMP ranged from 0.34 to 0.53. In the CVF scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR were slightly higher than those for RFI. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.360, 0.284, 0.574 and 0.520, respectively, and the model-based theoretical accuracies were 0.420, 0.414, 0.401 and 0.382, respectively. In the CVR scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR was lower than RFI, which was different from the CVF scenario. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.449, 0.593, 0.581 and 0.627, respectively, and the model-based theoretical accuracies were 0.577, 0.629, 0.631 and 0.638, respectively. The accuracies of genomic predictions were 0.371 and 0.322 higher than the conventional pedigree-based predictions for the CVF and CVR scenarios, respectively. The genetic correlations of FCR with EP, BMP and LMP were -0.427, -0.156 and -0.338, respectively. The correlations between RFI and the three carcass traits were -0.320, -0.404 and -0.353, respectively. These results indicate that RFI and FCR have a moderate accuracy of genomic prediction. Improving RFI and FCR could be favourable for EP, BMP and LMP. Compared with FCR, which can be improved by selection for ADG in typical meat-type chicken breeding programs, selection for RFI could lead to extra improvement in feed efficiency.

]]>
<![CDATA[Analysis and modeling of coolants and coolers for specimen transportation]]> https://www.researchpad.co/article/N4e3aeb5c-7b13-42da-a06e-637c738940f8

Maintaining cold chain while transporting medical supplies and samples is difficult in remote settings. Failure to maintain temperature requirements can lead to degraded sample quality and inaccuracies in sample analysis. We performed a systematic analysis on different types of transport coolers (polystyrene foam, injection-molded, and rotational molded) and transport coolants (ice, cold packs, frozen water bottles) frequently in use in many countries. Polystyrene foam coolers stayed below our temperature threshold (6°C) longer than almost all other types of coolers, but were not durable. Injection-molded coolers were durable, but warmed to 6°C the quickest. Rotational molded coolers were able to keep temperatures below our threshold for 24 hours longer than injection molded coolers and were highly durable. Coolant systems were evaluated in terms of cost and their ability to maintain cold temperatures. Long lasting commercial cold packs were found to be less cost effective and were below freezing for the majority of the testing period. Frozen plastic water bottles were found to be a reusable and economical choice for coolant and were only below freezing briefly. Finally, we modeled the coolers performance at maintaining internal temperatures below 6°C and built a highly accurate linear model to predict how long a cooler will remain below 6°C. We believe this data may be useful in the planning and design of specimen transportation systems in the field, particularly in remote or resource limited settings.

]]>
<![CDATA[Predicting 30-day hospital readmissions using artificial neural networks with medical code embedding]]> https://www.researchpad.co/article/N1f40719a-4631-45e6-bedb-5cf8a42ecf53

Reducing unplanned readmissions is a major focus of current hospital quality efforts. In order to avoid unfair penalization, administrators and policymakers use prediction models to adjust for the performance of hospitals from healthcare claims data. Regression-based models are a commonly utilized method for such risk-standardization across hospitals; however, these models often suffer in accuracy. In this study we, compare four prediction models for unplanned patient readmission for patients hospitalized with acute myocardial infarction (AMI), congestive health failure (HF), and pneumonia (PNA) within the Nationwide Readmissions Database in 2014. We evaluated hierarchical logistic regression and compared its performance with gradient boosting and two models that utilize artificial neural networks. We show that unsupervised Global Vector for Word Representations embedding representations of administrative claims data combined with artificial neural network classification models improves prediction of 30-day readmission. Our best models increased the AUC for prediction of 30-day readmissions from 0.68 to 0.72 for AMI, 0.60 to 0.64 for HF, and 0.63 to 0.68 for PNA compared to hierarchical logistic regression. Furthermore, risk-standardized hospital readmission rates calculated from our artificial neural network model that employed embeddings led to reclassification of approximately 10% of hospitals across categories of hospital performance. This finding suggests that prediction models that incorporate new methods classify hospitals differently than traditional regression-based approaches and that their role in assessing hospital performance warrants further investigation.

]]>
<![CDATA[A compound attributes-based predictive model for drug induced liver injury in humans]]> https://www.researchpad.co/article/Ndeb57c49-a1cc-41d4-9618-08dc56c45dac

Drug induced liver injury (DILI) is one of the key safety concerns in drug development. To assess the likelihood of drug candidates with potential adverse reactions of liver, we propose a compound attributes-based approach to predicting hepatobiliary disorders that are routinely reported to US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). Specifically, we developed a support vector machine (SVM) model with recursive feature extraction, based on physicochemical and structural properties of compounds as model input. Cross validation demonstrates that the predictive model has a robust performance with averaged 70% of both sensitivity and specificity over 500 trials. An independent validation was performed on public benchmark drugs and the results suggest potential utility of our model for identifying safety alerts. This in silico approach, upon further validation, would ultimately be implemented, together with other in vitro safety assays, for screening compounds early in drug development.

]]>
<![CDATA[Propagation analysis and prediction of the COVID-19]]> https://www.researchpad.co/article/N8eed231d-a3c4-460f-9998-75e54b9c5a58

Based on the official data modeling, this paper studies the transmission process of the Corona Virus Disease 2019 (COVID-19). The error between the model and the official data curve is quite small. At the same time, it realized forward prediction and backward inference of the epidemic situation, and the relevant analysis help relevant countries to make decisions.

]]>
<![CDATA[Why is it difficult to accurately predict the COVID-19 epidemic?]]> https://www.researchpad.co/article/Nfef213eb-1508-48fb-b79f-0644c88064d2

Since the COVID-19 outbreak in Wuhan City in December of 2019, numerous model predictions on the COVID-19 epidemics in Wuhan and other parts of China have been reported. These model predictions have shown a wide range of variations. In our study, we demonstrate that nonidentifiability in model calibrations using the confirmed-case data is the main reason for such wide variations. Using the Akaike Information Criterion (AIC) for model selection, we show that an SIR model performs much better than an SEIR model in representing the information contained in the confirmed-case data. This indicates that predictions using more complex models may not be more reliable compared to using a simpler model. We present our model predictions for the COVID-19 epidemic in Wuhan after the lockdown and quarantine of the city on January 23, 2020. We also report our results of modeling the impacts of the strict quarantine measures undertaken in the city after February 7 on the time course of the epidemic, and modeling the potential of a second outbreak after the return-to-work in the city.

]]>
<![CDATA[Magma Degassing as a Source of Long‐Term Seismicity at Volcanoes: The Ischia Island (Italy) Case]]> https://www.researchpad.co/article/N978e9b20-749d-4a9f-ac68-8956762af4c2

Abstract

Transient seismicity at active volcanoes poses a significant risk in addition to eruptive activity. This risk is powered by the common belief that volcanic seismicity cannot be forecast, even on a long term. Here we investigate the nature of volcanic seismicity to try to improve our forecasting capacity. To this aim, we consider Ischia volcano (Italy), which suffered similar earthquakes along its uplifted resurgent block. We show that this seismicity marks an acceleration of decades‐long subsidence of the resurgent block, driven by degassing of magma that previously produced the uplift, a process not observed at other volcanoes. Degassing will continue for hundreds to thousands of years, causing protracted seismicity and will likely be accompanied by moderate and damaging earthquakes. The possibility to constrain the future duration of seismicity at Ischia indicates that our capacity to forecast earthquakes might be enhanced when seismic activity results from long‐term magmatic processes, such as degassing

]]>
<![CDATA[Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020]]> https://www.researchpad.co/article/Ne00ead13-ea6f-4654-adc5-7cb9ff13ce2c

The initial cluster of severe pneumonia cases that triggered the COVID-19 epidemic was identified in Wuhan, China in December 2019. While early cases of the disease were linked to a wet market, human-to-human transmission has driven the rapid spread of the virus throughout China. The Chinese government has implemented containment strategies of city-wide lockdowns, screening at airports and train stations, and isolation of suspected patients; however, the cumulative case count keeps growing every day. The ongoing outbreak presents a challenge for modelers, as limited data are available on the early growth trajectory, and the epidemiological characteristics of the novel coronavirus are yet to be fully elucidated.

We use phenomenological models that have been validated during previous outbreaks to generate and assess short-term forecasts of the cumulative number of confirmed reported cases in Hubei province, the epicenter of the epidemic, and for the overall trajectory in China, excluding the province of Hubei. We collect daily reported cumulative confirmed cases for the 2019-nCoV outbreak for each Chinese province from the National Health Commission of China. Here, we provide 5, 10, and 15 day forecasts for five consecutive days, February 5th through February 9th, with quantified uncertainty based on a generalized logistic growth model, the Richards growth model, and a sub-epidemic wave model.

Our most recent forecasts reported here, based on data up until February 9, 2020, largely agree across the three models presented and suggest an average range of 7409–7496 additional confirmed cases in Hubei and 1128–1929 additional cases in other provinces within the next five days. Models also predict an average total cumulative case count between 37,415 and 38,028 in Hubei and 11,588–13,499 in other provinces by February 24, 2020.

Mean estimates and uncertainty bounds for both Hubei and other provinces have remained relatively stable in the last three reporting dates (February 7th – 9th). We also observe that each of the models predicts that the epidemic has reached saturation in both Hubei and other provinces. Our findings suggest that the containment strategies implemented in China are successfully reducing transmission and that the epidemic growth has slowed in recent days.

]]>
<![CDATA[An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov)]]> https://www.researchpad.co/article/N6a6056b1-3479-4fb6-9569-57adefa5c776

The basic reproduction number of an infectious agent is the average number of infections one case can generate over the course of the infectious period, in a naïve, uninfected population. It is well-known that the estimation of this number may vary due to several methodological issues, including different assumptions and choice of parameters, utilized models, used datasets and estimation period. With the spreading of the novel coronavirus (2019-nCoV) infection, the reproduction number has been found to vary, reflecting the dynamics of transmission of the coronavirus outbreak as well as the case reporting rate. Due to significant variations in the control strategies, which have been changing over time, and thanks to the introduction of detection technologies that have been rapidly improved, enabling to shorten the time from infection/symptoms onset to diagnosis, leading to faster confirmation of the new coronavirus cases, our previous estimations on the transmission risk of the 2019-nCoV need to be revised. By using time-dependent contact and diagnose rates, we refit our previously proposed dynamics transmission model to the data available until January 29th, 2020 and re-estimated the effective daily reproduction ratio that better quantifies the evolution of the interventions. We estimated when the effective daily reproduction ratio has fallen below 1 and when the epidemics will peak. Our updated findings suggest that the best measure is persistent and strict self-isolation. The epidemics will continue to grow, and can peak soon with the peak time depending highly on the public health interventions practically implemented.

]]>
<![CDATA[Insignificant QBO‐MJO Prediction Skill Relationship in the SubX and S2S Subseasonal Reforecasts]]> https://www.researchpad.co/article/N80d62339-fdd5-4539-8af8-fbb9aaf795dd

Abstract

The impact of the stratospheric quasi‐biennial oscillation (QBO) on the prediction of the tropospheric Madden‐Julian oscillation (MJO) is evaluated in reforecasts from nine models participating in subseasonal prediction projects, including the Subseasonal Experiment (SubX) and Subseasonal to Seasonal (S2S) projects. When MJO prediction skill is analyzed for December to February, MJO prediction skill is higher in the easterly phase of the QBO than the westerly phase, consistent with previous studies. However, the relationship between QBO phase and MJO prediction skill is not statistically significant for most models. This insignificant QBO‐MJO skill relationship is further confirmed by comparing two subseasonal reforecast experiments with the Community Earth System Model v1 using both a high‐top (46‐level) and low‐top (30‐level) version of the Community Atmosphere Model v5. While there are clear differences in the forecasted QBO between the two model top configurations, a negligible change is shown in the MJO prediction, indicating that the QBO in this model may not directly control the MJO prediction and supporting the insignificant QBO‐MJO skill relationship found in SubX and S2S models.

]]>
<![CDATA[Transient Deformation in California From Two Decades of GPS Displacements: Implications for a Three‐Dimensional Kinematic Reference Frame]]> https://www.researchpad.co/article/Nd6dcf325-6fcb-4a25-9772-8945673692b3

Abstract

Our understanding of plate boundary deformation has been enhanced by transient signals observed against the backdrop of time‐independent secular motions. We make use of a new analysis of displacement time series from about 1,000 continuous Global Positioning System (GPS) stations in California from 1999 to 2018 to distinguish tectonic and nontectonic transients from secular motion. A primary objective is to define a high‐resolution three‐dimensional reference frame (datum) for California that can be rapidly maintained with geodetic data to accommodate both secular and time‐dependent motions. To this end, we compare the displacements to those predicted by a horizontal secular fault slip model for the region and construct displacement and strain rate fields. Over the past 19 years, California has experienced 19 geodetically detectable earthquakes and widespread postseismic deformation. We observe postseismic strain rate variations as large as 1,000 nstrain/year with moment releases equivalent up to an Mw6.8 earthquake. We find significant secular differences up to 10 mm/year with the fault slip model, from the Mendocino Triple Junction to the southern Cascadia subduction zone, the northern Basin and Range, and the Santa Barbara channel. Secular vertical uplift is observed across the Transverse Ranges, Coastal Ranges, Sierra Nevada, as well as large‐scale postseismic uplift after the 1999 Mw7.1 Hector Mine and 2010 Mw7.2 El Mayor‐Cucapah earthquakes. We also identify areas of vertical land motions due to anthropogenic, natural, and magmatic processes. Finally, we demonstrate the utility of the kinematic datum by improving the accuracy of high‐spatial‐resolution 12‐day repeat‐cycle Sentinel‐1 Interferometric Synthetic Aperture Radar displacement and velocity maps.

]]>
<![CDATA[Prediction model for dengue fever based on interactive effects between multiple meteorological factors in Guangdong, China (2008–2016)]]> https://www.researchpad.co/article/Nfe4e2064-ca0a-4d6d-a8b7-4f75eb296e9a

Introduction

In order to improve the prediction accuracy of dengue fever incidence, we constructed a prediction model with interactive effects between meteorological factors, based on weekly dengue fever cases in Guangdong, China from 2008 to 2016.

Methods

Dengue fever data were derived from statistical data from the China National Notifiable Infectious Disease Reporting Information System. Daily meteorological data were obtained from the China Integrated Meteorological Information Sharing System. The minimum temperature for transmission was identified using data fitting and the Ross-Macdonald model. Correlations and interactive effects were examined using Spearman’s rank correlation and multivariate analysis of variance. A probit regression model to describe the incidence of dengue fever from 2008 to 2016 and forecast the 2017 incidence was constructed, based on key meteorological factors, interactive effects, mosquito-vector factors, and other important factors.

Results

We found the minimum temperature suitable for dengue transmission was ≥18°C, and as 97.91% of cases occurred when the minimum temperature was above 18 °C, the data were used for model training and construction. Epidemics of dengue are related to mean temperature, maximum/minimum and mean atmospheric pressure, and mean relative humidity. Moreover, interactions occur between mean temperature, minimum atmospheric pressure, and mean relative humidity. Our weekly probit regression prediction model is 0.72. Prediction of dengue cases for the first 41 weeks of 2017 exhibited goodness of fit of 0.60.

Conclusion

Our model was accurate and timely, with consideration of interactive effects between meteorological factors.

]]>
<![CDATA[Current state of the global operational aerosol multi‐model ensemble: An update from the International Cooperative for Aerosol Prediction (ICAP)]]> https://www.researchpad.co/article/N8f12c2c4-071c-4383-bd9c-676c4687640f

Since the first International Cooperative for Aerosol Prediction (ICAP) multi‐model ensemble (MME) study, the number of ICAP global operational aerosol models has increased from five to nine. An update of the current ICAP status is provided, along with an evaluation of the performance of ICAP‐MME over 2012–2017, with a focus on June 2016–May 2017. Evaluated with ground‐based Aerosol Robotic Network (AERONET) aerosol optical depth (AOD) and data assimilation quality MODerate‐resolution Imaging Spectroradiometer (MODIS) retrieval products, the ICAP‐MME AOD consensus remains the overall top‐scoring and most consistent performer among all models in terms of root‐mean‐square error (RMSE), bias and correlation for total, fine‐ and coarse‐mode AODs as well as dust AOD; this is similar to the first ICAP‐MME study. Further, over the years, the performance of ICAP‐MME is relatively stable and reliable compared to more variability in the individual models. The extent to which the AOD forecast error of ICAP‐MME can be predicted is also examined. Leading predictors are found to be the consensus mean and spread. Regression models of absolute forecast errors were built for AOD forecasts of different lengths for potential applications. ICAP‐MME performance in terms of modal AOD RMSEs of the 21 regionally representative sites over 2012–2017 suggests a general tendency for model improvements in fine‐mode AOD, especially over Asia. No significant improvement in coarse‐mode AOD is found overall for this time period.

]]>
<![CDATA[Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes]]> https://www.researchpad.co/article/5c8c193ed5eed0c484b4d25f

In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.

]]>
<![CDATA[Exploring the use of machine learning for risk adjustment: A comparison of standard and penalized linear regression models in predicting health care costs in older adults]]> https://www.researchpad.co/article/5c89772fd5eed0c4847d264d

Background

Payers and providers still primarily use ordinary least squares (OLS) to estimate expected economic and clinical outcomes for risk adjustment purposes. Penalized linear regression represents a practical and incremental step forward that provides transparency and interpretability within the familiar regression framework. This study conducted an in-depth comparison of prediction performance of standard and penalized linear regression in predicting future health care costs in older adults.

Methods and findings

This retrospective cohort study included 81,106 Medicare Advantage patients with 5 years of continuous medical and pharmacy insurance from 2009 to 2013. Total health care costs in 2013 were predicted with comorbidity indicators from 2009 to 2012. Using 2012 predictors only, OLS performed poorly (e.g., R2 = 16.3%) compared to penalized linear regression models (R2 ranging from 16.8 to 16.9%); using 2009–2012 predictors, the gap in prediction performance increased (R2:15.0% versus 18.0–18.2%). OLS with a reduced set of predictors selected by lasso showed improved performance (R2 = 16.6% with 2012 predictors, 17.4% with 2009–2012 predictors) relative to OLS without variable selection but still lagged behind the prediction performance of penalized regression. Lasso regression consistently generated prediction ratios closer to 1 across different levels of predicted risk compared to other models.

Conclusions

This study demonstrated the advantages of using transparent and easy-to-interpret penalized linear regression for predicting future health care costs in older adults relative to standard linear regression. Penalized regression showed better performance than OLS in predicting health care costs. Applying penalized regression to longitudinal data increased prediction accuracy. Lasso regression in particular showed superior prediction ratios across low and high levels of predicted risk. Health care insurers, providers and policy makers may benefit from adopting penalized regression such as lasso regression for cost prediction to improve risk adjustment and population health management and thus better address the underlying needs and risk of the populations they serve.

]]>
<![CDATA[Video loss prediction model in wireless networks]]> https://www.researchpad.co/article/5c8977a4d5eed0c4847d3245

This work discusses video communications over wireless networks (IEEE 802.11ac standard). The videos are in three different resolutions: 720p, 1080p, and 2160p. It is essential to study the performance of these media in access technologies to enhance the current coding and communications techniques. This study sets out a video quality prediction model that includes the different resolutions that are based on wireless network terms and conditions, an approach that has not previously been adopted in the literature. The model involves obtaining Service and Experience Quality Metrics, such as PSNR (Peak Signal-to-Noise Ratio) and packet loss. This article outlines a methodology and mathematical model for video quality loss in the wireless network from simulated data and its accuracy is ensured through the use of performance metrics (RMSE and Standard Deviation). The methodology is based on two mathematical functions, (logarithmic and exponential), and their parameters are defined by linear regression. The model obtained RMSE values and standard deviation of 2.32 dB and 2.2 dB for the predicted values, respectively. The results should lead to a CODEC (Coder-Decoder) improvement and contribute to a better wireless networks design.

]]>