AimsOur aim was to develop a machine learning (ML)-based risk stratification system to predict 1-, 2-, 3-, 4-, and 5-year all-cause mortality from pre-implant parameters of patients undergoing cardiac resynchronization therapy (CRT).Methods and resultsMultiple ML models were trained on a retrospective database of 1510 patients undergoing CRT implantation to predict 1- to 5-year all-cause mortality. Thirty-three pre-implant clinical features were selected to train the models. The best performing model [SEMMELWEIS-CRT score (perSonalizEd assessMent of estiMatEd risk of mortaLity With machinE learnIng in patientS undergoing CRT implantation)], along with pre-existing scores (Seattle Heart Failure Model, VALID-CRT, EAARN, ScREEN, and CRT-score), was tested on an independent cohort of 158 patients. There were 805 (53%) deaths in the training cohort and 80 (51%) deaths in the test cohort during the 5-year follow-up period. Among the trained classifiers, random forest demonstrated the best performance. For the prediction of 1-, 2-, 3-, 4-, and 5-year mortality, the areas under the receiver operating characteristic curves of the SEMMELWEIS-CRT score were 0.768 (95% CI: 0.674–0.861; P < 0.001), 0.793 (95% CI: 0.718–0.867; P < 0.001), 0.785 (95% CI: 0.711–0.859; P < 0.001), 0.776 (95% CI: 0.703–0.849; P < 0.001), and 0.803 (95% CI: 0.733–0.872; P < 0.001), respectively. The discriminative ability of our model was superior to other evaluated scores.ConclusionThe SEMMELWEIS-CRT score (available at semmelweiscrtscore.com) exhibited good discriminative capabilities for the prediction of all-cause death in CRT patients and outperformed the already existing risk scores. By capturing the non-linear association of predictors, the utilization of ML approaches may facilitate optimal candidate selection and prognostication of patients undergoing CRT implantation.
Cardiac resynchronization therapy (CRT) is a key component in the management of symptomatic heart failure with reduced ejection fraction and wide QRS complex.1 Based on the report of the European Heart Rhythm Association, over 90 CRT implantations per million population are performed annually in the ESC countries.2 Although CRT improves mortality, functional capacity, clinical symptoms, and quality of life in a certain patient subpopulation, not everyone benefits equally and mortality rates still remain high among these patients.3–7
The recognition of this variability in outcomes has prompted efforts in the risk stratification of CRT patients based on pre‐implant assessments. However, the currently available risk scores have several shortcomings (e.g. lack of generalizability and impact analyses, omitting routinely assessed, powerful predictors) which hamper their utilization in the everyday clinical practice.8 Therefore, more precise and personalized methods are required. The recent improvements in computation power and software technologies have led to the flourishing of machine learning (ML), a field of artificial intelligence (AI), which seems to be a promising tool to meet this compelling demand.9
Machine learning refers to a collection of techniques that gives AI the ability to learn complex rules and to identify patterns from multidimensional datasets, without being explicitly programmed or applying any a priori assumptions. It has been effectively utilized in many areas of cardiology such as precision phenotyping, diagnostics, and prognostication including the prediction of hospital readmissions and mortality.10–12 Although, heart failure patients undergoing CRT implantation represent another important target population for mortality prediction, only few studies have applied ML to tackle this issue.13–15
Accordingly, our aim was to design and evaluate a ML-based risk stratification system to predict 1-, 2-, 3-, 4-, and 5-year mortality from pre-implant parameters of patients undergoing CRT implantation. We hypothesized that ML can capture high-dimensional, non-linear relationships among clinical features and a risk stratification system can be developed that predicts mortality for individual patients more accurately than the currently available risk scores.
We identified 2282 patients who underwent successful CRT implantation at the Heart and Vascular Center of Semmelweis University (Budapest, Hungary) between September 2000 and December 2017. For each of these patients, pre-implant clinical characteristics such as demographics, medical history, physical status and vitals, currently applied medical therapy, electrocardiogram, echocardiographic, and laboratory parameters were extracted retrospectively from electronic medical records and entered to our structured database.
An additional prospective database of patients undergoing CRT implantation between January 2009 and December 2011 was also utilized. Patients included in both the retrospective and the prospective databases were removed from the retrospective database. In this way, the two cohorts were completely independent and they could be used as training and test cohorts for ML algorithms.
The study protocol (Supplementary material online, Figure S1) complies with the Declaration of Helsinki and it was approved by the Regional and Institutional Committee of Science and Research Ethics (approval No. 161/2019).
Follow-up data [status (dead or alive), date of death] was obtained for all patients from the National Health Insurance Database of Hungary. Patients with shorter than 5-year follow-up duration (614 patients in the retrospective and 0 patient in the prospective database) were excluded from all analyses. The primary endpoint of our study was all-cause mortality.
Our structured database initially comprised over 100 easily obtainable clinical variables (so-called features). Firstly, features included in both the retrospective and the prospective databases were identified (n = 49). Then, features missing for >40% of cases (n = 16) were excluded. The final set of input features included 33 pre-implant clinical variables (Supplementary material online, Table S1).
Missing values were imputed using the mean imputation method which replaces the missing values of a certain variable with the mean of the available cases. As the range of different features varied widely and some of the utilized algorithms required the data to be normalized, Z-score normalization was performed after imputation.
We used the follow-up data to generate six classes of possible outcomes: death during the 1st (class 1), the 2nd (class 2), the 3rd (class 3), the 4th (class 4), the 5th year after CRT implantation (class 5), and no death during the first 5 years following the implantation (class 6). The task of ML algorithms was to predict the probability distribution (i.e. class membership probabilities) of each patient over these classes based on the pre-implant clinical features.
Model development included trials of several ML classifiers such as logistic regression, ridge regression, support vector machines, k-nearest neighbours classifier, gradient boosting classifier, random forest, conditional inference random forest, and multi-layer perceptron. Models were trained with stratified 10-fold cross-validation on the training cohort and a grid search approach was used to tune the hyper-parameters of each ML algorithm (Supplementary material online, Table S2).
The outputs of each model were series of six values representing the previously defined class membership probabilities (Figure 1A). The sum of these probabilities is equal to one in each patient. To create binary classifiers, we calculated cumulative class membership probabilities by summing these values until the given year of follow-up (Figure 1B). The computed cumulative probabilities were then calibrated using Platt’s scaling and the survival curve could be plotted for each patient (Figure 1C). The calibration of the model was evaluated using Brier score which is defined as the mean squared difference between the observed and the predicted outcome. Expected survival was also calculated from the annual calibrated cumulative probabilities (Figure 1D).
To quantify the model’s discriminative capabilities in each year, receiver operating characteristic (ROC) curve analysis was performed and area under the curve (AUC) was calculated. The mean AUC of 1-, 2-, 3-, 4-, and 5-year calibrated cumulative probabilities was calculated and it served as the major metric to assess a model’s performance.
The model with the highest mean AUC was selected for further evaluation and it is referenced as the SEMMELWEIS-CRT (perSonalizEd assessMent of estiMatEd risk of mortaLity With machinE learnIng in patientS undergoing CRT implantation) score throughout the entire manuscript. To determine whether the model remains accurate when new data are fed into it, we tested it on the patients of the test cohort.
For each patient in the test cohort, we also computed pre-existing risk scores (Seattle Heart Failure Model, VALID-CRT, EAARN, ScREEN, and CRT-score).16–20 Their prediction capabilities were quantified annually with AUCs and they were compared with SEMMELWEIS-CRT score using the DeLong test.
To determine the major predictors of all-cause mortality in our patient population, permutation feature importances were computed from the final model. Permutation feature importance measures the importance of an input feature by calculating the increase in the model’s prediction error after permuting its values. A feature is considered important if shuffling its values decreases the model’s discriminative capability as the model relies heavily on that feature for the prediction. A feature is unimportant if shuffling its values leaves the AUC unchanged as in this case the model ignores the feature for the prediction.
The final training cohort included 1510 patients [66 ± 10 years, 1141 (76%) males] who underwent CRT implantation. A total of 158 CRT patients [67 ± 10 years, 127 (80%) males] were prospectively enrolled and entered to the test database. During the 5-year follow-up period, 805 (53%) patients died in the training cohort and there were 80 (51%) deaths in the test cohort. Supplementary material online, Table S3 shows the baseline characteristics of both cohorts and the comparisons between patients who were dead and alive at 5-year follow-up.
Among the evaluated ML classifiers, random forest (i.e. SEMMELWEIS-CRT score) yielded the highest AUCs for the prediction of all-cause mortality at 1-, 2-, 3-, 4-, and 5-year follow-up in the test cohort (Table 1 and Figure 2). Calibration improved the Brier scores of the final model (Supplementary material online, Table S5).
|1 year||2 years||3 years||4 years||5 years||Mean|
When compared with the pre-existing risk scores, the SEMMELWEIS-CRT score demonstrated significantly better response prediction and greater discrimination of mortality (Table 1). The CRT-score exhibited the best performance among the pre-existing risk scores; however, our random forest-based classifier was still superior to it for the prediction of 5-year outcome. Regarding the rest of the risk scores, the SEMMELWEIS-CRT score significantly outperformed them at all of the investigated time points.
Leading predictors of all-cause mortality are presented on Figure 3 and the full list of feature importances is provided as Supplementary material online, Table S6. Older age, higher serum levels of creatinine, lower values of left ventricular ejection fraction, serum sodium, haemoglobin concentration, and glomerular filtration rate were associated with higher predicted probability of all-cause mortality (Figure 4). However, as random forest captures complex high-level interactions among a multitude of variables, it is challenging to determine the effect of a single feature on the predicted probability of mortality and these individual relationships should be interpreted with caution.
Based on the predicted probability of death, patients were split into four quartiles at each year of follow-up. As depicted by Kaplan–Meier curves, there was significant difference in the distribution of events across the quartiles at all years and a graded increase in event rates could be observed while moving from the 2nd quartile through the 4th quartile (Figure 5). At 1-year follow-up, being categorized to the 4th quartile was associated with a more than 7-fold increased risk of death compared with those in the 1st quartile (Table 2). At 2-, 3-, 4-, and 5-year follow-up, patients in 3rd and 4th quartiles exhibited a significantly increased risk of mortality compared with those in the 1st quartile (Table 2). The expected survival of patients was monotonously decreasing from the 1st through the 4th quartile in each year (Supplementary material online, Table S7).
|1 year||2 years||3 years||4 years||5 years|
|2nd vs. 1st quartile||1.89||5.55||2.18||1.81||1.40|
|P = 0.301||P = 0.010||P = 0.142||P = 0.203||P = 0.439|
|3rd vs. 1st quartile||1.56||7.30||4.18||2.88||3.75|
|P = 0.487||P < 0.001||P = 0.002||P = 0.012||P < 0.001|
|4th vs. 1st quartile||7.92||21.55||10.59||8.16||6.71|
|P < 0.001||P < 0.001||P < 0.001||P < 0.001||P < 0.001|
In the present study, we developed and tested a ML-based risk stratification tool to predict all-cause mortality of CRT patients during a 5-year follow-up period (Take home figure). Among the evaluated ML classifiers, random forest demonstrated the best performance; therefore, this algorithm was used to create the SEMMELWEIS-CRT score. With an average AUC over 0.700, the SEMMELWEIS-CRT score significantly outperformed the other currently available risk scores. We also developed an online calculator (available at semmelweiscrtscore.com) to enable a convenient, interactive, and personalized calculation of predicted mortality in patients undergoing CRT implantation.
Cardiac resynchronization therapy induces reverse left ventricular remodelling and improves outcomes in a certain subgroup of heart failure patients.2,21 Despite these well-known beneficial effects, individual outcomes vary substantially. In the past years, several studies have investigated predictors that contribute to this variation and numerous prognostic models have been developed by combining multiple risk factors.16–19 However, these currently available risk scores have shortcomings and physicians are still reluctant to use them in daily clinical practice.8
The major limitation is the insufficient reliability and ineffectiveness for risk assessment at the individual patient level as outcome estimates have been extrapolated from large clinical trials. Although, these scores offer general guidance and they are effective at predicting outcomes at the population level, there remains a significant gap in the capability to predict outcomes for an individual patient.23 On the other hand, individual prognostication remains essential to develop appropriate personalized treatment plans and to make critical medical decisions based on life expectancy. These facts emphasize the need for more precise assessment through capturing the complex underlying interactions of predictors. With the SEMMELWEIS-CRT score, we intended to develop a more personalized approach for the risk assessment of patients undergoing CRT implantation.
Simultaneously interpreting the myriad risk predictors in an individual patient is challenging for clinicians. As a vast number of clinical variables associated with mortality needs to be considered, the complexity of assessment increases, making it more difficult for clinicians to draw an overall conclusion regarding risk in an individual patient. Moreover, the potential influence of complex and hidden interactions between several weaker predictors is often overlooked. In this study, we demonstrated that ML is capable to overcome these challenges by leveraging complex higher-level interactions among a multitude of clinical features. Accordingly, our model exhibited improved discrimination and predictive range with respect to all-cause mortality compared with the pre-existing risk scores. Moreover, the SEMMELWEIS-CRT score was capable of identifying patients with robustly increased risk of all-cause mortality (4th quartile) during the entire follow-up period.
With the increasing availability of enormous electronic datasets, ML algorithms have emerged as highly effective methods for medical prediction problems, with the potential to augment risk stratification.9 By making no a priori assumptions about causative factors, ML enables an agnostic exploration of all available data for non-linear patterns that may predict a particular individual’s risk, i.e. personalized risk stratification.
Our evaluation of ML algorithms was rigorous, including trials of numerous different classifiers within a wide hyper-parameter space. Among the evaluated algorithms, the best performing model was the random forest classifier which is consistent with previous studies using ML to predict clinical endpoints.13,24–26
There are various risk models available for the risk assessment of patients from the entire heart failure spectrum.20,27 However, in our analysis, we focused exclusively on CRT recipients and we generated models that recognize patterns in the clinical characteristics of this specific subset of heart failure patients. Moreover, many of the pre-existing scores provide risk estimates for only a distinct time interval. In contrast, our goal was to build a model that could assess the risk of mortality annually from 1 to 5 years. Recently, Kalscheur et al.13 have developed a ML-based risk assessment tool and their model exhibited comparable discriminative capabilities to ours. However, their model was limited to predict 1-year outcomes, while the SEMMELWEIS-CRT score offers prediction of mortality risk at 1-, 2-, 3-, 4-, and 5-year follow-up.
Ideally, ML models, such as the one developed in the present study, will be integrated into electronic medical record systems and they will operate in the background providing real-time, personalized risk assessment based on the electronically available clinical features. Consequently, clinicians do not have to calculate a patient’s risk manually that may enhance the model’s feasibility in clinical practice. Another potential benefit of ML algorithms is the capability to assimilate new data in real-time to continuously improve its own predictive accuracy.
The SEMMELWEIS-CRT score uses 33 clinical variables. Majority of them are routinely assessed during the management of heart failure; therefore, they are readily available from electronic medical records. Moreover, our model was designed in a way to tolerate moderate number of missing parameters, however, with special regards to the most important features, high percentage of missing values may reduce the reliability of the prediction.
We also identified the most important predictors of all-cause mortality in this patient cohort. Many of these features have been described previously as influencing CRT outcomes, such as advanced age, male gender, non-left bundle branch block QRS morphology, history of or present atrial fibrillation at implantation, impaired renal function, and increased comorbidity burden.28–30 However, it is challenging to assess the independent impact of each variable on the predicted risk of mortality as ML models capture higher dimensional, non-linear interactions among features.
The observed high efficacy of our random forest model suggests that ML should be integrated into the individual risk assessment of patients undergoing CRT implantation. We foresee that the role of ML-based prognostic risk scores will become increasingly relevant in the near future and structured, dense databases in combination with state-of-the-art analytic approaches will pave the way to precision cardiovascular medicine.
This study has several strengths and limitations to be acknowledged. To ensure the generalizability of our model, we trained our models with 10-fold cross-validation on a large database and we performed additional testing of the final model on an independent cohort of patients. However, our study represents results from a single centre; therefore, the SEMMELWEIS-CRT score should be validated in external centres to confirm its generalizability. Our score requires a broad spectrum of input variables that might discourage clinicians from its utilization at first glance. Thus, we designed our score in a way to tolerate missing values, nevertheless, it might be less reliable with a large number of missing variables. In spite of including established predictors of mortality in the final model, some relevant input features were excluded during model development due to the proportion of missing values. Inclusion of the omitted parameters (e.g. other comorbidities) might further improve the predictive capabilities of our model. Besides missing values, the relatively long-time course of retrospective data collection bears inherent limitations also regarding the changes in the guideline directed medical therapy. Another major limitation of risk score models is the lack of impact analyses to determine how the utilization of the models improves patient care and outcomes. Accordingly, future investigations should target the identification of treatment plans that specifically fit different levels of risk assessed by the SEMMELWEIS-CRT score. As the application of ML depends on the robustness of the database, practical use of our model in patient care would require careful and structured collection of data. However, this issue will resolve soon as large and structured databases are becoming widely available. Moreover, our model could be linked with electronic medical record systems to automatically calculate risk score obviating the manual computation of patients’ risk and potentially increasing the model’s use in clinical practice.
Using commonly available clinical variables, we developed and tested a random forest-based risk stratification system, the SEMMELWEIS-CRT score to effectively predict all-cause mortality in patients undergoing CRT implantation. Our ML-based risk assessment tool outperformed the pre-existing conventional risk scores. By capturing the non-linear association of predictors, the SEMMELWEIS-CRT score effectively outlined patient subgroups at high risk for mid- and long-term mortality. Therefore, the integration of these approaches into daily clinical practice may facilitate optimal candidate selection and prognostication of patients undergoing CRT implantation.
This work was supported by the National Research, Development and Innovation Office of Hungary (NKFIA; NVKP_16-1-2016-0017 National Heart Program) and the Higher Education Institutional Excellence Program of the Ministry for Innovation and Technology in Hungary, within the framework of the Therapeutic Development thematic program of the Semmelweis University.
Conflicts of interest: B.M. receives lecture fees from Biotronik, Medtronic and Abbott. Other authors declare no conflicts of interest regarding this manuscript.