Journal of Drug Assessment
Taylor & Francis
Measuring problem prescription opioid use among patients receiving long-term opioid analgesic treatment: development and evaluation of an algorithm for use in EHR and claims data
Volume: 9, Issue: 1
DOI 10.1080/21556660.2020.1750419
  • PDF   
  • XML   

Objective Opioid surveillance in response to the opioid epidemic will benefit from scalable, automated algorithms for identifying patients with clinically documented signs of problem prescription opioid use. Existing algorithms lack accuracy. We sought to develop a high-sensitivity, high-specificity classification algorithm based on widely available structured health data to identify patients receiving chronic extended-release/long-acting (ER/LA) therapy with evidence of problem use to support subsequent epidemiologic investigations.Methods Outpatient medical records of a probability sample of 2,000 Kaiser Permanente Washington patients receiving ≥60 days’ supply of ER/LA opioids in a 90-day period from 1 January 2006 to 30 June 2015 were manually reviewed to determine the presence of clinically documented signs of problem use and used as a reference standard for algorithm development. Using 1,400 patients as training data, we constructed candidate predictors from demographic, enrollment, encounter, diagnosis, procedure, and medication data extracted from medical claims records or the equivalent from electronic health record (EHR) systems, and we used adaptive least absolute shrinkage and selection operator (LASSO) regression to develop a model. We evaluated this model in a comparable 600-patient validation set. We compared this model to ICD-9 diagnostic codes for opioid abuse, dependence, and poisoning. This study was registered with as study NCT02667262 on 28 January 2016.Results We operationalized 1,126 potential predictors characterizing patient demographics, procedures, diagnoses, timing, dose, and location of medication dispensing. The final model incorporating 53 predictors had a sensitivity of 0.582 at positive predictive value (PPV) of 0.572. ICD-9 codes for opioid abuse, dependence, and poisoning had a sensitivity of 0.390 at PPV of 0.599 in the same cohort.Conclusions Scalable methods using widely available structured EHR/claims data to accurately identify problem opioid use among patients receiving long-term ER/LA therapy were unsuccessful. This approach may be useful for identifying patients needing clinical evaluation.

Carrell, Albertson-Junkans, Ramaprasan, Scull, Mackwood, Johnson, Cronkite, Baer, Hansen, Green, Hazlehurst, Janoff, Coplan, DeVeaugh-Geiss, Grijalva, Liang, Enger, Lange, Shortreed, and Von Korff: Measuring problem prescription opioid use among patients receiving long-term opioid analgesic treatment: development and evaluation of an algorithm for use in EHR and claims data



The federal government has declared the epidemic of opioid-related harms in the United States1–4 to be a public health emergency5, and a committee convened by the National Academy of Sciences Engineering and Medicine has concluded that a coordinated response will be needed to reverse the escalating prevalence of these harms6. Opioid surveillance, a key component in this response, is hampered by the absence of accurate, scalable surveillance methods for identifying patients with problem opioid use7, 8. To date, most large-scale investigations of problem use have relied on International Classification of Diseases, Ninth Revision (ICD-9) diagnostic codes for opioid abuse (305.*), dependence or addiction (304.*) and/or poisoning (965.00, 965.02, 965.09, E850; Supplementary Appendix A)9–15 despite their poor sensitivity16, 17. Recent research indicates some patients without formal diagnoses have clinical documentation of problem opioid use in encounter notes (e.g. discussion of opioid use disorder treatment options)17, suggesting that more sophisticated structured data algorithms might allow for more accurate identification of patients with problem opioid use.

This study is one of 11 post-marketing requirements (PMR) studies for extended-release, long-acting opioid analgesics (ER/LA).


The objective of this study was to use a moderate amount of manually-curated gold standard data to develop a computable algorithm that accurately identified patients experiencing problem prescription opioid use, and to use this algorithm to generate gold standard data to support epidemiologic investigations among a collection of 11 PMR studies. In order to allow the resulting algorithm to be applied in very large healthcare data sets, inputs to the algorithm were restricted to structured health data such as diagnosis, procedure and medication codes that are widely available from medical claims records or their equivalent derived from electronic health records (EHRs). This study focuses on ER/LA recipients because it was conducted pursuant to a United States Food and Drug Administration (FDA) request to companies holding New Drug Applications for ER/LA opioids (as distinct from immediate-release opioids) to conduct post-marketing studies to assess the serious risks associated with long-term ER/LA use18–20. The study design was reviewed by a panel of experts at a two-day FDA public meeting in 201421. The protocol (PMR 3033-7) is available at www.clinicatrials.gov22. Gold standard data generated using the algorithm developed in this study were to be combined with gold standard data on opioid-related overdoses developed in a companion study and used to investigate the incidence and epidemiology of problem opioid use and opioid-related overdose and death23 in a very large patient cohort combining data from Kaiser Permanente Northwest (KPNW), KPWA, Optum, and Tennessee Medicaid. As such, this study also contributes to an emerging literature on automated methods to determine patient phenotypes or case status in “big” healthcare data to support clinical, epidemiological and surveillance research without the need for expensive, sample-constraining manual chart review24–26.

Our operational definition of clinically-documented problem opioid use is described elsewhere27. Briefly, we define problem opioid use as a spectrum of behaviors and symptoms associated with the unhealthy use of prescription opioid medications. This definition includes, but does not require, clinically-documented evidence of the behavioral or physiological manifestations of substance use disorder as defined in the Diagnostic and Statistical Manual of Mental Disorders, version 5 (DSM-5). We prefer this more inclusive definition because (1) chart notes often lack details needed to support a rigorous clinical diagnosis of substance use disorder – even for patients with substance use disorders, and (2) the public health motivation for this research is not limited to clinically diagnosed opioid use disorder (OUD). By “clinically documented” we simply mean that the information is recorded in patient charts; this does not imply that a formal clinical diagnosis of substance use disorder has been made. We aimed to produce an algorithm with sensitivity ≥0.90 at a positive predictive value (PPV) ≥0.90. However, given the limitations of structured EHR/claims data we specified in advance minimally acceptable sensitivity of ≥0.75 at PPV ≥0.75. As a secondary objective, we compared our algorithm to a simple algorithm based on diagnosis codes commonly used in the scientific literature (Supplementary Appendix A)9–15.



The setting for this study was Kaiser Permanente Washington (KPWA, formerly Group Health Cooperative), where over 890,000 patients received outpatient care documented in an Epic EHR system28 during the study period, 1 January 2006 to 30 June 2015. Data used was limited to structured health data (including diagnosis, procedure and medication codes) widely available from medical claims records or its equivalent derived from EHRs (hereafter referred to as EHR/claims data). We deliberately focused on EHR/claims data so that the resulting algorithm could be applied in a wide variety of settings, including claims databases representing tens of millions of lives29. To the KPWA EHR data, we added claims data for outpatient, urgent, inpatient, and chemical dependence care received by KPWA patients outside KPWA. Medications for outside chemical dependence care were represented in the KPWA EHR. Encounter, diagnosis, procedure, and medication records were combined and transformed into the Sentinel Common Data Model (CDM, version 6)30, 31, which is applicable to large sectors of the US population32. A research team at Kaiser Permanente Washington Health Research Institute had access to study patients’ complete outpatient (including primary and specialty care) EHR charts and manually reviewed this information to create reference standard data regarding the presence of documented signs of problem opioid use27.

Study cohort and sample

Patients eligible for this study were ≥18 years of age by 1 January 2006 and had received ≥60 days’ supply of extended-release or long-acting (ER/LA) opioid analgesics (including transdermal or oral opioids and excluding buprenorphine) in any 90-day span during the study period (“long-term ER/LA”). We did not exclude patients exposed to ER/LA medications prior to the start of the study period (i.e. we studied a “prevalent user” cohort). We excluded patients receiving nursing home or hospice services during the study period. Study eligibility was independent of exposure to immediate-release (IR) opioids or the presence or absence of other conditions or diagnoses. Study patients were required to have ≥24 months of continuous enrollment, including ≥6 months prior to and ≥18 months following the first ER/LA dispensing in a patient’s earliest qualifying long-term ER/LA episode (the patient’s index date). We also required patients to have at least eight study quarters with EHR-documented encounters to assure opportunities for clinicians to observe and document patient issues.

Our stratified random sample of 2,000 patients was enriched with patients 18–35 years of age and patients with diagnoses during the study period of opioid dependence, abuse, and/or poisoning (Supplementary Appendix A), both of which are known correlates of problem opioid use9, 33–35. We randomly assigned 70% (n = 1,400) to an algorithm training set and reserved 30% (n = 600) for a one-time evaluation of the final algorithm. Assuming a 20% prevalence of problem use and algorithm performance of 80% sensitivity and 80% specificity, the 95% confidence intervals for sensitivity and specificity in this validation set would be 71–89% and 76–84%, respectively.

Reference standard

The creation of reference standard data by manual chart review is described elsewhere27. Briefly, experienced chart abstractors following a written protocol manually reviewed each patient’s entire outpatient chart to determine whether signs of problem opioid use were clinically documented, and if so the earliest date of documentation (“onset date”). Determinations regarding problem use were based on the totality of the evidence in the chart; determinations were negative if evidence was weak or ambiguous27. Inter-rater reliability among charts receiving a single review was high (Cohen’s kappa = 0.83).

Algorithm development

Each patient’s EHR and claims data were the source data for algorithm development. A study team of clinicians, epidemiologists and medical records experts formed operational definitions of a large number of candidate predictor variables using training data informed by findings reported in the literature8, 36–40, clinical experience, and qualitative insights gained from the manual review of 80 charts comparable to but not included in the study sample. Candidate predictors were typically binary (yes/no) measures reflecting patient demographics, diagnoses, encounters, and utilization data elements, individually or in combination.

To gauge potential “signal” in individual candidate predictors we calculated the following risk ratio (RR):

RR=Percentage ofproblem use POSITIVES with predictor set to TRUEPercentage of problem use NEGATIVES with predictor set to TRUE

We considered candidate predictors with larger values of RR and larger numbers of patients positive for the predictor (or, for interval level predictors, above a reasonable cut-point) to indicate greater discriminating signal. Using this information, we iteratively refined candidate predictors. We used a similar analytic approach to dichotomize some continuous candidate predictors. We included age-group interactions with candidate predictors when such interactions were scientifically compelling.

We used adaptive least absolute shrinkage and selection operator (LASSO) logistic regression41, 42, as implemented in the “lqa” R package43 to identify a subset of candidate predictors for the final algorithm. We used adaptive LASSO because we wanted a parsimonious and transparent prediction model. Traditional LASSO is a regression analysis method that selects predictors by penalizing, or “shrinking toward zero,” coefficients of candidate predictors that do not substantially improve algorithm accuracy; adaptive LASSO extends traditional LASSO by favoring predictors with stronger initial associations with the outcome44. Implementing adaptive LASSO requires a gamma parameter, which is an exponent applied to the coefficient weights that determine how much the initial estimates of associations with the outcome influences the model fitting, and a lambda parameter, which influences how sparse the final model will be. We used the inverse of the absolute value of coefficients obtained from ridge regression to estimate lambda coefficient weights as is recommended when the ratio of predictors to sample size is large43.

To select the parameter values, we used eight-fold cross-validation on the training data, performing a grid search over values of both gamma and lambda. We avoided smaller folds because they may lack enough events to estimate a rich model. Our metric for evaluating model fit given lambda and gamma was the sum of squares in the left-out portion of the cross-validation sample: in(yiy^i)2, where y^i is the predicted value of the ith data point in the left-out portion of the cross-validation sample using the prediction model estimated in the cross-validation sample. After selecting both lambda and gamma using cross validation, we estimated the predictive model on the entire training set using adaptive LASSO with these lambda and gamma values; this produced the model for the final classification algorithm, which predicted the logit of the probability of chart-documented problem use as a linear combination of the retained terms, plus selected interactions between these. The model-specified (“fitted”) probability was used as a risk score for each patient. Because both training and validation data oversampled higher-risk patients, we calculated weights based on the inverse of each patient’s probability of selection45–47 (i.e. design weights) to reweight the analytic datasets back to the pool of eligible patients to estimate prevalence.

Observation period for algorithm implementation

Performance of claims-based algorithms may improve as the data collection period increases12, but the duration of continuous enrollment may vary considerably across the diverse healthcare settings where this algorithm was intended to be used48, 49. We, therefore, used a 36-month observation period, including 12 months before and 24 months after a patient’s ER/LA index date, because >50% of study-eligible KPWA, KPNW, Optum/Humedica, and Tennessee Medicaid (settings where the algorithm was to be applied) had ≥36 months of continuous enrollment. This period allowed for adequate capture of patient information without bias toward patients with longer enrollment. Including 12 months pre-index allowed us to assess patients’ experience prior to long-term ER/LA use.

We operationalized reference standard outcomes to reflect the 36-month observation period. Patients with signs of problem use before or during the 36-month period were considered positive, and patients without evidence or whose onset occurred after the 36-month period were considered negative.

Algorithm evaluation

During algorithm development and for final evaluation we used cut points on algorithm-calculated risk scores to classify patients as positive (values at or above the cut point) or negative (all other values) for problem use. We did this for selected cut-points chosen to optimize performance with (a) desirable sensitivity, (b) desirable specificity, (c) desirable PPV, or (d) balanced sensitivity and PPV. All cut points were selected based on training data. To evaluate the final algorithm, we used these cut points and reported algorithm performance in validation data by comparing. algorithm classifications to reference standard classifications.

Our algorithm evaluation metrics were:

    Sensitivity (recall or true positive rate):
    true positives/(true positives + false negatives),
    Specificity (true negative rate):
    true negatives/(false positives + true negatives),
    Positive predictive value (PPV or precision):
    true positives/(true positives + false positives), and
    Negative predictive value (NPV):
    true negatives/(true negatives + false negatives).

We characterize tradeoffs in algorithm sensitivity and specificity graphically using receiver operating characteristic (ROC) curves.

To compare the final algorithm’s performance to an approach commonly reported in the literature, we operationalized a simple ICD-9 code-based algorithm which classified a patient positive if they had an ICD-9 diagnosis code for prescription opioid dependence, abuse, or poisoning (Supplementary Appendix A) at any time during the observation period and negative otherwise.

This study was approved by the Human Subjects Review Board of Kaiser Permanente Washington.


The study sample and manual chart review results are described elsewhere27. Briefly, 3,728 patients met the study inclusion and exclusion criteria (Table 1). Median total days’ supply of ER/LA medications dispensed during each patient’s earliest qualifying continuous enrollment period was 1,208 days (interquartile range [IQR] 257–1,837 days; range 60–6,684 days). The median age was 52 years (IQR: 44–60, range: 20–96), 55% were women, and 79% were white (Table 1). The prevalence of reference-standard problem use at any time during the 9.5-year study period, weighted to account for sampling probabilities, was 29.3%, and 23.0% when limited to the 36-month observation period used for algorithm evaluation.

Table 1.
Demographic characteristics of study-eligible Kaiser Permanente Washington patients (n = 3,728), patients sampled for inclusion in Study 3B (n = 2,000), and patients randomly assigned to the training (n = 1,400) and validation (n = 600) samples.
 Eligible for studyFull study sampleTraining sampleValidation sample
Demographic characteristicn%n%n%n%
Number of patients3,728100%2,000100%1,400100%600100%
Age at ER/LA index date
 Mean (SD)55 (13.4) 52 (13.4) 52 (13.3) 52 (13.6) 
 Min20 20 20 20 
Median52 52 52 51 
 Max96 96 96 94 
 18–34 years2296.122911.515911.37011.7
 35–54 years1,73446.595847.966247.329649.3
 55–64 years100827.048424.234624.713823.0
 65 + years75720.332916.523316.69616.0
 Black/African American1433.8733.7543.9193.1
 Native American/Alaska Native1203.2693.5463.3233.8
 Hawaiian/Pacific Islander200.5110.680.630.5
 Unknown/not specified39810.619611.516211.56811.5

We operationalized 1,126 candidate predictor variables. Briefly, these included demographic measures; the Charlson Comorbidity Index; other medication; medications used to treat opioid use disorder; diagnoses of pain, mental health conditions, other substance use/disorders, and opioid overdose; emergency room utilization; physical therapy utilization; measures characterizing opioid prescription fill patterns and morphine-equivalent dose; and a variety of clinically-relevant interaction terms (summarized in Table 2; details in Supplementary Appendix C). Our candidate predictors did not include the administration of naloxone. This was because we found, in a companion study of opioid overdose, that naloxone is often not captured in structured EHR data and, in any case, is often administered presumptively by emergency care personnel before opioid involvement is assessed, thereby reducing the predictive power of naloxone administration50. A plurality of candidate predictors characterized opioid dispensing. For example, one such predictor indicated whether a patient received during any 3-month period ≥3 partially overlapping IR dispensing with ≤14 days’ supply on a Saturday, Sunday, or Monday. Information about encounters and non-opioid medications were also commonly represented in predictors. Some predictors were created by varying the values of key elements if doing so preserved face validity (e.g. morphine equivalent dose [MEQ] of ≥33% versus ≥50% versus ≥75% over consecutive calendar quarters).

Table 2.
Categories of 1,126 candidate predictor variables operationalized from Sentinel demographics, encounters, diagnoses, procedures and medications EHR/claims data considered for inclusion in the classification algorithm to identify patients with chart-documented problem opioid use.
CategoryOperationalization notesa
 Pain DiagnosesBack pain, other back or neck disorder, headache or migraine, neuropathic pain, fibromyalgia, arthritis
 Change in pain location over timeChange during various time intervals (days, weeks, months)
 Count of distinct pain locationsLower back, other back or neck disorder, headache or migraine, neuropathic pain, fibromyalgia, arthritis
 Mental health disordersDepression, bipolar disorder, anxiety disorder, other mental health disorders, other mood disorder, schizophrenia/schizoaffective
 Problem opioid useDependence, abuse, poisoning (excluding heroin), heroin
 Non-opioid substance use disorderAlcohol disorder, specified drug dependence, cannabis dependence, combination of drug dependence, nondependent drug abuse, tobacco use disorder
 Sleep disorderInsomnia, psychophysiological insomnia, inadequate sleep hygiene, insomnia due drug or substance, insomnia due to medical condition, physiologic (organic) insomnia, hypersomnia of central origin, central sleep apnea syndrome, isolated sleep symptoms, concurrent use of opioids and insomnia diagnosis
 Psycho-social traumaPost-Traumatic stress disorder (PTSD), domestic violence (E-codes, V-codes)
 Hepatitis/cirrhosisEver/never; counts (overall, by month, by quarter); percent of quarters
 EndocarditisEver/never; counts (overall, by month, by quarter); percent of quarters
 ComorbiditiesCharlson comorbidity index; point in time and change over time
 Accidental injury or poisoning due to drugs (E-codes)Opioids, non-narcotic analgesics, barbiturates and sedatives, psychoactive medications, other drugs
 Adverse Effects from psychoactive drugs (E-codes)Ever/never; counts (overall, by month, by quarter); percent of quarters
 Days’ supplyTotal days’ supply overall, per month, per quarter; ER/LA and SA/IR combined and by type; percent change in days’ supply over time; ever/never and count of quarters with excess days’ supply
 Medications used for the treatment of substance use disorderTotal days’ supply overall, per month, per quarter; ever/never use at various points in time and relative to index date
 Opioid dispensingsEver/never by month, by quarter; counts overall, by month, by quarter; in proximity with other medication dispensings (days, weeks, quarters); by day of the week
 Psychoactive medicationsVarious versions, including antidepressant medications, antianxiety medications, muscle relaxers, homeopathic dispensings, benzodiazepine, barbiturate, hypnotics, anticonvulsants, add medication, lithium, stimulants
 Concomitant use of opioids and other psychoactive medicationsEver/never; counts (overall, by month, by quarter); percent of quarters; number of different medications used concomitantly
 Overlapping dispensings ("early fills")Ever/never; counts (overall, by month, by quarter); percent of quarters; operationalized in a variety of ways including by NDC, by opioid type, by day of the week and other characteristics of dispensings
 Morphine equivalence dosing (MEQ or MED)Various versions, including average daily meq, meq per day of supply, changes in meq over time, high meq by dispensing and by time period (month, quarter), by opioid type (short acting versus long acting)
 Medications used to treat opioid use disorderTotal days’ supply overall, per month, per quarter; ever/never use at various points in time and relative to onset date; frequency of dispensings
 Concurrent use of opioids and pain diagnosisEver/never; counts (overall, by month, by quarter); percent of quarters
 Emergency room (ER) encountersVarious versions, including opioids dispensed on the same date as emergency room encounters, day of week, ever/never and count of emergency room encounters during opioid use, emergency room encounters during concomitant use of opioids and other psychoactive medication(s)
 Treatment of substance use disorderEver/never; counts (overall, by month, by quarter); percent of quarters
 Urine drug screeningEver/never; counts (overall, by month, by quarter); percent of quarters; number of urine drug screen in close proximity to other risk indicators such as overlapping dispensings and high MEQ
 SurgeryVarious version, based on type, opioid use prior to and after surgery, diagnoses in close proximity to surgery
Combinations and interactions
 Combinations of data from multiple sourcesVarious versions, including frequency of urine drug screening during periods of overlapping opioid dispensings, emergency room encounters during periods of overlapping opioid dispensings, emergency room encounters during periods of excess days’ supply of opioids, emergency room encounters during concomitant use of opioids and other psychoactive medications, emergency room encounters during periods of high morphine equivalence dose
 InteractionsOver 100 interaction terms including interactions with patient age, patient gender, and interactions between selected diagnoses
a Most potential predictors were derived in a variety of ways in both continuous and binary forms, including but not limited to: ever/never, frequency (overall, by month, by quarter), percent of time or visits, and/or in combination with other variables.

The final adaptive LASSO model incorporated 53 of the 1,126 candidate predictors. These 53 predictors (Supplementary Appendix B) included age, sex, diagnosis of opioid-dependence; diagnoses of comorbidities including mental health disorders, alcohol use disorder, non-opioid drug dependence, tobacco use disorder and anxiety disorder; various measures of opioid dispensings based on days’ supply and MEQ; dispensing of opioids concomitantly with other medications such as benzodiazepines; various measures of early refills; opioid dispensing in proximity to ER encounters; the history of receiving medications used to treat drug dependence; the coincidence of urine drug screening and dispensing of opioid medications; pain diagnoses; and interaction terms based on patient age.

The performance of the final classification model is summarized in Table 3 and Figure 1. Performance in training data where algorithm sensitivity and PPV were balanced was 0.706 and 0.703, respectively, decreasing to 0.582 and 0.572, respectively, in validation data (Table 3, row 10), well below our a priori minimally acceptable level. A risk score cut point with high sensitivity (0.900 in training data and 0.850 in validation data; Table 3, row 1) yielded modest PPV (0.429 in training data and 0.412 in validation data). Conversely, a risk score cut point with high PPV (0.900 in training data and 0.774 in validation data; Table 3, row 7) yielded low sensitivity (0.356 in training data and 0.296 in validation data). The ROC curve (Figure 1) reveals consistent tradeoffs between sensitivity and specificity throughout the range of scores.

Receiver operating characteristic (ROC) curve for the problem opioid use classification algorithm in the training set (solid line), validation set (dashed lines), and sensitivity and specificity of the simple binary algorithm based on ICD-9 diagnosis codes for opioid abuse, dependence and poisoning (circle).
Figure 1.
Receiver operating characteristic (ROC) curve for the problem opioid use classification algorithm in the training set (solid line), validation set (dashed lines), and sensitivity and specificity of the simple binary algorithm based on ICD-9 diagnosis codes for opioid abuse, dependence and poisoning (circle).
Table 3.
Problem opioid use classification algorithm performance in the 1,400-patient training set and the 600-patient validation set, for selected values of the algorithm-generated risk score with desired performance characteristics (based on training data), as measured by sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV).
RowDesired performance characteristic (based on training data)Risk score cut-pointSensitivitySpecificityPPV§NPVPred. prevalence¥
1SensitivityExcellent (0.90)0.1220.9000.8500.6410.6400.4290.4120.9550.93556%56%
2Good (0.80)0.2290.8000.7290.8270.7860.5810.5030.9330.90740%42%
3Acceptable (0.75)0.2780.7520.6290.8790.8410.6510.5410.9220.88435%35%
4SpecificityExcellent (0.90)0.3110.7360.6200.9000.8670.6880.5800.9190.88532%33%
5Good (0.80)0.2020.8210.7380.8000.7640.5510.4810.9370.90743%44%
6Acceptable (0.75)0.1690.8610.7760.7510.7270.5090.4570.9480.91647%48%
7PPVExcellent (0.90)0.7050.3560.2960.9880.9740.9000.7740.8370.82314%13%
8Good (0.80)0.4780.5450.4860.9590.9340.8000.6850.8760.85922%23%
9Acceptable (0.75)0.3930.6290.5440.9370.9050.7500.6310.8940.87026%28%
10Sensitivity and PPV are balanced0.3300.7060.5820.9110.8710.7030.5720.9120.87530%31%
Sensitivity is the proportion of people correctly classified as having problem opioid use by the algorithm, defined as: Number of people identified with chart review to have problem opioid use and correctly classified by the algorithm to have problem opioid use/the number of people identified with chart review to have problem opioid use.
Specificity is the proportion of people correctly classified as not having problem opioid use by the algorithm, defined as: Number of people identified with chart review to not have problem opioid use and correctly classified by the algorithm to not have problem opioid use/the number of people identified with chart review to not have problem opioid use.
§ Positive predictive value is the proportion of people the algorithm classifies as having problem opioid use who have problem opioid use identified by chart review, defined as: Number of people identified with chart review to have problem opioid use and classified by the algorithm to have problem opioid use/the number of people identified to have problem opioid use by the algorithm.
Negative predictive value is the proportion of people the algorithm classifies as not having problem opioid use identified by chart review, defined as the number of people identified with chart review to not have problem opioid use and classified by the algorithm to not have problem opioid use/the number of people identified to have problem opioid use by the algorithm.
¥ This is the unadjusted predicted prevalence, defined as the percent of patients in the training sample predicted to be problem opioid use positive using the corresponding risk score cut point. The unadjusted prevalence of problem opioid use positive patients in the training sample was 36.5% (511/1,400).

The simple ICD-9 algorithm yielded a sensitivity of 0.399, PPV of 0.599, a specificity of 0.922 and a negative predictive value of 0.836 (Figure 1).


Our algorithm to detect clinician-documented signs of problem prescription opioid use based on a rich set of candidate predictors derived from medical claims data performed better than commonly used algorithms based on a simple set of ICD-9 diagnosis codes. However, performance in a cohort of long-term ER/LA opioid recipients was below our minimally acceptable level and not, therefore, suitable for gold standard case identification in epidemiologic investigations. If the balanced sensitivity/PPV version of the algorithm were used to classify patients it would overlook over 40% of actual cases, and 40% of patients classified as having problem use would be wrongly classified. Versions of the algorithm that preserved sensitivity would severely sacrifice PPV and vice-versa.

Despite its shortcomings for generating gold standard data, the modeling approach used here may be useful for developing clinical screening algorithms applicable to all recipients of long-term opioid therapy (not just ER/LA recipients) needed to identify patients at elevated risk of developing problem opioid use51. Such algorithms would use a patient’s EHR data preceding an upcoming encounter to calculate risk as of that encounter (rather than using data before and after ER/LA initiation, as in the present algorithm). To limit false-positive classifications, a problem opioid use risk score would be calibrated to emphasize specificity (rather than sensitivity), as is common in screening efforts to avoid high false-positive rates52, 53.

We can speculate about possible reasons for the limited success of this algorithm. First, though it was not anticipated when this study was planned in 2014, focusing on a prevalent ER/LA user cohort, most of whom had substantial exposure to prescription opioids prior to their study index dates , may have severely complicated the algorithm development task. By not beginning observation at patients’ first exposure to long-term opioid therapy (including immediate-release formulations) the indicators of cause and effect related to problem use may have been confounded, increasing perplexity during algorithm training. It is possible, for example, that clinicians may have transitioned some patients to ER/LA therapy because of concerns about problematic use, a reasonable strategy given reports that ER/LA formulations carry reduced abuse/addiction potential54, 55. Such channeling bias may also have inflated the observed prevalence of problem use.

Second, and also unanticipated when this study was planned, structured EHR/claims data alone may lack the nuance required to accurately identify signs of problem opioid use, a highly complex phenomenon56, 57. To accurately identify this outcome algorithmically, it may be necessary to incorporate richer EHR data, including information from unstructured chart notes, thereby precluding the algorithm’s use in medical claims databases. Previous attempts to identify patients experiencing problem opioid use have yielded varying results7, 58. Multiple screening tools have been developed8, but alternative approaches have sometimes given discordant results59. Distinguishing among subgroups of patients receiving long-term opioid therapy – based on age group, comorbidity profiles, or coterminous use of medications that amplify risks such as benzodiazepines – rather than attempting to use a single algorithm to identify all patients with problem use may improve algorithm performance. It is possible that more detailed diagnostic coding in the ICD-10 era (which began after our study period) may contain additional useful information.

Limitations of this study should be noted. First, we used professional chart abstractors rather than clinicians to create the reference standard, and some may consider clinician review to be superior. However, inter-rater agreement, the most objective indicator of high-quality abstraction, was very strong in this study and abstraction was guided by a detailed protocol27. Second, while adaptive LASSO is an appropriate method when candidate predictors exceed the number of outcome events, it is possible other modeling methods such as neural networks may have yielded somewhat better results. Third, this work was conducted in a single site; results elsewhere may vary. It is noteworthy that in a companion study of opioid overdose, the performance of an opioid overdose algorithm developed at Kaiser Permanente Northwest, which was very good, performed very similarly in Optum claims data, Medicaid data for the State of Tennessee, and Kaiser Permanente Washington50.


Our attempt to develop a single automated algorithm for generating gold standard classifications regarding the presence or absence of problem opioid use in a prevalent user cohort of patients receiving long-term ER/LA therapy was unsuccessful. The approach reported here may have utility for developing screening tools to identify patients for whom further clinical evaluation is warranted. Future work should focus on incident long-term opioid recipients (without distinguishing ER/LA from IR) and target subgroups of patients whose clinical course may be more homogeneous and, therefore, more likely to be reflected in structured EHR/claims data.




Declaration of funding

This study was funded by the Opioid PMR Consortium (OPC), which is comprised of companies that hold NDAs of extended-release and long-acting analgesics, working in response to collective post-marketing requirements from the US Food and Drug Administration ( The study was designed in collaboration between OPC members and investigators with input from the FDA. Investigators maintained intellectual freedom in terms of publishing final results.

The study is part of a program of 11 post-marketing study requirements being implemented by the OPC. At time of study conduct, the OPC consisted of the following companies: Allergan; Assertio Therapeutics, Inc.; BioDelivery Sciences, Inc.; Collegium Pharmaceutical, Inc.; Daiichi Sankyo, Inc.; Eaalet Corporation; Endo Pharmaceuticals, Inc.; Hikma Pharmaceuticals USA Inc.; Janssen Pharmaceuticals, Inc.; Mallinckrodt Inc.; Pernix Therapeutics Holdings, Inc.; Pfizer, Inc.; and Purdue Pharma, LP.

Declaration of financial/other interests

DSC, LAJ, AR, GS, MM, EJ, DJC, KH, SMS, and MVK are employees of Kaiser Permanente Washington. CAG, BH, and SLJ are employees of Kaiser Permanente Northwest. CAG has since retired. PMC and ADG were employees of Purdue Pharma, LP at the time the work was conducted and are currently employees of Johnson & Johnson and Indivior, Inc. respectively. CL and CLE are employees of Optum, Inc. AB is an employee of Amazon, CGG is an employee of Vanderbilt University, and JL is an employee of The Fred Hutchinson Cancer Research Center. Prior to conducting the work described here DSC, AR, DSC, KH, and MVK worked on projects funded by grants to Kaiser Permanente Health Research Institute for research on opioid risks funded by Pfizer, Inc. Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Author contributions

DSC, CAG, BH, SLJ, PMC, ADG, CGG, CL, CLE, SMS, and MVK contributed to the conception or design of the work. LAJ, AR, GS, MM, EJ, DJC, AB, SLJ, CL, JL, and SMS contributed to the acquisition and analysis of data. All authors contributed to the interpretation of data and the drafting or critical revision of the manuscript for important intellectual content. All authors gave final approval of the manuscript and agreement to be accountable for all aspects of the work.

Ethical approval

This observational, retrospective study was approved by the Human Subjects Review Board of Kaiser Permanente Washington and was conducted in accordance with ethical best practices. The study was based on secondary use of historical patient data; it did not entail any patient contact.

Informed consent

This study was conducted under a waiver of informed consent obtained from the Human Subjects Review Board of Kaiser Permanente Washington.



Guy GP, Zhang K, Bohm MK, et al.. Vital signs: changes in opioid prescribing in the United States, 2006-2015. MMWR Morb Mortal Wkly Rep. 2017;66(26):, pp.697–704.


Manchikanti L, Fellows B, Ailinani H, et al.. Therapeutic use, abuse, and nonmedical use of opioids: a ten-year perspective. Pain Physician. 2010;13(5):, pp.401–435.


Substance Abuse and Mental Health Services Administration (SAMHSA) [Internet] Rockville (MD): SAMHSA. Results from the 2013 National Survey on Drug Use and Health: Summary of national findings; 2014 [cited 2017 Nov 1]; Available from:


Rudd RA, Seth P, David F, et al.. Increases in drug and opioid-involved overdose deaths – United States, 2010-2015. MMWR Morb Mortal Wkly Rep. 2016;65(5051):, pp.1445–1452.


Hirschfield DJ.Trump declares opioid crisis a ‘health emergency’ but requests no funds. New York (NY): The New York Times Company; 2017.


National Academies of Sciences Engineering and Medicine.Pain management and the opioid epidemic: balancing societal and individual benefits and risks of prescription opioid use. Washington (DC): The National Academies Press; 2017.


Vowles KE, McEntee ML, Julnes PS, et al.. Rates of opioid misuse, abuse, and addiction in chronic pain: a systematic review and data synthesis. Pain. 2015;156(4):, pp.569–576.


Sehgal N, Manchikanti L, Smith HS.. Prescription opioid abuse in chronic pain: a review of opioid abuse predictors and strategies to curb opioid abuse. Pain Physician. 2012;15(3):, pp.ES67–ES92.


Sullivan MD, Edlund MJ, Fan MY, et al.. Risks for possible and probable opioid misuse among recipients of chronic opioid therapy in commercial and medicaid insurance plans: the TROUP Study. Pain. 2010;150(2):, pp.332–339.


White AG, Birnbaum H, Schiller M.. Economic impact of opioid abuse, dependence, and misuse. Am J Pharm Benefits. 2011;3:, pp.e59–e70.


White AG, Birnbaum HG, Mareva MN, et al.. Direct costs of opioid abuse in an insured population in the United States. J Managed Care Special Pharma. 2005;11(6):, pp.469–479.


White AG, Birnbaum HG, Schiller M, et al.. Analytic models to identify patients at risk for prescription opioid abuse. Am J Manag Care. 2009;15:, pp.897–906.


Rice JB, White AG, Birnbaum HG, et al.. A model to identify patients at risk for prescription opioid abuse, dependence, and misuse. Pain Med. 2012;13(9):, pp.1162–1173.


Shei A, Rice JB, Kirson NY, et al.. Sources of prescription opioids among diagnosed opioid abusers. Curr Med Res Opin. 2015;31(4):, pp.779–784.


Shei A, Rice JB, Kirson NY, et al.. Characteristics of high-cost patients diagnosed with opioid abuse. J Managed Care Special Pharma. 2015;21(10):, pp.902–912.


Rowe C, Vittinghoff E, Santos GM, et al.. Performance measures of diagnostic codes for detecting opioid overdose in the emergency department. Acad Emerg Med. 2017;24(4):, pp.475–483.


Carrell DS, Cronkite D, Palmer RE, et al.. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform. 2015;84(12):, pp.1057–1064.


United States Food and Drug Administration [Internet] Silver Spring (MD): FDA. New safety measures announced for extended-release and long-acting opioids. Updated 5 November 17; 2017 [cited 2017 Nov 1]; Available from:


United States Food and Drug Administration [Internet] Silver Spring (MD): FDA. Labeling supplement and PMR required; 2017 [cited 2017 Nov 1]; Available from:


Coplan PM, Cepeda MS, Petronis KR, et al.. Postmarketing studies program to assess the risks and benefits of long-term use of extended-release/long-acting opioids among chronic pain patients. Postgrad Med. 2020;132(1):, pp.44–51.


Federal Register [Internet]. Washington (DC): Federal Register. Postmarketing requirements for the class-wide extended-release/long-acting opioid analgesics; Public Meeting; Request for Comments; 2014 [cited 2017 Nov 1]; Available from:

22 [Internet] Bethesda (MD): U.S. National Library of Medicine. An observational study to develop algorithms for identifying opioid abuse and addiction based on admin claims data; 2017 [cited 2017 Nov 1]:[ identifier: NCT02667262]. Available from:

23 [Internet] Bethesda (MD): U.S. National Library of Medicine. Incidence and predictors of opioid overdose and death in ER/LA opioid users as measured by diagnoses and death records; 2017 [cited 2017 Nov 1]; Available from:


Yu S, Ma Y, Gronsbell J, et al.. Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc. 2018;25(1):, pp.54–60.


Kaur H, Sohn S, Wi CI, et al.. Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med. 2018;18(1):, pp.34.


Martin S, Wagner J, Lupulescu-Mann N, et al.. Comparison of EHR-based diagnosis documentation locations to a gold standard for risk stratification in patients with multiple chronic conditions. Appl Clin Inform. 2017;08(03):, pp.794–809.


Carrell DS, Albertson-Junkans L, Ramaprasan A, et al. Problem opioid use among patients receiving long-term opioid therapy established by manual chart review. 2020. (Under Review).


Epic Systems Corporation Verona (WI): Epic.; 1979 [cited 2014 Oct 6]; Available from:


Curtis LH, Brown J, Platt R.. Four health data networks illustrate the potential for a shared national multipurpose big-data network. Health Aff (Millwood). 2014;33(7):, pp.1178–1186.


United States Food and Drug Administration [Internet] Silver Spring (MD): FDA. FDA’s sentinel initiative: transforming how we monitor the safety of FDA-regulated products; 2019 [cited 2012 Oct 29]; Available from:


United States Food and Drug Administration [Internet] Silver Spring (MD): FDA. Sentinel distributed database and common data model; 2017 [cited 2017 Jul 21]; Available from:


Platt R, Wilson M, Chan KA, et al.. The new Sentinel Network–improving the evidence of medical-product safety. N Engl J Med. 2009;361(7):, pp.645–647.


Edlund MJ, Steffick D, Hudson T, et al.. Risk factors for clinically recognized opioid abuse and dependence among veterans using opioids for chronic non-cancer pain. Pain. 2007;129:, pp.355–362.


Chou R, Fanciullo GJ, Fine PG, et al.. Opioids for chronic noncancer pain: prediction and identification of aberrant drug-related behaviors: a review of the evidence for an American Pain Society and American Academy of Pain Medicine clinical practice guideline. J Pain. 2009;10(2):, pp.131–146.


Passik SD, Kirsh KL, Donaghy KB, et al.. Pain and aberrant drug-related behaviors in medically ill patients with and without histories of substance abuse. Clin J Pain. 2006;22:, pp.173–181.


Chou R, Turner JA, Devine EB, et al.. The effectiveness and risks of long-term opioid therapy for chronic pain: a systematic review for a National Institutes of Health Pathways to Prevention Workshop. Ann Intern Med. 2015;162(4):, pp.276–286.


Katz C, El-Gabalawy R, Keyes KM, et al.. Risk factors for incident nonmedical prescription opioid use and abuse and dependence: results from a longitudinal nationally representative sample. Drug Alcohol Depend. 2013;132(1–2):, pp.107–113.


Saha TD, Kerridge BT, Goldstein RB, et al.. Nonmedical prescription opioid use and DSM-5 nonmedical prescription opioid use disorder in the United States. J Clin Psychiatry. 2016;77(06):, pp.772–780.


Goldner EM, Lusted A, Roerecke M, et al.. Prevalence of Axis-1 psychiatric (with focus on depression and anxiety) disorder and symptomatology among non-medical prescription opioid users in substance use treatment: systematic review and meta-analyses. Addict Behav. 2014;39(3):, pp.520–531.


Amari E, Rehm J, Goldner E, et al.. Nonmedical prescription opioid use and mental health and pain comorbidities: a narrative review. Can J Psychiatry. 2011;56(8):, pp.495–502.


Tibshirani RJ.. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58:, pp.14.


Hastie T, Tibshirani RJ, Freidman J.The elements of statistical learning. New York (NY): Springer; 2009.


Ulbricht J. lqa: Penalized Likelihood Inference for GLMs, version 1.0-3. 2012 [updated 2012 Oct 29; cited 2017 Jul 21]; Available from:


Zou H.. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101:, pp.12.


Kim JK, Skinner CJ.. Weighting in survey analysis under informative sampling. Biometrika. 2013;100(2):, pp.385–398.


Fuller W.Sampling statistics. Hoboken (NJ): Wiley; 2009.


Chambers RL, Skinner CJ.Analysis of survey data. Chichester (UK): Wiley; 2003.


Vertelney H, Yarger J, Tilley L, et al. Health insurance churning among young adults in California: by the numbers. San Francisco (CA): Phillip R. Lee Institute for Health Policy Studies; 2017 [cited 2017 November 1]; Available from:


Callahan ST, Cooper WO.. Continuity of health insurance coverage among young adults with disabilities. Pediatrics. 2007;119(6):, pp.1175–1180.


Green CA, Perrin NA, Hazlehurst B, et al.. Identifying and classifying opioid-related overdoses: a validation study. Pharmacoepidemiol Drug Saf. 2019;28(8):, pp.1127–1137.


Canan C, Polinski JM, Alexander GC, et al.. Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review. J Am Med Inform Assoc. 2017;24(6):, pp.1204–1210.


Maxim LD, Niebo R, Utell MJ.. Screening tests: a review with examples. Inhal Toxicol. 2014;26(13):, pp.811–828.


Smith DC, Bennett KM, Dennis ML, et al.. Sensitivity and specificity of the gain short-screener for predicting substance use disorders in a large national sample of emerging adults. Addict Behav. 2017;68:, pp.14–17.


Cicero TJ, Ellis MS, Kasper ZA.. Relative preferences in the abuse of immediate-release versus extended-release opioids in a sample of treatment-seeking opioid abusers. Pharmacoepidemiol Drug Saf. 2017;26(1):, pp.56–62.


Juurlink DN, Dhalla IA.. Dependence and addiction during chronic opioid therapy. J Med Toxicol. 2012;8(4):, pp.393–399.


American Psychiatric AssociationDiagnostic and statistical manual of mental disorders (DSM-5). 5th (revised) ed.Arlington (VA): American Psychiatric Publishing; 2013.


Smith SM, Dart RC, Katz NP, et al.. Classification and definition of misuse, abuse, and related events in clinical trials: ACTTION systematic review and recommendations. Pain. 2013;154(11):, pp.2287–2296.


Vowles KE, McEntee ML, Siyahhan Julnes P, et al.. On the importance of clear comparisons and a methodologically rigorous empirical literature in evaluating opioid use in chronic pain: a response to Scholten and Henningfield. Pain. 2015;156(8):, pp.1577–1578.


Nikulina V, Guarino H, Acosta MC, et al.. Patient vs provider reports of aberrant medication-taking behavior among opioid-treated patients with chronic pain who report misusing opioid medication. Pain. 2016;157(8):, pp.1791–1798. problem prescription opioid use among patients receiving long-term opioid analgesic treatment: development and evaluation of an algorithm for use in EHR and claims data&author=David S. Carrell,Ladia Albertson-Junkans,Arvind Ramaprasan,Grant Scull,Matt Mackwood,Eric Johnson,David J. Cronkite,Andrew Baer,Kris Hansen,Carla A. Green,Brian L. Hazlehurst,Shannon L. Janoff,Paul M. Coplan,Angela DeVeaugh-Geiss,Carlos G. Grijalva,Caihua Liang,Cheryl L. Enger,Jane Lange,Susan M. Shortreed,Michael Von Korff,&keyword=Algorithms,electronic health records,opioid-related disorders,population surveillance,&subject=Research Article,Pain Medicine,