If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
A Machine Learning Algorithm Outperforms Traditional Multiple Regression to Predict Risk of Unplanned Overnight Stay Following Outpatient Medial Patellofemoral Ligament Reconstruction
To determine whether conventional logistic regression or machine learning algorithms were more precise in identifying the risk factors for unplanned overnight admission after medial patellofemoral ligament (MPFL) reconstruction.
Methods
A retrospective review of the prospectively collected National Surgical Quality Improvement Program database was performed to identify patients who underwent outpatient MPFL reconstruction from 2006–2018. Patients admitted overnight were identified as those with length of stay of 1 or more days. Models were generated using random forest, extreme gradient boosting, adaptive boosting, or elastic net penalized logistic regression, and an additional model was produced as a weighted ensemble of the 4 final algorithms. The predictive capacity of these models was compared to that of logistic regression.
Results
Of the 1307 patients identified, 221 (16.9%) required at least one overnight stay after MPFL reconstruction. Multivariate logistic regression found the following variables to be predictors of inpatient admission: age (odds ratio [OR] = 1.03 [95% confidence interval {CI} 1.02-1.04]; P <.001), spinal anesthesia (OR = 3.42 [95% CI 1.98-6.08]; P < .001), American Society of Anesthesiologists (ASA) class 3/4 (OR = 1.96 [95% CI 1.25-3.06]; P < .001), history of chronic obstructive pulmonary disease (COPD) (OR = 6.44 [95% CI 1.58-26.17]; P = .02), and body mass index (BMI) (OR = 1.03 [95% CI 1.01-1.05]; P < .001). The ensemble model achieved the best performance based on discrimination assessed via internal validation (area under the curve = 0.722). The variables determined most important by the ensemble model were increasing BMI, increasing age, ASA class, anesthesia, smoking, hypertension, lateral release, and history of COPD.
Conclusions
An internally validated machine learning algorithm outperformed logistic regression modeling in predicting the need for unplanned overnight hospitalization after MPFL reconstruction. In this model, the most significant risk factors for admission were age, BMI, ASA class, smoking status, hypertension, lateral release, and history of COPD. This tool can be deployed to augment provider assessment to identify high-risk candidates and appropriately set postoperative expectations for patients.
Clinical Relevance
Identifying and mitigating patient risk factors to prevent adverse surgical outcomes and hospitalizations is one of our primary goals. There may be a key role for machine learning algorithms to help successfully and efficiently risk stratify patients to decrease costs, appropriately set postoperative expectations, and increase the quality of delivered care.
The medial patellofemoral ligament (MPFL), which prevents excessive lateral patellar translation, is the primary anatomic restraint injured during a dislocation of the patella.
Femoral avulsion of the medial patellofemoral ligament after primary traumatic patellar dislocation predicts subsequent instability in men: A mean 7-year nonoperative follow-up study.
MPFL reconstruction is a safe and reliable surgical option for the treatment of patellar instability and has been described as the cornerstone for patients in whom nonoperative medical management failed.
MPFL reconstructions are primarily performed on an outpatient basis in the hospital or ambulatory surgical center, which allows for improved patient satisfaction, decreased operation time, and decreased cost for procedure.
In some cases, overnight admission (planned or unplanned) after elective outpatient surgery is necessary, which increases cost and the risk of adverse events and nosocomial infections.
Despite the relative importance of identifying patients who may require overnight admission after planned outpatient procedures, there is a paucity of literature analyzing patient factors that underlie this surgical precaution.
Machine learning algorithms allow for automatic improvement of their computability through experience and use of test data.
They build a theoretical model based on sample data, known as training data, to make predictions and decisions about future events. Machine learning models have been shown to have a higher predictive ability compared to other statistical tests. The purpose of this study was to determine whether conventional logistic regression or machine learning algorithms were more precise in identifying the risk factors for unplanned overnight admission after MPFL reconstruction. Our hypothesis was that both logistic regression and a machine learning algorithm will identify significant predictors for unplanned overnight admission in patients undergoing outpatient MPFL reconstruction and that our machine learning model would have superior capabilities in identifying risk factors compared to conventional logistic regression.
Materials and Methods
Database
Data were collected using the ACS National Surgical Quality Improvement Program (NSQIP) database, which was queried to identify all patients who underwent an outpatient elective medial patellofemoral ligament reconstruction from 2006 to 2018. This study used the ACS-NSQIP database because it is a nationally represented surgical database that prospectively collects preoperative and postoperative patient variables, as well as complication rates, readmission information, and unplanned surgeries within 30 days of the original procedure.
The current procedural terminology code used for identifying the subset of patients who underwent a medial patellofemoral ligament reconstruction was CPT-27428. Patients undergoing concomitant tibial tubercle osteotomy were excluded because this population is commonly admitted overnight for pain control and compartment checks. Patients admitted overnight after surgery were identified as those with total length of stay (LOS) ≥ 1 day, whereas those with LOS < 1 day were defined as same-day discharge. In essence, patients whose surgery date and discharge date were different were considered to have an overnight stay even if the total number of hours between events was less than 24.
Candidate Covariates
The following variables were considered for utilization in logistic regression and the machine learning modeling: age, sex, American Society of Anesthesiologists’ (ASA) Physical Status Classification, body mass index (BMI), functional status, level of dyspnea, admission source, anesthesia type, operative time, diabetes mellitus, congestive heart failure, chronic obstructive pulmonary disease (COPD), smoking history, transfusion, hypertension, long-term steroid use, preoperative laboratory values, including creatinine, albumin, leukocyte count, platelets and hematocrit, as well as the following concomitant procedures: lateral release, loose body removal, autograft harvest, synovectomy, and soft tissue flap reconstruction. Because many patients who undergo tibial tubercle osteotomy are intentionally admitted to the hospital after surgery for observation, patients with concurrent tibial tubercle osteotomy at the time of MPFL reconstruction were excluded from this study. Preoperative variables with >30% patients missing values were excluded from analysis. However, variables with <30% missing information were imputed using the miss forest imputation method.
Logistic Regression
Multiple variable logistic regression was used to calculate the adjusted odds ratios with 95% confidence intervals for unplanned overnight admission in patients with an MPFL reconstruction. The independent variables included in this model were the demographic characteristics, preoperative laboratory values and comorbidities that were shown to be statistically different between patients with an unplanned admission and patients who were not admitted (Tables 1 and 2).
Table 1Baseline Characteristic of Study Population (n = 1307)
Variable
Missing
Demographics and intraoperative variables
Age, median (IQR)
28 (22-38)
Sex
Female
723 (55.3%)
Male
584 (44.7%)
ASA Class
1-2 Mild disturbance
1164 (89.1%)
3-4 Severe disturbance
143 (10.9%)
Body mass index
29 (25-34%)
2 (0.15%)
Dependent functional status
29 (2.0%)
Dyspnea
20 (1.4%)
Anesthesia
9 (0.62%)
General
1248 (96.2%)
Spinal
59 (3.8%)
Operative time, median (IQR)
80 (56-109)
Concomitant lateral release
104 (7.9%)
Comorbidities
Diabetes
50 (3.8%)
Smoking
254 (19.4%)
Chronic obstructive pulmonary disease
11 (0.8%)
Medicated hypertension
146 (11.2%)
Long-term steroid use
12 (0.9%)
2 (0.15%)
Overnight admission
Yes
221 (16.9%)
No
1086 (83.1%)
ASA, American Society of Anesthesiologists; IQR, interquartile range.
Statistical analysis was performed using the R 4.0.4 (R Foundation for Statistical Computing, Vienna, Austria). Statistical significance for all tests was achieved with a P value < .05.
Outcome and Analysis
The main outcome of interest was overnight admission, defined as having an LOS ≥ 1 day. Feature selection was used to determine preoperative variables that had a significant impact on overnight admission. After feature selection, modeling was performed using the determined preoperative variables with each of the following candidate machine learning algorithms: extreme gradient boosting, Adaptive Boost, random forest and linear discriminant, which were chosen based on their usage in prior studies
as well as an ensemble algorithm built from a combination of all four models. Ensemble methods are a type of meta-algorithm that incorporates the learning techniques of each individual model into a unique predictive model.
Recursive feature elimination was subsequently performed to remove individual predictors that did not accurately fit the model. The model was then rebuilt with the most precise set of predictors.
Models were validated via 0.632 bootstrapping with 1000 resample datasets because of this technique’s ability to optimize evaluation of both model bias and variance compared to traditional train-test splits in sample sizes of <10,000.
Model evaluations consisted of reiterative partitions of the complete dataset into train and test sets. For each combination of train and test set, the model is programmed on the train set using tenfold cross-validation. The performance of this model is then evaluated on the respective test set. The ideal model was chosen based on area under the receiver operating characteristics curve (AUROC). Models were compared by discrimination, calibration, and Brier score values.
Discriminative power was assessed via the AUROC. Models that assigned the correct label for every output have an AUROC of 1, whereas completely random predictions have an AUROC of 0.5 (Fig 1). Calibration of the model’s predicted probabilities as a function of observed frequencies within the test population are summarized in a calibration plot. Finally, the Brier score was assessed for each candidate model with small values considered more optimal.
Fig 1Discrimination and calibration of the ensemble model. An AUC of 1 indicates a model that assigned the correct label for every output. An AUC of 0.5 indicates a model with completely random predictions. (AUC, area under the curve.)
Individual explanations for model behavior were provided using local-interpretable model-agnostic explanations. Decision curve analysis was used to determine the benefit of implementing the predictive algorithm in a real setting. The curve plots net benefit against the predicted probabilities of the outcome of interest, in this case overnight admission, and provides the cost-benefit ratio for every value of the predicted probability. Decision curves for changing management for no patients or all patients are plotted for comparison. In addition, a randomized permutation test was used to compare performances across each model.
Results
Variable Breakdown
A total of 1307 patients who underwent planned outpatient medial patellofemoral ligament reconstruction were included following eligibility assessment. The full collection of variables analyzed for feature selection is provided in Table 1. Within the cohort, 723 (55.3%) of the patients were female and the median age was 28 (interquartile range 22-38). The most common comorbidities in our study population were hypertension (n = 146 [11.2%]) and a positive smoking history (n = 254 [19.4%]).
A total of 221 patients (16.9%) required at least one overnight stay following elective MPFL reconstruction. After recursive feature elimination, the following features were identified to be important for the construction of the model: BMI, age, ASA classification, smoking history, history of COPD, anesthesia type, concomitant lateral release and medical hypertension (Fig 2a).
Fig 2(A) The relative importance of each preoperative variable identified in the ensemble model, which was the most predictive model of the machine learning algorithm. (B) Decision curve analysis whereby net benefit is plotted against risk preference. The downward sloping gray line represents intervention for all which in this context represents changing management (requiring all patients to have an overnight stay after MPFL reconstruction). The downward sloping blue line represents the net benefit realized when choosing who stays overnight based on conventional logistic regression, and the downward sloping red line represents the net benefit realized when choosing who stays overnight based on the ensemble model. There is a theoretical horizontal line at zero representing “intervention for none” that represents no patients staying overnight after MPFL reconstruction. As can be seen, the ensemble model (red line) has more net benefit than the conventional logistic regression (blue line) at all ranges except for very low risk patients.” (MPFL, medial patellofemoral ligament.)
On multivariate logistic regression, increasing age (OR = 1.03 [95% CI 1.02-1.04]; P < .001), spinal anesthesia (OR = 3.42 [95% CI 1.92-6.08]; P < .001), ASA class III/IV (OR = 1.96 [95% CI 1.25-3.06]; P < .001), history of COPD (OR = 6.44 [95% CI 1.58-26.17]; P = .02) and increasing BMI (OR = 1.03 [95% CI 1.01-1.05]; P < .001) were shown to be significant contributors to an unexpected admission after elective MPFL reconstruction (Table 2).
Model Performance
After training and ensemble learning, comparison of model performance was attempted. Discrimination was assessed via AUROC and bootstrapping was used to internally validate each model. The ensemble model achieved the best performance on generating predictions, with an AUC of 0.722, a calibration intercept of 0.006, a calibration slope of 0.968 and a Brier score of 0.116 (Table 3). In comparison, the generalized linear model (logistic regression) received suboptimal performance on generating predictions, with an AUC of 0.69, a calibration intercept of 0.063, a calibration slop of 0.649, and a Brier score of 0.142.
Table 3Model Assessment on Internal Validation Using 0.632 Bootstrapping With 1000 Resampled Datasets (n = 1451)
Metric
Area under the curve
Calibration slope
Calibration intercept
Brier Score
Apparent
Internal Validation
Elastic net
0.687 (0.651-0.722)
0.649 (0.647-0.651)
0.967 (0.956-0.978)
0.006 (0.004-0.008)
0.14 (0.127-0.152)
Random forest
0.966 (0.956- 0.97)
0.710 (0.709-0.732)
0.969 (0.964-0.975)
0.006 (0.004-0.007)
0.121 (0.11-0.133)
XGBoost
0.995 (0.994-0.997)
0.690 (0.687-0.699)
0.969 (0.963-0.975)
0.006 (0.004-0.007)
0.126 (0.113-0.139)
SVM
0.763 (0.761-0.764)
0.633 (0.641-0.635)
0.963 (0.951-0.974)
0.007 (0.004-0.009)
0.142 (0.129-0.155)
Neural Network
0.692 (0.69-0.693)
0.629 (0.627-0.631)
0.987 (0.975-0.999)
0.002 (0-0.005)
0.142 (0.13-0.155)
Ensemble
0.801 (0.8-0.802)
0.722 (0.707-0.764)
0.968 (0.965-0.971)
0.006 (0.005-0.007)
0.116 (0.104-0.128)
GLM, generalized linear model; SVM, support vector machine; XGBoost, extreme gradient boosting.
The ensemble model was compared against conventional logistic regression, a simplified version of the ensemble model, as well as 2 default strategies using decision curve analysis (Fig 2B). The simplified model was an ensemble model fitted using preoperative use as the primary predictor, whereas the 2 default models were the scenarios of changing management for all or no patients. On decision curve analysis, the complete ensemble model yielded a more significant increase in net benefit compared to the other models.
Explanations
Explanations accompanying the predicted probability of the outcome of interest are generated on an individual basis. An example from a theoretical patient is provided in Fig 3. This patient was given a probability of 0.39 for plausible overnight admission following elective MPFL reconstruction. Features that supported this analysis may be found in Fig 3. The final model was subsequently incorporated into a web-based application for patient education and demonstration purposes only. Partial dependence curves comparing unplanned admission risk to 2 main continuous variables (age and BMI) were included to display the increasing risk when associated with increasing BMI and bimodal distribution of age when compared to unplanned admission risk (Fig 4).
Fig 3Example of an individual patient-level explanation for the ensemble algorithm predictions. This patient had a 39% individual probability of inpatient hospitalization, features that supported this included ASA class >2, BMI > 28, and a history of smoking. Features that contradicted this prediction in the patient’s history included general anesthesia, no concomitant ligamentous procedures, no history of COPD, young age, and no history of medicated hypertension. (BMI, body mass index; COPD, chronic obstructive pulmonary disease.)
Fig 4Partial dependence curves demonstrating dependence of inpatient admission risk on the range of values for the two continuous variables included in the model. Age demonstrates bimodal dependence with increased risk at less than 25 or greater than 40 years, while risk of admission increases linearly with BMI at values >30. The x-axis represents age range and the y-axis represents level of risk. (BMI, body mass index.)
The ensemble model demonstrated the best performance when compared with the other candidate machine learning algorithms. Additionally, feature elimination identified patient risk factors that increased the likelihood of overnight admission following outpatient MPFL reconstruction. The most important variables were increased BMI, increased age, ASA classification, and mode of anesthetic administration. Smoking status, history of COPD, history of hypertension and lateral release were also found important, however less so. Finally, the algorithm was integrated into an open-access web-application that can be used to aid providers in determining patients at an increased risk of complications, decreased costs associated with the surgical procedure and optimize patient satisfaction.
Risk factors associated with poor short-term clinical outcomes in patients undergoing an MPFL reconstruction is well documented in the literature.
Outcomes after isolated medial patellofemoral ligament reconstruction for the treatment of recurrent lateral patellar dislocations: A systematic review and meta-analysis.
collected preoperative and postoperative demographic and clinical data, as well as the Banff Patellofemoral Instability Instrument (BPII) scores at 12 and 24 months to determine whether age at the time of surgery influenced patient-reported quality of life and clinical outcomes after MPFL reconstruction. It was observed that age at the time of surgery was correlated with postoperative BPII scores, with lower BPII scores apparent for each 10-year increase in age at the time of MPFL reconstruction. Enderlein et al.
also identified age greater than 30 and obesity as markers of poor subjective outcomes.
In our analysis, we found concomitant lateral retinacular release to be an isolated risk factor for overnight admission, which was not observed when analyzing traditional logistic regression. Although a paucity of evidence exists on this relationship, previous studies have commented on the short-term functional outcomes of this additional procedure. A systematic review done by Migliorini et al.
found that there were no differences in terms of range of motion, positivity to apprehension test, rate of postoperative complications, re-dislocations, and revision surgeries between patients with a MPFL reconstruction and an MPFL reconstruction with concomitant lateral retinacular release. Malatray et al.,
in their analysis, found there to be no significant differences in subjective IKDC scores or patellar tilt based on the addition of an arthroscopic lateral release to an MPFL reconstruction in patients with recurrent patellar dislocation. Current literature seems to conclude that there is no additional functional benefit to lateral retinacular release; however, it is not clear whether patients who received a lateral release had a more severe degree of patellar instability than patients who did not and actually did benefit functionally. Our analysis suggests that lateral release is a risk factor for overnight admission; however, other patient factors could potentially be confounding this observation, such as extra pain or increased bleeding due to the additional procedure, which could have been the primary reason why these patients were admitted overnight. It is also plausible that, due to the extra procedure, these patients required drain placement, which could have lengthened the procedure and led to an overnight admission. As a result, the potential benefits, or risks of lateral retinacular release and lateral retinacular lengthening is still a topic of discussion and further research should be done to clarify its effects in patients undergoing an MPFL reconstruction.
The recursive feature elimination also identified smoking and history of COPD as important contributors to the model. While previous investigations into MPFL reconstruction have not examined these correlations, the effects of tobacco smoke on perioperative and postoperative outcomes in patients undergoing orthopaedic surgical procedures is well documented. Teng et al.,
in their meta-analysis found that patients who smoked and underwent a total hip arthroplasty were at a significantly increased risk of aseptic loosening of hip prosthesis, deep infections and all cause revisions compared to patients who did not smoke. Trivedi et al.,
in their review of the ACS-NSQIP database, found that the main risk factors for developing adverse events in Black patients undergoing a total knee arthroplasty were tobacco smoking, ASA score > 2, congestive heart failure, COPD and chronic kidney disease. Because these comorbidities are relatively common, it is important to risk stratify to determine which patients may require extra accommodations or closer follow-up.
As the preferred modality of health care delivery for many elective surgical procedures shift from inpatient hospital settings to outpatient, it is becoming increasingly imperative to effectively and efficaciously risk stratify patients. As we turn our focus toward value-based care, identifying and mitigating patient risk factors to prevent adverse surgical outcomes and hospitalizations should be one of our primary goals. To aid this goal, there may be a key role for machine learning algorithms to help successfully and efficiently risk stratify patients to decrease costs and increased quality of delivered care. Through careful implementation, external validation, and real-time learning on retrospectively collected patient data, the algorithm developed herein can aid orthopaedic surgeons in identifying high-risk patients, mobilizing appropriate resources, and optimizing both clinical and economic outcomes.
Limitations
This study has several limitations that should be considered when interpreting the results. The major limitation is that the NSQIP database includes surgical cases from hospital networks. For same-day procedures, ambulatory surgical centers remain one of the main locations where these types of operations are performed.
As a result, there exists a potential selection bias that may ignore differences between the population receiving an MPFL reconstruction at an ambulatory surgical center and the population reported in our analysis. This limitation was addressed by only including patients with an LOS < 1 day as coded in the NSQIP database. Additionally, previous studies have shown that there is not a major difference in cohort dynamics or risk profiles between patients receiving a same-day procedure at a hospital outpatient versus an ambulatory surgical center.
Second, the NSQIP database relies primarily on a proper and standardized method of data collection and documentation. Because NSQIP collects data from a variety of surgical procedures, the variables included in the database are broad, which hinders complete evaluation of individual surgical procedures. However, NSQIP remains the gold standard as data are documented prospectively by trained clinical staff and the heterogeneity of reported data allows for analyses using this database to be generalizable to different subpopulations. In addition, because NSQIP primarily includes deidentified data, it is not possible to determine whether there were factors specific to individual hospitals that influenced hospital admission rates. Finally, the algorithm’s performance for prediction is dependent on the training data used, and deviations in input from the training values may result in the model’s inability to accurately predict outcomes.
Conclusions
An internally validated machine learning algorithm outperformed logistic regression modeling in predicting the need for unplanned overnight hospitalization after MPFL reconstruction. In this model, the most significant risk factors for admission were age, BMI, ASA class, smoking status, hypertension, lateral release, and history of COPD. This tool can be deployed to augment provider assessment to identify high-risk candidates and appropriately set postoperative expectations for patients.
Femoral avulsion of the medial patellofemoral ligament after primary traumatic patellar dislocation predicts subsequent instability in men: A mean 7-year nonoperative follow-up study.
Outcomes after isolated medial patellofemoral ligament reconstruction for the treatment of recurrent lateral patellar dislocations: A systematic review and meta-analysis.
The authors report that they have no conflicts of interest in the authorship and publication of this article. Full ICMJE author disclosure forms are available for this article online, as supplementary material.