Advertisement
Original Article| Volume 5, ISSUE 1, e11-e19, February 2023

Quantifying Surgeon Intuition Using a Judgment Analysis Model: Surgeon Accuracy of Predicting Patient-Reported Outcomes in Patients Undergoing Hip Arthroscopy for Femoroacetabular Impingement Is Moderate at Best

Open AccessPublished:December 09, 2022DOI:https://doi.org/10.1016/j.asmr.2022.09.010

      Purpose

      To quantify surgeon intuition, determine whether a surgeon’s prediction of outcomes after hip arthroscopy correlates with actual patient-reported outcomes (PRO), and identify differences in clinical judgment between expert and novice examiners.

      Methods

      This prospective, longitudinal study was conducted at an academic medical center on adults undergoing primary hip arthroscopy for treatment of femoroacetabular impingement. A Surgeon Intuition and Prediction (SIP) score was completed preoperatively by an attending surgeon (expert) and physician assistant (novice). Baseline and postoperative outcome measures included legacy hip scores (e.g., Modified Harris Hip score) and Patient-Reported Outcomes Information System tools. Mean differences were assessed using t-tests. Generalized estimating equations assessed longitudinal changes. Pearson correlation coefficients (r) evaluated associations between SIP score and PRO scores.

      Results

      Data from 98 patients (mean age 36 years, 67% female) with complete data sets at 12-month follow-up were analyzed. Weak-to-moderate strength correlations were seen between SIP score and PRO scores (r = 0.36 to r = 0.53) for pain, activity and physical function. Significant improvements were seen in all primary outcome measures at 6 and 12 months postoperatively when compared to baseline scores (P < .05), with about 50% to 80% of patients achieving the minimum clinically important difference and patient acceptable symptomatic state thresholds postoperatively.

      Conclusions

      An experienced, high-volume hip arthroscopist had only weak-to-moderate ability to intuitively predict PRO. Surgical intuition and judgment were not superior in an expert examiner compared to a novice.

      Level of Evidence

      Level III, retrospective comparative prognostic trial.
      Predicting outcomes after surgery has powerful clinical implications for improving patient care.
      • Baumhauer J.F.
      Patient-reported outcomes: Are they living up to their potential.
      ,
      • Woodfield J.C.
      • Pettigrew R.A.
      • Plank L.D.
      • Landmann M.
      • Van Rij A.M.
      Accuracy of the surgeons’ clinical prediction of perioperative complications using a visual analog scale.
      The ability to accurately forecast which patients will have the best outcomes is difficult, however, because of multiple risk factors and other patient-specific variables. It is not surprising that human cognition is regarded conceptually as a “black box,” a term describing a system that lacks a clear understanding of the internal algorithms and processes between known inputs and outputs.
      • Makhni E.C.
      • Makhni S.M.
      • Ramkumar P.N.
      Artificial intelligence for the orthopaedic surgeon: An overview of potential benefits, limitations, and clinical applications.
      ,
      • Sevdalis N.
      • Rosamond K.
      Opening the “black box” of surgeon’s risk estimation: From intuition to quantitative modeling.
      Although poorly understood, obtaining excellent surgical judgment is highly regarded as a cornerstone of becoming a competent surgeon.
      • Jacklin R.
      • Sevdalis N.
      • Harries C.
      • Darzi A.
      • Vincent C.
      Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
      Judgment is thought to develop over time based on past experiences and knowledge of risk factors.
      Numerous studies have identified individual risk factors for positive and negative surgical outcomes after hip arthroscopy in the treatment of femoroacetabular impingement (FAI).
      • Kunze K.N.
      • Polce E.M.
      • Nwachukwu B.U.
      • Chala J.
      • Nho S.J.
      Development and internal validation of supervised machine learning algorithms for predicting clinically significant functional improvement in a mixed population of primary hip arthroscopy.
      • Kunze K.N.
      • Polce E.M.
      • Rasio J.
      • Nho S.J.
      Machine learning algorithms predict clinically significant improvements in satisfaction after hip arthroscopy.
      • Kyin C.
      • Maldonado D.R.
      • Go C.C.
      • Shapira J.
      • Lall A.C.
      • Domb B.G.
      Mid- to long-term outcomes of hip arthroscopy: A systematic review.
      • Sogbein O.A.
      • Shah A.
      • Kay J.
      • Memon M.
      • Simunovic N.
      • Belzile E.
      • Ayeni O.R.
      Predictors of outcomes after hip arthroscopic surgery for femoroacetabular impingement: A systematic review.
      Predictors of positive outcomes include younger age, male sex, body mass index less than 25, Tönnis grade 0, pain relief from preoperative intra-articular hip injections, and lower preoperative baseline patient-reported outcomes (PRO) scores.
      • Sogbein O.A.
      • Shah A.
      • Kay J.
      • Memon M.
      • Simunovic N.
      • Belzile E.
      • Ayeni O.R.
      Predictors of outcomes after hip arthroscopic surgery for femoroacetabular impingement: A systematic review.
      ,
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
      Similarly, studies have identified predictors of negative outcomes, including symptom duration greater than 8 months, age greater than 45 years, chondral defects, decreased joint space greater than 2 mm, and increased lateral center edge angle.
      • Kyin C.
      • Maldonado D.R.
      • Go C.C.
      • Shapira J.
      • Lall A.C.
      • Domb B.G.
      Mid- to long-term outcomes of hip arthroscopy: A systematic review.
      • Sogbein O.A.
      • Shah A.
      • Kay J.
      • Memon M.
      • Simunovic N.
      • Belzile E.
      • Ayeni O.R.
      Predictors of outcomes after hip arthroscopic surgery for femoroacetabular impingement: A systematic review.
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
      While individual risk factors provide good information, these types of studies are limited by the fact that actual patient outcomes are not predicated on a singular risk factor but rather are complex and multifactorial in nature.
      • Woodfield J.C.
      • Pettigrew R.A.
      • Plank L.D.
      • Landmann M.
      • Van Rij A.M.
      Accuracy of the surgeons’ clinical prediction of perioperative complications using a visual analog scale.
      Judgment analysis (JA) is an experimental model using linear regression analyses to provide a quantitative assessment of human performance.
      • Jacklin R.
      • Sevdalis N.
      • Harries C.
      • Darzi A.
      • Vincent C.
      Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
      Use of JA models can provide a valid means to quantitatively study surgical judgment and decision-making, thus overcoming the black box concerns of human cognition.
      • Makhni E.C.
      • Makhni S.M.
      • Ramkumar P.N.
      Artificial intelligence for the orthopaedic surgeon: An overview of potential benefits, limitations, and clinical applications.
      ,
      • Jacklin R.
      • Sevdalis N.
      • Harries C.
      • Darzi A.
      • Vincent C.
      Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
      Jacklin et al.
      • Jacklin R.
      • Sevdalis N.
      • Harries C.
      • Darzi A.
      • Vincent C.
      Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
      used a JA model to assess the ability of trainee surgeons to predict the likelihood that a patient undergoing a laparoscopic cholecystectomy would need to be converted to an open approach. They reported a mean correlation of prediction was 0.48 compared with the gold standard epidemiologic model; however, there was large variation among individual surgeons. Woodfield et al.
      • Woodfield J.C.
      • Pettigrew R.A.
      • Plank L.D.
      • Landmann M.
      • Van Rij A.M.
      Accuracy of the surgeons’ clinical prediction of perioperative complications using a visual analog scale.
      similarly used a JA model to show that surgeons were able to make meaningful preoperative predictions of major complications following abdominal surgery.
      PRO measures are powerful tools commonly used in orthopaedics to track changes to a patient’s physical, social, and mental health following treatment but also can be used as a prediction tool.
      • Baumhauer J.F.
      Patient-reported outcomes: Are they living up to their potential.
      Patient-Reported Outcomes Information System (PROMIS) tools have been previously validated against legacy outcome measures for hip arthroscopy, which include the Modified Harris Hip Score (mHHS), Non-Arthritic Hip Score (NAHS), Hip Outcome Score (HOS), and visual analog scale (VAS) pain score.
      • Childs S.
      • Canham C.
      • Kenney R.J.
      • Silas D.R.
      • Adler K.
      • Giordano B.D.
      Correlation of PROMIS CAT with validated hip outcome scores in patients undergoing hip arthroscopy.
      ,
      • Kollmorgen R.C.
      • Hutyra C.A.
      • Green C.
      • Lewis B.
      • Olson S.A.
      • Mather R.C.
      Relationship between PROMIS computer adaptive tests and legacy hip measures among patients presenting to a tertiary care hip preservation center.
      Use of PRO scores in a JA model to determine whether a surgeon’s intuition and judgment are actually predictive of patient outcomes after hip arthroscopy would provide valuable insight into the accuracy of a surgeon’s prediction. The purposes of this study were to quantify surgeon intuition, determine whether a surgeon’s prediction of outcomes after hip arthroscopy correlates with actual PRO, and identify differences in clinical judgment between expert and novice examiners. It was hypothesized that an experienced hip arthroscopist would have strong surgical judgment to intuitively predict patient outcomes. It was also hypothesized that an experienced examiner would have overall stronger clinical judgment compared to a novice examiner.

      Methods

      After institutional review board approval, patients 18 years of age and older who elected to undergo hip arthroscopy for FAI were recruited to participate in this prospective, longitudinal cohort study. All surgeries were performed at a single academic medical center by a single surgeon with an active enrollment period between November 2017 and April 2019. All patients initially did not respond to conservative treatment options and met standard indications for undergoing hip arthroscopy. Patients were excluded if they had any evidence of osteoarthritis (e.g., >2 mm joint space narrowing anywhere along the sourcil), previous hip surgery, or if they were undergoing hip arthroscopy as a staged procedure for another procedure (e.g., periacetabular osteotomy).

      Quantification of Surgeon Intuition

      A Surgeon Intuition and Prediction (SIP) questionnaire was created to assess the surgeon’s prediction of patient outcomes based on perceived patient response to treatment. Unlike traditional scales, the SIP score maximizes bias by incorporating one’s “gut reaction” to sensory perceptions and objective findings identified during the patient encounter that are thought to positively or negatively influence patient outcomes. Key areas in which these cognitive transactions take place are during the patient history, physical examination, and review of imaging studies. In addition, initial and final impressions during the patient encounter can provide further information. The SIP questionnaire was developed to incorporate all 5 of these domains and is detailed in Figure 1.
      Figure thumbnail gr1
      Fig 1Surgeon Intuition and Prediction (SIP) questionnaire.
      The SIP questionnaire was completed electronically within 1 to 2 weeks of surgery date at the preoperative office visit. One attending surgeon (expert) and one physician assistant (novice) with 10 years’ difference of training and experience both completed the questionnaire. The senior author (B.G.) is an experienced hip arthroscopist with more than 12 years of posttraining surgical experience, performs more than 400 arthroscopic hip procedures annually, teaches on an international level, and maintains a practice that is committed to comprehensive hip-preservation medicine. Both examiners were given the same instructions to place a mark on a VAS in each of the 5 domains according to the perceived effect on patient outcome following surgery. The measured distance across the standardized 100-mm line on the VAS was converted into a calibrated 20-point score for each domain, giving a possible total SIP score of 100. Greater SIP scores indicate better outcomes.
      Importantly, no deviations to the standard of care occurred in the treatment of any patient. All patients underwent an exhaustive trial of nonoperative treatment including rest, modified activity, pharmacologic therapy, and physiotherapy. Only after a patient was properly indicated for surgery were they then approached for enrollment into this study. Any identifiable risk factors were addressed with the patient by the surgeon and attempts were made to modify them before surgery.

      Patient-Reported Outcomes

      Patients completed a battery of 7 PRO questionnaires electronically at various time points in the preoperative and postoperative periods. Legacy hip outcome scores included HOS-Sport, HOS-ADL, mHHS, and NAHS. In addition, PROMIS-Physical Function (PF), PROMIS-Pain Interference (PI), and PROMIS-Depression (D) tools were used. The questionnaires were completed either on an electronic tablet in the office or via e-mail immediately following the office appointment. Results were collected in an electronic database (REDCap, v11.0.3; Vanderbilt University, Nashville, TN). In total, the surveys took approximately 10 to 20 minutes to complete. Baseline assessments were completed 1 to 2 weeks before surgery at the preoperative visit. Postoperatively, the questionnaires were readministered at routine 6-month and 12-month follow-up appointments. The surgical team was blinded to patient outcome scores during the study duration.

      Data Analysis

      Descriptive statistics were used to characterize the study sample. Due to the longitudinal nature of our study, there were missing data points for relevant outcome measures at various time points. Descriptive statistics were calculated on the entire sample and for subjects with complete data. A complete analysis was conducted for subjects with complete data for all PRO measures at baseline, 6 months, and 12 months. A generalized estimating regression was used as a secondary analysis on all patients. Means for all 7 outcome measures were calculated and paired t-tests were used to assess changes from baseline for all outcome measures. To expand upon this complete case analysis, we used generalized estimating equation regression models. This method is ideal for analyzing longitudinal data that have missing observations and is often used when the population-averaged effects are of primary interest, rather than individual changes. Seven separate generalized estimating equation regression models were constructed for each measure. The number and percentage of patients achieving minimal clinically important difference (MCID) and patient acceptable symptomatic state (PASS) was calculated using standard thresholds and definitions as previously described in the literature.
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
      ,
      • Ishoi L.
      • Thorborg K.
      • Orum M.G.
      • Kemp J.L.
      • Reiman M.P.
      • Holmich P.
      How many patients achieve an acceptable state after hip arthroscopy for femoroacetabular impingement syndrome? A cross-sectional study including PASS cutoff values for the HAGOS and iHOT-33.
      If a PRO did not have an established value for PASS, as reported in the literature, one-half the standard deviation of the mean at baseline in the sample population was used to assess whether subjects achieved this threshold. An absolute value for r between 0.4 and 0.6 indicated a moderate strength of correlation between variables. All data analysis was conducted in SAS, version 9.4 (SAS Institute, Cary, NC). Statistical significance was considered P < 0.05 and established a priori.

      Results

      A total of 188 patents with minimum 1-year follow-up met inclusion criteria and were enrolled during the study time period. Thirty-three patients voluntarily withdrew, leaving 155 patients who completed the study. Due to the longitudinal nature of the study, there were 57 patients with missing follow-up data. The primary data analysis was performed on 98 patients (mean age 36.1 years, 67% female) who had complete data sets at all time points. A secondary longitudinal analysis was performed on 155 patients, which included patients who were missing any data points. All patients underwent primary hip arthroscopy to treat symptomatic mixed-type FAI with associated labral, chondral, and synovial pathology. Mean traction time was 34 minutes. Table 1 shows demographic data for both sample populations.
      Table 1Patient Characteristics
      Entire Sample (n = 155)Complete Data (n = 98)
      MeanSDMeanSD
      Age, y35.3812.8636.0513.22
      No.%No.%
      Sex
       Male55353132
       Female100656768
      Hip
       Left66434647
       Right89575253
      SD, standard deviation.
      The results from the primary analysis of PRO scores at baseline, 6 months, and 12 months postoperatively are shown in Table 2 and Figure 2. The difference of means between each time point for this same population is reported in Table 3 and Figure 3. There were significant improvements in mean PRO scores from baseline to 6 months for all 7 PRO instruments (P < .05). Similarly, there were significant improvements in difference of means from baseline to 12 months for all PRO instruments (P < .05). Between 6 months and 12 months postoperatively, there were significant improvements in mean HOS-Sport and PROMIS-PF scores (P < .05). The remaining outcome scores also had continued improvements from 6 months to 12 months postoperatively but these did not reach statistical significance.
      Table 2Means PRO Scores at Baseline, 6 Months, and 12 Months
      Baseline6 Months12 Months
      MeanSDMeanSDMeanSD
      HOS-ADL71.2513.8288.3711.1789.8813.69
      HOS-Sport56.9820.2470.5023.2578.8821.83
      NAHS61.2016.3583.4214.3384.7817.42
      mHHS58.1614.3681.5716.5781.8520.04
      PROMIS-PF41.535.9648.688.1251.4210.23
      PROMIS-PI59.715.9050.168.7148.878.83
      PROMIS-D44.629.7342.879.3342.009.16
      HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; NAHS, Non-Arthritic Hip Score; PRO, patient-reported outcome; PROMIS-D, Patient-Reported Outcomes Measurement Information System, Depression; PROMIS-PF, Patient-Reported Outcomes Measurement Information System, Physical Function; PROMIS-PI, Patient-Reported Outcomes Measurement Information System, Pain Interference; SD, standard deviation.
      Figure thumbnail gr2
      Fig 2Mean scores on all patient-reported outcome measures at baseline and postoperatively. (HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; NAHS, Non-Arthritic Hip Score; PROMIS, Patient-Reported Outcomes Information System.)
      Table 3Difference of Means for PROs Compared With Baseline (n = 98)
      Baseline to 6 MonthsBaseline to 12 Months6 Months to 12 Months
      Diff of MeanP ValueDiff of MeanP ValueDiff of MeanP Value
      HOS-ADL17.12<.000118.63<.0011.51.1682
      HOS-Sport13.52<.000121.9<.00018.38.0007
      NAHS22.22<.000123.58<.00011.36.3705
      mHHS23.41<.000123.69<.00010.28.866
      PROMIS-PF7.15<.00019.89<.00012.74<.0001
      PROMIS-PI–9.55<.0001–10.84<.0001–1.29.134
      PROMIS-D–1.75.02–2.62<.002–0.87.1895
      NOTE. Bold values indicate statistically significant differences (P < .05).
      HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; NAHS, Non-Arthritic Hip Score; PRO, patient-reported outcome; PROMIS-D, Patient-Reported Outcomes Measurement Information System, Depression; PROMIS-PF, Patient-Reported Outcomes Measurement Information System, Physical Function; PROMIS-PI, Patient-Reported Outcomes Measurement Information System, Pain Interference; SD, standard deviation.
      Figure thumbnail gr3
      Fig 3Difference of mean scores on all patient-reported outcome measures at baseline and postoperatively. (HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; NAHS, Non-Arthritic Hip Score; PROMIS, Patient-Reported Outcomes Information System.)
      Rates of achieving MCID and PASS thresholds are shown in Tables 4 and 5, respectively. MCID threshold was reached in the majority of patients by 6 months postoperatively for mHHS (82%), NAHS (81%), PROMIS-PI (77%), PROMIS-PF (65%), and HOS-Sport (60%). At 12 months postoperatively, MCID threshold was reached in the majority of patients for PROMIS-PI (83%), NAHS (81%), mHHS (80%), HOS-Sport (76%), and PROMIS-PF (73%). There was an increase in the percentage of patients who met MCID threshold from 6 months to 12 months postoperatively for HOS-Sports, PROMIS-PF, and PROMIS-PI. More than 70% of patients achieved PASS by 6 months postoperatively for HOS-ADL and PROMIS-PF scores.
      Table 4Rates of Achieving MCID Threshold Postoperatively
      MCID Threshold6 Months12 Months
      No.%No.%
      HOS-ADL9.027282728
      HOS-Sport6.059607476
      NAHS8.18
      Based off 1/2 standard deviation.
      79817981
      mHHS8.080827980
      PROMIS-PF5.164657273
      PROMIS-PI2.95
      Based off 1/2 standard deviation.
      75778183
      PROMIS-D4.87
      Based off 1/2 standard deviation.
      28293031
      HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; MCID, minimum clinically important difference; mHHS, Modified Harris Hip Score; NAHS, Non-Arthritic Hip Score; PROMIS-D, Patient-Reported Outcomes Measurement Information System, Depression; PROMIS-PF, Patient-Reported Outcomes Measurement Information System, Physical Function; PROMIS-PI, Patient-Reported Outcomes Measurement Information System, Pain Interference.
      Based off 1/2 standard deviation.
      Table 5Rates of Achieving PASS Threshold Postoperatively
      PASS Threshold6 Months12 Months
      No.%No.%
      HOS ADL87.071727374
      HOS Sport75.052536566
      mHHS74.070717273
      PROMIS-PF51.822224849
      HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; PASS, patient acceptable symptomatic state; PROMIS-PF, Patient-Reported Outcomes Measurement Information System, Physical Function.
      Correlations between SIP score and PRO scores comparing novice and expert examiners at both time postoperative time points are shown in Table 6. Negative Pearson correlation coefficients (r) for PROMIS-PI and PROMIS-D indicate the directionality of interpreting these PRO instrument scores in comparison with the other PRO tools (i.e., lower pain and depression scores indicate a better score). P values reached statistical significance for all measures except PROMIS-D in the expert group at the 6-month follow-up.
      Table 6Correlations Between SIP Score and PRO Scores at 12 Months Postoperatively
      6 Months12 Months
      ExpertNoviceExpertNovice
      rP ValuerP ValuerP ValuerP Value
      HOS-ADL0.50<.00010.50<.00010.36.010.45<.0001
      HOS-Sport0.37.00080.35.00130.41.00010.46<.0001
      NAHS0.45<.00010.47<.00010.38.0060.44<.0001
      mHHS0.50<.00010.48<.00010.43<.00010.53<.0001
      PROMIS-PF0.41<.00010.40.00020.41<.00010.44<.0001
      PROMIS-PI–0.32.0038–0.42.001–0.24<.0001–0.36.001
      PROMIS-D–0.17.1231–0.30.01–0.18.1012–0.33.0026
      NOTE. Bold values indicate statistically significant differences (P < .05).
      r = Pearson correlation coefficient.
      HOS-ADL, Hip Outcome Score – Activities of Daily Living; HOS-Sport, Hip Outcome Score –Sport-Specific Scale; mHHS, Modified Harris Hip Score; PASS, patient acceptable symptomatic state; PROMIS-D, Patient-Reported Outcomes Measurement Information System, Depression; PROMIS-PF, Patient-Reported Outcomes Measurement Information System, Physical Function; PROMIS-PI, Patient-Reported Outcomes Measurement Information System, Pain Interference.
      At 6 months’ postoperatively, a moderate-strength correlation was seen between the expert SIP score and PRO scores for mHSS (r = 0.50), HOS-ADL (r = 0.50), NAHS (r = 0.45), and PROMIS-PF (r = 0.41). Novice SIP scores also had moderate strength correlation with mHSS (r = 0.48), HOS-ADL (r = 0.50), NAHS (r = 0.47), PROMIS-PF (r = 0.40), and PROMIS-PI (r = 0.42) at 6 months’ postoperatively. Weak correlations were seen for HOS-Sport, HOS-ADL, PROMIS-D, and PROMIS-PI for novice and expert examiners at various time points as seen in Table 6. The strength of correlations remained similar between the 2 postoperative time intervals for combined examiners with a mean overall correlation of r = 0.40 and r = 0.39 at 6 and 12 months, respectively (Table 7). Comparison between examiner skill levels showed that the expert examiner had decreasing overall mean correlation strength of combined outcome measures from 6 months (r = 0.39) to 12 months (r = 0.34) postoperatively, whereas the overall mean correlation for the novice examiner improved marginally from r = 0.42 to r = 0.43.
      Table 7Overall Mean Correlation Strength (r) Across Examiners and Time Points
      6 Months12 Months
      Expert0.390.34
      Novice0.420.43
      Combined0.400.39
      r = Pearson correlation coefficient.
      Figure 4 presents the results of the secondary analysis (complete data set, n = 155) showing longitudinal changes in all PROs from baseline using general estimating equation regression analyses. All PROs similarly demonstrated a statistically significant improvement from baseline. Scores for HOS-ADL, HOS-Sport, NAHS, and PROMIS-PF significantly increased from baseline to 6 months and 12 months. Scores for PROMIS-D and PROMIS-PI significantly decreased over time.
      Figure thumbnail gr4
      Fig 4Longitudinal changes from baseline for all outcome measures using a generalized estimating equation regression model (N = 155).

      Discussion

      The main finding was that reported patient outcomes following hip arthroscopy had overall only weak-to-moderate strength of correlation with a surgeon’s prediction of those outcomes. The authors’ hypotheses were largely refuted by the results of this study. Although our results support that an experienced hip arthroscopist can reasonably predict outcomes for some patients, there was not overwhelmingly strong correlation between predicted and actual outcomes as was hypothesized.
      In addition, our data refute the hypothesis that expert examiners shoulder have better clinical judgment of surgical outcomes compared to a novice examiner. In fact, our data show that the novice examiner had maintained or even had continued improvement of predictive accuracy from 6 months to 12 months postoperatively, whereas the expert examiner showed trends of decreased strength of correlation over time.
      Surgeons use cognitive shortcuts, called heuristics, on conscious and subconscious levels in everyday practice to predict which patients they believe may have the best or worst surgical outcomes. Surgeons estimate risk intuitively through a complex cognitive process that weighs risk factors and draws on past experiences.
      • Ishoi L.
      • Thorborg K.
      • Orum M.G.
      • Kemp J.L.
      • Reiman M.P.
      • Holmich P.
      How many patients achieve an acceptable state after hip arthroscopy for femoroacetabular impingement syndrome? A cross-sectional study including PASS cutoff values for the HAGOS and iHOT-33.
      However, it is unknown exactly the mechanism by which surgeons can produce any risk estimation, thus is often regarded as a “black box” phenomenon where the inputs and outputs are known but internal algorithms are not well understood.
      Experienced surgeons often cite their clinical acumen and gestalt as a guide for decision-making. Yet, the results from our study suggest that perhaps there are limitations to even an experienced surgeon’s ability to predict outcomes. The lead surgeon (B.G.), an experienced hip arthroscopist with more than 12 years of surgical experience and performing more than 400 cases per year, had no better predictive ability than a more novice physician assistant with 10 years less experience in hip arthroscopy, and in some cases worse judgment was observed. At 6 months’ postoperatively, the attending surgeon and physician assistant had similar scores of clinical judgment, but interestingly, at time of final follow-up, the novice examiner had stronger clinical judgment of patient outcomes and the expert examiner had worse judgment.
      In contrast to our findings, Woodfield et al,
      • Baumhauer J.F.
      Patient-reported outcomes: Are they living up to their potential.
      reported that surgeons made meaningful preoperative predictions of major complications after abdominal surgery using a similar 100-mm VAS as used in the present study. They concluded that the unique contribution of a surgeon’s clinical assessment should be considered in predictive models for estimating surgical risks. Jacklin et al.
      • Jacklin R.
      • Sevdalis N.
      • Harries C.
      • Darzi A.
      • Vincent C.
      Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
      also used a JA model to assess the ability of trainee surgeons to predict the likelihood that a patient undergoing a laparoscopic cholecystectomy would need to be converted to an open approach. In that study, the authors found the mean correlation to be 0.48 ± 0.14 compared with a gold standard model. In comparison with the present study, the mean overall correlation across all examiners was 0.40.
      The results of this study also showed that most patients achieved clinically significant improvements by 6 months from the date of surgery, which is consistent with other reports in the literature.
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
      ,
      • Ishoi L.
      • Thorborg K.
      • Orum M.G.
      • Kemp J.L.
      • Reiman M.P.
      • Holmich P.
      How many patients achieve an acceptable state after hip arthroscopy for femoroacetabular impingement syndrome? A cross-sectional study including PASS cutoff values for the HAGOS and iHOT-33.
      Overall, the patients in this study met MCID and PASS thresholds at similar levels compared to other studies of patients undergoing hip arthroscopy for FAI. Ishoi et al.
      • Ishoi L.
      • Thorborg K.
      • Orum M.G.
      • Kemp J.L.
      • Reiman M.P.
      • Holmich P.
      How many patients achieve an acceptable state after hip arthroscopy for femoroacetabular impingement syndrome? A cross-sectional study including PASS cutoff values for the HAGOS and iHOT-33.
      found that less than one-half of patients (46%) undergoing hip arthroscopy for FAI had achieved PASS. Our data show similar values with 49% of patients in the present study achieving PASS thresholds for PROMIS-PF, although 73% achieved PASS thresholds for mHHS. Mullins and Carton
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
      found that 86% of competitive athletes undergoing hip arthroscopy for FAI achieved MCID for mHHS at the 2-year follow-up. In our study, only 73% of our patients achieved MCID for mHHS. However, it should be noted that our sample was a mixed population, and competitive athletes have been previously shown to achieve MCID at greater rates compared with nonathletes.
      • Clapp I.M.
      • Nwachukwu B.U.
      • Beck E.C.
      • Jan K.
      • Gowd A.K.
      • Nho S.J.
      Comparing outcomes of competitive athletes versus nonathletes undergoing hip arthroscopy for treatment of femoroacetabular impingement syndrome.
      It is well established that achieving MCID is dependent on baseline PRO scores due to a ceiling effect.
      • Mullins K.
      • Carton P.
      Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.

      Limitations

      This study was not without limitations. First, we used an unsophisticated prediction model as a means to quantify surgeon intuition of patient outcomes. We accept that the SIP questionnaire is not a validated instrument. However, our scale and methodology were modeled after studies with similar methodologic approaches of using a JA model to quantify surgeon assessment of risk and complications. Woodfield et al.
      • Woodfield J.C.
      • Pettigrew R.A.
      • Plank L.D.
      • Landmann M.
      • Van Rij A.M.
      Accuracy of the surgeons’ clinical prediction of perioperative complications using a visual analog scale.
      reported that surgeon’s risk estimates using a 100-mm VAS, although subjective, were still more accurate predictors of postoperative complications over objective data. It should be noted that due to this limitation, the lack of correlation between SIP score and PRO score may be due to the SIP tool itself rather than the surgeon’s ability to predict differences. Future studies should look to validate this instrument to assess whether it can truly detect differences in outcome scores. In addition, inter-rater reliability between surgeons and procedures also should be explored with further study in this area. A second limitation is the potential for performance bias if the attending surgeon were to consciously or subconsciously adjust surgical technique based on his preoperative risk estimates. The attending surgeon in this study performs more than 200 hip arthroscopies per year and limited variability by using a systematic approach to guide the surgical technique. In addition, the surgeon and novice examiner also were blinded to re-reviewing their own predictions before surgery and did not have access to review patient outcome scores during the course of the study. Third, due to the prospective and longitudinal nature of the study, we experienced patient withdraw and loss to follow-up, which decreased our sample size. However, this was believed to be important to maintain the integrity of our statistical analysis by only including patients with complete data sets in our primary analysis. We also performed a secondary analysis of our larger sample size, which had incomplete data sets. This secondary analysis demonstrated similar demographic and PRO scores compared with our primary data set. Next, we recognize that the results of a single surgeon and physician assistant using this predictive model does not necessarily allow generalizability to other surgeons and other procedures. Lastly, future studies also may look to use more sophisticated artificial intelligence models for predicting patient outcomes. Recent published work using machine learning models have shown accurate prediction of MCID achievement after hip arthroscopy.
      • Kunze K.N.
      • Polce E.M.
      • Rasio J.
      • Nho S.J.
      Machine learning algorithms predict clinically significant improvements in satisfaction after hip arthroscopy.

      Conclusions

      An experienced, high-volume hip arthroscopist had only weak-to-moderate ability to intuitively predict PRO. Surgical intuition and judgment were not superior in an expert examiner compared to a novice.

      Supplementary Data

      References

        • Baumhauer J.F.
        Patient-reported outcomes: Are they living up to their potential.
        N Engl J Med. 2017; 377: 6-9
        • Woodfield J.C.
        • Pettigrew R.A.
        • Plank L.D.
        • Landmann M.
        • Van Rij A.M.
        Accuracy of the surgeons’ clinical prediction of perioperative complications using a visual analog scale.
        World J Surg. 2007; 31: 1912-1920
        • Makhni E.C.
        • Makhni S.M.
        • Ramkumar P.N.
        Artificial intelligence for the orthopaedic surgeon: An overview of potential benefits, limitations, and clinical applications.
        J Am Acad Orthop Surg. 2021; 29: 235-243
        • Sevdalis N.
        • Rosamond K.
        Opening the “black box” of surgeon’s risk estimation: From intuition to quantitative modeling.
        World J Surg. 2008; 32: 324-325
        • Jacklin R.
        • Sevdalis N.
        • Harries C.
        • Darzi A.
        • Vincent C.
        Judgement analysis: A method for quantitative evaluation of trainee surgeons’ judgements of surgical risk.
        Am J Surg. 2008; 195: 183-188
        • Kunze K.N.
        • Polce E.M.
        • Nwachukwu B.U.
        • Chala J.
        • Nho S.J.
        Development and internal validation of supervised machine learning algorithms for predicting clinically significant functional improvement in a mixed population of primary hip arthroscopy.
        Arthroscopy. 2021; 37: 1488-1497
        • Kunze K.N.
        • Polce E.M.
        • Rasio J.
        • Nho S.J.
        Machine learning algorithms predict clinically significant improvements in satisfaction after hip arthroscopy.
        Arthroscopy. 2021; 37: 1143-1151
        • Kyin C.
        • Maldonado D.R.
        • Go C.C.
        • Shapira J.
        • Lall A.C.
        • Domb B.G.
        Mid- to long-term outcomes of hip arthroscopy: A systematic review.
        Arthroscopy. 2021; 37: 1011-1025
        • Sogbein O.A.
        • Shah A.
        • Kay J.
        • Memon M.
        • Simunovic N.
        • Belzile E.
        • Ayeni O.R.
        Predictors of outcomes after hip arthroscopic surgery for femoroacetabular impingement: A systematic review.
        Orthop J Sports Med. 2019; 7: 1-19
        • Mullins K.
        • Carton P.
        Arthroscopic correction of sports-related femoroacetabular impingement in competitive athletes: 2-year clinical outcome and predictors for achieving minimal clinically important difference.
        Orthop J Sports Med. 2021; 9: 1-11
        • Childs S.
        • Canham C.
        • Kenney R.J.
        • Silas D.R.
        • Adler K.
        • Giordano B.D.
        Correlation of PROMIS CAT with validated hip outcome scores in patients undergoing hip arthroscopy.
        Arthroscopy. 2017; 33: e15
        • Kollmorgen R.C.
        • Hutyra C.A.
        • Green C.
        • Lewis B.
        • Olson S.A.
        • Mather R.C.
        Relationship between PROMIS computer adaptive tests and legacy hip measures among patients presenting to a tertiary care hip preservation center.
        Am J Sports Med. 2019; 47: 876-894
        • Ishoi L.
        • Thorborg K.
        • Orum M.G.
        • Kemp J.L.
        • Reiman M.P.
        • Holmich P.
        How many patients achieve an acceptable state after hip arthroscopy for femoroacetabular impingement syndrome? A cross-sectional study including PASS cutoff values for the HAGOS and iHOT-33.
        Orthop J Sports Med. 2021; 9: 1-9
        • Clapp I.M.
        • Nwachukwu B.U.
        • Beck E.C.
        • Jan K.
        • Gowd A.K.
        • Nho S.J.
        Comparing outcomes of competitive athletes versus nonathletes undergoing hip arthroscopy for treatment of femoroacetabular impingement syndrome.
        Am J Sports Med. 2020; 48: 159-166