Lasse K HARRIS 1, Anders TROELSEN 1, Berend TERLUIN 2, Kirill GROMOV 1, Andrew PRICE 3, and Lina H INGELSRUD 1
1 Department of Orthopaedic Surgery, Copenhagen University Hospital Hvidovre, Copenhagen Denmark; 2 Department of General Practice, Amsterdam Public Health Research Institute, Amsterdam UMC, Amsterdam, The Netherlands; 3 Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, UK
Background and purpose — Developing meaningful thresholds for the Oxford Knee Score (OKS) advances its clinical use. We determined the minimal important change (MIC), patient acceptable symptom state (PASS), and treatment failure (TF) values as meaningful thresholds for the OKS at 3-, 12-, and 24-month follow-up in patients undergoing unicompartmental knee arthroplasty (UKA).
Patients and methods — This is a cohort study with data from patients undergoing UKA collected at a hospital in Denmark between February 2016 and September 2021. The OKS was completed preoperatively and at 3, 12, and 24 months postoperatively. Interpretation threshold values were calculated with the anchor-based adjusted predictive modeling method. Non-parametric bootstrapping was used to derive 95% confidence intervals (CI).
Results — Complete 3-, 12-, and 24-month postoperative data was obtained for 331 of 423 (78%), 340 of 479 (71%), and 235 of 338 (70%) patients, median age of 68–69 years (58–59% females). Adjusted OKS MIC values were 4.7 (CI 3.3–6.0), 7.1 (CI 5.2–8.6), and 5.4 (CI 3.4–7.3), adjusted OKS PASS values were 28.9 (CI 27.6–30.3), 32.7 (CI 31.5–33.9), and 31.3 (CI 29.1–33.3), and adjusted OKS TF values were 24.4 (CI 20.7–27.4), 29.3 (CI 27.3–31.1), and 28.5 (CI 26.0–30.5) at 3, 12, and 24 months postoperatively, respectively. All values statistically significantly increased from 3 to 12 months but not from 12 to 24 months.
Interpretation — The UKA-specific measurement properties and clinical thresholds for the OKS can improve the interpretation of UKA outcome and assist quality assessment in institutional and national registries.
Citation: Acta Orthopaedica 2022; 93: 634–642. DOI http://dx.doi.org/10.2340/17453674.2022.3909.
Copyright: © 2022 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for non-commercial purposes, provided proper attribution to the original work.
Submitted: 2022-04-05. Accepted: 2022-06-15. Published: 2022-07-05.
Correspondence: lasse.kindler.harris@regionh.dk
Conception and study design: LKH, AT, LHI. Collection and assembly of data: LKH. Analysis: LKH, AT, LHI. Interpretation of the data: LKH, AT, BT, KG, AP, LHI. Drafting of the manuscript: LKH, LHI. Critical revision and final approval of the article: LKH, AT, BT, KG, AP, LHI.
The authors would like to thank Dea Ravn for technical assistance and data management in the open-source programming language R, the patients for responding to the questionnaires, and the local staff at the Department of Orthopaedic Surgery for handling the data collection process on a daily basis.
Acta thanks Margareta Hedström for help with peer review of this study.
Unicompartmental knee arthroplasty (UKA) is deemed a viable alternative to total knee arthroplasty (TKA) for patients with severe knee osteoarthritis with a certain wear pattern (1). Patient-reported outcome measures (PROMs) are increasingly used to evaluate treatment effectiveness and quality of care from a patient-centered perspective (2). The Oxford Knee Score (OKS) is frequently used to assess pain and functional limitations after knee arthroplasty, on a scale ranging from 0 to 48 (worst to best) (3,4). However, meaningful interpretation of PROM data is challenging, as statistically significant improvements are not necessarily clinically meaningful (5). To help assign meaning to PROM scores, 3 interpretation threshold concepts have been suggested. The minimal important change (MIC) concept defines the smallest change score that is deemed important by the average patient (6). The patient acceptable symptom state (PASS) concept defines the score above which patients consider themselves well (7). The treatment failure (TF) concept defines the score below which patients consider their treatment to have failed (8).
Interpretation threshold values are considered context-specific, highlighting the importance of investigating possible differences across patient populations (5,9,10). No previous studies have determined MIC or TF values, but a PASS value of 41.5 points for the OKS in people undergoing UKA at 24 months postoperatively has been suggested (11). Additionally, in people undergoing TKA, a MIC value of 6.9 for the OKS at 6 months’ follow-up and TF values of 27 at 12 and 24 months has been presented (12,13). Although UKA is deemed a viable alternative to TKA it remains to be investigated as to whether PROM scores should be interpreted alike across both patient populations. There is a potential variability in interpretation threshold values across patient populations that is unclear. Therefore, we determined the MIC, PASS, and TF for the OKS at 3-, 12-, and 24-month follow-up after undergoing UKA.
This is a cohort study using data from a Danish hospital’s local arthroplasty registry. Between February 2016 and September 2021, all patients with scheduled UKA were asked to complete an electronic questionnaire during their preoperative visit to the hospital. Electronic follow-up questionnaires were emailed to patients at 3, 12, and 24 months postoperatively. 2 reminder emails with a 2-week interval and ultimately a paper version of the questionnaire were sent by postal mail to the patients if they failed to complete the electronic questionnaire or were without an email address. The yearly use of UKA at the hospital increased from 9% to 58% during the study period, mainly because more surgeons have adopted the surgical procedure and they all follow the same indication for UKA recommended by Hamilton et al. (14).
The data was from patients undergoing medial UKA, which has been routinely collected at the Danish hospital. The inclusion criterion was patients undergoing primary surgery for knee osteoarthritis. The exclusion criterion was patients undergoing revision surgery. If patients were registered as having bilateral medial UKAs within the study period, we selected the first to be included.
The OKS is a 12-item questionnaire assessing degree of pain and function summed to a total score between 0 and 48 (worst–best) (3). Adequate validity, reliability, and responsiveness characteristics for the OKS in patients undergoing knee arthroplasty has been reported (15). Additionally, at each postoperative time-point, 3 anchor questions were responded to (Table 1, see Supplementary data). First, patients were asked whether they had experienced overall changes in symptoms since the knee surgery: “How are your knee problems now compared with prior to your operation?” Response options were on a 7-point scale (16). Patients answering “better, an important improvement” or “somewhat better, but enough to be an important improvement were classified as being importantly improved. Patients answering “worse, an important deterioration” or “somewhat worse, but enough to be an important deterioration” were classified as being importantly deteriorated. Patients answering “about the same” or “very small improvement/deterioration, not enough to be an important improvement/deterioration” were classified as unchanged. Second, patients were asked: “Taking into account all the activities you have during your daily life, your level of pain, and also your functional impairment, do you consider that your current state is satisfactory?” (yes/no) (7). Finally, if the patients responded in terms of not having a satisfactory symptom state, they were asked: “Would you consider your current state as being so unsatisfactory that you think the treatment has failed?” (yes/no) (8).
Patient characteristics were reported as mean (SD) or median (IQR) for continuous variables, and as frequency and percentage distribution for categorical variables. The OKS change score distributions across anchor response options were investigated using boxplots.
An anchor-based approach was used to calculate interpretation threshold values. This approach involved anchoring the OKS to anchor question responses. We used the predictive modeling method developed to estimate MIC thresholds because of the reported methodological advantages compared with the commonly used receiver operating characteristic (ROC) method (17). The predictive modeling method is centered on a logistic regression using the dichotomized anchor response as the dependent variable and the change in OKS for MIC improvement, or the postoperative OKS for PASS and TF, as the independent variable. The thresholds were the OKS corresponding to a likelihood ratio of 1, which means that the postoperative odds of being importantly improved or having a satisfactory symptom state are the same as the preoperative odds for improvement or having a satisfactory symptom state (17). The predictive modeling method, and the ROC method, is biased if the proportion being importantly improved or having a satisfactory symptom state differs from 50%. This biases results in overestimation of the threshold if the proportion is greater than 50% or underestimation if the proportion is smaller than 50%. Consequentially, we used an adjustment to the threshold for unequal proportions of patients with the following equation proposed by Terluin et al. (18):
MICadjusted = MICpred – (0.090 + 0.103 * Cor) * SDchange * log-odds(imp).
In this equation, Cor is the point biserial correlation between postoperative OKS and the anchor, SDchange is the SD of the OKS change score, and log-odds(imp) is the natural logarithm of (proportion improved/[1 – proportion improved]). Similarly, we used the equation fitted for PASS and TF thresholds to adjust for unequal proportions of people with a satisfactory symptom state. Additionally, we used bootstrapping (n = 1,000) to obtain 95% confidence intervals (CI) reported as 0.025–0.975 quantiles. Furthermore, we tested whether threshold values differed statistically between the 3 follow-up timepoints by evaluating the 95% CI around the mean differences between the 1,000 bootstrap samples for each timepoint, calculated as the 0.025–0.975 quantiles of differences.
We additionally calculated interpretation threshold values using the ROC method, enabling the comparison of the predictive modeling method with this traditional method. Optimal values were identified using the Youden index (19).
Correlations were calculated to assess anchor validity. Point-biserial correlation was calculated for dichotomized MIC anchors and the change in OKS, and for both PASS and TF anchors and the postoperative scores. Polyserial correlation was additionally calculated for the 7-level MIC anchor responses and change, plus preoperative and postoperative OKS.
We investigated baseline dependency of MIC values by randomly splitting the OKS item set into 2 separate scales, using one scale to stratify into low and high baseline subgroups, and the other scale to calculate MIC values (20). Splitting the OKS is a workaround to create an independent second baseline measurement to avoid redistributing measurement error by stratifying on the baseline score. Because some variation may occur depending on the exact division of the items, we repeated the random splitting of OKS 5 times and estimated the average MIC value for each baseline group, as recommended (20). Baseline dependency of PASS and TF values was investigated by calculating these values on median-split datasets. Finally, we tested whether threshold values were statistically significantly different between the baseline groups by performing item-split (MIC) and median-split (PASS and TF) analyses on 1,000 bootstrap samples. We calculated mean differences and reported 95% CI as 0.025–0.975 quantiles of the mean differences. For all analyses, R version 4.1.2 (http://www.r-project.org/) was used.
This study was carried out in accordance with the Helsinki declaration. The local arthroplasty registry has been approved by the Danish Data Protection Agency (Journal number HVH-2012-048). In Denmark, register-based studies using only questionnaire data require no approval from the ethical committee. The Department of Orthopaedic Surgery at the hospital fully funded this project. No potential conflicts of interest are declared by the authors in relation to this study.
Complete data was obtained for 331 of 423 (78%), 340 of 479 (71%), and 235 of 338 (70%) patients at 3-, 12-, and 24-month follow-up, respectively (Figure 1). At surgery, patients responding at the follow-up timepoints had a median age of 68–69 years and 58–59% were female (Table 2). Patients with complete data and patients with missing data had similar age characteristics. Patients with missing data were more often male in the 3-month group, had higher BMI in the 12-month group, and had worse preoperative OKS and lower overall self-rated health in the 12- and 24-month groups compared with patients with complete data (Table 3, see Supplementary data).
Factor | 3 months n = 331 | 12 months n = 340 | 24 months n = 235 |
Age a | 69 (61–74) | 68 (61–74) | 68 (60–74) |
Female sex | 194 (59) | 196 (58) | 138 (59) |
BMI a | 29 (26–34) | 29 (25–33) | 28 (25–32) |
ASA | |||
1 | 20 (6) | 26 (8) | 23 (10) |
2 | 257 (78) | 255 (75) | 176 (75) |
3 | 53 (16) | 58 (17) | 36 (15) |
4 | 1 (0) | 1 (0) | – |
KL grade | |||
2 | 3 (1) | 10 (3) | 13 (5) |
3 | 79 (24) | 92 (27) | 68 (29) |
4 | 249 (75) | 238 (70) | 154 (66) |
OKS a | 23 (17–28) | 24 (19–28) | 24 (19–29) |
EQ5D index a | 0.66 (0.59–0.72) | 0.72 (0.62–0.72) | 0.72 (0.63–0.72) |
EQ5D VAS a | 70 (50–80) | 70 (50–80) b | 70 (51–80) b |
a Values are median (0.025–0.975 quantile range). b Missing data, n = 1. KL grade: Kellgren & Lawrence classification. OKS: Oxford Knee Score. EQ5D: EuroQol 5-Dimension. VAS: visual analog scale. |
Figure 1. Flowchart of patients enrolled. OKS, Oxford Knee Score.
At 3 months postoperatively, the overall percentage of patients reporting important improvements was 87%, while 4% reported being importantly deteriorated. 89% of patients reported important improvements at 12 and 24 months, while 4% and 5% reported being importantly deteriorated, respectively (Table 4).
Postoperative OKS change scores were generally higher for patients feeling importantly improved, in comparison with those feeling importantly deteriorated or unchanged in symptoms (Figure 2).
Figure 2. Oxford Knee Change scores at 3, 12, and 24 months postoperatively by minimal important change anchor question response categories ranging from “better, an important improvement” to “worse, an important deterioration.” Horizontal bars present the median, the box the interquartile range (IQR), the whiskers the maximum and minimum scores within 1.5 * IQR from the box, and • represents outliers.
At 3 months postoperatively, 82% considered themselves to have satisfactory symptoms, while 4% considered their symptoms state as being so unsatisfactory that they considered the treatment to have failed. At 12 and 24 months the proportion of patients satisfied with their symptom level was 83% and 85%, while 8% and 9% considered the treatment to have failed, respectively (Table 5).
Postoperative OKS were generally higher for patients considering their symptom level to be satisfactory, in comparison with those considering the treatment to have failed or neither (Figure 3)
Figure 3. Oxford Knee Score distribution at 3, 12, and 24 months postoperatively for patients with satisfactory symptoms, considering the treatment to have failed, or neither. See Figure 2 for boxplot interpretation.
The point-biserial correlations between the dichotomized MIC anchor and the change in OKS were 0.43, 0.49, and 0.56 at 3, 12, and 24 months. Correlations between PASS and TF anchor questions and the postoperative OKS were 0.55 and 0.33 at 3 months, 0.67 and 0.53 at 12 months, and 0.67 and 0.59 at 24 months, respectively. Polyserial correlations for MIC anchor responses and change, preoperative and postoperative OKS, as well as point-biserial correlation for PASS and TF anchor responses and preoperative and postoperative OKS are presented in supplementary data (Table 6, see Supplementary data).
When MIC values were adjusted for the high proportion of improved patients the OKS threshold values were 4.7 (CI 3.3–6.0) at 3 months, 7.1 (CI 5.2–8.6) at 12 months, and 5.4 (CI 3.4–7.3) at 24 months postoperatively. When PASS values were adjusted for the high proportion having satisfactory symptoms the OKS values were 28.9 (CI 27.6–30.3), 32.7 (CI 31.5–33.9), and 31.3 (CI 29.1–33.3) at 3, 12, and 24 months, respectively. When TF values were adjusted for the small proportion considering their treatment to have failed the OKS values were 24.4 (CI 20.7–27.4) at 3 months, 29.3 (CI 27.3–31.1) at 12 months, and 28.5 (CI 26.0–30.5) at 24 months, respectively (Table 7).
Follow-up | n | MIC value (CI) a | PASS value (CI) a | TF value (CI) a |
3 months | 331 | 4.7 (3.3–6.0) | 28.9 (27.6–30.3) | 24.4 (20.7–27.4) |
12 months | 340 | 7.1 (5.2–8.6) | 32.7 (31.5–33.9) | 29.3 (27.3–31.1) |
24 months | 235 | 5.4 (3.4–7.3) | 31.3 (29.1–33.3) | 28.5 (26.0–30.5) |
a 95% confidence intervals (CI) are the 0.025–0.975 quantiles of the 1,000 bootstrap threshold values. |
The interpretation threshold values increased statistically from 3 to 12 months, but not from 12 to 24 months postoperatively (Table 8, see Supplementary data).
The interpretation threshold values were consistently higher for patients in the low baseline subgroup than in the high baseline subgroup for all postoperative timepoints (Table 9, see Supplementary data).
Interpretation threshold values calculated with the adjusted predictive modeling method were lower and the CIs were generally narrower compared with the ROC method (Table 10, see Supplementary data).
This cohort study from a Danish public hospital estimated interpretation threshold values for the OKS at 3-, 12-, and 24-month follow-up in patients undergoing UKA. Adjusted MIC values were 4.7, 7.1, and 5.4 points, adjusted PASS values were 28.9, 32.7, and 31.3 points, and adjusted TF values were 24.4, 29.3, and 28.5 points at 3, 12, and 24 months postoperatively, respectively. All values increased statistically from 3 to 12 months but not from 12 to 24 months.
The adjusted OKS MIC values we found lie in the range of previously published values. No studies have previously determined these values in patients undergoing UKA exclusively. 2 studies using the same methodological approach as ours found values of 7 and 8 points at 6 and 12 months, respectively, in patients undergoing TKA (12,16). However, other TKA studies found values of 9 points at 6 months, and 5 points at 12 months, but by using different anchor questions and statistical approaches (21,22). These findings suggest that the postoperative OKS MIC scores in general are similar in patients undergoing UKA and TKA, but the values may depend on the statistical method used (12,16,21,22). We found a small but statistically higher adjusted OKS MIC value between 3 and 12 months postoperatively, suggesting that patients’ expectations of pain-levels and knee function increase with time after undergoing UKA. However, the non-statistical difference between 12- and 24-month values also suggests that these patient expectations may stabilize after 12 months.
To our knowledge, only 1 study has previously determined adjusted OKS PASS values in people undergoing UKA at 24 months postoperatively (11). That study proposed a cut-off value of 41.5 points which is 10.2 points higher compared with our finding. The large difference could be explained by the ROC analysis used on a population where the proportion improved was very high (92.7%), possibly causing an upward biased value in the comparative study (11). However, our adjusted OKS PASS and TF values found at 3 and 12 months postoperatively are within 3 points of the proposed cut-offs previously suggested in a study using the same method for patients undergoing TKA (13). These findings suggest that the OKS PASS and TF scores are similar in patients undergoing UKA and TKA. We found that both adjusted OKS PASS and TF values increased from 3 to 12 months, suggesting that patients accept a higher symptom level early after surgery, while requiring better functional status at 12 months postoperatively. Additionally, we found that the adjusted OKS TF threshold values were between 2.8 and 4.5 points below PASS thresholds, suggesting that the area where people neither consider their symptom levels satisfactory nor consider their treatment to have failed is narrow and perhaps redundant. However, the low number of patients considering their treatment to have failed at 3 (n = 13 [4%]), 12 (n = 26 [8%]), and 24 months (n = 22 [9%]) makes these assumptions uncertain.
We demonstrated that using different statistical approaches yields different interpretation threshold values. First, the predictive modeling method derived cut-offs with greater precision (i.e., CIs were narrower) compared with the ROC method (17). Second, we demonstrated how the adjusted predictive modeling method altered the cut-offs as the proportion of patients being importantly improved, feeling satisfactory symptoms, or feeling treatment failure differed greatly from 50% (18). These findings align with previous studies, and emphasize the preference of the predictive modelling method above the ROC method (12,13,16).
Preoperative symptom status impacts on the interpretation threshold values. We demonstrated baseline dependency of the threshold values at all postoperative timepoints except for TF at 3 months. Likewise, previous studies determined baseline dependency of OKS PASS and TF values, using a similar methodological approach, in patients undergoing TKA (13,23). For the MIC, previous results are sparse and conflicting with different methods used to evaluate baseline dependency (16,24). We demonstrated baseline dependency also of MIC values, using a newly developed method that avoids redistributing measurement error (20). The adjusted predictive model cut-offs for the low baseline subgroups were from 4.0 to 6.4 points lower than the high baseline subgroups. These findings support the notion that patients who are in a poor health condition need greater improvement to consider their change important, but are concurrently willing to accept an overall worse outcome than patients who are in a better health condition (25). The implication of baseline dependency is that when applying the threshold values, it is important to select the value that derives from a patient population with comparable preoperative status as the population under study.
Providing meaningful interpretation threshold values for the OKS has both scientific and clinical implications. They can help improve the interpretation of studies using OKS as an outcome measure. Additionally, arthroplasty registries collecting the OKS are provided with a tool to monitor quality of treatment from the patient-centered perspective. Furthermore, from a clinical perspective, the values at 3, 12, and 24 months postoperatively may be used as reference values for what the “average” patient undergoing UKA would deem as an important improvement, a satisfactory symptom state, and a state feeling that their treatment has failed. If the OKS is used in clinical practice, these interpretation thresholds could lead to greater understanding and better applicability for clinicians and patients in the shared decision-making process. Our study suggests that PROM scores can be interpreted using the same interpretation values across both UKA and TKA populations.
This study has limitations. The data having been collected at 1 public hospital in Denmark possibly limits the generalizability of the interpretation threshold values found in this study. Furthermore, between 70% and 78% of the patients receiving a UKA provided complete data, possibly introducing selection bias, further lowering the generalizability of the findings. It could be that patients answering the follow-up questionnaires are those generally feeling satisfied with their treatment result. However, considering hospital uptake area, coverage of both urban and rural geographical areas, and patient characteristics depicting the nationwide Danish Knee Arthroplasty Register supports the representativeness of our study population in a Danish context (26). Additionally, because the adjusted predictive modeling method requires normally distributed scores and change scores, this study could potentially provide biased values. Skewness in either direction may cause downward bias for the MIC and if the skew is right- or left-sided it causes downward or upward bias for the PASS and TF, respectively. Nonetheless, before the suggested values are applicable in other countries and cultures, they must be compared with similar data derived from preferably large-scale international registries.
In conclusion, we believe the development of UKA-specific measurement properties and clinical thresholds for the OKS may guide the interpretation of UKA studies using this PROM. Additionally, all values increased from 3 to 12 months postoperatively, implying that patients have higher expectations regarding their knee pain and function long term. Similar studies should investigate the external validity of these values.
Factor | Non-responders 3 months n = 92 | Responders 3 months n = 331 | p-value | Non-responders 12 months n = 139 | Responders 12 months n = 340 | p-value | Non-responders 24 months n = 103 | Responders 24 months n = 235 | p-value |
Age a | 67 (59–74) | 69 (61–74) | 0.3 | 67 (58–74) | 68 (61–74) | 0.2 | 68 (60–74) | 68 (60–74) | 0.9 |
Female sex | 41 (45) | 194 (59) | 0.01 | 72 (52) | 196 (58) | 0.3 | 53 (52) | 138 (59) | 0.3 |
BMI a | 30 (26–34) | 29 (26–34) | 0.1 | 30 (26–35) | 29 (25–33) | 0.01 | 29 (26–34) | 28 (25–32) | 0.2 |
ASA | |||||||||
1 | 3 (3) | 20 (6) | 0.03 | 12 (9) | 26 (8) | 0.4 | 7 (7) | 23 (10) | 0.1 |
2 | 62 (68) | 257 (78) | 95 (68) | 255 (75) | 71 (69) | 176 (75) | |||
3 | 27 (29) | 53 (16) | 32 (23) | 58 (17) | 25 (24) | 36 (15) | |||
4 | – | 1 (0) | – | 1 (0) | – | – | |||
KL grade | |||||||||
2 | 2 (2) | 3 (1) | 0.03 | 10 (7) | 10 (3) | 0.4 | 7 (7) | 13 (5) | 0.4 |
3 | 24 (26) | 79 (24) | 40 (29) | 92 (27) | 23 (22) | 68 (29) | |||
4 | 66 (72) | 249 (75) | 89 (64) | 238 (70) | 73 (71) | 154 (66) | |||
OKS a | 21 (16–28) b | 23 (17–28) | 0.3 | 19 (15–24) d | 24 (19–28) | 0.01 | 20 (16–26) s | 24 (19–29) | 0.01 |
EQ5D index a | 0.69 (0.57–0.77)b | 0.66 (0.59–0.72) | 0.5 | 0.66 (0.50–0.72) e | 0.72 (0.62–0.72) | 0.1 | 0.66 (0.34–0.72) h | 0.72 (0.63–0.72) | 0.09 |
EQ5D VAS a | 70 (50–80) c | 70 (50–80) | 1 | 60 (39–80) e | 70 (50–80) f | 0.05 | 50 (41–79) h | 70 (51–80)f | 0.01 |
Abbreviations: see Table 1. P–values calculated with Wilcoxon signed rank test for continuous variables and chi–square test for dichotomous variables. a Numbers are median (0.025–0.975 quantile range). b–h Missing data, b n = 58, c n = 59, d n = 79, e n = 78, f n = 1, g n = 58, h n = 57. |