Giuseppe RINONAPOLI, Lorenzo LUCCHETTA, Giulio ANCILLAI, Francesco MANFREDA, Paolo CECCARINI, Auro CARAFFA
Orthopedics and Traumatology Department, University of Perugia, S. Maria della Misericordia Hospital, Perugia, Italy
Background and purpose — We aimed to evaluate the diagnostic accuracy of 6 clinical tests for meniscal tears comparing them with MRI and arthroscopy in a cross-sectional study.
Methods — 255 patients (20–45 years) with knee trauma were examined by 2 orthopedic surgeons blinded to the patient’s history, MRI result, and the first clinical examination. The clinical tests (Joint Line Tenderness, McMurray, Apley, Thessaly, Ege, and Hyper-flexion) were conducted between 5 and 7 days post-injury (T1) and 4–5 weeks post-injury (T2). Diagnostic accuracy was determined based on MRI and arthroscopic findings, evaluating sensitivity, specificity, and predictive values.
Results — Arthroscopy confirmed 188 meniscal tears. The McMurray demonstrated the most balanced performance, with sensitivity of 91% at T1 to 80% at T2 with specificity increase from 55% to 79% showing the highest positive predictive value (PPV) of 92% at T2. Combining McMurray and Apley yielded the best accuracy minimizing false positive. McMurray and Hyper-flexion were more sensitive to medial chondropathy; Thessaly, Ege, and Hyper-flexion were more influenced by anterior knee pain.
Conclusion — No single clinical test was sufficiently reliable for independent diagnosis, reinforcing the need for MRI confirmation and further refinement of clinical evaluation strategies.
Citation: Acta Orthopaedica 2025; 96: 452–458. DOI: https://doi.org/10.2340/17453674.2025.43906.
Copyright: © 2025 The Author(s). Published by MJS Publishing – Medical Journals Sweden, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)
Submitted: 2025-03-23. Accepted: 2025-05-16. Published: 2025-06-25.
Correspondence: DrLorenzoLucchetta@pm.me
GR, LL, and PC: wrote the manuscript; FM and AC: revised the draft and reviewed the literature; AC and GR: planned the surgical case and followed-up the patients; GR and AC: planned and performed the surgery; LL and GA: conceived the manuscript and followed-up the patients.
Handling co-editor: Per Henrik Randsborg
Acta thanks Srinivas B S Kambhampati, Michael Schneider, and Franky Steenbrugge for help with peer review of this manuscript.
Arthroscopic meniscus procedures have become one of the most common orthopedic procedures performed in the United States. Given an estimated meniscal injury incidence of 60 per 100,000 population the right diagnostic management is essential [1,2].
Along with a carefully taken history, meniscal tests are the most reliable and cost-effective tools for diagnosing meniscal injury. The clinical diagnosis of a meniscal lesion relies heavily on the clinician’s expertise and ability to interpret patient-reported symptoms and signs. Moreover, diagnosing meniscal tears in the acute phase can be particularly challenging due to pain, swelling, and limited range of motion.
Traditionally, non-weight-bearing tests such as McMurray’s and Apley’s have been widely used, while weight-bearing alternatives, such as the Thessaly and Ege’s tests, have gained attention due to their potential to better mimic functional knee loading [3-5]. Recent literature reviews and meta-analyses have attempted to evaluate the diagnostic accuracy of these tests [4-6]; however, there remains significant debate regarding their reliability, with no single test demonstrating consistently high sensitivity and specificity [7]. Some authors continue to debate the routine use of MRI before arthroscopy, suggesting that in cases of clinically positive knee findings, avoidance of this could help reduce costs and prevent unnecessary invasive procedures [8,9].
Given these challenges, our study aims to provide an updated evaluation of 6 of the most widely used clinical meniscal tests, Joint Line Tenderness, McMurray, Apley, Thessaly, Ege, and Hyper-flexion, by analyzing their diagnostic performance in an acute and a sub-acute setting, against MRI and arthroscopy findings.
This is a retrospective cross-sectional study performed between 2020 and 2023 on all patients between 20 and 45 years old with any signs of knee trauma who were referred to the Department of Orthopedics and Traumatology, Santa Maria della Misericordia Hospital, Perugia. Orthopedic surgeons FM and GA collected the result of each patient’s test data and included in the study only those patients who presented concordant test results between the 2 blinded examiners (GR, AC).
The arthroscopic procedures were performed by the 2 senior orthopedic surgeons (GR, AC). The arthroscopic findings and associated lesions diagnosed during the arthroscopies were recorded as a conclusive diagnosis. Patients with incomplete, missing, or discordant results between the 2 blinded examiners were excluded based on a predefined secondary exclusion criterion.
The study is reported according to STARD guidelines.
Inclusion criteria were any signs of knee trauma involving the meniscus or cruciate ligament regardless of the presence of a diagnostic MRI. Patients with a diagnosis of anterior cruciate ligament (ACL) tear were included in the study. The primary exclusion criteria were diagnosed osteoarthritis, rheumatic diseases, psoriatic arthritis, previous articular fractures, previous surgery, and previous septic knee arthritis.
Patients with knee pain following an acute knee injury were clinically examined by a first orthopedic surgeon (FM) and underwent magnetic resonance imaging (MRI). This first examiner gathered information regarding the patient’s history of pain and previous pathologies of the affected knee. Those patients who reported at least 3 episodes of anterior knee pain over the previous 3 months, in the absence of trauma, lasting more than 48 hours, and corresponding to the characteristics of patellofemoral pain syndrome (PFS), were classified as “anterior knee pain patients” (AKP). PFS was characterized by pain behind or around the patella and crepitations, provoked by ascending or descending stairs, squatting, prolonged sitting with flexed knees, running, and cycling. PFS patients were excluded from the AKP group if, during the arthroscopic findings, they had patellofemoral chondropathy.
All patients who presented with mechanical locking of the knee (n = 56) were included in a different intervention group (group L). Mechanical locking knees were defined as knees that demonstrate fixed flexion or have a blockage of full extension that cannot be passively straightened [10].
Patients were clinically tested, in 2 windows of time, between the fifth and the seventh day (T1) and between the fourth and the fifth week (T2) from injury by 2 experienced orthopedic surgeons (GR, AC) who had 10 years of experience in knee surgery with more than 2,000 arthroscopies, and were blinded to patient history, MRI results ,and the first clinical examination.
Hyper-Flexion Test [11]. With the patient supine, the knee is passively mobilized in flexion. The test is positive if the patient has referred pain over 120° of flexion.
Joint Line Tenderness (JLT) [11]. With the patient supine and the knee bent at 90°, the examiner palpates along the joint line from the medial side of the patellar ligament to the back of the knee to detect specific points of tenderness that exceed normal discomfort. The test is then compared with the contralateral joint. The test was positive if the point of maximal tenderness is the point that exceeds normal discomfort and that is more tender than the contralateral leg at the same anatomic location. Reported specificity was 29–77% with sensitivity of 63–76% [4,5].
McMurray Test [12]. The patient lies supine with the knee fully bent. The examiner externally rotates the tibia while extending the knee to elicit any clicking or pain along the joint line, which may indicate a medial meniscal tear. To check the lateral meniscus the maneuver is equal but with the tibia internally rotated. The test was considered positive if during the tibia’s rotations the patients experienced pain in the meniscal compartment. Reported specificity was 71–98% with sensitivity of 16–71% [4,5].
Apley Test [13]. The patient lies prone. The examiner bends the knee at 90° and then internally rotates the lower leg while applying downward pressure to test the lateral meniscus; the maneuver is then repeated on external rotation to test the medial meniscus. The test was considered positive if during the compression the patient experienced pain in the meniscal compartment. Reported specificity was 70–94% with sensitivity of 13–61% [4,5].
Thessaly Test [3]. While standing, the patient rotates the knee and body first with the knee bent at 5° and then at a 20° angle. A positive test for meniscal tear is suggested by joint line discomfort or a locking/catching sensation. Reported specificity was 65–90% with sensitivity of 64–90% [4,5].
Ege’s Test [3]. In a standing position, the patient squats with the legs at maximal external rotation to evaluate the medial meniscus and then repeats the movement with the legs at maximal internal rotation for the lateral meniscus. The test is positive for meniscal tears as indicated by pain, or a clicking sound around ≥ 90° of knee flexion for posterior horn tears, whereas for anterior horn located tears the symptoms are in earlier knee flexion. Reported specificity was 77% with sensitivity of 65% [4,5].
Indication for arthroscopic procedure was given in the following cases: (i) positivity of clinical examination (for meniscal lesion or ACL tear), confirmed by positivity of MRI, (ii) positive MRI, combined with uncertain clinical diagnosis, (iii) positive clinical examination, in association with negative MRI. Each MRI image was examined by a senior diagnostic radiologist.
Finally, orthopedic surgeons FM and GA collected the result of each patient’s test data and included in the study only those patients who presented concordant test results between the 2 blinded examiners (GR, AC) (secondary exclusion criterion). The arthroscopic procedures were performed by the 2 senior orthopedic surgeons (GR, AC). The arthroscopic findings and associated lesions diagnosed during the arthroscopies were recorded as a conclusive diagnosis.
The measures of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (+LR), negative likelihood ratio (–LR), diagnostic odds ratio (DOR), and overall accuracy were performed, each accompanied by their 95% confidence intervals (CIs).
Different subgroup analyses were conducted to evaluate whether there was a favorable combination of the tests that could reduce false positives and if there were differences in the tests by dividing lateral and medial meniscus pathologies. Regarding the combination of 2 or more tests, we considered a combination of tests to be negative if even 1 test was negative, and this method tended to increase the number of true positives while decreasing the number of false negatives.
Lastly, the associated lesions were combined with test results using a regression model, to better understand their statistical influence on each clinical test. A P value < 0.05 was considered significant.
Jamovi software (version 2.5; https://www.jamovi.org/) was used to perform the statistical data elaborations.
The study is retrospective, and it follows the Helsinki ethical principles for appropriate medical research. Informed consent was obtained from all participants, and their rights were protected in accordance with ethical guidelines. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors declare no conflict of interest. Complete disclosure of interest forms according to ICMJE are available on the article page, doi: 10.2340/17453674.2025.43906
From the initial pool of 400 patients, 14 met the primary exclusion criteria (7 previous knee surgery, 5 diagnosed osteoarthritis, 1 rheumatoid arthritis, 1 psoriatic arthritis); 56 patients with mechanical locked knee were included in Group L. Of these 330 patients, 48 were excluded because of no indication for arthroscopic procedures and 27 patients were excluded due to discordance in the test assessment between the 2 examiners, resulting in 255 patients who underwent an arthroscopy procedure (Figure).

Flowchart of patient selection.
Among the 255 included in the study 191 were males and 64 were females. The mean age was 28.1 years, standard deviation (SD) 6.3 (range 20–45). Arthroscopy was performed an average of 56 days from injury. In the 255 cases examined, a comparison between the clinical tests and the arthroscopic findings was performed, using arthroscopy as the gold standard for meniscal tear diagnosis.
In 188 cases meniscal tears were diagnosed and then compared with each patient’s clinical and MRI findings.
McMurray had the most balanced performance across both physicians’ assessments, with a sensitivity of 91% and specificity of 55% at T1, and of 80% and 79% respectively at T2. It also exhibited the highest positive predictive value (PPV) at 92% at T2, making it highly reliable for confirming meniscal lesions (Tables 1 and 2).
| Tests | TP | TN | FP | FN | SEN (CI) | SPE (CI) | PPV (CI) | PPV (CI) | ACC % | LR+ | LR– | Pre-t O % | Post-t O % | Post-t PROB % |
| JLT | 170 (90) | 34 (51) | 33 (49) | 18 (9.6) | 90 (85–94) | 51 (38–63) | 84 (78–88) | 65 (51–78) | 81 | 1.8 | 0.18 | 2.1 | 5.1 | 84 |
| McMurray | 151 (80) | 53 (79) | 14 (21) | 37 (20) | 80 (74–86) | 79 (67–88) | 92 (86–95) | 59 (48–69) | 0.8 | 3.8 | 0.25 | 2.8 | 11. | 92 |
| Apley | 160 (85) | 50 (75) | 17 (15) | 28 (15) | 85 (79–90) | 75 (62–84) | 90 (85–94) | 64 (52–74) | 82 | 3.3 | 0.20 | 2.8 | 9.4 | 90 |
| Thessaly | 172 (91) | 44 (66) | 23 (34) | 16 (8.5) | 91 (86–95) | 66 (53–77) | 88 (83–92) | 73 (60–84) | 85 | 2.6 | 0.13 | 2.8 | 7.4 | 88 |
| Ege | 168 (89) | 41 (61) | 26 (39) | 20 (11) | 89 (84–93) | 61 (48–73) | 87 (81–91) | 67 (54–78) | 83 | 2.3 | 0.17 | 2.9 | 7.3 | 88 |
| Hyper-Fl. | 148 (89) | 32 (48) | 35 (52) | 40 (21) | 79 (72–84) | 48 (36–60) | 81 (74–86) | 44 (33–57) | 71 | 1.5 | 0.44 | 2.8 | 4.2 | 81 |
| For abbreviations, see Table 1 | ||||||||||||||
The Thessaly test achieved a sensitivity of 98% at T1 and 91% at T2, but showed remarkable improvements in specificity, increasing from 27% at T1 to 66% at T2. Similarly, the Apley test showed a marked increase in specificity, from 33% at T1 to 75% at T2, while its sensitivity decreased from 96% to 85%. Pain on Hyper-Flexion exhibited the lowest precision, with a significant drop in sensitivity from 98% at T1 to 79% at T2.
The best combination of tests was found to be McMurray and Apley, yielding 20 false positives out of 67 cases without lesions. Among combinations of 3 tests, McMurray–Apley–Thessaly and Apley–Thessaly–Ege showed the best outcomes with 26 and 27 false positives, respectively. The full table of test combinations is available in Supplementary Tables 4–7.
Meniscal lesions affected the medial meniscus (MM) in 56% of the cases compared with the lateral meniscus (LM) at 24%.
By comparing the test result at T2 with the affected zone of the meniscus, the Joint Line Tenderness test had a sensitivity of 92% for the MM and a specificity of 81% for the LM. Similarly, the Apley test shows a sensitivity of 87% for MM and a specificity of 90% for LM. The subgroup comparisons of each test for medial or lateral meniscus injury only are shown in the Supplementary tables (Supplementary Tables 4 and 5).
The Thessaly, McMurray, and Apley tests showed the strongest reliability for detecting meniscal lesions, with high R² values (0.432, 0.419, and 0.415, respectively) and significant P values (< 0.001) (Table 3). Regarding the increased number of false positives, McMurray and Hyper-Flexion tests were particularly sensitive to medial chondropathy, while Thessaly, Ege, and Hyper-Flexion tests were most influenced by anterior knee pain (AKP). Moreover, the group of false positives, i.e., not meniscus pathologies diagnosed with arthroscopy, were compared with the pathologies diagnosed during arthroscopy (Supplementary Table 7).
Cartilage lesions of the medial compartment accounted for 60% and 50% of Joint Line Tenderness’s false positives and cartilage lesions of the lateral compartment accounted for 50% of Ege and Thessaly’s false positives. Additionally, a notable percentage ranging from 31% to 50% of false positives was associated with ACL tears and 20% to 45% for anterior knee pain across various tests.
Of 56 patients, 42 patients (75%) had a bucket-handle lesion of the medial meniscus, 7 had a flap lesion of the body of the medial meniscus, 2 had a flap lesion of the posterior horn of the lateral meniscus, and 5 had no lesions. Overall, a sensitivity and specificity of 91% was obtained.
We aimed to evaluate the diagnostic accuracy of 6 clinical tests for meniscal tears, comparing them with MRI and arthroscopy in a cross-sectional study. Several studies in the literature report the accuracy of clinical tests used to diagnose meniscal tears; however, no single test has proven to be entirely reliable [14-16].
Our findings confirmed that none of the 6 tests were sufficiently reliable on their own to prevent unnecessary arthroscopies, especially in the acute phase (T1). The Thessaly, McMurray, and Apley tests resulted in being the most reliable for detecting meniscal lesions, as indicated by high β coefficients and significant P values < 0.001.
The McMurray test had the most balanced performance across both T1 and T2, with improved specificity at T2 (79% vs 55% at T1) and the highest positive predictive value (PPV) of 92%, making it the most reliable for confirming meniscal tears and consistent with the ranges reported in the literature [4,5]. Ege’s test showed promising results in our population, with specificity reaching 83% and sensitivity 66%, confirming data present in the literature (specificity 77–90%, sensitivity 65–67%) [3].
Thessaly, although highly sensitive (98% at T1 and 91% at T2), improved notably in specificity at T2 (66% vs 27% at T1), suggesting its dual role in screening during the acute phase and more precise diagnosis in the subacute phase confirming its high reliability as reported in the literature [4,5].
Interestingly, we found that the accuracy of the JLT test was comparable to other tests, despite its notably low specificity (51%), as previously reported [17]. This finding is critical when interpreting potential false positives for this test.
The pain on Hyper-Flexion test exhibited the least precision, with the lowest accuracy at 70.5% and a high number of false negatives, marking it as less reliable in clinical evaluation, but the authors decided to include it in the statistical evaluation due to the scarcity of published data on sensitivity and specificity for this test in the existing literature.
Regarding the medial and lateral meniscus, Krakowski et al. [18] found that the McMurray test (88%) and Thessaly test (70%) had the highest sensitivity for medial meniscus (MM) and for lateral meniscus (LM) and Ege and Apley showed high accuracy and sensitivity for MM and highest accuracy and specificity for LM. More data should be combined to determine which test has the highest probability of detecting medial or lateral meniscus lesions.
Moreover, even by assuming that if a combination of tests was negative if even 1 test was negative, the authors found that combining multiple tests reduced either the sensitivity or specificity, while some authors have suggested that combining tests increases diagnostic accuracy. According to the literature, meniscal tests are usually executed in combination, but without showing an increase in diagnostic accuracy [5,16].
In this study, McMurray and Apley, when combined, yielded the most accurate results with only 20 false positives out of 67 cases without lesions. Among combinations of 3 tests, McMurray–Apley–Thessaly and Apley–Thessaly–Ege achieved the best outcomes with 26 and 27 false positives, respectively.
Combining tests increases the likelihood of confusion but this does not need to suggest relying on a single test; rather, it underscores the importance of using the most accurate tests while recognizing that each test carries a margin of error.
As regards whether the associated lesion, such as ACL tears, chondropathy, or anterior knee pain, could influence the outcomes of these tests [19,20], the ACL showed weak association across all tests except for JLT, where its results significantly correlated. Medial chondropathy showed a significant correlation with McMurray while both lateral and medial chondropathy showed a high correlation in Forced Flexion. Anterior knee pain showed a significant correlation in all tests besides JLT and McMurray, confirming its big bias influence in acute knee injuries.
These results should be considered by surgeons to better understand which associated lesions could lead to an increasing number of false positives in each performed test. However, due to the low number of cases, we could not draw any definitive conclusions.
Numerous authors reported that mechanical blocked knee seems to be related more to cartilage lesions and that its direct relation with meniscal lesions still needs further analyses [21]. In the mechanically locked knees group, it was observed that there was a high likelihood of finding a torn meniscus, with 47 of 52 cases confirmed as positive. Most of these cases (75%) were due to bucket-handle lesions; no cartilage lesions were recorded.
For both orthopedic surgeons and primary care physicians, MRI has become the most widely used noninvasive imaging modality for detecting meniscal injuries, with diagnostic accuracy rates as high as 98% with higher sensitivity and specificity in medial meniscal lesions compared with lateral ones [22]. This examination could be crucial in an acute setting [23] but due to its high cost it is not always available for every patient. Needle arthroscopy (NA) could gain space as a cost-saving technique, as the cost of MRI still has a high impact on the examinations’ recommendations [24].
These cases usually require prompt surgical intervention and, given that MRI wait time could be an obstacle, bypassing it in selected cases not only expedites treatment but also reduces healthcare costs by avoiding unnecessary imaging and shortening waiting lists for surgery.
It should be noted that patient history alone can suggest meniscal tears in approximately 75% of cases [25], and Abdon et al. [19] found that a combination of patient-reported symptoms can improve the predictive value of identifying meniscal lesions to 70–80%.
We deliberately included only the concordant results between the 2 testers to mitigate variability but acknowledge that this strategy may have led to the exclusion of complex or ambiguous cases, constituting a relative limitation. The inclusion of patients only up to 45 years of age represented an intrinsic limitation as we deliberately wanted to reduce the influence of osteoarthritis and degenerative meniscal diseases.
The main limitation of the study was that no formal statistical comparisons were conducted to determine whether any statistically significant difference between the tests was found. Future studies should include these analyses to better assess whether the observed differences in sensitivity, specificity, and predictive values reflect real diagnostic superiority or are within statistical variability.
Moreover, due to data scarcity, it was not possible to calculate precisely how much the other conditions of the knee (ACL tears, AKP, chondropathy) influence the results of each clinical test.
We showed that the McMurray test had the most balanced performance across the acute and sub-acute phases; the Forced Flexion test should not be given much consideration due to its low accuracy. All the tests showed no significant differences in accuracy, sensitivity, or specificity but the combination of McMurray and Apley showed the best diagnostic outcomes, while triple-test combinations such as McMurray–Apley–Thessaly reduced false positives even further. However, the influence of associated lesions on each test performance should be considered during evaluation.
MRI remains the gold standard for noninvasive diagnosis, yet its limited availability and cost necessitate alternative strategies.
Tables 4–7 are available as Supplementary data on the article homepage, doi: 10.2340/17453674.2025.43906