Machine learning and logistic regression in estimating survival in patients with high-malignant deep-seated soft tissue sarcomas: development and analysis based on a population-based retrospective cohort

Authors

  • Andrea Thorn Department of Orthopaedic Surgery, Rigshospitalet – University of Copenhagen, Copenhagen, Denmark https://orcid.org/0000-0003-3761-2791
  • Jessica A Lavery Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
  • Thomas Baad-Hansen Department of Orthopaedic Surgery, Aarhus University Hospital, Aarhus, Denmark https://orcid.org/0000-0003-4826-8412
  • Jonathan A Forsberg Department of Orthopedic Surgery, Orthopaedic Service, Oncology, Memorial Sloan Kettering Cancer Center, New York, USA
  • Michael Mørk Petersen Department of Orthopaedic Surgery, Rigshospitalet – University of Copenhagen, Copenhagen, Denmark https://orcid.org/0000-0002-2324-6420
  • Christina Enciso Holm Department of Orthopaedic Surgery, Rigshospitalet – University of Copenhagen, Copenhagen, Denmark https://orcid.org/0000-0002-5868-9125

DOI:

https://doi.org/10.2340/17453674.2026.45509

Keywords:

High malignt, Machine learning, Oncology, Sarcoma, Soft tissue tumours

Abstract

Background and purpose: Soft tissue sarcomas are a heterogeneous group of malignant tumors with a high risk of metastasis, primarily to the lungs, making accurate survival prediction an essential part of long-term planning. No machine learning (ML) survival prediction models have been developed using a modern, population-based dataset from Scandinavia. We aimed to develop and compare ML models with logistic regression in predicting 5-year survival in soft tissue sarcoma patients and identify key predictive variables.
Methods: This retrospective cohort study included patients diagnosed with deep-seated, high-grade soft tissue sarcomas of the extremities and trunk wall in Denmark from 2000 to 2016. Logistic regression was compared with 4 developed ML models, including random forest. Performance was assessed using the area under the curve (AUC), sensitivity, specificity, and calibration metrics, with a 70:30 training–test split and 5-fold cross-validation to evaluate the models.
Results: 516 patients were included, of whom 226 (44%) died within 5 years following surgery. Random forest demonstrated the best ML performance on the training set and was compared with logistic regression on the test set. Logistic regression achieved an AUC of 0.74 (95% confidence interval [CI] 0.66–0.82), outperforming random forest‘s AUC of 0.65 (CI 0.56–0.74). Logistic regression also had higher sensitivity (0.65 vs 0.59) and specificity (0.72 vs 0.69), while random forest had a lower Brier score (0.38 vs 0.41).
Conclusion: Although the developed random forest ML model performed well during training, logistic regression outperformed it after internal validation. Soft tissue sarcomas located in the trunk, grade 3 tumors, and chemotherapy within 3 months of surgery demonstrated the highest negative effect on survival, consistent with current treatment protocols in which patients with high-risk disease are managed with more aggressive multimodal therapy. Further external validation and assessment of clinical utility are required before potential clinical implementation.

Downloads

Download data is not yet available.

References

Jørgensen P H, Lausten G S, Pedersen A B. The Danish Sarcoma Database. Clin Epidemiol 2016; 8: 685-90. doi: 10.2147/clep.S99495. DOI: https://doi.org/10.2147/CLEP.S99495

Stiller C A, Trama A, Serraino D, Rossi S, Navarro C, Chirlaque M D, et al. Descriptive epidemiology of sarcomas in Europe: report from the RARECARE project. Eur J Cancer 2013; 49: 684-95. doi: 10.1016/j.ejca.2012.09.011. DOI: https://doi.org/10.1016/j.ejca.2012.09.011

Jeys L, Morris G, Evans S, Stevenson J, Parry M, Gregory J. Surgical innovation in sarcoma surgery. Clin Oncol (R Coll Radiol) 2017; 29: 489-99. doi: 10.1016/j.clon.2017.04.003. DOI: https://doi.org/10.1016/j.clon.2017.04.003

Callegaro D, Miceli R, Mariani L, Raut C P, Gronchi A. Soft tissue sarcoma nomograms and their incorporation into practice. Cancer 2017; 123: 2802-20. doi: 10.1002/cncr.30721. DOI: https://doi.org/10.1002/cncr.30721

Eilber F C, Brennan M F, Eilber F R, Dry S M, Singer S, Kattan M W. Validation of the postoperative nomogram for 12-year sarcoma-specific mortality. Cancer 2004; 101: 2270-5. doi: 10.1002/cncr.20570. DOI: https://doi.org/10.1002/cncr.20570

Thio Q C B S, Karhade A V, Ogink P T, Raskin K A, De Amorim Bernstein K, Lozano Calderon S A, et al. Can machine-learning techniques be used for 5-year survival prediction of patients with chondrosarcoma? Clin Orthop Relat Res 2018; 476: 2040-8. doi: 10.1097/CORR.0000000000000433. DOI: https://doi.org/10.1097/CORR.0000000000000433

Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat Methods 2018; 15: 233-234. doi: 10.1038/nmeth.4642. DOI: https://doi.org/10.1038/nmeth.4642

Kamalapathy P N, Ramkumar D B, Karhade A V, Kelly S, Raskin K, Schwab J, et al. Development of machine learning model algorithm for prediction of 5-year soft tissue myxoid liposarcoma survival. J Surg Oncol 2021; 123: 1610-17. doi: 10.1002/jso.26398. DOI: https://doi.org/10.1002/jso.26398

Yeramosu T, Ahmad W, Bashir A, Wait J, Bassett J, Domson G. Predicting five-year mortality in soft-tissue sarcoma patients. Bone Joint J 2023; 105-b: 702-10. doi: 10.1302/0301-620x.105b6.Bjj-2022-0998.R1. DOI: https://doi.org/10.1302/0301-620X.105B6.BJJ-2022-0998.R1

Kamalapathy P N, Gonzalez M R, de Groot T M, Ramkumar D, Raskin K A, Ashkani-Esfahani S, et al. Prediction of 5-year survival in soft tissue leiomyosarcoma using a machine learning model algorithm. J Surg Oncol 2024; 129: 531-6. doi: 10.1002/jso.27514. DOI: https://doi.org/10.1002/jso.27514

Christodoulou E, Ma J, Collins G S, Steyerberg E W, Verbakel J Y, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110: 12-22. doi: 10.1016/j.jclinepi.2019.02.004. DOI: https://doi.org/10.1016/j.jclinepi.2019.02.004

Yu A, Lee L, Yi T, Fice M, Achar R K, Tepper S, et al. Development and external validation of a machine learning model for prediction of survival in extremity leiomyosarcoma. Surg Oncol 2024; 10.1016/j.suronc.2024.102057: 102057. doi: 10.1016/j.suronc.2024.102057. DOI: https://doi.org/10.1016/j.suronc.2024.102057

Collins G S, Moons K G M, Dhiman P, Riley R D, Beam A L, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ (Online) 2024; 385: e078378. doi: 10.1136/bmj-2023-078378. DOI: https://doi.org/10.1136/bmj-2023-078378

Schmidt M, Pedersen L, Sørensen H T. The Danish Civil Registration System as a tool in epidemiology. Eur J Epidemiol 2014; 29: 541-9. doi: 10.1007/s10654-014-9930-3. DOI: https://doi.org/10.1007/s10654-014-9930-3

Maretty-Nielsen K, Aggerholm-Pedersen N, Keller J, Safwat A, Baerentzen S, Pedersen A B. Population-based Aarhus Sarcoma Registry: validity, completeness of registration, and incidence of bone and soft tissue sarcomas in western Denmark. Clin Epidemiol 2013; 5: 45-56. doi: 10.2147/CLEP.S41835. DOI: https://doi.org/10.2147/CLEP.S41835

Trojani M, Contesso G, Coindre J M, Rouesse J, Bui N B, De Mascarel A, et al. Soft-tissue sarcomas of adults; study of pathological prognostic variables and definition of a histopathological grading system. Int J Cancer 1984; 33: 37-42. doi: 10.1002/ijc.2910330108. DOI: https://doi.org/10.1002/ijc.2910330108

Enneking W F, Spanier S S, Goodman M A. A system for the surgical staging of musculoskeletal sarcoma. Clin Orthop Relat Res 1980; 153: 106-20. doi: 10.1097/00003086-198011000-00013. DOI: https://doi.org/10.1097/00003086-198011000-00013

Breiman L. Random forests. Machine Learning 2001; 45: 5-32. doi: 10.1023/A:1010933404324. DOI: https://doi.org/10.1023/A:1010933404324

Lee L, Yi T, Fice M, Achar R K, Jones C, Klein E, et al. Development and external validation of a machine learning model for prediction of survival in undifferentiated pleomorphic sarcoma. Musculoskelet Surg 2024; 108: 77-86. doi: 10.1007/s12306-023-00795-w. DOI: https://doi.org/10.1007/s12306-023-00795-w

Bilgeri A, Klein A, Lindner L H, Nachbichler S, Knoesel T, Birkenmaier C, et al. The effect of resection margin on local recurrence and survival in high grade soft tissue sarcoma of the extremities: how far is far enough? Cancers 2020; 12: 2560. doi: 10.3390/cancers12092560. DOI: https://doi.org/10.3390/cancers12092560

Lebas A, Le Fevre C, Waissi W, Chambrelant I, Brinkert D, Noel G. Factors influencing long-term local recurrence, distant metastasis, and survival in patients with soft tissue sarcoma of the extremities treated with radiotherapy. Cancers 2024; 16: 1789. doi: 10.3390/cancers16101789. DOI: https://doi.org/10.3390/cancers16101789

Gronchi A, Miah A B, Dei Tos A P, Abecassis N, Bajpai J, Bauer S, et al. Soft tissue and visceral sarcomas: ESMO-EURACAN-GENTURIS Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 2021; 32: 1348-65. doi: 10.1016/j.annonc.2021.07.006. DOI: https://doi.org/10.1016/j.annonc.2021.07.006

Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med 2019; 380: 1347-58. doi: 10.1056/NEJMra1814259. DOI: https://doi.org/10.1056/NEJMra1814259

Holm C E, Grazal C F, Raedkjaer M, Baad-Hansen T, Nandra R, Grimer R, et al. Development and comparison of 1-year survival models in patients with primary bone sarcomas: external validation of a Bayesian belief network model and creation and external validation of a new gradient boosting machine model. SAGE Open Med 2022; 10: 20503121221076387. doi: 10.1177/20503121221076387. DOI: https://doi.org/10.1177/20503121221076387

Anderson A B, Grazal C F, Balazs G C, Potter B K, Dickens J F, Forsberg J A. Can predictive modeling tools identify patients at high risk of prolonged opioid use after ACL reconstruction? Clin Orthop Relat Res 2020; 478: 00-1618. doi: 10.1097/CORR.0000000000001251. DOI: https://doi.org/10.1097/CORR.0000000000001251

Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013; 4: 627-35. PMID: 24009950.

Alba A C, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux P J, et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. JAMA 2017; 318: 1377-84. doi: 10.1001/jama.2017.12126. DOI: https://doi.org/10.1001/jama.2017.12126

Steyerberg E W, Vickers A J, Cook N R, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010; 21: 128-38. doi: 10.1097/EDE.0b013e3181c30fb2. DOI: https://doi.org/10.1097/EDE.0b013e3181c30fb2

Kuhn M, Wickham H. Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles. Available from: https://www.tmwr.org/, 2020.

Potkrajcic V, Kolbenschlag J, Sachsenmaier S, Daigeler A, Ladurner R, Golf A, et al. Postoperative complications and oncologic outcomes after multimodal therapy of localized high risk soft tissue sarcoma. Radiat Oncol 2022; 17: 210. doi: 10.1186/s13014-022-02166-4. DOI: https://doi.org/10.1186/s13014-022-02166-4

Riley R D, Snell K I, Ensor J, Burke D L, Harrell F E Jr, Moons K G, et al. Minimum sample size for developing a multivariable prediction model: PART II – binary and time-to-event outcomes. Stat Med 2019; 38: 1276-96. doi: 10.1002/sim.7992. DOI: https://doi.org/10.1002/sim.7992

Balki I, Amirabadi A, Levman J, Martel A L, Emersic Z, Meden B, et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 2019; 70: 344-53. doi: 10.1016/j.carj.2019.06.002. DOI: https://doi.org/10.1016/j.carj.2019.06.002

Kantidakis G, Putter H, Litière S, Fiocco M. Statistical models versus machine learning for competing risks: development and validation of prognostic models. BMC Med Res Methodol 2023; 23: 51. doi: 10.1186/s12874-023-01866-z. DOI: https://doi.org/10.1186/s12874-023-01866-z

Nijman S W J, Leeuwenberg A M, Beekers I, Verkouter I, Jacobs J J L, Bots M L, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review. J Clin Epidemiol 2022; 142: 218-29. doi: 10.1016/j.jclinepi.2021.11.023. DOI: https://doi.org/10.1016/j.jclinepi.2021.11.023

Couronne R, Probst P, Boulesteix A-L. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics 2018; 19: 270. doi: 10.1186/s12859-018-2264-5. DOI: https://doi.org/10.1186/s12859-018-2264-5

Peeken J C, Goldberg T, Knie C, Komboz B, Bernhofer M, Pasa F, et al. Treatment-related features improve machine learning prediction of prognosis in soft tissue sarcoma patients. Strahlenther Onkol 2018; 194: 824-34. doi: 10.1007/s00066-018-1294-2. DOI: https://doi.org/10.1007/s00066-018-1294-2

Rajput D, Wang W-J, Chen C-C. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023; 24: 48. doi: 10.1186/s12859-023-05156-9. DOI: https://doi.org/10.1186/s12859-023-05156-9

Churpek M M, Yuen T C, Winslow C, Meltzer D O, Kattan M W, Edelson D P. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 2016; 44: 368-74. doi: 10.1097/CCM.0000000000001571. DOI: https://doi.org/10.1097/CCM.0000000000001571

Sbaraglia M, Bellan E, Dei Tos A P. The 2020 WHO Classification of Soft Tissue Tumours: news and perspectives. Pathologica 2021; 113: 70-84. doi: 10.32074/1591-951x-213. DOI: https://doi.org/10.32074/1591-951X-213

Li W, Dong Y, Liu W, Tang Z, Sun C, Lowe S, et al. A deep belief network-based clinical decision system for patients with osteosarcoma. Front Immunol 2022; 13: 1003347. doi: 10.3389/fimmu.2022.1003347. DOI: https://doi.org/10.3389/fimmu.2022.1003347

Published

2026-03-10

How to Cite

Thorn, A., Lavery, J. A., Baad-Hansen, T., Forsberg, J. A., Petersen, M. M., & Holm, C. E. (2026). Machine learning and logistic regression in estimating survival in patients with high-malignant deep-seated soft tissue sarcomas: development and analysis based on a population-based retrospective cohort. Acta Orthopaedica, 97, 185–193. https://doi.org/10.2340/17453674.2026.45509

Issue

Section

Publications

Categories

PlumX (by Elsevier) is an altmetrics platform that tracks and visualizes the online attention, usage, captures, citations, and social media engagement.