Acta Chir Orthop Traumatol Cech. 2026; 93(3):133-139 | DOI: 10.55095/achot2026/027

Diagnostic Performance of Artificial Intelligence (AI) Chatbot Compared to Orthopedic Trauma Surgeons in Evaluating Indication for Surgery for Isolated Lateral Malleolar Fractures: a Retrospective Study

Maria Oulianski, Rami Mosheiff, Dana Avraham, Omer Ben Yehuda, Yoram Weil, Mahmoud Jammal
Hadassah Medical Center of the Hebrew University, Jerusalem, Israel

The indication for surgery for isolated lateral malleolar fracture (AO/OTA 44B1) is debatable and in many cases, relies upon radiographic assessment of fracture stability. Artificial intelligence chatbots with visual analysis capabilities offer a potential attribute for radiographic assessment. This study compared the diagnostic performance of a commercially available AI chatbot with that of three fellowship-trained orthopedic trauma surgeons in evaluating equivocal isolated lateral malleolar fractures.



A retrospective study. 50 patients with isolated lateral malleolar injury at the level of the syndesmosis (AO/OTA 44B1) were evaluated by three blinded fellowship-trained orthopedic trauma surgeons. Each rater measured standardized radiographic ankle parameters, medial clear space, tibiofibular clear space, and tibiofibular overlap, on anteroposterior and mortise views and determined a surgical versus nonoperative treatment recommendation. Subsequently, the same sets of radiographs were independently evaluated by an AI chatbot (Claude, Anthropic). The observers and AI decisions were compared to the actual outcome of the patients (operative vs. nonoperatives).



All raters recommended surgery at lower rates (34.0-46.0%) than the actual operative rate (56.0%). The difference in outcomes between the actual treatment and the observers varied and ranged between 67.3-86% with the AI within the same ranges. The AI's radiographic measurements differed systematically from all surgeons across five of six parameters. Inter-rater agreement between the AI and surgeons was slight, while inter-surgeon agreement was moderate (κ = 0.457-0.589). ROC analysis showed comparable AUC values (0.63-0.67) for all raters.



The AI chatbot demonstrated diagnostic accuracy comparable to orthopedic trauma surgeons in directing treatment for isolated lateral malleolar fractures, despite using a systematically different measurement strategy. All raters exhibited conservative bias as comparted with the actual outcome with modest discriminatory ability, reflecting the inherent difficulty of this clinical issue. These findings support a potential complementary role for AI in ankle fracture triage, while final clinical management decisions should remain in the hands of the orthopedic surgeon.

Keywords: lateral malleolar ankle fracture, artificial intelligence, surgicaldecision-making, radiographic assessment orthopedic surgery, diagnostic accuracy.

Received: April 28, 2026; Revised: April 28, 2026; Accepted: May 4, 2026; Published: July 1, 2026  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Oulianski M, Mosheiff R, Avraham D, Yehuda OB, Weil Y, Jammal M. Diagnostic Performance of Artificial Intelligence (AI) Chatbot Compared to Orthopedic Trauma Surgeons in Evaluating Indication for Surgery for Isolated Lateral Malleolar Fractures: a Retrospective Study. Acta Chir Orthop Traumatol Cech. 2026;93(3):133-139. doi: 10.55095/achot2026/027.
Download citation

References

  1. Croft S, Furey A, Stone C, Moores C, Wilson R. Radiographic evaluation of the ankle syndesmosis. Can J Surg. 2015;58:58. doi:10.1503/cjs.004214. Go to original source...
  2. Elbahi MK, Muhammed A, Mohamednour MFA, Mukhtar FS. Artificial intelligence in fracture diagnosis on radiographs: evidence, pitfalls, and pathways for clinical integration (2020-2025). Cureus. 2025;17:e93124. doi:10.7759/cureus.93124. Go to original source... Go to PubMed...
  3. Erginoğlu SE, Ülgen NK, Yiğit N, Nazligül AS, Akkurt MO. Multimodal large language model for fracture detection in emergency orthopedic trauma: a diagnostic accuracy study. Diagnostics. 2026;16:476. doi:10.3390/diagnostics16030476. Go to original source...
  4. Gibson PD, Ippolito JA, Hwang JS, Didesch J, Koury KL, Reilly MC, Adams M, Sirkin M. Physiologic widening of the medial clear space: what's normal? J Clin Orthop Trauma. 2019;10(Suppl 1):S62. doi:10.1016/j.jcot.2019.04.016. Go to original source... Go to PubMed...
  5. Giorgino R, Alessandri-Bonetti M, Luca A, Migliorini F, Rossi N, Peretti GM, Mangiavini L. ChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. 2023;10:1284015. doi:10.3389/fsurg.2023.1284015. Go to original source... Go to PubMed...
  6. Gomes YE, Chau M, Banwell HA, Causby RS. Diagnostic accuracy of the Ottawa ankle rule to exclude fractures in acute ankle injuries in adults: a systematic review and meta-analysis. BMC Musculoskelet Disord. 2022;23:885. doi:10.1186/s12891-022-05831-7. Go to original source...
  7. Goodman AD, Blood TD, Benavent KA, Earp BE, Akelman E, Blazar PE. Implicit and explicit factors that influence surgeons' decision-making for distal radius fractures in older patients. J Hand Surg Am. 2022;47:719-726. doi: 10.1016/j.jhsa.2022.03.013. Go to original source...
  8. Gunaratnam C, Bernstein M. Factors affecting surgical decision-making: a qualitative study. Rambam Maimonides Med J. 2018;9:e0003. doi:10.5041/rmmj.10324. Go to original source... Go to PubMed...
  9. Hecht V, Mosimann ES, Krause F, Kurze C, Lustenberger T, Anwander H. The medial clearspace is a risk factor for secondary dislocation following cast immobilization after closed reduction in closed ankle fracture dislocations. Eur J Trauma Emerg Surg. 2025;51:161. doi:10.1007/s00068-025-02803-z. Go to original source... Go to PubMed...
  10. Herrera-Pérez M, Valderrabano V, Godoy-Santos AL, de César Netto C, González-Martín D, Tejero S. Ankle osteoarthritis: comprehensive review and treatment algorithm proposal. EFORT Open Rev. 2022;7:448-459. doi: 10.1530/EOR-21-0117. Go to original source... Go to PubMed...
  11. Julian TH, Broadbent RH, Ward AE. Surgical vs non-surgical management of Weber B fractures: a systematic review. Foot Ankle Surg. 2020;26:494-502. doi: 10.1016/j.fas.2019.06.006. Go to original source... Go to PubMed...
  12. Juto H, Nilsson H, Morberg P. Epidemiology of adult ankle fractures: 1756 cases identified in Norrbotten County during 2009-2013 and classified according to AO/OTA. BMC Musculoskelet Disord. 2018;19:441. doi:10.1186/s12891-018-2326-x. Go to original source... Go to PubMed...
  13. Kuo RYL, Harrison C, Curran TA, Jones B, Freethy A, Cussons D, Stewart M, Collins GS, Furniss D. Artificial intelligence in fracture detection: a systematic review and meta-analysis. Radiology. 2022;304:50-62. doi: 10.1148/radiol.211785. Go to original source... Go to PubMed...
  14. Lakomkin N, Fabricant PD, Cruz AI, Brusalis CM, Chauvin NA, Todd J. Interrater reliability and age-based normative values for radiographic indices of the ankle syndesmosis in children. JBJS Open Access. 2016;2:e0004. doi:10.2106/JBJS.OA.16.00004. Go to original source...
  15. Langerhuizen DWG, Janssen SJ, Mallee WH, van den Bekerom MPJ, Ring D, Kerkhoffs GMMJ, Jaarsma RL, Doornberg JN. What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? A systematic review. Clin Orthop Relat Res. 2019;477:2482-2491. doi: 10.1097/CORR.0000000000000848. Go to original source... Go to PubMed...
  16. Mergen M, Spitzl D, Ketzer C, Strenzke M, Marka AW, Makowski MR, Bressem KK, Adams LC, Gassert FT. Leveraging large language models for accurate AO fracture classification from CT. text reports. J Imaging Informatics Med. 2026;39:1861-1867. doi:10.1007/s10278-025-01603-6. Go to original source...
  17. Nowroozi A, Salehi MA, Shobeiri P, Agahi S, Momtazmanesh S, Kaviani P, Kalra MK. Artificial intelligence diagnostic accuracy in fracture detection from plain radiographs and comparing it with clinicians: a systematic review and meta-analysis. Clin Radiol. 2024;79:579-588. doi:10.1016/j.crad.2024.04.009. Go to original source... Go to PubMed...
  18. Oulianski M, Avraham D, Lubovsky O. Radiographic evaluation of distal radius fracture healing by time: orthopedist versus qualitative assessment of image processing. Trauma Care. 2022;2:481-486. doi:10.3390/traumacare2030040. Go to original source...
  19. Pires R, Pereira A, Abreu-e-Silva G, Labronici P, Figueiredo L, Godoy-Santos A, Kfuri M. Ottawa ankle rules and subjective surgeon perception to evaluate radiograph necessity following foot and ankle sprain. Ann Med Health Sci Res. 2014;4:432. doi:10.4103/2141-9248.133473. Go to original source... Go to PubMed...
  20. Pogliacomi F, De Filippo M, Casalini D, Longhi A, Tacci F, Perotta R, Pagnini F, Tocco S, Ceccarelli F. Acute syndesmotic injuries in ankle fractures: from diagnosis to treatment and current concepts. World J Orthop. 2021;12:270. doi:10.5312/wjo.v12.i5.270. Go to original source... Go to PubMed...
  21. Reyes-Valdés A, Martínez-Ledezma M, Fernández-Quezada D, Guzmán-Esquivel J, Cárdenas-Rojas MI. Prevalence and characteristics of patients requiring surgical reinterventions for ankle fractures. J Clin Med. 2023;12:5843. doi:10.3390/jcm12185843. Go to original source... Go to PubMed...
  22. Rooney EM, Finney FT, Talusan P, Holmes JR, Walton D. Mid term 5-year follow up of a novel algorithm for non-operative Weber B ankle fractures. Foot Ankle Orthop. 2019;4:2473011419S00366. doi:10.1177/2473011419s00366. Go to original source...
  23. Smith AM, Jacquez EA, Argintar EH. Assessing the efficacy of an AI-powered chatbot (ChatGPT) in providing information on orthopedic surgeries: a comparative study with expert opinion. Cureus. 2024;16:e63287. doi:10.7759/cureus.63287. Go to original source...
  24. Sorin V, Klang E. Large language models and the emergence phenomena. Eur J Radiol Open. 2023;10:100494. doi:10.1016/j.ejro.2023.100494. Go to original source...
  25. Strash WW, Berardo P. Radiographic assessment of the hindfoot and ankle. Clin Podiatr Med Surg. 2004;21:295-304. doi:10.1016/j.cpm.2004.03.004. Go to original source... Go to PubMed...
  26. Surmanowicz P, Hamilton AM, Mondal P, Kulyk P, Sahota N, Obaid H. Correlation between Weber classification of ankle fractures and medial clear space widening on radiography. Diagnostics (Basel). 2025;15:15162085. doi:10.3390/diagnostics15162085. Go to original source...
  27. Tansey PJ, Chen J, Panchbhavi VK. Current concepts in ankle fractures. J Clin Orthop Trauma. 2023;45:102260. doi:10.1016/j.jcot.2023.102260. Go to original source... Go to PubMed...
  28. Toru HK, Khan AA, Ali N. Operative vs. nonoperative management of isolated Weber B ankle fractures. Cureus. 2025;17:e78028. doi:10.7759/cureus.78028. Go to original source... Go to PubMed...
  29. von der Stück MS, Vuskov R, Westfechtel S, Siepmann R, Kuhl C, Truhn D, Nebelung S. Visual large language models in radiology: a systematic multimodel evaluation of diagnostic accuracy and hallucinations. Life. 2026;16(1):66. doi:10.3390/life16010066. Go to original source... Go to PubMed...