Comparative Evaluation of Chatbot Responses on Coronary Artery Disease

Pay, Levent; Yumurtaş, Ahmet Çağdaş; Çetin, Tuğba; Çınar, Tufan; Hayıroğlu, Mert İlker

Journal Metrics

Impact Factor (2025): 0.6
Scopus: Q3
CiteScore (2024): 1.1
Source Normalized Impact
per Paper:
0.315
SCImago Journal Rank: 0.250

53/7Current Issue Ahead of Print Archive Most Accessed Articles

Quick Search

In authors and

In titles and abstracts

In keywords

In all

Comparative Evaluation of Chatbot Responses on Coronary Artery Disease [Turk Kardiyol Dern Ars]

Turk Kardiyol Dern Ars. 2025; 53(1): 35-43 | DOI: 10.5543/tkda.2024.78131

Comparative Evaluation of Chatbot Responses on Coronary Artery Disease

Levent Pay¹, Ahmet Çağdaş Yumurtaş², Tuğba Çetin³, Tufan Çınar⁴, Mert İlker Hayıroğlu³
¹Department of Cardiology, Istanbul Haseki Training and Research Hospital, Istanbul, Türkiye
²Department of Cardiology, Kars Harakani State Hospital, Kars, Türkiye
³Department of Cardiology, Dr Siyami Ersek Thoracic and Cardiovascular Surgery Training Hospital, İstanbul, Türkiye
⁴Department of Medicine, University of Maryland Medical Center Midtown Campus, Maryland, USA

OBJECTIVE
Coronary artery disease (CAD) is the leading cause of morbidity and mortality globally. The growing interest in natural language processing chatbots (NLPCs) has driven their inevitable widespread adoption in healthcare. The purpose of this study was to evaluate the accuracy and reproducibility of responses provided by NLPCs, such as ChatGPT, Gemini, and Bing, to frequently asked questions about CAD.

METHODS
Fifty frequently asked questions about CAD were asked twice, with a one-week interval, on ChatGPT, Gemini, and Bing. Two cardiologists independently scored the answers into four categories: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each NLPC’s responses were assessed.

RESULTS
ChatGPT’s responses were scored as 14% incomplete/partially correct and 86% comprehensive/correct. In contrast, Gemini provided 68% comprehensive/correct responses, 30% incomplete/partially correct responses, and 2% a mix of accurate and inaccurate/misleading information. Bing delivered 60% comprehensive/correct responses, 26% incomplete/partially correct responses, and 8% a mix of accurate and inaccurate/misleading information. Reproducibility scores were 88% for ChatGPT, 84% for Gemini, and 70% for Bing.

CONCLUSION
ChatGPT demonstrates significant potential to improve patient education about coronary artery disease by providing more sensitive and accurate answers compared to Bing and Gemini.

Keywords: Artificial intelligence, Bing Chat, ChatGPT, coronary artery disease, digital health, Gemini, natural language processing chatbots

Corresponding Author: Levent Pay
Manuscript Language: English

CITE

Full Text PDF Download citation RIS EndNote BibTex Medlars Procite Reference Manager Similar articles PubMed Google Scholar