BACKGROUND Coronary artery disease (CAD) is the leading cause of morbidity and mortality globally. The growing interest in natural language processing chatbots (NLPC) has ensured their inevitable widespread adoption in the healthcare field. The purpose of this study was to check the accuracy and reproducibility of the answers given by NLPCs such as ChatGPT, Gemini and Bing to frequently asked questions about CAD.
METHODS Fifty frequently asked questions about CAD were asked 2 times, 1 week apart, on ChatGPT, Gemini and Bing. Two cardiologists independently scored the answers into 4 groups: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each NLPCs answers were evaluated.
RESULTS ChatGPT's scoring was 14% incomplete/partially correct and 86% comprehensive/correct. On the other hand, Gemini provided 68% comprehensive/correct answers, 30% incomplete/partly correct responses, and 2% mix of accurate and inaccurate/misleading. Finally, Bing delivered 60% comprehensive/correct responses, 26% incomplete/partially correct responses, and 8% responses that were a mix of accurate and inaccurate/misleading information. Reproducibility values were 88% for ChatGPT, 84% for Gemini, and 70% for Bing.
CONCLUSIONS ChatGPT has significant potential to enhance patient education about coronary artery disease by providing more sensitive and accurate answers compared to Bing and Gemini.
Copyright © 2024 Archives of the Turkish Society of Cardiology