The application of AI is medicine is a rapidly growing field. One area of fervent research is in the utilization of AI to improve breast cancer (BC) management. A recent systematic study on the effects of AI on the personalization of BC care found that it is effective in uncovering hidden patterns in complex omics and genetic data.

One of the most popular AI tools—ChatGPT by OpenAI—is a large language model that is capable of processing data, generating language, and reproducing knowledge using natural language. A recent paper provides an overview of ChatGPT’s applications in medicine, its advantages and limitations, future prospects, and ethical considerations. However, the use of ChatGPT was not among the AI tools mentioned in the recent systematic analysis. When AI such as ChatGPT is applied to the field of medicine, practice guidelines are utilized as training data to help generate clinical recommendations.

Researchers in Germany explored the quality of ChatGPT-generated therapy recommendations for BC treatment. They assessed clinical recommendations made for 30 BC cases identified by the medical board using ChatGPT-3.5 model. ChatGPT 3.5 was supplied with training data up to September 2021, and all prompting was performed on July 19, 2023. (The latest version of ChatGPT, GPT4o, was released on May 13, 2024.)

Among the patient characteristics identified for the 30 cases were that 30% were initial diagnoses with no prior treatment; 50% had already undergo either surgery or chemotherapy for BC; 13.3% were experiencing a first BC relapse; and 6.7% were having a second BC relapse. Almost all cases (93.3%) involved invasive BC, and only 3.3% were ductal carcinoma in situ. Among the invasive BC cases, the majority were hormone receptor–positive (80% estrogen receptor–positive and 63% progesterone receptor positive); 10.7% were human epidermal growth factor receptor 2 (HER2)–positive.

The latest German BC guidelines were used to train ChatGPT. Patient-specific information provided to ChatGPT included age, histology, tumor stage molecular markers, relevant medical history, and tumor-specific treatments that had already been performed. It was requested that the output be generated in the form of a table identifying which of the following treatment options were the most appropriate: surgery, chemotherapy, radiotherapy, HER2 therapy, and/or endocrine therapy. ChatGPT was to indicate a yes/no response for each treatment option as to whether it should be included in the case’s treatment plan. Additional information requested of ChatGPT was the use of medications and dosages, if applicable; alternative options; and therapy-related risks. ChatGPT recommendations were to be summarized in one short sentence similar to that generated by a tumor board. Four gynecological oncologists reviewed ChatGPT’s recommendations using a structured questionnaire rating each treatment option individually in addition to its overall recommendations.

The investigators found that ChatGPT recommended surgery, chemotherapy, radiotherapy, HER2 therapy, and endocrine therapy in 90%, 70%, 90%, 10%, and 66.7% of cases, respectively. ChatGPT recommendations were rated as correct without limitations (1), correct with minor limitations (2), better alternative available (3), partially wrong (4), and completely wrong (5) based on a predefined questionnaire.

The authors’ overall assessments were that treatment options for surgery, chemotherapy, endocrine therapy, and radiation were rated as correct with minor limitations and that treatment recommendations for HER2 antibody therapy were rated as correct without limitations. However, the boxplot of expert ratings of therapy recommendations shows the wide variation in recommendation ratings for some treatment options such as chemotherapy, which had recommendations spanning a little over 1 to over 3 (i.e., better alternatives available). There were differences in rating for the therapy alternatives class with recommendations ranging between 1 to 3 based on supplemental data provided (the boxplot shows a rating of 1-2). ChatGPT fared significantly better when making treatment recommendations for primary cases as opposed to relapse cases.

The investigators also found that recommendations varied in relation to the prompt used. Accuracy was improved when patient data were provided in precise, short sentences. For diagnosis-related details such as TNM (tumor, nodes, metastasis) classification or HER2-receptor status, correct recommendations were dependent on the wording provided and needed to be equivalent to the cancer guideline in use.

Further, even with identical input, output varied, and errors found included recommending surgical treatment for postoperative cases, a lack of chronological sequence of therapy options for incorporation into a care plan, missing medication details, and incorrect suggestions.

This paper provides pharmacists with information on the opportunities and limitations of the use of ChatGPT in making treatment recommendations for BC patients. While this paper employed an older version of ChatGPT than is currently in use, the management of BC is both an art and science and requires the skill and expertise of pharmacists trained in the field.

The content contained in this article is for informational purposes only. The content is not intended to be a substitute for professional advice. Reliance on any information provided in this article is solely at your own risk.