{"id":2059,"date":"2025-12-07T12:56:12","date_gmt":"2025-12-07T17:56:12","guid":{"rendered":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/?p=2059"},"modified":"2025-12-07T12:56:12","modified_gmt":"2025-12-07T17:56:12","slug":"ethical-ramifications-of-ai-powered-medical-diagnoses","status":"publish","type":"post","link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/science\/ethical-ramifications-of-ai-powered-medical-diagnoses\/","title":{"rendered":"Ethical ramifications of AI-powered medical diagnoses"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Incredible advancements in artificial intelligence (AI) have recently paved the way for the use of AI in healthcare settings. Implementation of AI has the potential to address worker shortages in the medical field, lead to discovery of new drugs, or improve diagnoses (Bajwa et al., 2021). A writer for the American Medical Association, Benji Feldheim applauds AI for restoring the \u201chuman side\u201d in medicine. For example, AI scribes in particular ease the documentation burden doctors face\u2014reducing burnout and improving doctors\u2019 interactions with patients as a result (Feldheim, 2025). Another example is the AI model developed by Shmatko et al. (2025), known as Delphi-2M, which is capable of accurately predicting a patient\u2019s next 20 years of disease burden (i.e., what diseases they would contract and when). Evidently, AI is a very promising technology already capable of improving lives, however, there are reasons to be skeptical. While these advances are promising, these uses of AI also raise concerns about fairness and clinical safety. After a brief synopsis of Shmatko et al.\u2019s Delphi-2M, I evaluate the ethical ramifications of AI-powered diagnoses and related clinical tools.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Delphi-2M is an AI model trained on over 400,000 patient histories from a UK database to forecast an individual\u2019s 20-year disease trajectory. Similar to chatbots like ChatGPT, Delphi-2M is a large language model (LLM), a type of AI that can recognize and reproduce patterns from large amounts of data. Similar to how chatbots pick up on what words are likely to appear with other words in order to form sentences, Delphi-2M learns from its vast training set of medical records to predict a patient\u2019s disease trajectory from realworld patterns. As Yonghui Wu puts it in her summary of Shmatko et al.\u2019s work, it\u2019s just how becoming a smoker may be followed by a future diagnosis of lung cancer\u2014these are patterns Delphi-2M recognize. To do this, Delphi-2M is fed \u201ctokens\u201d that link diseases or health factors to specific times in a person\u2019s life, like chickenpox at age 2 or smoking at age 41 (Figure 1). Then, Delphi-2M outputs new tokens that predict what diseases and when they will occur in an individual\u2019s life, like the onset of respiratory disorders at age 71 as a result of smoking. Delphi-2M, after being trained, was tested by predicting the medical histories of 1.9 million patients not included in the original training set. Shmatko et al. demonstrate this AI to have great success in accurately predicting disease trajectory, as it partially predicts patterns in individuals\u2019 diagnoses in 97% of cases.<\/span><\/p>\n<figure id=\"attachment_2060\" aria-describedby=\"caption-attachment-2060\" style=\"width: 548px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2060\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/delphi-figure-161x300.jpeg\" alt=\"\" width=\"548\" height=\"1022\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/delphi-figure-161x300.jpeg 161w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/delphi-figure.jpeg 751w\" sizes=\"auto, (max-width: 548px) 100vw, 548px\" \/><figcaption id=\"caption-attachment-2060\" class=\"wp-caption-text\">Visualization of Delphi-2M input and output (Wu, 2025).<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400\">Nonetheless, we must hold AI used to diagnose patients to a higher level of scrutiny compared to AI used commercially. LLMs are not perfect as they are subject to <\/span><a href=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/csci-tech\/machine-learning-and-algorithmic-bias\/\"><span style=\"font-weight: 400\">algorithmic bias<\/span><\/a><span style=\"font-weight: 400\"> and misuse, beginning before their creation. Shmatko et al. (2025), for example, address some shortcomings of the training data used for Delphi-2M. Notably, they explain the data from a mostly-white, older subset of the UK population isn\u2019t entirely generalizable to very different demographics. Though Shmatko et al. found successes testing the model against a Danish database after training it on UK patients, I\u2019m still concerned how Delphi-2M would perform on non-European and younger demographics, or those underrepresented in training data. Facial recognition is a prime example of where AI underperforms when training datasets lack diverse representation. AI designed to recognize faces historically underperform on individuals with feminine features or darker skin due to unrepresentative training data (Hardesty, 2018). With this in mind, it\u2019s important that training data for diagnostic AI is representative of all demographics prior to widespread implementation.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Furthermore, Cabitza et al. (2017) wrote on some of the unintended consequences of machine learning in healthcare, postulating that widespread implementation of these tools also has the potential to reduce the skill of physicians. Though convenient in the short run, Cabitza et al. raise concerns with overreliance on AI\u2014as studies show physicians aided by AI were less sensitive and accurate in diagnosing patients. Mammogram readers, for instance, were 14% less sensitive in their diagnostics when presented with images marked by computer-aided detection (Povyakalo et al., 2013). Though this study focused on image diagnoses, it\u2019s clear how widespread use of Delphi-2M would lead to the same problems of deskilling in physicians. Delphi-2M is also exclusively a text-based model, which as Cabitza et al. detail, means that these diagnosis algorithms do not incorporate crucial contextual elements that are \u201cpsychological, relational, social, and organizational\u201d in nature. A realworld example that Cabitza et al. described was an instance in which an AI model predicted a lower mortality risk for patients with pneumonia and asthma compared to those with pneumonia and without asthma. Understanding that asthma is not a protective factor for pneumonia patients, the involved researchers found the discrepant AI output was the result of hospital procedures that admitted pneumonia patients with asthma directly to intensive care, giving them better health outcomes. This missing piece of crucial information, which was difficult to represent in these prognostic models, led to an error a physician would not make. Thus, AI is limited in what information it can train on.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Though these new advancements in healthcare AI are promising, they have their limits. Tools like Delphi-2M spot patterns across vast clinical histories that no single clinician could feasibly track, yet the benefits depend on who is represented in the data, how predictions are explained and used, and whether safeguards are in place when they fail. Before AI is implemented in healthcare, we must demand representative training sets, validation across diverse populations, clear disclosures of uncertainty and limitations, and constant human involvement in the process that resists automation bias and deskilling. In short, diagnostic AI should supplmenent\u2014not replace\u2014clinical judgment, and it should be developed with privacy, equity, and patient trust at the forefront. Only then will these systems reliably improve care rather than merely appear to.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>References<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Bajwa, J., Munir, U., Nori, A., &amp; Williams, B. (2021). Artificial intelligence in healthcare: transforming the practice of medicine. <\/span><i><span style=\"font-weight: 400\">Future Healthcare Journal<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">8<\/span><\/i><span style=\"font-weight: 400\">(2), e188\u2013e194. https:\/\/doi.org\/10.7861\/fhj.2021-0095<\/span><\/p>\n<p><span style=\"font-weight: 400\">Cabitza, F., Rasoini, R., &amp; Gensini, G. F. (2017). Unintended consequences of machine learning in medicine. <\/span><i><span style=\"font-weight: 400\">JAMA<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">318<\/span><\/i><span style=\"font-weight: 400\">(6), 517. https:\/\/doi.org\/10.1001\/jama.2017.7797<\/span><\/p>\n<p><span style=\"font-weight: 400\">Feldheim, B. (2025, June 12). AI scribes save 15,000 hours\u2014and restore the human side of medicine. <\/span><i><span style=\"font-weight: 400\">American Medical Association<\/span><\/i><span style=\"font-weight: 400\">. https:\/\/www.ama-assn.org\/practice-management\/digital-health\/ai-scribes-save-15000-hours-and-restore-human-side-medicine<\/span><\/p>\n<p><span style=\"font-weight: 400\">Hardesty, L. (2018, February 11). Study finds gender and skin-type bias in commercial artificial-intelligence systems. MIT News. https:\/\/news.mit.edu\/2018\/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212<\/span><\/p>\n<p><span style=\"font-weight: 400\">Povyakalo, A. A., Alberdi, E., Strigini, L., &amp; Ayton, P. (2013). How to Discriminate between Computer-Aided and Computer-Hindered Decisions. <\/span><i><span style=\"font-weight: 400\">Medical Decision Making<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">33<\/span><\/i><span style=\"font-weight: 400\">(1), 98\u2013107. https:\/\/doi.org\/10.1177\/0272989&#215;12465490<\/span><\/p>\n<p><span style=\"font-weight: 400\">Wu, Y. (2025). AI uses medical records to accurately predict onset of disease 20 years into the future. <\/span><i><span style=\"font-weight: 400\">Nature<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">647<\/span><\/i><span style=\"font-weight: 400\">(8088), 44\u201345. https:\/\/doi.org\/10.1038\/d41586-025-02971-3<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Incredible advancements in artificial intelligence (AI) have recently paved the way for the use of AI in healthcare settings. Implementation of AI has the potential to address worker shortages in the medical field, lead to discovery of new drugs, or improve diagnoses (Bajwa et al., 2021). A writer for the American Medical Association, Benji Feldheim [&hellip;]<\/p>\n","protected":false},"author":716,"featured_media":2064,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[63,65,68,1],"tags":[],"class_list":["post-2059","post","type-post","status-publish","format-standard","has-post-thumbnail","category-biology","category-csci-tech","category-psych-neuro","category-science","entry"],"featured_image_src":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/delphi-logo-white-bg-565x400.png","featured_image_src_square":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/delphi-logo-white-bg.png","author_info":{"display_name":"Mauricio Cuba Almeida","author_link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/author\/mcubaalmeida\/"},"_links":{"self":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/2059","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/users\/716"}],"replies":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/comments?post=2059"}],"version-history":[{"count":0,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/2059\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media\/2064"}],"wp:attachment":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media?parent=2059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/categories?post=2059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/tags?post=2059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}