When AI Alerts Override Clinical Judgment, Who's Liable?

March 25, 2026

·

6 min

Algorithmic Medicine at the Bedside: Promise, Peril, and the Persistence of Clinical Judgment

Introduction

The integration of artificial intelligence into clinical care has moved from theoretical possibility to operational reality across U.S. health systems with remarkable speed. Predictive models, ambient documentation tools, wearable biosensors, and agentic AI systems are now embedded in workflows that directly affect patient management. Yet a growing body of evidence—and a chorus of frontline voices—suggests that the deployment of these technologies has frequently outpaced their validation, with consequences that disproportionately fall on nursing staff and, ultimately, on patients. The central clinical and ethical question this moment demands is not whether AI belongs in health care, but under what conditions, with what oversight, and with whose meaningful consent it should operate.

The Bedside Reality: Alert Fatigue and Clinical Override

The tensions inherent in algorithmic medicine are perhaps most starkly illustrated in the case of Adam Hart, a nurse at St. Rose Dominican Hospital in Henderson, Nevada, with 14 years of experience. When an AI-generated sepsis flag prompted a protocol-driven order to begin intravenous fluids in an elderly patient with a dialysis catheter and compromised renal function, Hart refused. A physician who intervened ordered dopamine instead—averting what Hart believed would have been a life-threatening complication. No party acted in bad faith. The algorithm, the protocol, and the institutional hierarchy simply converged in a direction that clinical reasoning at the bedside directly contradicted.

This scenario is not anomalous. Nurses interviewed across the country describe a consistent pattern: AI-generated alerts trigger institutional protocols, and deviation from those protocols—even when grounded in sound clinical judgment—is experienced as defiance rather than expertise. The structural implication is serious. Frontline clinicians are being placed in the position of either complying with algorithmic directives that may not account for the full clinical picture or absorbing the professional and moral risk of override.

Algorithmic Performance: What the Data Actually Show

The sepsis-prediction algorithm developed by Epic has become a widely cited cautionary example. Adopted broadly across U.S. hospital systems, the tool was later evaluated and found to be substantially less accurate than initially marketed. Epic has since released a second version and maintains that clinical studies demonstrate outcome improvements. Nevertheless, the experience revealed a concerning pattern: an imperfect product can transition rapidly from pilot to policy before its limitations are well characterized in real-world settings.

A similar arc played out at UC Davis Health, where the BioButton—a hexagonal silicone wearable sensor that continuously monitored vital signs including heart rate, temperature, and respiratory rate—was piloted in an oncology bone marrow transplant unit beginning in 2023. The device was introduced with substantial institutional enthusiasm, described as "transformational." In practice, nurses found the alerts difficult to interpret and frequently not actionable. As one registered nurse, Melissa Beebe, with 17 years at the institution, described the experience: the notifications "flagged changes in vital signs without specifics." She noted that the device "was overdoing it but not really giving great information." After approximately one year, UC Davis Health discontinued the pilot, citing that nurses were detecting critical deterioration faster than the device. Elven Mitchell, an ICU nurse of 13 years at Kaiser Permanente Hospital in Modesto, California, estimates that approximately half of alerts generated by a centralized monitoring team are false positives—yet hospital policy requires evaluation of each one, drawing nurses away from high-acuity patients.

Ambient documentation tools—widely adopted over the past two years to record clinician-patient interactions and auto-generate clinical notes—have similarly underdelivered on efficiency promises. Studies have reported time savings ranging from negligible to approximately 22 minutes per day. As Nigam Shah, professor of medicine at Stanford University and chief data scientist for Stanford Health Care, noted: "Everybody rushed in saying these things are magical; they're gonna save us hours. Those savings did not materialize."

Equity and Algorithmic Bias: A Population-Scale Concern

The stakes of inadequate validation extend beyond individual patient encounters. Ziad Obermeyer, Blue Cross of California Distinguished Associate Professor of Health Policy and Management at UC Berkeley's School of Public Health, has documented that some clinical algorithms used in patient care are racially biased. These tools, he notes, "are being used to screen about 100 million to 150 million people every year for these kinds of decisions." The scale makes the absence of a standardized pre-deployment validation framework—analogous to the drug approval process—a significant patient safety gap. Unlike pharmaceuticals, AI tools in health care have no single regulatory gatekeeper; the burden of validation falls largely on individual institutions, many of which are ill-equipped to conduct rigorous independent assessments.

Structural Deficits: Deployment Without Co-Design

A recurring theme across interviews with nurses and health system leaders is the consistent failure to engage frontline clinical staff in the design and evaluation of AI tools.

Shah acknowledged that he initially staffed his data science team with physicians rather than nurses, until his institution's chief nursing officer intervened. He has since revised his view: "Ask nurses first, doctors second, and if the doctor and nurse disagree, believe the nurse, because they know what's really happening."

Nurses bring irreplaceable clinical signal to patient assessment—signal that algorithms cannot capture. As Obermeyer observed, the models analyze electronic medical records, but critical data exists outside the digital file: "How are they answering questions? How are they walking? All these subtle things that physicians and nurses see and understand about patients." Unvalidated rollouts carry systemic consequences beyond individual harm. As one expert—who requested anonymity due to concern about professional repercussions—warned: "You are creating mistrust in a generation of clinicians and providers."

Toward a Framework for Responsible Clinical AI

Some health systems are developing institutional responses. Stanford Health Care, Mount Sinai Health System, and others have brought AI development in-house, enabling internal testing and clinician-facing validation. Mount Sinai has implemented a bottom-up submission process in which any staff member may propose an AI tool. One wound-care nurse's proposal for a pressure ulcer prediction tool achieved high adoption rates—in part, leadership attributes, because the nurse is personally training her peers. Suchi Saria, John C. Malone Associate Professor of Computer Science at Johns Hopkins University and director of the Machine Learning and Healthcare Lab, argues that clinical AI must behave like a well-integrated team member: "It's not gonna work if this new team member is disruptive. People aren't gonna use it. If this new member is unintelligible, people aren't gonna use it."

The next frontier—agentic AI—raises the stakes further. Mount Sinai's cardiac catheterization lab deployed an agentic AI system called Sofiya to conduct pre-procedure patient calls. The system reportedly saved more than 200 nursing hours over five months. Yet nurses at a November 2024 New York City Council hearing testified that Sofiya's outputs still require nurse verification for accuracy. The efficiency gains, while real, have not eliminated the need for human clinical oversight.

Clinical and Policy Implications

The current landscape calls for clear-eyed action from clinical leadership and health system administrators. Institutions should require prospective, independent validation of AI tools before deployment; establish formal mechanisms for frontline clinician input during design and evaluation; and ensure that AI alerts are accompanied by interpretable, actionable rationale rather than opaque risk scores. Regulatory frameworks that bring clinical AI under oversight structures analogous to medical device approval merit serious policy consideration.

The patient safety imperative is clear: until algorithmic tools can demonstrate validated, equitable, and interpretable performance in real-world clinical environments, the final arbiter of patient care must remain the clinician at the bedside—the one who can see, hear, and sense what no algorithm yet can.

Related Posts

Blog Post Image

March 25, 2026

·

6 min

When AI Alerts Override Clinical Judgment, Who's Liable?

AI-driven sepsis flags, wearable monitors generating false positives, and agentic systems replacing nurse calls—clinical AI is accelerating without sufficient validation.

Blog Post Image

March 20, 2026

·

6 min

Performance Drives Patient Trust More Than Governance

A national survey of 3,000 U.S. adults reveals that AI performance — not FDA approval or physician oversight — is the single strongest driver of patient trust in medical AI. AI performing at specialist level increased visit selection by 32.5%, a finding with direct implications for how practices deploy and communicate AI tools.

Blog Post Image

March 10, 2026

·

5 min

AI Health Tools Are Here—But Are They Clinically Ready?

ChatGPT Health launched in January 2026—but a new study reveals it failed to properly triage the most and least serious cases.

Blog Post Image

March 4, 2026

·

7 min

Food Is Medicine: The $1.1T Case for Clinical Action Now

Poor diet drives CVD, type 2 diabetes, and stroke—costing $1.1 trillion annually in the US alone. A landmark JAMA Health Forum special communication argues that physicians now have the policy tools, EHR infrastructure, and clinical workflows to make "Food is Medicine" a standard of care—if they choose to act.

Blog Post Image

February 24, 2026

·

6 min

AI Scribes Capture More Symptoms—But Treat Fewer Patients

AI ambient scribes produce richer psychiatric documentation across all 6 neuropsychiatric domains—yet AI-scribed visits were 17% less likely to result in a depression diagnosis, new prescription, or behavioral health referral. Documentation and action are diverging.

Blog Post Image

February 11, 2026

·

4 min

Telehealth Cuts Both Good and Bad Tests—What Physicians Must Know

A landmark JAMA Network Open study of 22,547 propensity-matched annual visits reveals that virtual visits reduce high-value test ordering by 14.3% and low-value test ordering by 19.3% compared with in-person visits. Telehealth's promise as a care-quality lever is more complicated—and more consequential—than previously understood.