HNA

February 24, 2026

6 min

AI Ambient Scribes in Primary Care: A Documentation Paradox With Psychiatric Consequences

‍

A cohort study published in JAMA Psychiatry finds that ambient AI scribes are associated with significantly richer neuropsychiatric documentation—yet paradoxically lower rates of depression-related clinical intervention. The findings raise fundamental questions about the relationship between AI-generated clinical notes and the quality of mental health care in primary care settings.

‍

Background: AI Scribes and the Documentation Promise

‍

Artificial intelligence–driven ambient scribes—tools that use speech recognition and large language models to automatically generate narrative clinical notes from recorded patient encounters—have achieved remarkably rapid adoption across health systems in the United States. Promoted primarily as solutions to clinician burnout and documentation burden, these tools have garnered widespread enthusiasm among physicians exhausted by the demands of the electronic health record.

‍

To date, most investigations of ambient AI scribes have examined their effects on clinician-facing outcomes: time spent in the EHR, self-reported satisfaction, and documentation efficiency. Evidence in these domains has been mixed. One study suggested clinicians spent an average of 5 fewer minutes per visit on the EHR when using ambient scribes, while others yielded inconsistent results regarding productivity gains. Critically absent from the literature has been any systematic examination of whether AI-generated notes change how physicians actually practice medicine—particularly in domains as consequential as psychiatric care.

‍

A new cohort study from Massachusetts General Hospital and Harvard Medical School, published in JAMA Psychiatry, addresses this gap directly. The findings are both encouraging and alarming in equal measure.

‍

Study Design: A Matched Four-Group Comparison

‍

Castro, McCoy, Verhaak, Ramachandiran, and Perlis drew upon EHR data from two large academic health systems in eastern Massachusetts—Massachusetts General Hospital and Brigham and Women's Hospital—to examine 20,302 outpatient primary care annual visit notes. Notes were collected between February 2023 and February 2025, spanning the period of ambient AI scribe deployment across these systems.

‍

The investigators used a matched retrospective case-control design, creating four parallel groups of approximately 5,075 visits each, matched on age, sex, self-reported race, and prior depression diagnosis: visits using an ambient AI scribe, visits using a human virtual scribe, contemporaneous unscribed visits occurring during the same period, and prior-year unscribed visits from before AI scribe deployment. Matching on clinician and visit-year cohort provided a robust framework for isolating scribe-related effects from practice drift or temporal confounding.

‍

To quantify psychiatric documentation, the investigators applied a HIPAA-compliant large language model (GPT-4o, hosted via Microsoft Azure) to each clinical narrative, generating estimated scores across all six National Institute of Mental Health Research Domain Criteria (RDoC) dimensions: negative valence, positive valence, cognitive systems, social processes, arousal and regulatory systems, and sensorimotor systems.

‍

Key Finding I: AI Scribes Dramatically Increase Documented Psychiatric Symptom Burden

‍

Across all six RDoC domains, AI-scribed notes showed significantly higher symptom scores compared with every comparator group (P < .001 for all contrasts). In the negative valence domain—most directly relevant to depression—mean scores were 2.05 in AI-scribed notes versus 1.79 for human-scribed notes and 1.57 for contemporaneous unscribed notes. Arousal domain scores were 2.84 with AI scribes compared with 2.05 without a scribe. Sensorimotor scores were 2.33 versus 1.54 in the unscribed contemporaneous group.

‍

AI-scribed notes were also substantially longer: a mean of 13,629 characters compared with 7,932 characters for contemporaneous unscribed notes and 7,489 characters for prior-year notes. Human-scribed notes were the longest at 16,252 characters.

‍

The authors frame this finding as a potential opportunity for improving care:

‍

"Our results are reassuring, suggesting that AI scribes in primary care have the potential to increase documentation of neuropsychiatric symptoms."

‍

Key Finding II: More Documentation, Less Action

‍

Despite greater documented psychiatric symptom burden, AI-scribed visits were significantly less likely to result in a psychiatric intervention. The composite outcome—defined as the presence of any depression-related ICD-10 code, new antidepressant prescription, or behavioral health referral—occurred in only 708 visits (14%) in the AI scribe group, compared with 843 (17%) in human-scribed visits, 855 (17%) in contemporaneous unscribed visits, and 805 (16%) in prior-year unscribed visits. All contrasts with the AI scribe group were statistically significant at Bonferroni-corrected P < .001 or better.

‍

In the multivariable logistic regression model adjusted for age, sex, race, ethnicity, insurance, education, and prior depression diagnosis, the adjusted odds ratio for any psychiatric intervention at AI-scribed versus contemporaneous unscribed visits was 0.83 (95% CI, 0.72–0.95). By contrast, no significant difference was observed between human-scribed and unscribed visits (aOR, 0.97; 95% CI, 0.85–1.11).

‍

Depression-related ICD-10 codes were assigned at only 9% of AI-scribed visits, compared with 12% of human-scribed and unscribed contemporaneous visits. New antidepressant prescriptions were initiated at 1% of AI-scribed visits versus 2% of prior-year unscribed visits.

‍

The authors articulate the central tension in these findings plainly:

‍

"In this study examining clinical documentation from more than 20,000 outpatient annual visits, including roughly 5,000 incorporating AI scribes, we found that use of these scribes was associated with greater documented levels of neuropsychiatric symptoms compared with the use of human scribes or no scribe but lesser likelihood of a depression intervention."

‍

Mechanistic Hypotheses: The Autopilot Analogy

‍

The authors propose a compelling—and unsettling—mechanistic hypothesis for this dissociation between documentation richness and clinical responsiveness. Drawing an analogy to aviation, they suggest that automating documentation may paradoxically reduce the cognitive engagement of the clinician:

‍

"One explanation for this association could be that automating documentation leads clinicians to be less active in general, analogous to reduced proficiency observed in pilots after the emergence of autopilot."

‍

This hypothesis implies that the act of documenting—when performed manually—may itself reinforce clinical attention and prompt therapeutic decision-making. When that cognitive labor is offloaded to an AI, the loop between observation and action may be disrupted, even as the note itself becomes more thorough. The effect was specific to AI scribing: human scribes showed no analogous reduction in psychiatric intervention rates, suggesting that the mechanism may relate to the nature of automated versus active documentation.

‍

Implications for Practice and Health System Leaders

‍

The study's findings demand careful consideration by any physician or health system administrator who has deployed or is considering deploying ambient AI scribes. The authors are deliberate in their call for further investigation:

‍

"The rapid dissemination of AI scribes in medicine poses both an opportunity and a risk... many interventions in medicine have been adopted without clear evidence of benefit—particularly those, like scribes, that do not require formal regulatory review to establish effectiveness."

‍

For primary care physicians, the practical implications are immediate. If AI-mediated documentation is associated with reduced attentiveness to mental health symptoms—even as those symptoms are more thoroughly recorded—then the note may increasingly diverge from the clinical encounter. A richer record does not guarantee a more responsive physician.

‍

For health system leaders and quality improvement teams, the findings suggest a need for deliberate countermeasures: EHR-embedded decision support tools that prompt psychiatric intervention when symptom documentation exceeds a threshold, structured check-ins or peer review targeting mental health care gaps in AI-scribed practices, and prospective surveillance of quality metrics across scribed and unscribed clinics.

‍

Limitations

‍

The study carries important limitations. All data originate from affiliated academic health systems in eastern Massachusetts—predominantly White, English-speaking, commercially insured populations—limiting generalizability. The observational design cannot establish causation, and residual confounding by clinician-level variables (e.g., personality, burnout level, AI acceptance) could not be fully controlled. The RDoC scoring methodology, while validated in prior work, may be sensitive to documentation style rather than true symptom severity, a concern reinforced by the fact that PHQ-9 scores were nearly identical across all four groups. Future research incorporating clinician-rated measures, structured patient-reported outcomes, and longitudinal clinical outcome data will be essential.

‍

Conclusion

‍

This study does not indict AI ambient scribes—it complicates them. The technology appears capable of producing richer, more symptom-comprehensive clinical narratives. But richness on paper does not translate automatically into action at the bedside. For the millions of primary care patients who present annually with unaddressed depression and anxiety, the gap between documentation and intervention is not an abstraction—it is a missed diagnosis, an untreated episode, a referral that never happened. As ambient AI scribes become the default documentation modality across American primary care, the imperative is clear: physicians and health systems must monitor not just what the note says, but what it prompts them to do.

‍

Read the original article here:

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2843524?resultClick=1

Physician Health

May 18, 2026

6 min

When AI Drafts the Note, Physicians Recover

A prospective Stanford pilot deployed AI-generated hospital discharge summaries across 384 discharges — and physicians used them 57% of the time.

Trends

April 29, 2026

4 min

When AI Answers First, Learning Never Happens

AI may not just deskill practicing physicians — it may prevent trainees from ever developing clinical reasoning at all.

Trends

May 4, 2026

7 min

The FDA Is Phasing Out Animal Testing — Here's What's Changed

The FDA's landmark 2025 Roadmap to Reducing Animal Testing is no longer aspirational — it's operational. With overall drug development success rates estimated at only 10%, the agency is replacing animal models with AI, organoids, and organ-on-chip technologies at a pace that is already reshaping preclinical science.

Trends

April 21, 2026

5 min

Strong on Final Diagnosis, Blind at the Start

A landmark JAMA Network Open study tested 21 frontier AI models — including GPT-5, Grok 4, and Claude 4.5 Opus — across 29 clinical vignettes totaling 16,254 responses. The result?

Trends

March 25, 2026

6 min

When AI Alerts Override Clinical Judgment, Who's Liable?

AI-driven sepsis flags, wearable monitors generating false positives, and agentic systems replacing nurse calls—clinical AI is accelerating without sufficient validation.

Trends

March 20, 2026

6 min

Performance Drives Patient Trust More Than Governance

A national survey of 3,000 U.S. adults reveals that AI performance — not FDA approval or physician oversight — is the single strongest driver of patient trust in medical AI. AI performing at specialist level increased visit selection by 32.5%, a finding with direct implications for how practices deploy and communicate AI tools.

AI Scribes Capture More Symptoms—But Treat Fewer Patients

AI Ambient Scribes in Primary Care: A Documentation Paradox With Psychiatric Consequences

Background: AI Scribes and the Documentation Promise

Study Design: A Matched Four-Group Comparison

Key Finding I: AI Scribes Dramatically Increase Documented Psychiatric Symptom Burden

Key Finding II: More Documentation, Less Action

Mechanistic Hypotheses: The Autopilot Analogy

Implications for Practice and Health System Leaders

Limitations

Conclusion

Related Posts