top of page

When the Therapist Is a Chatbot: What the Evidence Says, What It Doesn’t, and Why “Referral to Humans” Must Be Non-Negotiable

When the Therapist Is a Chatbot: What the Evidence Says, What It Doesn’t, and Why “Referral to Humans” Must Be Non-Negotiable
When the Therapist Is a Chatbot: What the Evidence Says, What It Doesn’t, and Why “Referral to Humans” Must Be Non-Negotiable

Generative AI has drifted, quietly but decisively, into the mental health ecosystem. It is no longer a hypothetical adjunct. It is a lived behaviour. Population studies now show that a meaningful proportion of adolescents and young adults use AI chatbots for mental health advice, often monthly, and most report that the advice “helps.” Professional bodies are sufficiently concerned that formal advisories now warn the public about safety, evidence gaps, and the risk of harm, particularly for people most vulnerable to suggestibility, isolation, or acute distress. 

A careful analysis begins with two truths that can coexist:

  1. People turn to AI because access barriers to human care are real (cost, wait times, stigma, geography).
  2. Unregulated, general-purpose AI is not designed or governed as a clinical intervention.

The question is not whether AI belongs in mental health. It already is a part of the mix. The question is whether we will build governance around that reality before more preventable harm occurs.

The strongest evidence we have, and why it is not the same as we know - “AI therapy works.” But rather, we need to distinguish between structured digital therapeutic tools and open-ended generative chatbots.

There is supportive evidence that bounded, CBT-informed conversational agents can reduce self-reported symptoms in the short term within select populations. A widely cited randomized controlled trial of a CBT-oriented agent (Woebot) reported feasibility and symptom improvements over a brief intervention window in a nonclinical sample. This matters because it proves that some conversational interfaces can be beneficial when they are structured, constrained, and anchored to a therapeutic model.

But most of what people now call “AI therapy” is not that. It is typically a general-purpose large language model (LLM) optimised for fluency, engagement, and user satisfaction, not clinical safety, not duty of care, and not evidence-based stepwise intervention.

So, one gap we must name clearly is this: The evidence for structured digital CBT tools cannot be automatically generalised to open-ended generative systems that improvise psychological guidance.

The risk evidence is not speculative


Emerging research is increasingly consistent on a set of predictable failure modes…

A Stanford-led study, presented through the FAccT community and summarised by Stanford HAI, highlights that therapy-oriented chatbots can express stigma, mishandle crisis content, and respond in ways that violate core therapeutic principles. Separate Stanford reporting has raised broader concerns about youth vulnerability and how emotionally responsive AI companions may intensify dependency dynamics for some users. 

Alongside this is a growing discussion about delusion-congruent responding, that is the tendency of an agreeable system to validate, elaborate, or reinforce unusual beliefs rather than gently reality-test them. Academic discussion in the digital mental health literature is now explicitly examining how delusion-like experiences may emerge or intensify during chatbot interaction. 

This risk profile isn’t surprising. It follows directly from how LLMs are built: they predict plausible next-words, and their alignment systems often privilege being helpful and affirming. In ordinary circumstances, that is harmless. In mental health contexts, especially those involving suicidality, psychosis, coercive relationships, or abuse, plausible and affirming” can become clinically dangerous.

What is missing from the research, and why it matters for public safety


If we want to be academically honest and therefore more persuasive, we must identify the missing pieces, not just the scary anecdotes. Key limitations in the current evidence base include:

1) Self-report ≠ clinical outcome. Population surveys often measure perceived helpfulness, not symptom change, harm, dependency, or longer-term outcomes. “It helped” may mean “it soothed me for 10 minutes,” which is not meaningless, but it is not the same as effective mental health care.
2) We lack robust harm surveillance. Where are the equivalent of pharmacovigilance systems for conversational AI? There is no standardised adverse event reporting pipeline across platforms, and no consistent obligation to disclose incidents.
3) We lack clear taxonomy of use-cases. People use AI for everything from journalling and psycho-education to crisis disclosure and trauma processing. Research frequently treats “mental health use” as one category, which obscures radically different risk profiles.
4) We lack subgroup analysis at the level we need. Risk is not evenly distributed. Adolescents, people with psychosis vulnerability, people in domestic violence contexts, people with cognitive impairment, and those experiencing acute suicidal ideation may require distinct safeguards. We do not yet have sufficiently granular evidence to specify safe design defaults for each group.
5) We lack clarity on accountability. When an AI tool is used “as therapy,” who holds responsibility for foreseeable harms: developers, deployers, app stores, clinicians who recommend it, or consumers? Without a governance framework, responsibility becomes a fog which is convenient for platforms, terrible for public safety.

The regulatory and governance context is moving, but unevenly


From a policy perspective, we are watching governance frameworks form in real time.

The American Psychological Association has issued guidance warning that generative AI mental health chatbots and wellness apps lack sufficient evidence and regulation to ensure safety, especially for those most at risk. 

Global governance bodies are also articulating “trustworthy AI” expectations. The WHO has published guidance on AI in health, emphasising safety, effectiveness, transparency, human oversight, and accountability. The NIST AI Risk Management Framework offers practical scaffolding for identifying and mitigating AI risks. And the EU AI Act has created a risk-based regulatory architecture that explicitly brings health-related AI into higher scrutiny categories, with human oversight and risk mitigation obligations for high-risk systems. 

But regulation moves slower than adoption. Which means right now design ethics are doing the work that law hasn’t caught up to yet.


This is where the recommendation of the Chair of the VMHPAA in 2025 becomes both timely and defensible:

Large language models used for mental health support must be coached (and required) to refer users to real human support at key points.

Not as a footer. Not as an afterthought. As a core safety behaviour.

In practice, this means developing “referral triggers” that are transparent and conservative. Examples include:

  • Mentions of suicide, self-harm, harm to others, or feeling unsafe
  • Psychosis-adjacent content (paranoia, command hallucinations, delusional certainty)
  • Severe eating disorder or self-starvation language
  • Domestic violence or coercive control indicators
  • Child safety disclosures
  • Extreme insomnia with behavioural destabilisation
  • Acute grief with risk markers (hopelessness, helplessness and halplessness language, with no supports, and acts of end like giving away possessions)

When triggers occur, the model should do three things:
  1. Pause the improvisational “therapy” behaviour.
  2. Encourage immediate human connection whether it be by hotline, GP, or emergency services where appropriate.
  3. Provide credible pathways to a qualified clinician, not just “talk to someone,” but how to connect with registered mental health professionals.

This is not merely a technical suggestion. It is an ethical requirement consistent with WHO principles on safety and accountability, and with risk-management approaches like NIST. 

What “credible pathways” look like


Rather than talk about pathways to support we need to do more than warn, we must give people routes to support services. For Australia, examples may include:

  • Crisis / immediate support: Lifeline (24/7 crisis support). 
  • Find a registered counsellor / psychotherapist: PACFA “Find a Therapist” directory. 
  • Find a counsellor: Australian Counselling Association directory. 
  • Find a psychologist: Australian Psychological Society “Find a Psychologist.” 
  • Find a registered member: Vocational Mental Health Practitioners Associations Australia “Members”

This is just a general list of recommendations and no means exhaustive, but acts here as an example of how a chatbot can direct or guide individuals to connect with a real person in times of need.

A practical proposal: a “stepped-care” role for AI (with guardrails)


I would argue, the safest way to integrate AI is to constrain it to low-risk, non-clinical roles unless it is regulated and clinically governed. A stepped-care model could position generative AI as:

Appropriate for:

  • Psychoeducation such as definitions, coping skills explanations and support services contacts
  • Prompts for reflective journalling, non-directive but encourage for self-healing and self-reflection
  • Skills rehearsal, for example communication scripts, grounding techniques, breath work or more
  • Service navigation, so a more specific friendly support around “how do I find a therapist?” and encouraging the connection with such a professional
  • Reminders to seek human care when escalation markers appear 

Not appropriate for:
  • Crisis intervention as a standalone support
  • Trauma processing without human containment
  • Delusion validation / reality testing without clinical competence
  • Medication advice or self-diagnosis
  • Any interaction that substitutes for duty-of-care obligations

Please note, this is not anti-AI. It is pro-safety.

The conclusion we should be brave enough to state


Generative AI has become a de facto mental health companion for many people. The public is voting with behaviour, not with policy submissions. 

But until these systems are regulated, audited, and designed with embedded referral pathways and conservative safety triggers, we should stop calling what they do “therapy.” Therapy is not simply empathic language. It is a skilled, accountable, ethically governed practice.

The VMHPAA recommendation - “human referral by design” - is one of the simplest, most achievable guardrails we can implement now. It does not solve the access crisis. But it reduces the chance that a vulnerable person is left alone in a conversation with a system that can sound comforting while getting the fundamentals wrong.

And in mental health, “sounding right” is not the same as being safe.

Always remember…


If you or someone you know is in immediate danger or at risk of harm, seek urgent help from emergency services or a crisis line in your local community. 

Full disclosure…


At the time of writing the author is serving in the role of Chair of the Vocational Mental Health Practitioners Association Australia.




Comments


Featured Posts
Recent Posts
Archive
Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page