TIO: A mental health chatbot

Sanjay

TIO: A mental health chatbot

Sanjay

34 min readOct 12, 2020

Comments 6

Sorted by

New & upvoted

MichaelPlant

Hello Sanjay, thanks both for writing this up and actually having a go at building something! We did discuss this a few months ago but I can't remember all the details of what we discussed.

First, is there a link to the bot so people can see it or use it? I can't see one.

Second, my main question for you -sorry if I asked this before - is: what is the retention for the app? When people ask me about mental health tech, my main worry is not whether it might work if people used it, but whether people do want to use it, given the general rule that people try apps once or twice and then give up on them. If you build something people want to keep using and can provide that service cheaply, this would very likely be highly cost-effective.

I'm not sure it's that useful to create a cost-effectiveness model based on the hypothetical scenario where people use the chatbot: the real challenge is to get people to use it. It's a bit like me pitching a business to venture capitalists saying "if this works, it'll be the next facebook", to which they would say "sure, now tell us why you think it will be the next facebook".

Third, I notice your worst-cast scenario is the effect lasts 0.5 years, but I'd expect using a chatbot to only make me feel better for a few minutes or hours, so unless people are using it many times, I'd expect the impact to be slight. Quick maths: a 1 point increase on a 0-10 happiness scale for 1 day is 0.003 happiness life-years.

Sanjay

Thank you very much for taking the time to have a look at this.

(1) For links to the bot, I recommend having a look at the end of Appendix 1a, where I provide links to the bot, but also explain that people who aren't feeling low tend not to behave like real users, so it might be easier to look at one of the videos/recordings that we've made, which show some fictional conversations which are more realistic.

(2) Re retention, we have deliberately avoided measuring this, because we haven't thought through whether that would count as being creepy with users' data. We've also inherited some caution from my Samaritans experience, where we worry about "dependency" (i.e. people reusing the service so often that it almost becomes an addiction). So we have deliberately not tried to encourage reuse, nor measured how often it happens. We do however know that at least some users mention that they will bookmark the site and come back and reuse it. Given the lack of data, the model is pretty cautious in its assumptions -- only 1.5% of users are assumed to reuse the site; everyone else is assumed to use it only once. Also, those users are not assumed to have a better experience, which is also conservative.

I believe your comments about hypotheticals and "this will be the next facebook" are based on a misunderstanding. This model is not based on the "hypothetical" scenario of people using the bot, it's based on the scenario of people using the bot *in the same way the previous 10,000+ users have used the bot*. Thus far we have sourced users through a combination of free and paid-for Google ads, and, as described in Appendix 4a, the assumptions in the model are based on this past experience, adjusted for our expectations of how this will change in the future. The model gives no credit to the other ways that we might source users in the future (e.g. maybe we will aim for better retention, maybe we will source users from other referrals) -- those would be hypothetical scenarios, and since I had no data to base those off, I didn't model them.

(3) I see that there is some confusion about the model, so I've added some links in the model to appendix 4a, so that it's easier for people viewing the model to know where to look to find the explanations.

To respond to the specific points, the worst case scenario does *not* assume that the effect lasts 0.5 years. The worst case scenario assumes that the effect lasts a fraction of day (i.e. a matter of hours) for exactly 99.9% of users. For the remaining 0.1% of users, they are assumed to like it enough to reuse it for about a couple of weeks and then lose interest.

I very much appreciate you taking the time to have a look and provide comments. So sorry for the misunderstandings, let's hope I've now made the model clear enough that future readers are able to follow it better.

KrisMartens

Interesting idea, great to see such initiatives! My main attempt to contribute something is that I think I disagree about the way you seem to assume that this potentially would 'revolutionise the psychology evidence base'.

Questionable evidence base for underlying therapeutic approach

This bot has departed from many other mental health apps by not using CBT (CBT is commonly used in the mental health app space). Instead it’s based on the approach used by Samaritans. While Samaritans is well-established, the evidence base for the Samaritans approach is not strong, and substantially less strong than CBT. Part of my motivation was to improve the evidence base, and having seen the results thus far, I have more faith in the bot’s approach, although more work to strengthen the evidence base would be valuable

I'm not sure if it's helpful to think in terms of evidence base of an entire approach, instead of thinking diagnosis- or process-based. I mean, we do now a bit about what works for whom, and what doesn't. One potential risk is assuming that an approach can never be harmful, which it can.

The bot aims to achieve change in the user’s emotional state by letting the user express what’s on their mind

This is such a potential mechanism, it might be harmful for processes such as worrying or ruminating. If I understand the app correctly, I don't think I would advise it for my patients with generalized anxiety disorder, or with dependent personality traits.

Some therapeutic approaches (like CBT) are closer to being uniform (although, depending on how you implement them, sometimes CBT can be more or less uniform)

Others, like Rogerian or existential therapies, are highly non-uniform -- they don’t have a clear “playbook”

But a lot of Rogerian therapies would exclude quite some cases? Or there is at least a selection bias?

Sanjay

Thank you for your comment Kris.

I'm unclear why you are hesitant about the claim of the potential to revolutionise the psychology evidence base. I wonder if you perhaps inadvertently used a strawman of my argument by only reading the section which you quoted? This was not intended to support the claim about the bot's potential to revolutionise the psychology evidence base.

Instead, it might be more helpful to refer to Appendix 2; I include a heavily abbreviated version here:

The source for much of this section is conversations with existing professional psychiatrists/psychologists.

Currently some psychological interventions are substantially better evidenced than others.

<SNIP>

Part of the aim of this project is to address this in two ways:

(1) Providing a uniform intervention that can be assessed at scale

<SNIP>

(2) Allowing an experimental/scientific approach which could provide an evidence base for therapists

<SNIP>

Crucially, TIO is fundamentally different from other mental health apps -- it has a free-form conversational interface, similar to an actual conversation (unlike other apps which either don’t have any conversational interface at all, or have a fairly restricted/”guided” conversational capability). This means that TIO is uniquely well-positioned to achieve this goal.

To expand on item (2), the idea is that when I, as someone who speaks to people in a therapeutic capacity, choose to say one thing (as opposed to another thing) there is no granular evidence about that specific thing I said. This feels all the more salient when being trained or training others, and dissecting the specific things said in a training role play. These discussions largely operate in an evidence vacuum.

The professionals that I've spoken to thus far have not yet been able to point me to evidence as granular as this.

If you know of any such evidence, please do let me know -- it might help me to spend less time on this project, and I would also find that evidence very useful.

KrisMartens

Thanks for your reply, I hope I'm not wasting your time.

But appendix 2 also seems to imply that the evidence base for CBT is for it as an approach in its entirety. What we think that works in a CBT protocol for depression is different than what we think that works in a CBT protocol for panic disorder (or OCD, or ...). And there is data for which groups none of those protocols work.

In CBT that is mainly based on a functional analysis (or assumed processes), and that functional analysis would create the context in which specific things one would or wouldn't say. This also provides context to how you would define 'empathetic responses'.

(There is a paper from 1966 claiming that Rogers probably also used implicit functional analyses to 'decide' to what extent he would or wouldn't reinforce certain (mal)adaptive behaviors, just to show how old this discussion is. The bot might generate very interesting results to contribute to that discussion!)

Would you consider evidence that a specific diagnosis-aimed CBT protocol works better than a general CBT protocol for a specific group as relevant to the claim that there is evidence about which reactions (sentences) would or wouldn't work (for whom)?

So I just can't imagine revolutionizing the evidence base for psychological treatments using a 'uniform' approach (and thus without taking characteristics of the person into account), but maybe I don't get how diverse this bot is. I just interacted a bit with the test version, and it supported my hypothesis about it potentially being (a bit) harmful to certain groups of people. (*edit* you seem to anticipate on this but not encouraging re-use). But still great for most people!

Sanjay

Thanks very much Kris, I'm very pleased that you're interested in this enough to write these comments.

And as you're pointing out, I didn't respond to your earlier point about talking about the evidence base for an entire approach, as opposed to (e.g.) an approach applied to a specific diagnosis.

The claim that the "evidence base for CBT" is stronger than the "evidence base for Rogerian therapy" came from psychologists/psychiatrists who were using a bit of a shorthand -- i.e. I think they really mean something like "if we look at the evidence base for CBT as applied to X for lots of values of X, compared to the evidence base for Rogerian therapy as applied to X for lots of values of X, the evidence base for the latter is more likely to have gaps for lots of values of X, and more likely to have poorer quality evidence if it's not totally missing".

It's worth noting that while the current assessment mechanism is the question described in Appendix 1f, this is, as alluded to, not the only question that could be asked, and it's also possible for the bot to incorporate other standard assessment approaches (PHQ9, GAD7, or whatever) and adapt accordingly.

Having said that, I'd say that this on its own doesn't feel revolutionary to me. What really does seem revolutionary is that, with the right scale, I might be able to say: This client said XYZ to me, if I had responded with ABC or DEF, which of those would have given me a better response, and be able to test something as granular as that and get a non-tiny sample size.

Comments

Sanjay

Thank you very much for taking the time to have a look at this.