huw

364 karmaJoined Working (0-5 years)Sydney NSW, Australia
huw.cool

Bio

Participation
2

I live for a high disagree-to-upvote ratio

Comments
63

Global Burden of Disease (GBD) is okay, it depends a lot on what disease & metric you're looking at, and how aware you are of the caveats around it. Some of these:

  • A lot of the data is estimated, rather than real measurements of prevalence
    • I think most people understand this, but it's always worth a reminder
    • The GBD provides credible intervals for all major statistics and these should definitely be used!
    • This paper on the Major Depressive Disorder estimates is a good overview for a specific disease
  • The moral weights for estimating the years lived with disability for a given disease are compiled from a wide survey of the general public
    • This means they're based on people's general belief of what it would be like to have that condition, even if they haven't experienced it or know anyone who has
    • Older GBDs included expert opinion in their moral weights, but to remove biases they don't do this anymore (IMHO I think this is the right call)
  • The estimates for prevalence are compiled differently per condition by experts in that condition
    • There is some overall standardisation, but equally, there's some wiggle room for a motivated researcher to inflate their prevalence estimates. I assume the thinking is that these biases cancel out in some overall sense.

Overall, I think the GBD is very robust and an extremely useful tool, especially for (a) making direct comparisons between countries or diseases and (b) where no direct, trustworthy, country-specific data is available. But you should be able to improve on its accuracy if you have an inside view on a particular situation. I don't think it's subject to the incentives you mention above in quite the same way.

Thank you so much! Your criticism has helped me identify a few mistakes, and I think can get us closer to clarity. The main difference between our models is around who counts as a 'beneficiary', or what it means to 'recruit' someone.

The main thing I want to focus on is that you're predicting a cost per beneficiary that would be nearly 50% recruitment. I don't think that passes the smell test. The main difference is you're only counting the staff time for active participants, but even with modest dropout, we'd expect the vast majority of staff time to go to users who only complete one or two calls. But you're right to point out that we should factor in dropout between installation and the first guidance call, and when I factor this in, unguided has 11% of the cost of guided at scale.

The rest of this comment is just my working out.

Mistakes I made

One of the mistakes I made was having different definitions for recruitment for each condition in my cost model. If the number in that model says there are 100,000 beneficiaries, in the unguided model this means we got 100,000 installs, but in the guided model it means we got 100,000 participants who had 50–180 minutes of staff time allocated to them. Obviously there are different costs to recruit these kinds of participants.

(Two other mistakes I found: I shouldn't have multiplied the unguided recruitment cost by 2 to account for engagement differences, and I forgot to discount the office space costs for the nonexistent guides in the unguided model)

I should've been clearer about a 'beneficiary'

To rectify this, let's count a 'beneficiary' as someone who is in the targeted subgroup and completes pre-treatment. This is in line with most of the literature, which counts 'dropout' regardless of whether users complete any of the material, so long as they've done their induction. We don't want to filter this down to 'active' users, since users who drop out will still incur costs.

We have some facts from Kaya Guides:

  • They spent US$105 and got 875 initial user interactions for a cost per interaction of $0.12 (this isn't quite cost per install but it's close enough)
    • Let's consider this a lower bound, Meta's targeting should get cheaper at scale and after optimisation on Kaya Guides' end. I think it could get down to the $0.02–0.10 figure pretty easily (and it may have already been there if a number of interactions didn't lead to installs)
  • 82.65% of users who completed their depression questionnaire scored above threshold
    • This should also be a lower bound. Kaya Guides used basic ads, but Meta is able to optimise for 'app events' (such as scoring highly on this questionnaire) so should be better than this at targeting for depression (scary!)
  • 12.34% (108) of these initially interacting users scheduled and picked up an initial call
  • Their programmes involve an initial call, a second call at 3 days (I confirmed this directly), and then 5–8 weeks of subsequent calls. All calls are 15 minutes.

Updating my model

I've updated my cost model:

  • Split office space costs between conditions
  • Removed the recruitment cost doubling in unguided
  • Broke down staff time per participant
    • 1–10 calls per participant (lognormal), this is roughly what Kaya Guides and Step-By-Step do. I distributed it as a lognormal to quickly account for dropout rates, but the model is somewhat sensitive to this parameter. More knowledge on its distribution could change the overall calculus.
    • 15–40 minutes of time spent per call (again, roughly consistent with Kaya Guides and Step-By-Step, accounts for other time spent on that participant like notes & chat reviews)
  • Broke down cost per treatment starter
    • Kept cost per install at $0.02–0.10, since it's consistent with real-world data
    • Set a discount rate at 12.34% (~5–20%) of installers starting the programme, consistent with Kaya Guides' observations
      • I decided to extrapolate this to unguided, in absence of better data

I think this fairly accounts for everything you raised. I think you're right to point out that my model should've accounted for the cost of a treatment starter (~8x higher). But I don't think it's right to only account for active users, since Kaya Guides spend 15 minutes of staff time on 12% of all installers, even if they drop out later. And as their ad targeting gets better, we'd only expect this number to increase, paradoxically widening the cost gap!

Plugging it all in, unguided has 8% (17–23%) of the cost of guided at scale.

Earlier, I also sense-checked with Kaya Guides' direct cost-per-beneficiary, which they estimate to be $3.93. If the unguided cost per beneficiary is $0.41 (as in the updated model), then the limiting proportion increases a bit to 11%.

My doubts

The fact that it took over a month to find some pretty obvious flaws in my model is a concern, and my model is clearly somewhat sensitive to the parameters. However, even if I'm really pessimistic about the parameters, I can't get it above 20% of the cost of guided, which would still make it more cost-effective.

The bigger doubt I've had since writing this report is learning from Kaya Guides that they actually do have an unguided condition—anyone who scores 0–9 on the PHQ-9 (no/mild depression), or anyone who scores above but explicitly doesn't want a guide gets the ordinary programme, just without the calls. This has an astonishing 0% completion rate. I think the different subgroup, programme design, and lack of focus are mostly contributing to this, but it indicates that it's gonna be hard to keep users engaged. I'll chat with them some more and see if I can learn anything else.

My union is pretty conservative w/r/t social justice, because it's the one that covers tech & science (our members tend to hold left-wing opinions, but don't like stirring the pot). I don't know how we'd feel about animal welfare, but not many of us work directly in those industries.

To get closer to your point, live animal export is a big issue in Australia, and our dedicated Meat Industry Employees Union have called for a ban on it. So I think the kind of campaign you're talking about would fit right in here. Their animal welfare policy is so important to them that it's on the front page of their website. Equally, they've worked with the Greens and the Animal Justice Party (both legislatively represented) in the past, and the unions here have close ties to the Labor party (1 of 2 major parties), so political change might be uniquely achievable here—although I doubt the situation is much different in most EU countries.

Thank you—I am a big believer in the power of collective action & have organised successful union drives & pay disputes in the past. I don't have a lot to add to your breakdown; I think this is a very promising area for EA to consider for almost every cause area (ex. would love to see a similar breakdown for current/future efforts in frontier AI labs).

Just strategically, I think the most promising insider activism campaign would be to partner with an existing union in a country with strong union protections; this way, you can leverage those protections to prevent retaliation against employee activists, as they can credibly claim they were organising for the union. I think, frankly, this rules out the U.S. as a starting point—you would want to build groundswell in places where the host companies can't cut out off at the knees (the recent dismissals at Google are a strong reminder that if employees protest something the company has a stake in, they'll be fired at-will with no consequences).

Furthermore, unions have a lot of existing connections & skills in developing these campaigns, and, as you've noted, regularly participate in employee activism directly or otherwise have presences in other social movements. This comes with the trade-off of potentially alienating some employees (unions are almost exclusively left-wing and have established reputations), but I don't think there are many people (outside of the U.S.) who would be put off by a union and would've otherwise joined an employee activist drive.

huw
18
5
0

Can you give a sense of what proportion? Should we expect 'some' to mean ≤10% or something more significant?

I misinterpreted "but low if you think AI could start to automate a large fraction of jobs before 2030". Thanks for clarifying :)

I don't get it. How are consumers supposed to pay trillions of dollars if AI is going to automate a large fraction of their jobs?

[This comment is no longer endorsed by its author]Reply
huw
26
0
0
3

FWIW on timelines:

  • June 13, 2022: Critiques paper (link 1)
  • May 9, 2023: Language models explain language models paper (link 2)
  • November 17, 2023: Altman removal & reinstatement
  • February 15, 2024: William_S resigns
  • March 8, 2024: Altman is reinstated to the OpenAI board
  • March 12, 2024: Transformer debugger is open-sourced
  • April 2024: Cullen O'Keefe departs (via LinkedIn)
  • April 11, 2024: Leopold Aschenbrenner & Pavel Izmailov fired for leaking information
  • April 18, 2024: Users notice Daniel Kokotaljo has resigned

Without reading too much into it, there's a similar amount of negativity about the state of EA as there is a lack of confidence in its future. That suggests to me that there's a lot of people who think EA should be reformed to survive (rather than 'it'll dwindle and that's fine' or 'I'm unhappy with it but it'll be okay')?

If anything, EA now has a strong public (admittedly critical) reputation for longtermist beliefs. I wouldn't be surprised if some people have joined in order to pursue AI alignment and got confused when they found out more than half of the donations go to GHD & animal welfare.

Load more