TH

Tobias Häberli

@ Pivotal Research
1736 karmaJoined Bern, Switzerland

Comments
100

Topic contributions
1

I’m slightly confused about how they distinguish between ‘the model reached ASL-3’ and ‘treating the model as if it had reached ASL-3’.

We are deploying Claude Opus 4 with our ASL-3 measures as a precautionary and provisional action. To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections.

(Activating AI Safety Level 3 Protections)

This doesn’t change anything about your point that they changed their initial commitment, but does it mean that they didn’t (technically) break their initial commitment with this release? 

I don't think that people need to be non-speciecist and enthusiastic about neglected issues to want to donate to shrimp welfare. People might donate because they are opportunistic donors and this seems like a worthy cause, because they found Andrés trustworthy and want to donate to trustworthy projects, or because of memes (the internet is into shrimp), etc.

The best-case scenario for increasing donation volume is probably thoughtful, high-net-worth individuals getting interested in whether this is a thing, deciding that it is, and partially adjusting their donation decisions. I don't think they need to fully buy into effective altruism to do this.

I'd certainly be interested in whether this video leads to a notable uptick in donations (both the number and volume) :) 

The video has 418K youtube views – and I'd guess it will stagnate somewhere between 500k and 1 million views. In a 5-minute search I couldn’t find any other video seriously considering shrimp welfare with over 5k views, and I'd guess there are only 15–40 such videos with more than 1k views. So this video might have increased exposure to shrimp welfare concerns through youtube something like 3–15x. Seems plausible that it will lead to substantially more donations.

But I feel also sad that the ideas have largely not slipped into the public consciousness over the last 14 years. 

I kinda like that we’re back (so back?) to “a new movement called effective altruism”.

An even more neglected problem: low-floating fruit. Seagrass produces fruit[1], some of which (halophila decipiens) has been found hanging at depths of 190 feet (58 meters)[2]. This is an absurdly submerged fruit, not even reachable for giraffes. Somebody should be on this.

  1. ^

    https://en.wikipedia.org/wiki/Seagrass#Sexual_recruitment

  2. ^

I think the context of the Jack Clarke quote matters:

What if we’re right about AI timelines? What if we’re wrong?
Recently, I’ve been thinking a lot about AI timelines and I find myself wanting to be more forthright as an individual about my beliefs that powerful AI systems are going to arrive soon – likely during this Presidential Administration. But I’m struggling with something – I’m worried about making short-timeline-contingent policy bets.

So far, the things I’ve advocated for are things which are useful in both short and long timeline worlds. Examples here include:

  • Building out a third-party measurement and evaluation ecosystem.
  • Encouraging governments to invest in further monitoring of the economy so they have visibility on AI-driven changes.
  • Advocating for investments in chip manufacturing, electricity generation, and so on.
  • Pushing on the importance of making deeper investments in securing frontier AI developers.

All of these actions are minimal “no regret” actions that you can do regardless of timelines. Everything I’ve mentioned here is very useful to do if powerful AI arrives in 2030 or 2035 or 2040 – it’s all helpful stuff that either builds institutional capacity to see and deal with technology-driven societal changes, or equips companies with resources to help them build and secure better technology.

But I’m increasingly worried that the “short timeline” AI community might be right – perhaps powerful systems will arrive towards the end of 2026 or in 2027. If that happens we should ask: are the above actions sufficient to deal with the changes we expect to come? The answer is: almost certainly not!

[Section that Mikhail quotes.]

Loudly talking about and perhaps demonstrating specific misuses of AI technology: If you have short timelines you might want to ‘break through’ to policymakers by dramatizing the risks you’re worried about. If you do this you can convince people that certain misuses are imminent and worthy of policymaker attention – but if these risks subsequently don’t materialize, you could seem like you’ve been Chicken Little and claimed the sky is falling when it isn’t – now you’ve desensitized people to future risks. Additionally, there’s a short- and long-timeline risk here where by talking about a specific misuse you might inspire other people in the world to pursue this misuse – this is bound up in broader issues to do with ‘information hazards’.

These are incredibly challenging questions without obvious answers. At the same time, I think people are rightly looking to people like me and the frontier labs to come up with answers here. How we get there is going to be, I believe, by being more transparent and discursive about these issues and honestly acknowledging that this stuff is really hard and we’re aware of the tradeoffs involved. We will have to tackle these issues, but I think it’ll take a larger conversation to come up with sensible answers.

In context Jack Clark seems to be arguing that he should be considering short timeline, 'regretful actions' more seriously.

Some simplifying assumptions:

  • £50k starting net worth
  • Only employed for the next 4 years
  • £300k salary, £150k after tax, £110k after personal consumption
  • 10% interest on your savings for 4 years
  • Around £635k at end of 2030

This is only slightly more than the average net worth of for UK 55 to 64 year olds

Overall, if this plan worked out near perfectly, it would place you in around the 92 percentile of wealth in the UK

This would put you in a good, but not great position to invest to give. 

Overall it seems to me as if you’re trying to speedrun getting incredibly wealthy in 4 years. This is generally not possible with salaried work (the assumption above put you around the 99-99.5 percentile of salaries), but might be more feasible through entrepreneurship.

Some other considerations:

  • Working in such a high paying job, even in financial services, will probably not allow you to study and practice investing. You will not be an expert on AI investing and investing in general in 2030, which would be a problem if you believe such expertise was necessary for you to invest to give.
  • Quite a lot of EAs will be richer than this in 2030. My rough guess is more than 500. Your position might be useful but is likely to be far from unique.
  • You might want to think through your uncertainties about how useful money will be to achieve your goals in 2030-2040. If there’s no more white collar jobs in 2030, in 2035 the world might be very weird and confusing.
  • If there is a massive increase of overall wealth in 2030-2040 due to fast technological progress, a lot of problems you might care about will get solved by non-EAs. Charity is a luxury good for the rich, more people will be rich, charity on average solves much more problems than it creates.
  • Technological progress itself will potentially solve a lot of the problems you care about.
  • (Also agree with Marcus’s point.)

Thanks for the comment! I might be missing something, but GPT-type chatbots are based on large language models, which play a key role in scaling toward AGI. I do think that extrapolating progress from them is valuable but also agree that tying discussions about future AI systems too closely to current models’ capabilities can be misleading.

That said, my post intentionally assumes a more limited claim: that AI will transform the world in significant ways relatively soon. This assumption seems both more likely and increasingly foreseeable. In contrast, assumptions about a world ‘incredibly radically’ transformed by superintelligence are less likely and less foreseeable. There have been lots of arguments around why you should work on AI Safety, and I agree with many of them. I’m mainly trying to reach the EAs who buy into the limited claim but currently act as if they don’t.

Regarding the example: It would likely be a mistake to focus only on current AI capabilities for education. However, it could be important to seriously evaluate scenarios like, ‘AI teachers better than every human teacher soon’.

Thanks for the thoughtful comment!

Re point 1: I agree that the likelihood and expected impact of transformative AI exist on a spectrum. I didn’t mean to imply certainty about timelines, but I chose not to focus on arguing for specific timelines in this post.

Regarding the specific points: they seem plausible but are mostly based on base rates and social dynamics. I think many people’s views, especially those working on AI, have shifted from being shaped primarily by abstract arguments to being informed by observable trends in AI capabilities and investments.

Load more