Hide table of contents
This is a linkpost for https://ailabwatch.org

I'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly.

It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff.

(It's much better on desktop than mobile — don't read it on mobile.)

It's in beta—leave feedback here or comment or DM me—but I basically endorse the content and you're welcome to share and discuss it publicly.

It's unincorporated, unfunded, not affiliated with any orgs/people, and is just me.

Some clarifications and disclaimers.

How you can help:

  • Give feedback on how this project is helpful or how it could be different to be much more helpful
  • Tell me what's wrong/missing; point me to sources on what labs should do or what they are doing
  • Suggest better evaluation criteria
  • Share this
  • Help me find an institutional home for the project
  • Offer expertise on a relevant topic
  • Offer to collaborate
  • Volunteer your webdev skills
  • (Pitch me on new projects or offer me a job)
  • (Want to help and aren't sure how to? Get in touch!)

I think this project is the best existing resource for several kinds of questions, but I think it could be a lot better. I'm hoping to receive advice (and ideally collaboration) on taking it in a more specific direction. Also interested in finding an institutional home. Regardless, I plan to keep it up to date. Again, I'm interested in help but not sure what help I need.

I could expand the project (more categories, more criteria per category, more labs); I currently expect that it's more important to improve presentation stuff but I don't know how to do that; feedback will determine what I prioritize. It will also determine whether I continue spending most of my time on this or mostly drop it.


I just made a twitter account. I might use it to comment on stuff labs do.


Thanks to many friends for advice and encouragement. Thanks to Michael Keenan for doing most of the webdev. These people don't necessarily endorse this project.

128

0
0
2

Reactions

0
0
2
Comments23


Sorted by Click to highlight new comments since:

I think this is a great idea! Is there a way to have two versions:

  1. The detailed version (with %'s, etc)
  2. And the meme-able version (which links to the detailed version)

Content like this is only as good as the number of people that see it, and while its detail would necessarily be reduced in the meme-able version, I think it is still worth doing.

The Alliance for Animals does this in the lead up to elections and it gets spread widely: https://www.allianceforanimals.org.au/nsw-election-2023

 

Yep. But in addition to being simpler, the version of this project optimized for getting attention has other differences:

  • Criteria are better justified, more widely agreeable, and less focused on x-risk
  • It's done—or at least endorsed and promoted—by a credible org
  • The scoring is done by legible experts and ideally according to a specific process

Even if I could do this, it would be effortful and costly and imperfect and there would be tradeoffs. I expect someone else will soon fill this niche pretty well.

Hi Zach! To clarify, are you basically saying you don't want to improve the project much more than where you've got it to? I think it is possible you've tripped over a highly impactful thing here!

Not necessarily. But:

  1. There are opportunity costs and other tradeoffs involved in making the project better along public-attention dimensions.
  2. The current version is bad at getting public attention; improving it and making it get 1000x public attention would still leave it with little; likely it's better to wait for a different project that's better positioned and more focused on getting public attention. And as I said, I expect such a project to appear soon.

"And as I said, I expect such a project to appear soon."

I dont know whether to read this as "Zach has some inside information that gives him high confidence it will exist" or "Zach is doing wishful thinking" or something else!

What do you consider the purpose or theory of change of this project? I assumed it was to put pressure on the AI labs to improve along these criteria, which presumably requires some level of public attention. Do you see it more as a way for AI safety people to keep tabs on the status of these labs?

The original goal involved getting attention. Weeks ago, I realized I was not on track to get attention. I launched without a sharp object-level goal but largely to get feedback to figure out whether to continue working on this project and what goals it should have.

I think getting attention would increase the impact of this project a lot and is probably pretty doable if you are able to find an institutional home for it. I agree with Yanni's sentiment that it is probably better to improve on this project than to wait for another one that is more optimized for public attention to come along (though am curious why you think the latter is better).

To my understanding, Google has better infosec than OpenAI and Anthropic. They have much more experience protecting assets.

I share this impression. Unfortunately it's hard to capture the quality of labs' security with objective criteria based on public information. (I have disclaimers about this in 4-6 different places, including the homepage.) I'm extremely interested in suggestions for criteria that would capture the ways Google's security is good.

I mean Google does basic things like use Yubikeys where other places don't even reliably do that. Unclear what a good checklist would look like, but maybe one could be created.

The broader question I'm confused about is how much to update on the local/object-level of whether the labs are doing "kind of reasonable" stuff, vs what their overall incentives and positions in the ecosystem points them to doing. 

eg your site puts OpenAI and Anthropic as the least-bad options based on their activities, but from an incentives/organizational perspective, their place in the ecosystem is just really bad for safety. Contrast with, e.g., being situated within a large tech company[1] where having an AI scaling lab is just one revenue source among many, or Meta's alleged "scorched Earth" strategy where they are trying very hard to commoditize the component of LLMs

  1. ^

    eg GDM employees have Google/Alphabet stock, most of the variance in their earnings isn't going to come from AI, at least in the short term.

Thanks for doing this! This is one of those ideas that I've heard discussed for a while but nobody was willing to go through the pain of actually making the site; kudos for doing so.

Agreed! Safer AI was supposed to launch their rankings site in April, but nothing public so far.

Very easy to read. Props on the design.

Cool idea, thanks for working on it.

According to this article, only Deepmind gave the UK AI Institute (partial?) access to their model before release. This seems like a pro-social thing to do so maybe this could be worth tracking in some way if possible.

  1. Yep, that's related to my "Give some third parties access to models to do model evals for dangerous capabilities" criterion. See here and here.
  2. As I discuss here, it seems DeepMind shared super limited access with UKAISI (only access to a system with safety training + safety filters), so don't give them too much credit.
  3. I suspect Politico is wrong and the labs never committed to give early access to UKAISI. (I know you didn't assert that they committed that.)

This is a great project idea!

OpenAI has made a hard commitment to safety by allocating 20% compute (~20% of budget) for the superalignment team. That is a huge commitment which isn't reflected in this.

I agree such commitments are worth noticing and I hope OpenAI and other labs make such commitments in the future. But this commitment is not huge: it's just "20% of the compute we've secured to date" (in July 2023), to be used "over the next four years." It's unclear how much compute this is, and with compute use increasing exponentially it may be quite little in 2027. Possibly you have private information but based on public information the minimum consistent with the commitment is quite little.

It would be great if OpenAI or others committed 20% of their compute to safety! Even 5% would be nice.

I've heard OpenAI employees talk about the relatively high amount of compute superalignment has (complaining superalignment has too much and they, employees outside superalignment, don't have enough). In conversations with superalignment people, I noticed they talk about it as a real strategic asset ("make sure we're ready to use our compute on automated AI R&D for safety") rather than just an example of safety washing. This was something Ilya pushed for back when he was there.

Ilya is no longer on the Superalignment team?

Great to see the progress you've made on this.

Curated and popular this week
Ronen Bar
 ·  · 10m read
 · 
"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).  In this post, I argue that: 1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section). 2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI. The problem What is Moral Alignment? AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings. Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while
Max Taylor
 ·  · 9m read
 · 
Many thanks to Constance Li, Rachel Mason, Ronen Bar, Sam Tucker-Davis, and Yip Fai Tse for providing valuable feedback. This post does not necessarily reflect the views of my employer. Artificial General Intelligence (basically, ‘AI that is as good as, or better than, humans at most intellectual tasks’) seems increasingly likely to be developed in the next 5-10 years. As others have written, this has major implications for EA priorities, including animal advocacy, but it’s hard to know how this should shape our strategy. This post sets out a few starting points and I’m really interested in hearing others’ ideas, even if they’re very uncertain and half-baked. Is AGI coming in the next 5-10 years? This is very well covered elsewhere but basically it looks increasingly likely, e.g.: * The Metaculus and Manifold forecasting platforms predict we’ll see AGI in 2030 and 2031, respectively. * The heads of Anthropic and OpenAI think we’ll see it by 2027 and 2035, respectively. * A 2024 survey of AI researchers put a 50% chance of AGI by 2047, but this is 13 years earlier than predicted in the 2023 version of the survey. * These predictions seem feasible given the explosive rate of change we’ve been seeing in computing power available to models, algorithmic efficiencies, and actual model performance (e.g., look at how far Large Language Models and AI image generators have come just in the last three years). * Based on this, organisations (both new ones, like Forethought, and existing ones, like 80,000 Hours) are taking the prospect of near-term AGI increasingly seriously. What could AGI mean for animals? AGI’s implications for animals depend heavily on who controls the AGI models. For example: * AGI might be controlled by a handful of AI companies and/or governments, either in alliance or in competition. * For example, maybe two government-owned companies separately develop AGI then restrict others from developing it. * These actors’ use of AGI might be dr
Joris 🔸
 ·  · 5m read
 · 
Last week, I participated in Animal Advocacy Careers’ Impactful Policy Careers programme. Below I’m sharing some reflections on what was a really interesting week in Brussels! Please note I spent just one week there, so take it all with a grain of (CAP-subsidized) salt. Posts like this and this one are probably much more informative (and assume less context). I mainly wrote this to reflect on my time in Brussels (and I capped it at 2 hours, so it’s not a super polished draft). I’ll focus mostly on EU careers generally, less on (EU) animal welfare-related careers. Before I jump in, just a quick note about how I think AAC did something really cool here: they identified a relatively underexplored area where it’s relatively easy for animal advocates to find impactful roles, and then designed a programme to help these people better understand that area, meet stakeholders, and learn how to find roles. I also think the participants developed meaningful bonds, which could prove valuable over time. Thank you to the AAC team for hosting this! On EU careers generally * The EU has a surprisingly big influence over its citizens and the wider world for how neglected it came across to me. There’s many areas where countries have basically given a bunch (if not all) of their decision making power to the EU. And despite that, the EU policy making / politics bubble comes across as relatively neglected, with relatively little media coverage and a relatively small bureaucracy. * There’s quite a lot of pathways into the Brussels bubble, but all have different ToCs, demand different skill sets, and prefer different backgrounds. Dissecting these is hard, and time-intensive * For context, I have always been interested in “a career in policy/politics” – I now realize that’s kind of ridiculously broad. I’m happy to have gained some clarity on the differences between roles in Parliament, work at the Commission, the Council, lobbying, consultancy work, and think tanks. * The absorbe
Recent opportunities in AI safety