Hide table of contents

Long TL;DR: You’re an engineer, you want to work on AI Safety, you’re not sure which org to apply to, so you’re going to apply to all of them. But - oh no - some of these orgs may actively be causing harm, and you don’t want to do that. What’s your alternative? Study AI Safety for 2 years before you apply? In this post I suggest you can collect the info you want quickly by going over specific posts

Why do I think some orgs might be [not helping] or [actively causing harm]?

Example link. (Help me out in the comments with more?)

My suggestion:

1. Open the tag of the org you want on lesswrong

How: Search for a post related to that org. You’ll have tags on top of the post. Click the tag with the org name.

2. Sort by “newest first״

3. Open 2-3 posts

(Don’t read the post yet!)

4. In each post, look at the top 2-3 most upvoted comments

What I expect you’ll find sometimes

A post by the org, with comments trying to politely say “this is not safe”, heavily upvoted.

Bonus: Read the comments

Or even crazier: Read the post! [David Johnson thinks this is a must!]

Ok, less jokingly, this seems to me like a friendly way to start to see the main arguments without having to read too much background material (unless you find, for example, a term you don’t know).

Extra crazy optimization: Interview before research

TL;DR: First apply to lots of orgs, and then, when you know which orgs are interested[1], then do your[2] research only those orgs.

Am I saying this idea for vetting AI Safety orgs is perfect?

No, I am saying it is better than the alternative of “apply to all of them (and do no research)”, assuming you resonate with my premise of “there’s a lot of variance in effectiveness of orgs” and “that matters”.

I also hope that by posting my idea, someone will comment with something even better.

  1. ^

    However you choose to define "interested". Maybe research the orgs that didn't reject your CV? Maybe only research the ones that accepted you? Your call

  2. ^

    Consider sharing your thoughts with the org. Just remember, whoever is talking to you was chosen as a person that convinces candidates to join. They will, of course, think their org is great. Beware of reasons like "the people saying we are causing harm are wrong, but we didn't explain publicly why". The whole point is letting the community help you with this complicated question.

17

0
0

Reactions

0
0
Comments21


Sorted by Click to highlight new comments since:

I don’t endorse judging AI safety organisations by less wrong consensus alone - I think you should at least read the posts!

Thanks for the push back!

Added this to the post

I think this is fairly bad advice - LessWrong commenters are wrong about a lot of things. I think this is an acceptable way to get a vibe for the what the LessWrong bubble thinks though. But idk, for most of these questions the hard part is figuring out which bubble to believe. Most orgs will have some groups think they're useless, some think they're great, and probably some who think they're net negative. Finding one bubble who believes one of these three doesn't tell you much!

Thanks for the pushback!

Do you have an alternative suggestion?

I personally interpret Neel's comment as saying this is ~not better (perhaps worse) than going in blindly. So I just wanted to highlight that a better alternative is not needed for the sake of arguing this (even if it's a good idea to have one for the sake of future AI researchers).

Do you think that going to do capabilities work at DeepMind or OpenAI is just as impactful as going to whatever the lesswrong community recommends (as presented by their comments and upvotes) ?

Possibly. As we've discussed privately, I think some AI safety groups which are usually lauded are actually net negative 🙃

But I was trying to interpret Neel and not give my own opinion.

My meta-opinion is that it would be better to see what others think about working on capabilities in top labs, compared to going there without even considering the downsides. What do you think? (A)

And also that before working at "AI safety groups which are usually lauded [but] are actually net negative", it would be better to read comments of people like you. What do you think? (B)

I somewhat disagree with both statements.

(A) Sure, it'd be good to have opinions from relevant people, but on the other hand it's non-trivial to figure out who "relevant people" are, and "the general opinion on LW" is probably not the right category. I'd look more at what (1) people actually working in the field, and (2) the broad ML community, think about an org. So maybe the Alignment Forum.

(B) I can only answer on my specific views. My opinion on [MIRI] probably wouldn't really help individuals seeking to work there, since they probably know everything I know and have their own opinions. My opinions are more suitable for discussions on the general AI safety community culture.

By the way, I personally resonate with your advice on forming an inside view and am taking that path, but it doesn't fit everyone. Some people don't want all that homework, they want to get in a company and write code, and, to be clear, it is common for them to apply to all orgs that [they see their names in EA spaces] or something like that (very wide, many orgs). This is the target audience I'm trying to help.

I would just probably tell people to work in another field than explicitly encouraging goodharting their way to trying to having positive impact in an area with extreme variance.

Thinking about where to work seems reasonable, listening to others' thoughts on where to work seems reasonable, this post advises both.

This post also pretty strongly suggests that lesswrong comments are the best choice of others' thoughts, and I would like to see that claim made explicit and then argued for rather than slipped in. As a couple of other comments have noted, lesswrong is far from a perfect signal of the alignment space.

Thanks (also) for the pushback part!

Do you have an alternative to lesswrong comments that you'd suggest?

Seems like there is a big gap between "Study AI Safety for 2 years before you apply" and reading posts, rather than just the most up-voted comments.

Other feedback: I don't understand why you call some of your suggestions "crazy"/"crazier". Also when you wrote "less joklingly" I had missed the joke. Perhaps your suggestions could be rewritten to be more clear without these words.

The joke is supposed to be that "reading the post" isn't actually that crazy. I see this wasn't understood (oops!). I'm going to start by trying to get the content right (since lots of people pushed back on it) and then try fixing this too

I didn't understand, could you say this in other words please?

Or did you mean it like this:

Seems like there is a big gap between ["Study AI Safety for 2 years before you apply" and reading posts], rather than [just the most up-voted comments].

?

I meant it like that, yes. Seems like there is a big gap between ["Study AI Safety for 2 years before you apply" and reading posts].

So what would you suggest? Reading a few posts about that org?

Seems worth asking in interviews "I'm concerned about advancing capabilities and shortening timelines, what actions is your organization taking to prevent that", with the caveat that you will be BSed.

Bonus: You can turn down roles explicitly because they're doing capabilities work, which if it becomes a pattern may incentivize them to change their plan.

I agree, see foot note 2

Curated and popular this week
 ·  · 5m read
 · 
This work has come out of my Undergraduate dissertation. I haven't shared or discussed these results much before putting this up.  Message me if you'd like the code :) Edit: 16th April. After helpful comments, especially from Geoffrey, I now believe this method only identifies shifts in the happiness scale (not stretches). Have edited to make this clearer. TLDR * Life satisfaction (LS) appears flat over time, despite massive economic growth — the “Easterlin Paradox.” * Some argue that happiness is rising, but we’re reporting it more conservatively — a phenomenon called rescaling. * I test rescaling using long-run German panel data, looking at whether the association between reported happiness and three “get-me-out-of-here” actions (divorce, job resignation, and hospitalisation) changes over time. * If people are getting happier (and rescaling is occuring) the probability of these actions should become less linked to reported LS — but they don’t. * I find little evidence of rescaling. We should probably take self-reported happiness scores at face value. 1. Background: The Happiness Paradox Humans today live longer, richer, and healthier lives in history — yet we seem no seem for it. Self-reported life satisfaction (LS), usually measured on a 0–10 scale, has remained remarkably flatover the last few decades, even in countries like Germany, the UK, China, and India that have experienced huge GDP growth. As Michael Plant has written, the empirical evidence for this is fairly strong. This is the Easterlin Paradox. It is a paradox, because at a point in time, income is strongly linked to happiness, as I've written on the forum before. This should feel uncomfortable for anyone who believes that economic progress should make lives better — including (me) and others in the EA/Progress Studies worlds. Assuming agree on the empirical facts (i.e., self-reported happiness isn't increasing), there are a few potential explanations: * Hedonic adaptation: as life gets
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal