Congrats to Zach! I feel like this is mostly supposed to be a "quick update/celebratory post", but I feel like there's a missing mood that I want to convey in this comment. Note that my thoughts mostly come from an AI Safety perspective, so these thoughts may be less relevant for folks who focus on other cause areas.
My impression is that EA is currently facing an unprecedented about of PR backlash, as well as some solid internal criticisms among core EAs who are now distancing from EA. I suspect this will likely continue into 2024. Some examples:
I strongly agree that being associated with EA in AI policy is increasingly difficult (as many articles and individuals' posts on social media can attest), in particular in Europe, DC, and the Bay Area.
I appreciate Akash's comment, and at the same time, I understand the object of this post is not to ask for people's opinions about what the priorities of CEA would be, so I won't go too much into detail. I want to highlight that I'm really excited for Zach Robinson to lead CEA!
With my current knowledge of the situation in three different jurisdictions,...
Do you know anything about the strategic vision that Zach has for CEA? Or is this just meant to be a positive endorsement of Zach's character/judgment?
(Both are useful; just want to make sure that the distinction between them is clear).
I appreciate the comment, though I think there's a lack of specificity that makes it hard to figure out where we agree/disagree (or more generally what you believe).
If you want to engage further, here are some things I'd be excited to hear from you:
I expect that your search for a "unified resource" will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real "EA perspective".
Some things you could consider doing:
Among the questions you...
I agree there's no single unified resource. Having said that, I found Richard Ngo's "five alignment clusters" pretty helpful for bucketing different groups & arguments together. Reposting below:
...
- MIRI cluster. Think that P(doom) is very high, based on intuitions about instrumental convergence, deceptive alignment, etc. Does work that's very different from mainstream ML. Central members: Eliezer Yudkowsky, Nate Soares.
- Structural risk cluster. Think that doom is more likely than not, but not for the same reasons as the MIRI cluster. Instead, this cluster f
Thanks for this overview, Trevor. I expect it'll be helpful– I also agree with your recommendations for people to consider working at standard-setting organizations and other relevant EU offices.
One perspective that I see missing from this post is what I'll call the advocacy/comms/politics perspective. Some examples of this with the EU AI Act:
I'm excited to see the EAIF share more about their reasoning and priorities. Thank you for doing this!
I'm going to give a few quick takes– happy to chat further about any of these. TLDR: I recommend (1) getting rid of the "principles-first" phrase & (2) issuing more calls for proposals focused on the specific projects you want to see (regardless of whether or not they fit neatly into an umbrella term like "principles-first").
Personally, I still think there is a lot of uncertainty around how governments will act. There are at least some promising signs (e.g., UK AI Safety Summit) that governments could intervene to end or substantially limit the race toward AGI. Relatedly, I think there's a lot to be done in terms of communicating AI risks to the public & policymakers, drafting concrete policy proposals, and forming coalitions to get meaningful regulation through.
Some folks also have hope that internal governance (lab governance) could still be useful. I am not as opt...
I would be interested in seeing your takes about why building runway might be more cost-effective than donating.
Separately, if you decide not to go with 10% because you want to think about what is actually best for you, I suggest you give yourself a deadline. Like, suppose you currently think that donating 10% would be better than status quo. I suggest doing something like “if I have not figured out a better solution by Jan 1 2024, I will just do the community-endorsed default of 10%.”
I think this protects against some sort of indefinite procrastination. (Obviously less relevant if you never indefinitely procrastinate on things like this, but my sense is that most people do at least sometimes).
I think it’s good for proponents of RSPs to be open about the sorts of topics I’ve written about above, so they don’t get confused with e.g. proposing RSPs as a superior alternative to regulation. This post attempts to do that on my part. And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.
Strong agree. I wish ARC and Anthropic had been more clear about this, and I would be less critical of their RSP posts if they were ...
Excited to see this team expand! A few [optional] questions:
When should someone who cares a lot about GCRs decide not to work at OP?
I agree that there are several advantages of working at Open Phil, but I also think there are some good answers to "why wouldn't someone want to work at OP?"
Culture, worldview, and relationship with labs
Many people have an (IMO fairly accurate) impression that OpenPhil is conservative, biased toward inaction, generally prefers maintaining the status quo, and is generally in favor of maintaining positive relationships with labs.
As I've gotten more involved in AI policy, I've updated mor...
Adding this comment over from the LessWrong version. Note Evan and others have responded to it here.
Thanks for writing this, Evan! I think it's the clearest writeup of RSPs & their theory of change so far. However, I remain pretty disappointed in the RSP approach and the comms/advocacy around it.
I plan to write up more opinions about RSPs, but one I'll express for now is that I'm pretty worried that the RSP dialogue is suffering from motte-and-bailey dynamics. One of my core fears is that policymakers will walk away with a misleadingly positive impress...
Thanks! A few quick responses/questions:
I think presumably the pause would just be for that company's scaling—presumably other organizations that were still in compliance would still be fine.
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipu...
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it's pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won't develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they'd make it hard for anyone to pause for >a few months. I was surprised/di...
@tlevin I would be interested in you writing up this post, though I'd be even more interested in hearing your thoughts on the regulatory proposal Thomas is proposing.
Note that both of your points seem to be arguing against a pause, whereas my impression is that Thomas's post focuses more on implementing a national regulatory body.
(I read Thomas's post as basically saying like "eh, I know there's an AI pause debate going on, but actually this pause stuff is not as important as getting good policies. Specifically, we should have a federal agency that does li...
One thing I appreciate about both of these tests is that they seem to (at least partially) tap into something like "can you think for yourself & reason about problems in a critical way?" I think this is one of the most important skills to train, particularly in policy, where it's very easy to get carried away with narratives that seem popular or trendy or high-status.
I think the current zeitgeist has gotten a lot of folks interested in AI policy. My sense is that there's a lot of potential for good here, but there are also some pretty easy ways for thi...
I agree with Zach here (and I’m also a fan of Holly). I think it’s great to spotlight people whose applications you’re excited about, and even reasonable for the tone to be mostly positive. But I think it’s fair for people to scrutinize the exact claims you make and the evidence supporting those claims, especially if the target audience consists of potential donors.
My impression is that the crux is less about “should Holly be funded” and more about “were the claims presented precise” and more broadly some feeling of “how careful should future posts be when advertising possible candidates.”
Congratulations on launching!
On the governance side, one question I'd be excited to see Apollo (and ARC evals & any other similar groups) think/write about is: what happens after a dangerous capability eval goes off?
Of course, the actual answer will be shaped by the particular climate/culture/zeitgeist/policy window/lab factors that are impossible to fully predict in advance.
But my impression is that this question is relatively neglected, and I wouldn't be surprised if sharp newcomers were able to meaningfully improve the community's thinking on this.
Excited to see this! I'd be most excited about case studies of standards in fields where people didn't already have clear ideas about how to verify safety.
In some areas, it's pretty clear what you're supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.
One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
Are there examples of standards in oth...
Glad to see this write-up & excited for more posts.
I think these are three areas that MATS has handled well. I'd be especially excited to hear more about areas where MATS thinks it's struggling, MATS is uncertain, or where MATS feels like it has a lot of room to grow. Potential candidates include:
Clarification: I think we're bottlenecked by both, and I'd love to see the proposals become more concrete.
Nonetheless, I think proposals like "Get a federal agency to regulate frontier AI labs like the FDA/FAA" or even "push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy" are "concrete enough" to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke's list.
I don't think any of t...
I don't actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it's bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.
I appreciate Richard stating this explicitly. I think this is (and has been) a pretty big crux in the AI governance space right now.
Some folks (like Richard) believe that we're mainly bottlenecked by good concrete proposals. Other folks believe that we have concrete proposa...
I can see a worldview in which prioritizing raising awareness is more valuable, but I don't see the case for believing "that we have concrete proposals". Or at least, I haven't seen any; could you link them, or explain what you mean by a concrete proposal?
My guess is that you're underestimating how concrete a proposal needs to be before you can actually muster political will behind it. For example, you don't just need "let's force labs to pass evals", you actually need to have solid descriptions of the evals you want them to pass.
I also think that recent e...
Lots of awesome stuff requires AGI or superintelligence. People think LLMs (or stuff LLMs invent) will lead to AGI or superintelligence.
So wouldn’t slowing down LLM progress slow down the awesome stuff?
I think more powerful (aligned) LLMs would lead to more awesome stuff, so caution on LLMs does delay other awesome stuff.
I agree with the point that "there's value that can be gained from figuring out how to apply systems at current capabilities levels" (AI summer harvest), but I wouldn't go as far as "you can almost have the best of both worlds." It seems more like "we can probably do a lot of good with existing AI, so even though there are costs of caution, those costs are worth paying, and at least we can make some progress applying AI to pressing world problems while we figure out alignment/governance." (My version isn't catchy though, oops).
I appreciate that this post acknowledges that there are costs to caution. I think it could've gone a bit further in emphasizing how these costs, while large in an absolute sense, are small relative to the risks.
The formal way to do this would be a cost-benefit analysis on longtermist grounds (perhaps with various discount rates for future lives). But I think there's also a way to do this in less formal/wonky language, without requiring any longtermist assumptions.
If you have a technology where half of experts believe there's a ~10% chance of extinction, th...
Imagine: would you board an airplane if 50% of airplane engineers who built it said there was a 10% chance that everybody on board dies?
In the context of the OP, the thought experiment would need to be extended.
"Would you risk a 10% chance of a deadly crash to go to [random country]" -> ~100% of people reply no.
"Would you risk a 10% of a deadly crash to go to a Utopia without material scarcity, conflict, disease?" -> One would expect a much more mixed response.
The main ethical problem is that in the scenario of global AI progress, everyone is forced to board the plane, irrespective of their preferences.
The impression I get is that lots of people are like “yeah, I’d like to see more work on this & this could be very important” but there aren’t that many people who want to work on this & have ideas.
Is there evidence that funding isn’t available for this work? My loose impression is that mainstream funders would be interested in this. I suppose it’s an area where it’s especially hard to evaluate the promisingness of a proposal, though.
Reasons people might not be interested in doing this work: — Tractability — Poor feedback loops — Not many others in...
On the letter itself, though, I have a bunch of uncertainties around whether a six month pause right now would actually help
I share many of your concerns, though I think on balance I feel more enthusiastic about the six-month pause. (Note that I'm thinking about a six-month pause on frontier AI development that is enforced across the board, at least in the US, and I'm more confused about a six-month pause that a few specific labs opt-in to).
I wonder if this relates more to an epistemic difference (e.g., the actual credence we put on the six-month pau...
I think it's great that you're releasing some posts that criticize/red-team some major AIS orgs. It's sad (though understandable) that you felt like you had to do this anonymously.
I'm going to comment a bit on the Work Culture Issues section. I've spoken to some people who work at Redwood, have worked at Redwood, or considered working at Redwood.
I think my main comment is something like you've done a good job pointing at some problems, but I think it's pretty hard to figure out what should be done about these problems. To be clear, I think the post m...
Hi Akash,
Thank you for sharing your thougths & those concrete action items - I agree it would be nice to have a set of recommendations in an ideal world.
This post took at least 50 hours (collectively) to write, and was delayed in publishing by a few days due to busy schedules. I think if we had more time, I would have shared the final version with a small set of non-redwood beta reviewers for comments which would have caught things like this (and e.g. Nunos' comment).
We plan to do this for future posts (if you're reading this and would like to give com...
I think I agree with a lot of the specific points raised here, but I notice a feeling of wariness/unease around the overall message. I had a similar reaction to Haydn's recent "If your model is going to sell, it has to be safe" piece. Let me try to unpack this:
On one hand, I do think safety is important for the commercial interests of labs. And broadly being better able to understand/control systems seems good from a commercial standpoint.
My biggest reservations can be boiled down into two points:
Some thoughts I had while reading that I expect you'd agree with:
'It's harder to maintain good epistemics and strong reasoning + reasoning transparency in large coalitions of groups who have different worldviews/goals. ("We shouldn't say X because our allies in AI ethics will think it's weird.") I don't think "X is bad for epistemics" means "we definitely shouldn't consider X", but I think it's a pretty high cost that often goes underappreciated/underacknowledged'
This is probably a real epistemic cost in my view, but it takes more than identifying a cost to establish that forming a coalition with people with different g...
Exciting! Have you considered linkposting this to LessWrong? (Some technical AIS folks and AI governance folks check LW more than EAF)
Thanks for this context. Is it reasonable to infer that you think that OpenAI would've got a roughly-equally-desirable investment if OP had not invested? (Such that the OP investment had basically no effect on acceleration?)
Yes that’s my position. My hope is we actually slowed acceleration by participating but I’m quite skeptical of the view that we added to it.
Personally, I see this as a misunderstanding, i.e. that OP helped OpenAI to come into existence and it might not have happened otherwise.
I think some people have this misunderstanding, and I think it's useful to address it.
With that in mind, much of the time, I don't think people who are saying "do those benefits outweigh the potential harms" are assuming that the counterfactual was "no OpenAI." I think they're assuming the counterfactual is something like "OpenAI has less money, or has to take somewhat less favorable deals with investors, or has to do som...
Unless it's a hostile situation (as might happen with public cos/activist investors), I don't think it's actually that costly. At seed stage, it's just kind of normal to give board seats to major “investors”, and you want to have a good relationship with both your major investors and your board.
The attitude Sam had at the time was less "please make this grant so that we don't have to take a bad deal somewhere else, and we're willing to 'sell' you a board seat to close the deal" and more "hey would you like to join in on this? we'd love to have you. no worries if not."
Ah, thanks for the summary of the claim. Apologies for misunderstanding this in my initial skim.
I agree with Rameon's comment.
I appreciate the red-team of a commonly-discussed AI governance proposal. I'm confused about the section on race dynamics:
However, if it is true that the primary effect of the Windfall Clause is to concentrate money/power in the hands of AGI startup CEOs, this argument will likely not work. Rather, from the point of view of the CEOs, the spoils available to winning are now even larger than before, because you will get a massive slush fund for political pet projects, and the losers will not even get the consolation prize of their 401(k) doing so well.
How ar...
I think the claim is:
Can you say more about which longtermist efforts you're referring to?
I think a case can be made, but I don't think it's an easy (or clear) case.
My current impression is that Yudkowsky & Bostrom's writings about AGI inspired the creation of OpenAI/DeepMind. And I believe FTX invested a lot in Anthropic and OP invested a little bit (in relative terms) into OpenAI. Since then, there have been capabilities advances and safety advances made by EAs, and I don't think it's particularly clear which outweighs.
It seems unclear to me what the sign of these effect...
Is most of the AI capabilities work here causally downstream of Superintelligence, even if Superintelligence may have been (heavily ?) influenced by Yudkowsky? Both Musk and Altman recommended Superintelligence, altough Altman has also directly said Yudkowsky has accelerated timelines the most:
https://twitter.com/elonmusk/status/495759307346952192?lang=en
https://blog.samaltman.com/machine-intelligence-part-1
https://twitter.com/sama/status/1621621724507938816
If things stayed in the LW/Rat/EA community, that might have been best. If Yudkowsky hadn't written ...
A lot of longtermist effort is going into AI safety at the moment. I think it's hard to make the case that something in AI safety has legibly or concretely reduced AI risk, since (a) the field is still considered quite pre-paradigmatic, (b) the risk comes from systems that are more powerful than the ones we currently have, and (c) even in less speculative fields, research often takes several years before it is shown to legibly help anyone.
But with those caveats in mind, I think:
I think it's also easy to make a case that longtermist efforts have increased the x-risk of artificial intelligence, with the money and talent that grew some of the biggest hype machines in AI (Deepmind, OpenAI) coming from longtermist places.
It's possible that EA has shaved a couple counterfactual years off of time to catastrophic AGI, compared to a world where the community wasn't working on it.
+1 on questioning/interrogating opinions, even opinions of people who are "influential leaders."
I claim people who are trying to use their careers in a valuable way should evaluate organizations/opportunities for themselves
My hope is that readers don't come away with "here is the set of opinions I am supposed to believe" but rather "ah here is a set of opinions that help me understand how some EAs are thinking about the world." Thank you for making this distinction explicit.
Disagree that these are mostly characterizing the Berkeley community (#1 and #2 see...
I appreciated the part where you asked people to evaluate organizations by themselves. But it was in the context of "there are organizations that aren't very good, but people don't want to say they are failing," which to me implies that a good way to do this is to get people "in the know" to tell you if they are the failing ones or not. It implies there is some sort of secret consensus on what is failing and what isn't, and if not for the fact that people are afraid to voice their views you could clearly know which were "failing." This could be partially t...
I'd be excited about posts that argued "I think EAs are overestimating AI x-risk, and here are some aspects of EA culture/decision-making that might be contributing to this."
I'm less excited about posts that say "X thing going on EA is bad", where X is a specific decision that EAs made [based on their estimate of AI x-risk]. (Unless the post is explicitly about AI x-risk estimates).
Related: Is that your true rejection?
Thanks for writing this, Eli. I haven't read WWOTF and was hoping someone would produce an analysis like this (especially comparing The Precipice to WWOTF).
I've seen a lot of people posting enthusiastically about WWOTF (often before reading it) and some of the press that it has been getting (e.g., cover of TIME). I've felt conflicted about this.
On one hand, it's great that EA ideas have the opportunity to reach more people.
On the other hand, I had a feeling (mostly based on quotes from newspaper articles summarizing the book) that WWOTF doesn't feature AI ...
Not yet, and I'm to blame. I've been focusing on a different project recently, which has demanded my full attention.
Will plan to announce the winners (and make winning entries public, unless authors indicated otherwise) at some point this month.
+1. The heuristic doesn’t always work.
(Though for an intro talk I would probably just modify the heuristic to “is the the kind of intro talk that would’ve actually excited a younger version of me.”)
Thanks for writing this, Emma! Upvoted :)
Here's one heuristic I heard at a retreat several months ago: "If you're ever running an event that you are not excited to be part of, something has gone wrong."
Obviously, it's just a heuristic, but I actually found it to be a pretty useful one. I think a lot of organizers spend time hosting events that feel more like "teaching" rather than "learning together or working on interesting unsolved problems together."
And my impression is that the groups that have fostered more of a "let's learn together and do thin...
If you're ever running an event that you are not excited to be part of, something has gone wrong
This seems way too strong to me. Eg, reasonable and effective intro talks feel like they wouldn't be much fun for me to do, yet seem likely high value
Thank you for writing this, Ben. I think the examples are a helpful and I plan to read more about several of them.
With that in mind, I'm confused about how to interpret your post and how much to update on Eliezer. Specifically, I find it pretty hard to assess how much I should update (if at all) given the "cherry-picking" methodology:
...Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.
Note that this isn’t an attempt to provide a balanced overview of Yudkows
I think the effect should depend on your existing view. If you've always engaged directly with Yudkowsky's arguments and chose the ones convinced you, there's nothing to learn. If you thought he was a unique genius and always assumed you weren't convinced of things because he understood things you didn't know about, and believed him anyway, maybe it's time to dial it back. If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.
Writing this comment actually helped me understand how to respond to the OP myself.
Hey, Jay! Judging is underway, and I'm planning to announce the winners within the next month. Thanks for your patience, and sorry for missing your message.
I was keen to check out the winning entries to this contest, but I'm wondering if I missed the announcement, and I can't seem to find it anywhere. Have the entries been made public somewhere?
What are some of your favorite examples of their effectiveness?