Peter S. Park

524Joined Mar 2022


Brilliant and compelling writeup, Remmelt! Thank you so much for sharing it. (And thank you so much for your kind words about my post! I really appreciate it.) 

I strongly agree with you that mechanistic interpretability is very unlikely to contribute to long-term AI safety. To put it bluntly, the fact that so many talented and well-meaning people sink their time into this unpromising research direction is unfortunate. 

I think we AI safety researchers should be more open to new ideas and approaches, rather than getting stuck in the same old research directions that we know are unlikely to meaningfully help. The post "What an actually pessimistic containment strategy looks like" has potentially good ideas on this front.

Thank you so much for organizing this, Nicole! I attended and it was extremely fun.

Thank you so much for the clarification, Jay! It is extremely fair and valuable.

I don't really understand how this is supposed to be an update for those who disagreed with you. Could you elaborate on why you think this information would change people's minds?

The underlying question is: does the increase in the amount of AI safety plans resulting from coordinating on the Internet outweigh the decrease in secrecy value of the plans in EV? If the former effect is larger, then we should continue the status-quo strategy. If the latter effect is larger, then we should consider keeping safety plans secret (especially those whose value lies primarily in secrecy, such as safety plans relevant to monitoring). 

The disagreeing commenters generally argued that the former effect is larger, and therefore we should continue the status-quo strategy. This is likely because their estimate of the latter effect was quite small and perhaps far-into-the-future.

I think ChatGPT provides evidence that the latter should be a larger concern than many people's prior. Even current-scale models are capable of nontrivial analysis about how specific safety plans can be exploited, and even how specific alignment researchers' idiosyncrasies can be exploited for deceptive misalignment. 

For this to be a threat, we would need an AGI that was

- Misaligned
- Capable enough to do significant damage if it had access to our safety plans
- Not capable enough to do a similar amount of damage without access to our safety plans

I see the line between 2 and 3 to be very narrow. I expect almost any misaligned AI capable of doing significant damage using our plans to also be capable of doing significant damage without needing them.

I am uncertain about whether the line between 2 and 3 will be narrow. I think the argument of the line between 2 and 3 being narrow often assumes fast takeoff, but I think there is a strong empirical case that takeoff will be slow and constrained by scaling, which suggests the line between 2 and 3 might be larger than one might think. But I think this is a scientific question that we should continue to probe and reduce our uncertainty about!

No, "Women and Effective Altruism" by Keerthana preceded "Brainstorming ways to make EA safer and more inclusive" by Richard.

I agree that in many cases, when bad-faith critics say "EA should not do X" and we stop doing X, the world would actually become worse off.

But making SBF the face of the EA movement was a really bad decision. Especially given that he was unilaterally gambling with the whole EA movement's credibility.

There are robust lessons to be learned in this saga, which will allow us EAs to course-correct and prevent future catastrophic outcomes (to our movement and in general).

I think we EAs need to increasingly prioritize speaking up about concerns like the ones Habryka mentioned.

Even when positive in-group feelings, the fear of ostracism, and uncertainty/risk aversion internally influences one to not bring up these concerns, we should fight back against this urge because the concerns, if true, will likely grow larger and larger until they blow up.

There is very high EV in course correction before the catastrophic failure point.

Strongly agree with all of these points.

On point 2: The EA movement urgently needs more earners-to-give, especially now. One lesson that I think is correct, however, is that we should be wary of making any one billionaire donor the face of the EA movement. The downside risk—a loss of credibility for the whole movement due to unknown information about the billionaire donor—is generally too high.

Mostly it was about Point 3. I think an unconditional norm of only accepting anonymous donations above a certain threshold would be too blunt.

I think a version of Point 3 I would agree with is to have high-contributing donor names not be publicized as a norm (with some possible exceptions). I think this captures most of the benefits of an anonymous donation, and most potential donors who might not be willing to make an anonymous donation would be willing to make a discreet, non-publicized donation.

I strongly agree with the spirit of the reforms being suggested here (although I might have some different opinions on how to implement it). We need large-scale reforms of the EA community's social norms to prevent future risks to movement-wide credibility.

  1. Strongly agree. The fact that net upvotes are the only concrete metric by which EA forum posts and LessWrong forum posts are judged has indeed been suboptimal for one of EA's main goals: to reflect on and adapt our previous beliefs based on new evidence. Reforms designed to increase the engagement of controversial posts would be very helpful for our pursuit of this goal. (Disclaimer: Most of my EA forum posts would rank highly on the "controversial" scale, in that many people upvote and many people downvote them, and the top comment is usually critical and has a lot of net upvotes. I think that we EAs need to increasingly prioritize both posting and engaging with controversial arguments that run contrary to status-quo beliefs, even if it's hard! This is especially true for LessWrong, which arguably doubles as a scientific venue for AI safety research in addition to an EA-adjacent discussion forum.)
  2.  Agree, although I think EAs should be more willing to write and engage with controversial arguments non-anonymously as well.
  3. Strongly agree in spirit. While a norm of unconditionally refusing non-anonymous donations above a certain threshold might be too blunt, I do think we need to have better risk-management about tying our EA movement's credibility to a single charismatic billionaire, or a single charismatic individual in general. Given how important our work is, we probably need better risk-management practices in general. (And we EAs already care earnestly about this! I do think this is a question not of earnest desire but of optimal implementation.) I also think that many billionaires would actually prefer to donate anonymously or less publicly, because they agree with the bulk of but not all of EA's principles. Realistically, leaving room for case-by-case decision-making seems helpful.

Thank you so much for your excellent post on the strategy of buying time, Thomas, Akash, and Olivia! I strongly agree that this strategy is necessary, neglected, and tractable.

For practical ideas of how to achieve this (and a productive debate in the comments section of the risks from low-quality outreach efforts), please see my related earlier forum post:

Load more