This is a true, counterfactual match, and we will only receive the equivalent amount to what we can raise.
What will happen to the money counterfactually? Presumably it will be donated to other things the match funder thinks are roughly as good as GWWC?
I'm also confused by this. The use of "and" (instead of, say, "in that", "because", or "to the extent that") suggests that they've verified counterfactuality in some stronger way than just "the money won't go to us this season if you don't donate", but then they should be telling us how they know this.
Is this a problem? Seems fine to me, because the meaning is often clear, as in two of your examples, and I think it adds value in those contexts. And if it's not clear, doesn't seem like a big loss compared to a counterfactual of having none of these types of vote available.
I think that trying to get safe concrete demonstrations of risk by doing research seems well worth pursuing (I don't think you were saying it's not).
Do you have any thoughts on how should people decide between working on groups at CEA and running a group on the ground themselves?
I imagine a lot of people considering applying could be asking themselves that question, and it doesn't seem obvious to me how to decide.
Hi Isaac, this is a good question! I can elaborate more in the Q&A tomorrow but here are some thoughts:
Ultimatley a lot depends on your personal fit and comparative advantage. I think people should do the things they excel at. While I do think you can have a more scalable impact on the groups team, the groups team would have very little to no impact without the organizers working on the ground!
I can share some of the reasons that led me to prefer working at CEA over working on the ground:
To be fair, I think I'm partly making wrong assumptions about what exactly you're arguing for here.
On a slightly closer read, you don't actually argue in this piece that it's as high as 90% - I assumed that because I think you've argued for that previously, and I think that's what "high" p(doom) normally means.
Relatedly, I also think that your arguments for "p(doom|AGI)" being high aren't convincing to people that don't share your intuitions, and it looks like you're relying on those (imo weak) arguments, when actually you don't need to
I think you come across as over-confident, not alarmist, and I think it hurts how you come across quite a lot. (We've talked a bit about the object level before.) I'd agree with John's suggested approach.
Makes sense. To be clear, I think global health is very important, and I think it's a great thing to devote one's life to! I don't think it should be underestimated how big a difference you can make improving the world now, and I admire people who focus on making that happen. It just happens that I'm concerned the future might be even higher priority thing that many people could be in a good position to address.
On your last point, if you believe that the EV from a "effective neartermism -> effective longtermism" career change is greater than a "somewhat harmful career -> effective neartermism" career change, then the downside of using a "somewhat harmful career -> effective longtermism" example is that people might think the "stopped doing harm" part is more important than the "focused on longtermism" part.
More generally, I think your "arguments for the status quo" seem right to me! I think it's great that you're thinking clearly about the considerations on both sides, and my guess is that you and I would just weight these considerations differently.
Thank you for sharing these! I'm probably going to try the first three as a result of this post.
Another thing on my mind is that we should beware surprising and suspicious convergence - it would be surprising and suspicious if the same intervention (present-focused WAW work) was best for improving animals' lives today and also happened to be best for improving animals' lives in the distant future.
I worry about people interested in animal welfare justifying maintaining their existing work when they switch their focus to longtermism, when actually it would be better if they worked on something different.
Thanks for your reply! I can see your perspective.
On your last point, but future-focused WAW interventions, I'm thinking of things that you mention in the tractability section of your post:
...Here is a list of ways we could work on this issue (directly copied from the post by saulius[9]):
“To reduce the probability of humans spreading of wildlife in a way that causes a lot of suffering, we could:
- Directly argue about caring about WAW if humans ever spread wildlife beyond Earth
- Lobby to expand the application of an existing international law that tries to protect
For the kinds of reasons you give, I think it could be good to get people to care about the suffering of wild animals (and other sentient beings) in the event that we colonise the stars.
I think that the interventions that decrease the chance of future wild animal suffering are only a subset of all WAW things you could do, though. For example, figuring out ways to make wild animals suffer less in the present would come under "WAW", but I wouldn't expect to make any difference to the more distant future. That's because if we care about wild animals, we'll fi...
If I understand correctly, you put 0.01% on artificial sentience in the future. That seems overconfident to me - why are you so certain it won't happen?
I've only skimmed this, but just want to say I think it's awesome that you're doing your own thinking trying to compare these two approaches! In my view, you don't need to be "qualified" to try to form your own view, which depends on understanding the kinds of considerations you raise. This decision matters a lot, and I'm glad you're thinking carefully about it and sharing your thoughts.
I interpreted the title of this post as a bill banning autonomous AI systems from paying people to do things! I did think it was slightly early.
Would you be eligible for the graduate visa? https://www.gov.uk/graduate-visa
If so, would that meet your needs?
(I've just realised this is close to just a rephrasing of some of the other suggestions. Could be a helpful rephrasing though.)
The Superalignment team's goal is "to build a roughly human-level automated alignment researcher".
Human-level AI systems sound capable enough to cause a global catastrophe if misaligned. So is the plan to make sure that these systems are definitely aligned (if so, how?), or to make sure that they are deployed in a such a way that they would not be able to take catastrophic actions even if they want to (if so, what would that look like?)?
Thanks David, that's just the kind of reply I was hoping for! Those three goals do seem to me like three of the most important. It might be worth adding that context to your write-up.
I'm curious whether there's much you did specifically to achieve your third goal - inspiring people to take action based on high quality reasoning - beyond just running an event where people might talk to others who are doing that. I wouldn't expect so, but I'd be interested there was.
Thanks for writing this up! I'd be interested if you had time to say more about what you think the main theory of change of the event was (or should have been).
Interesting results, thanks for sharing! I think getting data from people who attend events is an important source of information about what's working and what's not.
I do worry a bit about what's best for the world coming apart from what people report as being valuable to them. (This comment ended up a bit rambley, sorry.)
Two main reasons that might be the case:
Are there any lessons that GWWC has learnt that you think would be useful for EA community builders to know and remember?
If GWWC goes very well over the next five years (say 90th percentile), what would that look like?
Even if it's true that it can be hard to agree or disagree with a post as a whole, I do get the impression that people sometimes feel like they disagree with posts as a whole, and so simply downvote the post.
Also, I suspect it is possible to disagree with a post as a whole. Many posts are structured like "argument 1, argument 2, argument 3, therefore conclusion". If you disagree with the conclusion, I think it's reasonable to say that that's disagreeing with the post as a whole. If you agree with the arguments and the conclusion, then you agree with the po...
My guess it's that it's an unfortunate consequence of disagree voting not being an option on top-level posts, so people are expressing their disagreement with your views by simply downvoting. (I do disagree with your views, but I think it's a reasonable discussion to have!)
Nudge to seriously consider applying for 80,000 hours personal advising if you haven't already: https://80000hours.org/speak-with-us/
My guess is they'd be able to help you think this through!
I don't help with EA Oxford any more, but I think this is a good opportunity and if you've read the whole post, that's a sign you should consider applying! I'd be v happy to frankly answer any questions you have about Oxford, EA Oxford, and the current EA Oxford team - just message me.
Ah thanks Greg! That's very helpful.
I certainly agree that the target is relatively small, in the space of all possible goals to instantiate.
But luckily we aren't picking at random: we're deliberately trying to aim for that target, which makes me much more optimistic about hitting it.
And another reason I see for optimism comes from that yes, in some sense I see the AI is in some way neutral (neither aligned nor misaligned) at the start of training. Actually, I would agree that it's misaligned at the start of training, but what's missing initially are the c...
Yes, I completely agree that this is nowhere near good enough. It would make me very nervous indeed to end up in that situation.
The thing I was trying to push back against was the idea that what I thought you were claiming: that we're effectively dead if we end up in this situation.
Regarding the whether they have the same practical implications, I guess I agree that if everyone had a 90% credence in catastrophe, that would be better than them having 50% credence or 10%.
Inasmuch as you're right that the major players have a 10% credence of catastrophe, we should either push to raise it or to advocate for more caution given the stakes.
My worry is that they don't actually have that 10% credence, despite maybe saying they do, and that coming across as more extreme might stop them from listening.
I think you might be right that if we can make the case for 90%, we should make it. But I worry we can't.
Ah I think I see the misunderstanding.
I thought you were invoking "Murphy's Law" as a general principle that should generally be relied upon - I thought you were saying that in general, a security mindset should be used.
But I think you're saying that in the specific case of AGI misalignment, there is a particular reason to apply a security mindset, or to expect Murphy's Law to hold.
Here are three things I think you might be trying to say:
Plans that involve increasing AI input into alignment research appear to rest on the assumption that they can be grounded by a sufficiently aligned AI at the start. But how does this not just result in an infinite, error-prone, regress? Such “getting the AI to do your alignment homework” approaches are not safe ways of avoiding doom.
On this point, the initial AI's needn't be actually aligned, I think. They could for example do useful alignment work that we can use even though they are "playing the training game"; they might want to take over, but...
"Applying a security mindset" means looking for ways that something could fail. I agree that this is a useful technique for preventing any failures from happening.
But I'm not sure that assuming that this is a sound principle to rely on when trying to work out how likely it is that something will go wrong. In general, Murphy’s Law is not correct. It's not true that "anything that can go wrong will go wrong".
I think this is part of the reason I'm sceptical of confident predictions of catastrophe, like your 90% - it's plausible to me that things could work ou...
Isn't it possible that calling for a complete stop to AI development actually counterfactually speeds up AI development?
The scenario I'm thinking of is something like:
If you don't cross-post them individually, maybe you could e.g. monthly make one forum post linking all the new blog posts that month? I think if you never cross-post, you'll get fewer readers, and forum readers seem likely to get value from the blog posts.
Oh to be clear, I think that almost all altruistic people do not much care about the magnitude of their impact (in practice).
So I think the approach I'd suggest is to focus on altruistic people, and helping them realise that they probably do really care about the magnitude of their impact on reflection.
That's a much larger group than the people who are already magnitude-sensitive, and I think the intervention is probably more feasible at the moment than for people who have no existing interest in altruism.
I haven't thought much about strategy for cit...
[Edited to add: I see Chris Leong said something similar at the same time.]
I think there's a tricky balance to strike.
If you think that option A is much better for the world than option B, then the more open and honest you are about thinking that A is much better, the more discouraged people working on B will feel.
But if you try to be more encouraging about option B, there's a real risk that people won't realise how much better you think option A is, and will work on B. If you're correct that option A is much better, then this is a terrible outcome. In tha...
We have used a few Swapcard alternatives at previous events (Bizzabo, Whova, Grip) and sadly Swapcard was the best despite its weaknesses. I know the EAG team has talked with you about this some Yonatan, but I'd be keen to hear if you have any updated recommendations!
I've skimmed this post - thanks so much for writing it!
Here's a quick, rushed comment.
I have several points of agreement:
I agree with the sentiment that ideally we'd accept that we have unchangeable personal needs and desires that constrain what we can do, so it might not "make sense" to feel guilty about them.
But I think the language "that's just silly" risks coming across as saying that anyone who has these feelings is being silly and should "just stop", which of course is easier said than done with feelings! And I'm worried calling feelings silly might make people feel bad about having them (see number 7 in the original post).
I think it's good to make object-level criticisms of posts, but I think it's important that we encourage rather than discourage posts that make a genuine attempt to explore unusual ideas about what we should prioritise, even if they seem badly wrong to you. That's because people can make up their own minds about the ideas in a post, and because some of these posts that you're suggesting be deleted might be importantly right.
In other words, having a community that encourages debate about the important questions seems more important to me than one that shuts down posts that seem "harmful" to the cause.
Thanks for the thoughtful response!
I think when it comes to how you would make your charity more effective at helping others, I agree it's not easy. I completely agree with your example about it being difficult to know which possible hires would be good at the job. I think you know much better than I do what is important to make 240Project go well.
But I think we can use reasoning to identify what plans are more likely to lead to good outcomes, even if we can't measure them to be sure. For example, working to address problems that are particularly large in ...
Thank you for writing and sharing this! I suppose it's being downvoted because it's anti-EA, but I enjoyed reading it and understanding your perspective.
I had three main reactions to it:
Thanks for the reply!
If I understand correctly, you think that people in EA do care about the sign of their impact, but that in practice their actions don't align with this and they might end up having a large impact of unknown sign?
That's certainly a reasonable view to hold, but given that you seem to agree that people are trying to have a positive impact, I don't see how using phrases like "expected value" or "positive impact" instead of just "impact" would help.
In your example, it seems that SBF is talking about quickly making grants that have positive expected value, and uses the phrase "expected value" three times.
I think misaligned AI values should be expected to be worse than human values, because it's not clear that misaligned AI systems would care about eg their own welfare.
Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it's not clear from a total utilitarian perspective that the outcome would be bad.
But the "values" of a misaligned AI system could be pretty arbitrary, so I don't think we should expect that.