Effective Altruism Forum
EA Forum

All of Isaac Dunn's Comments + Replies

I think misaligned AI values should be expected to be worse than human values, because it's not clear that misaligned AI systems would care about eg their own welfare.

Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it's not clear from a total utilitarian perspective that the outcome would be bad.

But the "values" of a misaligned AI system could be pretty arbitrary, so I don't think we should expect that.

GWWC Operational Funding Match 2023

Isaac Dunn5mo43

This is a true, counterfactual match, and we will only receive the equivalent amount to what we can raise.

What will happen to the money counterfactually? Presumably it will be donated to other things the match funder thinks are roughly as good as GWWC?

Luke Freeman

5mo

Thanks both. They haven't shared this with us specifically so I can't speak for them. They have been very clear that it is a conditional match. I'll try updating the wording for clarity.

Jeff Kaufman5mo14

I'm also confused by this. The use of "and" (instead of, say, "in that", "because", or "to the extent that") suggests that they've verified counterfactuality in some stronger way than just "the money won't go to us this season if you don't donate", but then they should be telling us how they know this.

Hello from the new content manager at CEA

Isaac Dunn5mo5

Is this a problem? Seems fine to me, because the meaning is often clear, as in two of your examples, and I think it adds value in those contexts. And if it's not clear, doesn't seem like a big loss compared to a counterfactual of having none of these types of vote available.

Sam Altman / Open AI Discussion Thread

Isaac Dunn5mo5

I think that trying to get safe concrete demonstrations of risk by doing research seems well worth pursuing (I don't think you were saying it's not).

Why you should (maybe) apply to the CEA University Groups Team!

Isaac Dunn6mo11

Do you have any thoughts on how should people decide between working on groups at CEA and running a group on the ground themselves?

I imagine a lot of people considering applying could be asking themselves that question, and it doesn't seem obvious to me how to decide.

jessica_mccurdy6mo24

Hi Isaac, this is a good question! I can elaborate more in the Q&A tomorrow but here are some thoughts:

Ultimatley a lot depends on your personal fit and comparative advantage. I think people should do the things they excel at. While I do think you can have a more scalable impact on the groups team, the groups team would have very little to no impact without the organizers working on the ground!

I can share some of the reasons that led me to prefer working at CEA over working on the ground:

I value having close management to help me think through my

Isaac Dunn6mo1

To be fair, I think I'm partly making wrong assumptions about what exactly you're arguing for here.

On a slightly closer read, you don't actually argue in this piece that it's as high as 90% - I assumed that because I think you've argued for that previously, and I think that's what "high" p(doom) normally means.

Greg_Colbourn

6mo

I do think it is basically ~90%, but I'm arguing here for doom being the default outcome of AGI; I think "high" can reasonably be interpreted as >50%.

Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope

Isaac Dunn6mo5

Relatedly, I also think that your arguments for "p(doom|AGI)" being high aren't convincing to people that don't share your intuitions, and it looks like you're relying on those (imo weak) arguments, when actually you don't need to

Greg_Colbourn

6mo

I'm crying out for convincing gears-level arguments against (even have $1000 bounty on it), please provide some.

Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope

Isaac Dunn6mo14

I think you come across as over-confident, not alarmist, and I think it hurts how you come across quite a lot. (We've talked a bit about the object level before.) I'd agree with John's suggested approach.

-1

William the Kiwi

6mo

What part of Greg writing comes across as over confident?

Greg_Colbourn

6mo

I feel like this is a case of death by epistemic modesty, especially when it isn't clear how these low p(doom) estimates are arrived at in a technical sense (and a lot seems to me like a kind of "respectability heuristic" cascade). We didn't do very well with Covid as a society in the UK (and many other countries), following this kind of thinking.

Isaac Dunn

6mo

Should 80,000 hours have more near-termist career content?

Isaac Dunn7mo8

Makes sense. To be clear, I think global health is very important, and I think it's a great thing to devote one's life to! I don't think it should be underestimated how big a difference you can make improving the world now, and I admire people who focus on making that happen. It just happens that I'm concerned the future might be even higher priority thing that many people could be in a good position to address.

Should 80,000 hours have more near-termist career content?

Isaac Dunn7mo20

On your last point, if you believe that the EV from a "effective neartermism -> effective longtermism" career change is greater than a "somewhat harmful career -> effective neartermism" career change, then the downside of using a "somewhat harmful career -> effective longtermism" example is that people might think the "stopped doing harm" part is more important than the "focused on longtermism" part.

More generally, I think your "arguments for the status quo" seem right to me! I think it's great that you're thinking clearly about the considerations on both sides, and my guess is that you and I would just weight these considerations differently.

NickLaing

7mo

Thanks Isaac for the encouragement on the validity of the "arguments for the staus quo". I'm not really sure how I weight the considerations to be honest, I'm more raising the questions for discussion. Yes that's a fair point about the story at the end. I hadn't consider the "stopped doing harm" part might make my example confusiton. Maybe then I would prefer a "went from being a doctor" to "focused on longtermism", because otherwise it feels like a bit of a kick in the teeth to a decent chunk of the EA community who have decided that global health is.a great thing to devote your life to ;).

Four productivity techniques if you love working with others but work alone

Isaac Dunn7mo2

Thank you for sharing these! I'm probably going to try the first three as a result of this post.

Theory: “WAW might be of higher impact than x-risk prevention based on utilitarianism”

Isaac Dunn7mo2

Another thing on my mind is that we should beware surprising and suspicious convergence - it would be surprising and suspicious if the same intervention (present-focused WAW work) was best for improving animals' lives today and also happened to be best for improving animals' lives in the distant future.

I worry about people interested in animal welfare justifying maintaining their existing work when they switch their focus to longtermism, when actually it would be better if they worked on something different.

Jens Aslaug

7mo

(I hope it’s not confusing that I'm answering both your comments at once). While I will have to consider this for longer, my preliminary thought is that I agree with most of what you said. Which means that I might not believe in some of my previous statements. Thanks for the link to that post. I do agree and I can definitely see how some of these biases have influenced a couple of my thoughts. -- Okay, I see. Well actually, my initial thought was that all of those four options had a similar impact on the longterm future. Which would justify focusing on short-term interventions and advocacy (which would correspond with working on point number three and four). However after further consideration, I think the first two are of higher impact when considering the far future. Which means I (at least for right now) agree with your earlier statement: While I still think the “flow through effect” is very real for WAW, I do think that it’s probably true working on s-risks more directly might be of higher impact. -- I was curious if you have some thoughts on these conclusions (concluded based on a number of things you said and my personal values): * Since working on s-risk directly is more impactful than working on it indirectly, direct work should be done when possible. * There is no current organization working purely on animal related s-risk (as far as I know). So if that’s your main concern, your options are start-up or convincing an “s-risk mitigation organization” that you should work on this area full time. * Animal Ethics works on advocating moral circle expansion. But since this is of less direct impact to the longterm future, this has less of an effect on reducing s-risk than more direct work. * If you’re also interested in reducing other s-risks (e.g. artificial sentience), then working for an organization that directly tries to reduce the probability of a number of s-risk is your best option (e.g. Center on Long-Term Risk or Center for Reducin

Theory: “WAW might be of higher impact than x-risk prevention based on utilitarianism”

Isaac Dunn7mo3

Thanks for your reply! I can see your perspective.

On your last point, but future-focused WAW interventions, I'm thinking of things that you mention in the tractability section of your post:

Here is a list of ways we could work on this issue (directly copied from the post by saulius^[9]):
“To reduce the probability of humans spreading of wildlife in a way that causes a lot of suffering, we could:
Directly argue about caring about WAW if humans ever spread wildlife beyond Earth
Lobby to expand the application of an existing international law that tries to protect

Isaac Dunn7mo3

For the kinds of reasons you give, I think it could be good to get people to care about the suffering of wild animals (and other sentient beings) in the event that we colonise the stars.

I think that the interventions that decrease the chance of future wild animal suffering are only a subset of all WAW things you could do, though. For example, figuring out ways to make wild animals suffer less in the present would come under "WAW", but I wouldn't expect to make any difference to the more distant future. That's because if we care about wild animals, we'll fi... (read more)

Jens Aslaug

7mo

I do agree that current WAW interventions have a relatively low expected impact compared with other WAW work (e.g. moral circle expansion) if only direct effects are counted. Here are some reasons why I think current interventions/research may help the longterm future. * Doing more foundational work now means we can earlier start more important research and interventions, when the technology is available. (Probably a less important factor) * Current research gives us a better answer to how much pleasure and suffering wild animals experience, which helps inform future decisions on the spread of wildlife. (This may not be that relevant yet) * Showcasing that interventions can have a positive effect on the welfare of wildlife, could help convince more people that helping wildlife is tractable and the morally right thing to do (even if it’s unnatural). (I think this to be the most important effect) So I think current interventions could have a significant impact on moral circle expansion. Especially because I think you have to have two beliefs to care for WAW work: believe that the welfare of wildlife is important (especially for smaller animals like insects, which likely make up the majority of suffering) and believe that interfering with nature could be positive for welfare. The latter may be difficult to achieve without proven interventions since few people think we should intervene in nature. Whether direct moral circle expansion or indirect (via. interventions) are more impactful are unclear to me. Animal Ethics mainly work on the former and Wild Animal Initiative works mainly on the latter. I’m currently expecting to donate to both. I think having an organization working directly on this area could be of high importance (as I know only the Center For Reducing Suffering and the Center on Long-Term Risk work partly on this area). But how do you think it’s possible to currently work on "future-focused wild animal welfare interventions"? Other than doing

Theory: “WAW might be of higher impact than x-risk prevention based on utilitarianism”

Isaac Dunn7mo2

If I understand correctly, you put 0.01% on artificial sentience in the future. That seems overconfident to me - why are you so certain it won't happen?

Jens Aslaug

7mo

Yes that’s correct and I do agree with you. To be honest the main reasons were due to limited knowledge and simplification reasons. Putting any high number for the likelihood of “artificial sentience” would make it the most important cause area (which based on my mindset, it might be). But I’m currently trying to figure out which of the following I think is the most impactful to work on: AI-alignment, WAW or AI sentience. This post was simply only about the first two. When all of that’s been said, I do think AI-sentience is a lot less likely than many EAs think (which still doesn't justify “0.01%”). But note that this is just initial thoughts based on limited information. Anyways, here’s my reasoning: * While I do agree that it might be theoretically possible and could cause suffering on an astronomical scale, I do not understand why we would intentionally or unintentionally create it. Intentionally I don't see any reason why a sentient AI would perform any better than a non-sentient AI. And unintentionally, I could imagine that with some unknown future technology, it might be possible. But no matter how complex we make AI with our current technology, it will just become a more "intelligent" binary system. * Even if we create it, it would only be relevant as an s-risk if we don’t realize it and fix it. However I do think the probability of me changing my mind is high.

Theory: “WAW might be of higher impact than x-risk prevention based on utilitarianism”

Isaac Dunn7mo22

I've only skimmed this, but just want to say I think it's awesome that you're doing your own thinking trying to compare these two approaches! In my view, you don't need to be "qualified" to try to form your own view, which depends on understanding the kinds of considerations you raise. This decision matters a lot, and I'm glad you're thinking carefully about it and sharing your thoughts.

Jens Aslaug

7mo

Thank you for your encouraging words! I appreciate your support and perspective.

Isaac Dunn8mo1

I interpreted the title of this post as a bill banning autonomous AI systems from paying people to do things! I did think it was slightly early.

wes R

8mo

Actually, I think I might have done that. (I say this so you don't feel gaslighted by the change in name.)

wes R

8mo

How should I change the name to prevent this?

𝕮𝖎𝖓𝖊𝖗𝖆's Quick takes

Isaac Dunn9mo7

Would you be eligible for the graduate visa? https://www.gov.uk/graduate-visa

If so, would that meet your needs?

𝕮𝖎𝖓𝖊𝖗𝖆

9mo

Thanks, yeah. My main hesitancy about this is that I probably want to go for a PhD, but can only get the graduate visa once, and I may want to use it after completing the PhD. But I've come around to maybe it being better to use it up now, pursue a PhD afterwards, and try to secure employment before completing my program so I can transfer to the skilled workers visa.

I'm interviewing Jan Leike, co-lead of OpenAI's new Superalignment project. What should I ask him?

Isaac Dunn9mo1

(I've just realised this is close to just a rephrasing of some of the other suggestions. Could be a helpful rephrasing though.)

I'm interviewing Jan Leike, co-lead of OpenAI's new Superalignment project. What should I ask him?

Answer by Isaac DunnJul 19, 20238

The Superalignment team's goal is "to build a roughly human-level automated alignment researcher".

Human-level AI systems sound capable enough to cause a global catastrophe if misaligned. So is the plan to make sure that these systems are definitely aligned (if so, how?), or to make sure that they are deployed in a such a way that they would not be able to take catastrophic actions even if they want to (if so, what would that look like?)?

Isaac Dunn

9mo

(I've just realised this is close to just a rephrasing of some of the other suggestions. Could be a helpful rephrasing though.)

EAGxCambridge 2023 Retrospective

Isaac Dunn9mo1

Thanks David, that's just the kind of reply I was hoping for! Those three goals do seem to me like three of the most important. It might be worth adding that context to your write-up.

I'm curious whether there's much you did specifically to achieve your third goal - inspiring people to take action based on high quality reasoning - beyond just running an event where people might talk to others who are doing that. I wouldn't expect so, but I'd be interested there was.

David M

9mo

We did encourage speakers to include action points and action-relevant information in their content, and tried to prioritise action-relevant workshops (e.g. "what it takes to found a charity"); I think that's about all. Thanks for the tip to include the goals in the write-up.

EAGxCambridge 2023 Retrospective

Isaac Dunn9mo1

Thanks for writing this up! I'd be interested if you had time to say more about what you think the main theory of change of the event was (or should have been).

David M

9mo

What I'll say should be taken more as representative of how I've been thinking, than of how CEA or other people think about it. These were our objectives, in order: 1: Connect the EA UK community. 2: Welcome and integrate less well-connected members of the community. Reduce the social distance within the UK EA community. 3: Inspire people to take action based on high-quality reasoning. The main emphasis was on 1, where the theory of impact is something like: The EA community will achieve more by working together than they will by working as individuals; facilitating people to build connections makes collaboration more likely. Some valuable kinds of connections might be: mentoring relationships, coworking, cofounding, research collabs, and not least friendships (for keeping up one's motivation to do good). We added other goals beyond connecting people, since a lot of changes to plans will come from one-off interactions (or even exposures to content); I think of someone deciding to apply for funding after attending a workshop on how to do that. Plausibly though, longer-lasting, deeper connections dominate the calculation, because of the 'heavy tail' of deep collaborations, such as an intern hire I heard of which resulted from this conference. I'll tag @OllieBase (CEA Events) in case he wants to give his own answer to this question.

Which CEA-funded community-building events are cost-effective?

Isaac Dunn10mo6

Interesting results, thanks for sharing! I think getting data from people who attend events is an important source of information about what's working and what's not.

I do worry a bit about what's best for the world coming apart from what people report as being valuable to them. (This comment ended up a bit rambley, sorry.)

Two main reasons that might be the case:

If the event causes someone's goals or motivations to change in a way that's great for the world, my guess is that doesn't feel valuable to the person compared to helping the person get or do things

... (read more)

OllieBase

9mo

Thanks Isaac, I agree relying on self-report is a key limitation here. In fact, when reviewing people's stories I would often wish they expanded on something that seemed small to them but important to us (e.g. they'd write "became more interested in pursuing X path" as part of a list, but that stood out to me as something exciting from an impact perspective). I didn't mention this in the report, but I also do user interviews fairly regularly to get some more colour on things like this and followed up with several people whose stories seemed impactful. I wouldn't say the events team are following the baseline of "let's do things that people report as valuable", and are just using that as one guiding light (albeit a significant one). I agree forming a clearer framework of how people arrive at impactful work would be exciting.

AMA: Luke Freeman, ED of Giving What We Can (GWWC)

Isaac Dunn10mo19

Are there any lessons that GWWC has learnt that you think would be useful for EA community builders to know and remember?

Luke Freeman

10mo

Thanks Isaac! Personal advocacy goes a long way, especially when you are kind, warm, meet people where they are at and inspire them to see where they could be. Sometimes it takes a long time to pay off but if you leave people intrigued and inspired it often comes back around. Most pledges and new donors can be traced back to some form of personal advocacy. Secondly, giving people specific calls to action (e.g. if you care about X you can donate to Y) is a very empowering move that leaves people feeling like they have agency and makes it easier for them to advocate for the ideas you present to them (whereas "look at how big and hard and complicated the world's problems are, this specific one is most important, and we need a very unique specific person to solve it" is going to be really demoralising and makes it ultimately harder to find those specific people anyway).

AMA: Luke Freeman, ED of Giving What We Can (GWWC)

Isaac Dunn10mo9

If GWWC goes very well over the next five years (say 90th percentile), what would that look like?

AMA: Luke Freeman, ED of Giving What We Can (GWWC)

Isaac Dunn10mo16

Do you think that most of GWWC's impact will come from money moved, or from introducing people to EA who then change their career paths, or something else? (I can't tell immediately tell from your strategy, which mentions both.)

Luke Freeman

10mo

Thanks Isaac, good question! I believe that although donations moved through our platform are significant and a robust measurement, they only represent a part of GWWC's overall impact. In fact, over half the donations that we're aware of (via member reporting) are made directly to charities or via other donation platforms. In my view, if we truly make strides towards achieving our mission, I expect that the indirect impact will dwarf the direct impact and more easily measured impact. This could take various forms such as introducing people to EA principles and causes, promoting positive values, and influencing donations more loosely, and shifting the impact-orientation of the philanthropic sector, to name a just few. On a day to day basis however, our approach is to focus on what we can effectively measure and optimise, which predominantly includes donations and number of people giving effectively. However, we always have our broader mission in sight to ensure we don't make decisions that compromise greater impact elsewhere. This post by Joey gives examples of ways that organisations could undermine overall impact by focusing too much on just their own metrics. Essentially, we are accountable to our mission more than just the easily measurable metrics precisely because we think that's what's more important. If you ask anyone at the GWWC team you'd likely hear that they're frequently asked questions like "is this moving us towards our mission?" or "could this harm our overall mission?". We're planning to incorporate more indirect impact measures in our next impact evaluation, starting with the ones easier to measure like organisations using our research, or people we've referred to other organisations as donors or employees etc. Thanks again for your question, Isaac!

AMA: Luke Freeman, ED of Giving What We Can (GWWC)

Isaac Dunn10mo22

What is the best reason to think that GWWC isn't good for the world, in your view?

Luke Freeman

10mo

The two most plausible ones to me are that: 1. If our team could be having more impact working on other things and that us existing and them being currently bought into the case for impact is the reason they don't. 2. We poison the well/turn people off/create misconceptions etc we don't live up to our standards (e.g. quality and tone). That being said I think that generally speaking GWWC is one of the lower downside risk options within the EA community (especially compared with some of the (potential or actual) longtermist projects which I often worry much more about potential downsides or accidentally doing harm – in the active sense as opposed to the opportunity cost harm). If you have some in mind yourself I'd happy hear them out 😀

Feedback requested: EA Forum reactions

Isaac Dunn11mo5

Even if it's true that it can be hard to agree or disagree with a post as a whole, I do get the impression that people sometimes feel like they disagree with posts as a whole, and so simply downvote the post.

Also, I suspect it is possible to disagree with a post as a whole. Many posts are structured like "argument 1, argument 2, argument 3, therefore conclusion". If you disagree with the conclusion, I think it's reasonable to say that that's disagreeing with the post as a whole. If you agree with the arguments and the conclusion, then you agree with the po... (read more)

Is the 10% Giving What We Can Pledge Core to EA's Reputation?

Isaac Dunn11mo10

My guess it's that it's an unfortunate consequence of disagree voting not being an option on top-level posts, so people are expressing their disagreement with your views by simply downvoting. (I do disagree with your views, but I think it's a reasonable discussion to have!)

DirectedEvolution

11mo

Update: based on analytics and timing, I now believe that there are one or two specific individuals (whose identities I don’t know) who are just strong-downvoting my posts without reading them. While they may be doing this because they disagree with what they can glean of my conclusions from the intro, I do not consider this to be different from suppression of thought. I am not certain this is happening but it is the best explanation for the data I have at this time.

DirectedEvolution

11mo

Thank you Isaac. Based on this post's more positive reception, I'm more inclined to update in favor of your view.

Ethan Beri's Quick takes

Isaac Dunn11mo5

Nudge to seriously consider applying for 80,000 hours personal advising if you haven't already: https://80000hours.org/speak-with-us/

My guess is they'd be able to help you think this through!

[Grant opportunity] EA Oxford is looking for community builders to join our team

Isaac Dunn11mo5

I don't help with EA Oxford any more, but I think this is a good opportunity and if you've read the whole post, that's a sign you should consider applying! I'd be v happy to frankly answer any questions you have about Oxford, EA Oxford, and the current EA Oxford team - just message me.

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y3

Ah thanks Greg! That's very helpful.

I certainly agree that the target is relatively small, in the space of all possible goals to instantiate.

But luckily we aren't picking at random: we're deliberately trying to aim for that target, which makes me much more optimistic about hitting it.

And another reason I see for optimism comes from that yes, in some sense I see the AI is in some way neutral (neither aligned nor misaligned) at the start of training. Actually, I would agree that it's misaligned at the start of training, but what's missing initially are the c... (read more)

Greg_Colbourn

We might be deliberately aiming, but we have to get it right on the first try (with transformative AGI)! And so far none of our techniques are leading to anything close to perfect alignment even for relatively weak systems (see ref. to "29%" in OP!) Right. And that's where the whole problem lies! If we can't meaningfully align today's weak AI systems, what hope do we have for aligning much more powerful ones!? It's not acceptable for early systems to be misaligned, precisely because of what that implies for the alignment of more powerful systems and our collective existential security. If OpenAI want to say "it's ok GPT-4 is nowhere close to being perfectly aligned, because we definitely definitely will do better for GPT-5", are you really going to trust them? They really tried to make GPT-4 as aligned as possible (for 6 months). And failed. And still released it anyway.

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y1

Yes, I completely agree that this is nowhere near good enough. It would make me very nervous indeed to end up in that situation.

The thing I was trying to push back against was the idea that what I thought you were claiming: that we're effectively dead if we end up in this situation.

Greg_Colbourn

Why aren't we effectively dead, assuming the misaligned AI reaches AGI and beyond in capability? Do we just luck out? And if so, what makes you think that is the dominant, or default (90%) outcome? To give one example: how would you use this technique (the "training game") to eliminate 100% of all possible prompt engineering hacks and so protect against misuse by malicious humans (cf. "If the “grandma’s bedtime story napalm recipe” prompt engineering hack - as mentioned in the OP).

2[comment deleted]1y

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y2

Regarding the whether they have the same practical implications, I guess I agree that if everyone had a 90% credence in catastrophe, that would be better than them having 50% credence or 10%.

Inasmuch as you're right that the major players have a 10% credence of catastrophe, we should either push to raise it or to advocate for more caution given the stakes.

My worry is that they don't actually have that 10% credence, despite maybe saying they do, and that coming across as more extreme might stop them from listening.

I think you might be right that if we can make the case for 90%, we should make it. But I worry we can't.

Greg_Colbourn

I think we should at least try! (As I am doing here.)

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y2

Ah I think I see the misunderstanding.

I thought you were invoking "Murphy's Law" as a general principle that should generally be relied upon - I thought you were saying that in general, a security mindset should be used.

But I think you're saying that in the specific case of AGI misalignment, there is a particular reason to apply a security mindset, or to expect Murphy's Law to hold.

Here are three things I think you might be trying to say:

As AI systems get more and more powerful, if there are any problems with your technical setup (training procedures, over

... (read more)

Greg_Colbourn

Thanks. Yes you're right in that I'm saying that you specifically need to apply security mindset/Murphy's Law when dealing with sophisticated threats that are more intelligent than you. You need to red team, to find holes in any solutions that you think might work. And a misaligned superhuman AI will find every hole/loophole/bug/attack vector you can think of, and more! Yes, I'm saying 1. This is enough for doom by default! 2 and 3 are red herrings imo as it looks like you are assuming that the AI is in some way neutral (neither aligned or misaligned), and then either becomes aligned or misaligned during training. Where is this assumption coming from? The AI is always misaligned to start! The target of alignment is a tiny pocket in a vast multidimensional space of misalignment. It's not anything close to a 50/50 thing. Yudkowsky uses the example of a lottery: the fact that you can either win or lose (2 outcomes) does not mean that the chance of winning is 50% (1 in 2)!

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y9

Plans that involve increasing AI input into alignment research appear to rest on the assumption that they can be grounded by a sufficiently aligned AI at the start. But how does this not just result in an infinite, error-prone, regress? Such “getting the AI to do your alignment homework” approaches are not safe ways of avoiding doom.

On this point, the initial AI's needn't be actually aligned, I think. They could for example do useful alignment work that we can use even though they are "playing the training game"; they might want to take over, but... (read more)

Greg_Colbourn

How can their output be trusted if they aren't aligned? Also, I don't think it's a way for reliably avoiding catastrophe even in the event the output can be trusted to be correct: how do you ensure that the AI rewarded for finding bugs finds all the bugs? From the Planned Obsolescence article you link to: and [my emphasis]

P(doom|AGI) is high: why the default outcome of AGI is doom

Isaac Dunn1y8

"Applying a security mindset" means looking for ways that something could fail. I agree that this is a useful technique for preventing any failures from happening.

But I'm not sure that assuming that this is a sound principle to rely on when trying to work out how likely it is that something will go wrong. In general, Murphy’s Law is not correct. It's not true that "anything that can go wrong will go wrong".

I think this is part of the reason I'm sceptical of confident predictions of catastrophe, like your 90% - it's plausible to me that things could work ou... (read more)

Greg_Colbourn

[Strong Disagree.] I think "anything that can go wrong will go wrong" becomes stronger and stronger, the bigger the intelligence gap there is between you and an AI you are trying to align. For it not to apply requires a mechanism for the AI to spontaneously become perfectly aligned. What is that mechanism? It does not have the same practical implications. As I say in the post, there is a big difference between the two in terms of it becoming a "suicide race" for the latter (90%). Arguing the former (10%) - as many people are already doing, and have been for years - has not moved the needle (it seems as though OpenAI, Google Deepmind and Anthropic are basically fine with gambling tens to hundred of millions of lives on a shot at utopia.[1] 1. ^ To be clear - I'm not saying it's 90% for the sake of argument. It is what I actually believe.

Greg_Colbourn's Quick takes

Isaac Dunn1y3

Yes, thank you for that! I'm probably going to write an object level comment there.

Greg_Colbourn's Quick takes

Isaac Dunn1y3

Isn't it possible that calling for a complete stop to AI development actually counterfactually speeds up AI development?

The scenario I'm thinking of is something like:

There's a growing anti-AI movement calling for a complete stop
A lot of people in that movement are ignorant about AI, and about the nature AI risks
It's therefore easy for pro-AI people to dismiss these concerns, because the reasons given for the stop are in fact wrong/bad
Any other, well-grounded calls for AI slowdown aren't given the time of day, because they are assumed to be the same as the

... (read more)

Greg_Colbourn

Well I've articulated what I think are compelling, specific reasons, that AI poses a serious existential threat in my new post: AGI rising: why we are in a new era of acute risk and increasing public awareness, and what to do now. Hope this can positively impact the public discourse toward informed debate. (And action!)

New blog: Planned Obsolescence

Isaac Dunn1y6

If you don't cross-post them individually, maybe you could e.g. monthly make one forum post linking all the new blog posts that month? I think if you never cross-post, you'll get fewer readers, and forum readers seem likely to get value from the blog posts.

EA university groups are missing out on most of their potential

Isaac Dunn1y2

Oh to be clear, I think that almost all altruistic people do not much care about the magnitude of their impact (in practice).

So I think the approach I'd suggest is to focus on altruistic people, and helping them realise that they probably do really care about the magnitude of their impact on reflection.

That's a much larger group than the people who are already magnitude-sensitive, and I think the intervention is probably more feasible at the moment than for people who have no existing interest in altruism.

I haven't thought much about strategy for cit... (read more)

Answer by Isaac DunnFeb 09, 20238

[Edited to add: I see Chris Leong said something similar at the same time.]

I think there's a tricky balance to strike.

If you think that option A is much better for the world than option B, then the more open and honest you are about thinking that A is much better, the more discouraged people working on B will feel.

But if you try to be more encouraging about option B, there's a real risk that people won't realise how much better you think option A is, and will work on B. If you're correct that option A is much better, then this is a terrible outcome. In tha... (read more)

Mack the Knife

The latter. I very much believe they should do what they think right. Nevertheless, I feel like some changes to their wording and the (visual) representation of other cause areas might be a good idea, especially given their huge influence overall, not just in longtermism. Would be a tough balance to be sure, but not impossible imo.

Summit on Existential Security 2023

Isaac Dunn1y3

Any specific suggestions? We're provisionally planning to use Swapcard.

Amy Labenz1y14

We have used a few Swapcard alternatives at previous events (Bizzabo, Whova, Grip) and sadly Swapcard was the best despite its weaknesses. I know the EAG team has talked with you about this some Yonatan, but I'd be keen to hear if you have any updated recommendations!

EA university groups are missing out on most of their potential

Isaac Dunn1y2

I've skimmed this post - thanks so much for writing it!

Here's a quick, rushed comment.

I have several points of agreement:

If we could get more people on board with the goal of EA (i.e., making the biggest positive difference they can), then that would be much better than just seeking out people who already have (or nearly have) that goal.
So it seems worth investing effort now into figuring out how to get people motivated towards this goal.
I agree that the four "reasons why people aren't joining your introductory EA program" you give are true statements (alt

... (read more)

Johan de Kock

Thanks for writing up your thoughts Isaac! You present some thought-provoking perspectives that I have not yet considered. I particularly resonate with your first point of disagreement that individuals can derive personal benefits from being altruistic simply by choosing some cause. Your argument that striving for cause-neutrality and maximizing positive impact may be less fulfilling is a valid one. However, I am unsure why working on a less neglected cause would necessarily be less emotionally fulfilling. In fact, pursuing something "unique" may be quite exciting. Nonetheless, I agree that cause-neutrality may be less fulfilling, as we all have unconscious biases that may favor certain causes due to personal experiences or connections. This may make steering against these inclinations more difficult, perhaps even unpleasant. I also agree that targeting "already-altruistic people" who care about the magnitude of their impact probably is very promising. Social impact is heavy tailed so it is likely that these individuals could contribute to most of the net impact generated. I just think that EA university groups should not be the stakeholder group that make this trade-off. In my view, it is important to carefully consider how to differentiate and vary the strategies of EA university, city, and national groups. With the target audience of university groups being very young adults, I believe it is detrimental to exclude those who may not yet be "there yet". As I have previously argued, there are many young and ambitious individuals who have not yet determined their life's direction, and they could be easily nudged towards becoming "already-altruistic". The loss of counterfactual impact would be huge. I would agree, however, that for city or national groups, a narrower focus might be a better strategy. What are your thoughts on having a broader focus for EA university groups, but a narrower one for city groups?

Things I didn’t feel that guilty about before getting involved in effective altruism

Isaac Dunn1y5

I agree with the sentiment that ideally we'd accept that we have unchangeable personal needs and desires that constrain what we can do, so it might not "make sense" to feel guilty about them.

But I think the language "that's just silly" risks coming across as saying that anyone who has these feelings is being silly and should "just stop", which of course is easier said than done with feelings! And I'm worried calling feelings silly might make people feel bad about having them (see number 7 in the original post).

Bob

Yes, that's fair. And it was silly of me to potentially cause someone to feel guilty about feeling guilty. But I stand by the point that there is no reason to feel guilty about choosing something that masimises one's own utility.

How would you estimate the value of delaying AGI by 1 day, in marginal donations to GiveWell?

Isaac Dunn1y9

I think it's good to make object-level criticisms of posts, but I think it's important that we encourage rather than discourage posts that make a genuine attempt to explore unusual ideas about what we should prioritise, even if they seem badly wrong to you. That's because people can make up their own minds about the ideas in a post, and because some of these posts that you're suggesting be deleted might be importantly right.

In other words, having a community that encourages debate about the important questions seems more important to me than one that shuts down posts that seem "harmful" to the cause.

2[anonymous]1y

I generally agree, but not in this specific case for two reasons. First, I think there are more thorough, less provocative, strictly better discussions of this kind of thing already. See writing from Beckstead, Bostrom, etc. Second, I think there are specific direct harms this post could have. See my latest reply to the OP on the other branch of this thread.

In (Praise?) of Ineffective Altuism

Isaac Dunn1y3

Thanks for the thoughtful response!

I think when it comes to how you would make your charity more effective at helping others, I agree it's not easy. I completely agree with your example about it being difficult to know which possible hires would be good at the job. I think you know much better than I do what is important to make 240Project go well.

But I think we can use reasoning to identify what plans are more likely to lead to good outcomes, even if we can't measure them to be sure. For example, working to address problems that are particularly large in ... (read more)

In (Praise?) of Ineffective Altuism

Isaac Dunn1y4

Thank you for writing and sharing this! I suppose it's being downvoted because it's anti-EA, but I enjoyed reading it and understanding your perspective.

I had three main reactions to it:

You obviously care a lot about helping other people and making the world a better place, which I value a lot. You're putting your money where your mouth is and actually taking action to make a difference. That's really admirable.
You seem to think that effective altruism is all about having an objective, measurable metric of effectiveness, and that any attempt to do good tha

... (read more)

Edward Farrelly

Hi Isaac, appreciate the response. Re 2, you capture something important in the phrase 'achieving the best objective outcomes, even if you can't measure them'. That for me is the precise problem, how do you achieve this in the absense of measurement? What are the practical steps I can take to make my charity a better organisation if what I measure doesn't seem to capture the work that is important. I know for instance that our work depends almost entirely on the skill and emotional awareness/warmth of our staff and volunteers. But I can't predict, when hiring, who will be good at this and who wont. Re 3, for sure I can help people even more. Theres no doubt that funding a hospital in Sudan which has run out of medical equipent is far more important and valuable, or paying a few hundred dollars to restore sight to a blind person in India is better value for money. Those are the kinds of charity I like to donate to. But for me its important that when trying to do something active, to contribute ones time and skills, that one try at least part of the time to prioritise ones immediate local environment, ones local community, and see where the greatest needs and disparities are there. Suicide prevention & homelessness are local issues all over the wealthy world, and very real. I understand your point about any action meaning other action isn't taken, that one is in effect 'giving up' on others, and how that can be an important tool for prioritising where one directs limited resources (this being pretty much what EA is trying to do, and what attracts me to it), and I think that in the real, practical world, its actually not that complicated to get a feel for ones own biases in that direction. There is a ton of really desperate primary survival need in the world today (though less than there was when I was growing up). I'm absolutely not questioning that charities who support this kind of work should not be prioritised. But I still think that there is

Announcing EA Pulse, large monthly US surveys on EA

Isaac Dunn2y2

Sounds excellent! Roughly how large is large?

Jamie Elsey

Several thousand per month (not double figure thousands)

CEA Ops is now EV Ops

Isaac Dunn2y3

Thanks for the reply!

If I understand correctly, you think that people in EA do care about the sign of their impact, but that in practice their actions don't align with this and they might end up having a large impact of unknown sign?

That's certainly a reasonable view to hold, but given that you seem to agree that people are trying to have a positive impact, I don't see how using phrases like "expected value" or "positive impact" instead of just "impact" would help.

In your example, it seems that SBF is talking about quickly making grants that have positive expected value, and uses the phrase "expected value" three times.

Ofer

Reasonably determining whether an anthropogenic x-risk related intervention is net-positive or net-negative is often much more difficult[1] than identifying the intervention as potentially high-impact. With less than 2 minutes to think, one can usually do the latter but not the former. People in EA can easily be unconsciously optimizing for impact (which tends to be much easier and aligned with maximizing status & power) while believing they're optimizing for EV. Using the term "impact" to mean "EV" can exacerbate this problem. ---------------------------------------- 1. Due to an abundance of crucial considerations. ↩︎