Wiki Contributions


College advice

How about "College/undergraduate advice"? It's clunky, but I'd guess that's outweighed by the significant boost to the chances that someone looking for this finds it. (E.g. if I didn't already know about this tag and was looking for it, I'd have searched for "undergraduate" in the search bar, but I don't think I'd have looked for "college." Maybe that's a rare case?)

Alternatively, maybe just putting "undergraduate" into the description is good enough for making it easier to find? If so, seems good to also throw in other related terms ("undergrad", "university", "student") into the description.

Think about EA alignment like skill mastery, not cult indoctrination

I think both (1) and (2) are sufficiently mild/non-nefarious versions of "repeating talking points" that they're very different from what people might imagine when they hear "techniques associated with cult indoctrination"--different enough that the latter phrase seems misleading.

(E.g., at least to my ears, the original phrase suggests that the communication techniques you've seen involve intentional manipulation and are rare; in contrast, (1) and (2) sound to me like very commonplace forms of ineffective (rather than intentionally manipulative) communication.)

(As I mentioned, I'm sympathetic to the broader purpose of the post, and my comment is just picking on that one phrase; I agree with and appreciate your points that communication along the lines of (1) and (2) happen, that they can be examples of poor communication / of not building from where others are coming from, and that the "skill mastery" perspective could help with this.)

Think about EA alignment like skill mastery, not cult indoctrination

Thanks! Seems like a useful perspective. I'll pick on the one bit I found unintuitive:

Summary: People who try to get more people to be EA-aligned often use techniques associated with cult indoctrination, such as repeating talking points and creating closed social circles.

In the spirit of not repeating talking points, could you back up this claim, if you meant it literally? This would be big if true, so I want to flag that:

  • You state this in the summary, but as far as I can see you don't state/defend it anywhere else in the post. So people just reading the summary might overestimate the extent to which the post argues for this claim.
  • I've seen lots of relevant community building, and I more often see the opposite: people being such nerds that they can't help themselves from descending into friendly debate, people being sufficiently self-aware that they know their unintuitive/unconventional views won't convince people if they're not argued for, and people pouring many hours into running programs and events (e.g. dinners, intro fellowships, and intro-level social events) aimed at creating an open social environment.

(As an aside, people might find it interesting to briefly check out YouTube videos of actual modern cult tactics for comparison.)

AI Governance Course - Curriculum and Application

Thanks! Good to know that wasn't easy enough to find--I've now added links in several additional spots in the post, including near the top. (The link you had was also right.)

Democratising Risk - or how EA deals with critics

Other thoughts:

  • Some other comment hinted at this: another frame that I'm not sure this paper considers is that non-strong-longtermist views are in one sense very undemocratic--they drastically prioritize the interests of very privileged current generations while leaving future generations disenfranchised, or at least greatly under-represented (if we assume there'll be many future people). So characterizing a field as being undemocratic due to having longtermism over-represented sounds a little like calling the military reconstruction that followed the US civil war (when the Union installed military governments in defeated Southern states to protect the rights of African Americans) undemocratic--yes, it's undemocratic in a sense, but there's also an important sense in which the alternative is painfully undemocratic.
    • How much we buy my argument here seems fairly dependent on how much we buy (strong) longtermism. It's intuitive to me that (here and elsewhere) we won't be able to fully answer "to what extent should certain views be represented in this field?" without dealing with the object-level question "to what extent are these views right"? The paper seems to try to side-step this, which seems reasonably pragmatic but also limited in some ways.
    • I think there's a similarly plausible case for non-total-utilitarian views being in a sense undemocratic; they tend to not give everyone equal decision-making weight. So there's also a sense in which seemingly fair representation of these other views is non-democratic.
      • As a tangent, this seems closely related to how a classic criticism of utilitarianism--that it might trample on the few for the well-being of a majority--is also an old criticism of democracy (which is a little funny, since the paper both raises these worries with utilitarianism and gladly takes democracy on board, although that might be defensible.)
  • One thing I appreciate about the paper is how it points out that the ethically loaded definitions of "existential risk" make the scope of the field dependent on ethical assumptions--that helped clarify my thinking on this.
Is EA compatible with technopessimism?

I think you're using a stronger assumption in your ethical theories that situations are even comparable, if you ignore when they occur

Hm I wouldn't endorse that assumption. I avoided specifying "when"s to communicate more quickly, but I had them in mind something like your examples--agree the times matter.

trying to get the first thing to happen (evolve to stable society) instead of second or third is worth doing if it were the only thing we could do in 2021

Agreed but only if we add another condition/caveat: that trying to get the first thing to happen also didn't trade off against the probability of very good scenarios not covered in these three scenarios (which it would mathematically have to do, under some assumptions). As an oversimplistic example with made-up numbers, suppose we were facing these probabilities of possible futures:

  • 20% -- your first scenario (tech stagnation) (10 goodness points)
  • 5% -- your second scenario (mass suffering) (-1,000,000 goodness points)
  • 20%--your third scenario (extinction) (-10 goodness points)
  • 55%--Status quo in 2021 evolving to technologically sophisticated utopia by 2100 (1,000,000 goodness points)

And suppose the only action we could take in 2021 would change the above probabilities to the following:

  • 100% -- your first scenario (tech stagnation) (10 goodness points)
  • 0% -- your second scenario (mass suffering) (-1,000,000 goodness points)
  • 0%--your third scenario (extinction) (-10 goodness points)
  • 0%--Status quo in 2021 evolving to technologically sophisticated utopia by 2100 (1,000,000 goodness points)

Then the expected value of not taking the action is 500,000 goodness points, while the expected value of taking the action is 10 goodness points, so taking the action would be very bad / not worthwhile (despite how technically the action falls under your description of "trying to get the first thing to happen (evolve to stable society) instead of second or third [...] if it were the only thing we could do in 2021").

Democratising Risk - or how EA deals with critics

Thanks for sharing this! Responding to just some parts of the object-level issues raised by the paper (I only read parts closely, so I might not have the full picture)--I find several parts of this pretty confusing or unintuitive:

  • Your first recommendation in your concluding paragraph is: "EA needs to diversify funding sources by breaking up big funding bodies." But of course "EA" per se can't do this; the only actors with the legal authority to break up these bodies (other than governments, which I'd guess would be uninterested) are these funding bodies themselves, i.e. mainly OpenPhil. Given the emphasis on democratization and moral uncertainty, it sounds like your first recommendation is a firm assertion that two people with lots of money should give away most of their money to other philanthropists who don't share their values, i.e. it's a recommendation that obviously won't be implemented (after all, who'd want to give influence to others who want to use it for different ends?). So unless I've misunderstood, this looks like there might be more interest in emphasizing bold recommendations than in emphasizing recommendations that stand a chance of getting implemented. And that seems at odds with your earlier recognition, which I really appreciate--that this is not a game. Have I missed something?
  • Much of the paper seems to assume that, for moral uncertainty reasons, it's bad for the existential risk research community to be unrepresentative of the wider world, especially in its ethical views. I'm not sure this is a great response to moral uncertainty. My intuition would be that, under moral uncertainty, each worldview will do best (by its own lights) if it can disproportionately guide the aspects of the world it considers most important. This suggests that all worldviews will do best (by their own lights) if [total utiliarianism + strong longtermism + transhumanism]* retains over-representation in existential risk research (since this view cares about this niche field to an extremely unusual extent), while other ethical views retain their over-representation in the many, many other areas of the world that entirely lack these longtermists. These disproportionate influences just seem like different ethical communities specializing differently, to mutual benefit. (There's room to debate just how much these ethical views should concentrate their investments, but if the answer is not zero, then it's not the case that e.g. the field having "non-representative moral visions of the future" is a "daunting problem" for anyone.)

*I don't use your term "techno-utopian approach" because "utopian" has derogotary connotations, not to mention misleading/inaccurate connotations re: these researchers' typical levels of optimism regarding technology and the future.

Is EA compatible with technopessimism?

I'm not sure I follow. [...] I assume all ethical views prefer status quo to extinction or totalitarianism

I wonder if we might be using "net negative" differently? By "net negative" I mean "worse than non-existence," not "worse than status quo." So even though we may prefer a stable status quo to imminent extinction, we might still think the latter leaves us at roughly net zero (i.e. not net negative, or at least not significantly net negative).

I also suspect that, under many ethical views, some forms of totalitarianism would be better than non-existence (i.e. not net-negative). For example, a totalitarian world in which freedoms/individuality are extremely limited--but most people are mostly happy, and extreme suffering is very rare--seems at least a little better than non-existence, by the lights of many views about value.

(A lot of what I'm saying here is based on the assumption that, according to very scope-sensitive views of value: on a scale where -100 is "worst possible future" and 0 is "non-existence" and 100 is "best possible future," a technologically unsophisticated future would be approximately 0, because humanity would miss out on the vast majority of time and space in which we could create (dis)value. Which is why, for a technologically unsophisticated future to be better than the average technologically sophisticated future, the latter has to be net negative.)

Oh I agree, I feel like superintelligence cannot be trusted, at least the kind that's capable of global power-grabs. [...] I think it's largely because humans don't want consistent things, and cannot possibly want consistent things, short of neurosurgery.

I'm not sure if you mean that humans' preferences (a) are consistent at any given time but inconsistent over time, or (b) are inconsistent even if we hold time constant. I'd have different responses to the two.

Re: (a), I think this would require some way to aggregate different individuals' preferences (applied to the same individual at different times)--admittedly seems tricky but not hopeless?

Re: (b), I agree that alignment to inconsistent preferences is impossible. (I also doubt humans can be aligned to other humans' inconsistent preferences--if someone prefers apples to pears and they also prefer pears to apples (as an example of an inconsistent preference), I can't try to do what they want me to do when they ask for a fruit, since there isn't a consistent thing that they want me to do.) Still--I don't know, I don't feel that my preferences are that incoherent, and I think I'd be pretty happy with an AI that just tries to do what I want it to do to whatever extent I have consistent wants.

My Overview of the AI Alignment Landscape: A Bird’s Eye View

You're right, this seems like mostly semantics. I'd guess it's most clear/useful to use "alignment" a little more narrowly--reserving it for concepts that actually involve aligning things (i.e. roughly consistently with non-AI-specific uses of the word "alignment"). But the Critch(/Dafoe?) take you bring up seems like a good argument for why AI-influenced coordination failures fall under that.

Is EA compatible with technopessimism?

This shouldn't be too hard if the default case from tech progress is extinction / totalitarianism.

Maybe, although I suspect this assumption makes it significantly harder to argue that a technologically sophisticated future is net negative in expectation (since, at least by ethical views that seem especially common in this community, extinction leads to approximately net zero (not net negative) futures, and it seems plausible to me that a totalitarian future--with all the terrible loss of potential that would involve--would still be better than non-existence, i.e. not net negative).

I don't know what a proof of "solving alignment" being impossible looks like

Just to clarify, I wouldn't demand that--I'd be looking for at least an intuitive argument that solving alignment is intractable. I agree that's still hard.

I still haven't understood what it's like inside the mind of someone who believes alignment is possible

As a tangent (since I want to focus on tractability rather than possibility, although impossibility would be more than enough to show intractability): the main reason I think that alignment (using roughly this definition of alignment) is possible is: humans can be aligned to other humans; sometimes we act in good faith to try to satisfy another's preferences. So at least some general intelligences can be aligned. And I don't see what could be so special about humans that would make this property unique to us.

Returning from the tangent, I'm also optimistic about tractability because:

  • People haven't been trying for that long, and the field is still very small
  • At least some prominent, relatively new research directions (e.g. [1], [2], [3]) seem promising

Some intuition: [...]

Yup this seems plausible, you get the bonus points :)

Load More