All of Wei Dai's Comments + Replies

That said, I very much agree about the “weirdness” of turning to philosophical uncertainty as a solution. Surely philosophical progress (done right) is a good thing, not a moral threat.

I of course also think that philosophical progress, done right, is a good thing. However I also think genuine philosophical progress is much harder than it looks (see Some Thoughts on Metaphilosophy for some relevant background views), and therefore am perhaps more worried than most about philosophical "progress", done wrong, being a bad thing.

The salient thing to notice is that this person wants to burn your house down.

In your example, after I notice this, I would call the police to report this person. What do you think I should do (or what does David want me to do) after noticing the political agenda of the people he mentioned? My own natural inclination is to ignore them and keep doing what I was doing before, because it seems incredibly unlikely that their agenda would succeed, given the massive array of political enemies that such agenda has.

I was concerned that after the comment was initially downvoted to -12, it would be hidden from the front page and not enough people would see it to vote it back into positive territory. It didn't work out that way, but perhaps could have?

I want to note that within a few minutes of posting the parent comment, it received 3 downvotes totaling -14 (I think they were something like -4, -5, -5, i.e., probably all strong downvotes) with no agreement or disagreement votes, and subsequently received 5 upvotes spread over 20 hours (with no further downvotes AFAIK) that brought the net karma up to 16 as of this writing. Agreement/disagreement is currently 3/1.

This pattern of voting seems suspicious (e.g., why were all the downvotes clustered so closely in time). I reported the initial cluster of dow... (read more)

4
Nathan Young
5d
Yeah the voting on these posts feels pretty bizarre. Though I try not to worry about that. It usually comes out in the wash to something that seems right.

I think too much moral certainty doesn't necessarily cause someone to be dangerous by itself, and there has to be other elements to their personality or beliefs. For example lots of people are or were unreasonably certain about divine command theory[1], but only a minority of them caused much harm (e.g. by being involved in crusades and inquisitions). I'm not sure it has much to do with realism vs non-realism though. I can definitely imagine some anti-realist (e.g., one with strong negative utilitarian beliefs) causing a lot of damage if they were put in c... (read more)

It's entirely possible that I misinterpreted David. I asked for clarification from David in the original comment if that was the case, but he hasn't responded so far. If you want to offer your own interpretation, I'd be happy to hear it out.

1
Concerned EA Forum User
4d
Imagine someone runs up to your house with a can of gasoline and some matches. They start talking about how there are bad men living in your walls and they need to burn the place down. Now, the fact that this person wants to burn you house down doesn’t allow you to determine whether there are bad men hiding in your walls. But focusing on that epistemological point would be a distraction. The salient thing to notice is that this person wants to burn your house down. 

I'm saying that you can't determine the truth about an aspect of reality (in this case, what cause group differences in IQ), when both sides of a debate over it are pushing political agendas, by looking at which political agenda is better. (I also think one side of it is not as benign as you think, but that's besides the point.)

I actually don't think this IQ debate is one that EAs should get involved in, and said as much to Ives Parr. But if people practice or advocate for what seem to me like bad epistemic norms, I feel an obligation to push back on that.

1
Concerned EA Forum User
6d
David definitely wasn’t saying that you can determine the empirical truth that way. If that’s the claim you think you were responding to, then I think you misinterpreted him in a really uncharitable and unfair way.

More specifically, you don't need to talk about what causes group differences in IQ to make a consequentialist case for genetic enhancement, since there is no direct connection between what causes existing differences and what the best interventions are. So one possible way forward is just to directly compare the cost-effectiveness of different ways of raising intelligence.

Materialism is an important trait in individuals, and plausibly could be an important difference between groups. (Certainly the history of the Jewish people attests to the fact that it has been considered important in groups!) But the horrific recent history of false hypotheses about innate Jewish behavior helps us see how scientifically empty and morally bankrupt such ideas really are.

Coincidentally, I recently came across an academic paper that proposed a partial explanation of the current East Asian fertility crisis (e.g., South Korea's fertility dec... (read more)

4
Wei Dai
5d
I want to note that within a few minutes of posting the parent comment, it received 3 downvotes totaling -14 (I think they were something like -4, -5, -5, i.e., probably all strong downvotes) with no agreement or disagreement votes, and subsequently received 5 upvotes spread over 20 hours (with no further downvotes AFAIK) that brought the net karma up to 16 as of this writing. Agreement/disagreement is currently 3/1. This pattern of voting seems suspicious (e.g., why were all the downvotes clustered so closely in time). I reported the initial cluster of downvotes to the mods in case they want to look into it, but have not heard back from them yet. Thought I'd note this publicly in case a similar thing happened or happens to anyone else.
-25
Concerned EA Forum User
6d

Note that Will does say a bit in the interview about why he doesn't view SBF's utilitarian beliefs as a major explanatory factor here (the fraud was so obviously negative EV, and the big lesson he took from the Soltes book on white-collar crime was that such crime tends to be more the result of negligence and self-deception than deliberate, explicit planning to that end).

I disagree with Will a bit here, and think that SBF's utilitarian beliefs probably did contribute significantly to what happened, but perhaps somewhat indirectly, by 1) giving him large... (read more)

We just wrote a textbook on the topic together (the print edition of utilitarianism.net)! In the preface, we briefly relate our different attitudes here: basically, I'm much more confident in the consequentialism part, but sympathetic to various departures from utilitarian (and esp. hedonistic) value theory, whereas Will gives more weight to non-consequentialist alternatives (more for reasons of peer disagreement than any intrinsic credibility, it seems), but is more confident that classical hedonistic utilitarianism is the best form of consequentialism.

I agree it'd be fun for us to explore the disagreement further sometime!

My memory of the podcast (could be wrong, only listened once!) is that Will said that, conditional on error theory being false, his credence in consequentialism, is about 0.5.

I think he meant conditional on error theory being false, and also on not "some moral view we've never thought of".

Here's a quote of what Will said starting at 01:31:21: "But yeah, I tried to work through my credences once and I think I ended up in like 3% in utilitarianism or something like. I mean large factions go to, you know, people often very surprised by this, but large fact... (read more)

2
David Mathers
7d
'also on not "some moral view we've never thought of".' Oh, actually, that's right. That does change things a bit. 

If future humans were in the driver’s seat instead, but with slightly more control over the process

Why only "slightly" more control? It's surprising to see you say this without giving any reasons or linking to some arguments, as this degree of alignment difficulty seems like a very unusual position that I've never seen anyone argue for before.

2
Matthew_Barnett
8d
I'm a bit surprised you haven't seen anyone make this argument before. To be clear, I wrote the comment last night on a mobile device, and it was intended to be a brief summary of my position, which perhaps explains why I didn't link to anything or elaborate on that specific question. I'm not sure I want to outline my justifications for my view right now, but my general impression is that civilization has never had much central control over cultural values, so it's unsurprising if this situation persists into the future, including with AI. Even if we align AIs, cultural and evolutionary forces can nonetheless push our values far. Does that brief explanation provide enough of a pointer to what I'm saying for you to be ~satisfied? I know I haven't said much here; but I kind doubt my view on this issue is that rare that you've literally never seen someone present a case for it.

The source code was available, but if someone wanted to claim compliance with the NIST standard (in order to sell their product to the federal government, for example), they had to use the pre-compiled executable version.

I guess there's a possibility that someone could verify the executable by setting up an exact duplicate of the build environment and re-compiling from source. I don't remember how much I looked into that possibility, and whether it was infeasible or just inconvenient. (Might have been the former; I seem to recall the linker randomizing some addresses in the binary.) I do know that I never documented a process to recreate the executable and nobody asked.

8
Lorenzo Buonanno
7d
Is this a use case for Reproducible Builds?

It’s not clear to me why human vs. AIs would make war more likely to occur than in the human vs. human case, if by assumption the main difference here is that one side is more rational.

We have more empirical evidence that we can look at when it comes to human-human wars, making it easier to have well-calibrated beliefs about chances of winning. When it comes to human-AI wars, we're more likely to have wildly irrational beliefs.

This is just one reason war could occur though. Perhaps a more likely reason is that there won't be a way to maintain the peace,... (read more)

3
Matthew_Barnett
8d
For what it's worth, I'd loosely summarize my position on this issue as being that I mainly think of AI as a general vehicle for accelerating technological and economic growth, along with accelerating things downstream of technology and growth, such as cultural change. And I'm skeptical we could ever fully "solve alignment" in the ambitious sense you seem to be imagining. In this frame, it could be good to slow down AI if your goal is to delay large changes to the world. There are plausible scenarios in which this could make sense. Perhaps most significantly, one could be a cultural conservative and think that cultural change is generally bad in expectation, and thus more change is bad even if it yields higher aggregate prosperity sooner in time (though I'm not claiming this is your position). Whereas, by contrast, I think cultural change can be bad, but I don't see much reason to delay it if it's inevitable. And the case against delaying AI seems even stronger here if you care about preserving (something like) the lives and values of people who currently exist, as AI offers the best chance of extending our lifespans, and "putting us in the driver's seat" more generally by allowing us to actually be there during AGI development. If future humans were in the driver's seat instead, but with slightly more control over the process, I wouldn't necessarily see that as being significantly better in expectation compared to my favored alternative, including over the very long run (according to my values). (And as a side note, I also care about influencing human values, or what you might term "human safety", but I generally see this as orthogonal to this specific discussion.)

What are some failure modes of such an agency for Paul and others to look out for? (I shared one anecdote with him, about how a NIST standard for "crypto modules" made my open source cryptography library less secure, by having a requirement that had the side effect that the library could only be certified as standard-compliant if it was distributed in executable form, forcing people to trust me not to have inserted a backdoor into the executable binary, and then not budging when we tried to get an exception for this requirement.)

4
Guy Raveh
8d
Were you prohibited from also open sourcing it?

I've looked into the game theory of war literature a bit, and my impression is that economists are still pretty confused about war. As you mention, the simplest model predicts that rational agents should prefer negotiated settlements to war, and it seems unsettled what actually causes wars among humans. (People have proposed more complex models incorporating more elements of reality, but AFAIK there isn't a consensus as to which model gives the best explanation of why wars occur.) I think it makes sense to be aware of this literature and its ideas, but the... (read more)

2
Matthew_Barnett
8d
This generally makes sense to me. I also think human irrationality could prompt a war with AIs. I don't disagree with the claim insofar as you're claiming that such a war is merely plausible (say >10% chance), rather than a default outcome. (Although to be clear, I don't think such a war would likely cut cleanly along human vs. AI lines.) On the other hand, humans are currently already irrational and yet human vs. human wars are not the default (they happen frequently, e.g. but at any given time on Earth, the vast majority of humans are not in a warzone or fighting in an active war). It's not clear to me why human vs. AIs would make war more likely to occur than in the human vs. human case, if by assumption the main difference here is that one side is more rational.  In other words, if we're moving from a situation of irrational parties vs. other irrational parties to irrational parties vs. rational parties, I'm not sure why we'd expect this change to make things more warlike and less peaceful as a result. You mention one potential reason: I don't think this follows. Humans presumably also had empathy in e.g. 1500, back when war was more common, so how could it explain our current relative peace? Perhaps you mean that cultural changes caused our present time period to be relatively peaceful. But I'm not sure about that; or at least, the claim should probably be made more specific. There are many things about the environment that have changed since our relatively more warlike ancestors, and (from my current perspective) I think it's plausible that any one of them could have been the reason for our current relative peace. That is, I don't see a good reason to single out human values or empathy as the main cause in itself.  For example, humans are now a lot richer per capita, which might mean that people have "more to lose" when going to war, and thus are less likely to engage in it. We're also a more globalized culture, and our economic system relies more on long

I was curious why given Will's own moral uncertainty (in this interview he mentioned having only 3% credence in utilitarianism) he wasn't concerned about SBF's high confidence in utilitarianism, but didn't hear the topic addressed. Maybe @William_MacAskill could comment on it here?

One guess is that apparently many young people in EA are "gung ho" on utilitarianism (mentioned by Spencer in this episode), so perhaps Will just thought that SBF isn't unusual in that regard? One lesson could be that such youthful over-enthusiasm is more dangerous than it seems,... (read more)

I feel like it's more relevant what a person actually believes than whether they think of themselves as uncertain. Moral certainty seems directly problematic (in terms of risks of recklessness and unilateral action) only when it comes together with moral realism: If you think you know the single correct moral theory, you'll consider yourself justified to override other people's moral beliefs and thwart the goals they've been working towards.

By contrast, there seems to me to be no clear link from "anti-realist moral certainty in some subjectivist axiology" ... (read more)

I don't think the "3% credence in utilitarianism" is particularly meaningful; doubting the merits of a particular philosophical framework someone uses isn't an obvious reason to be suspicious of them. Particularly not when Sam ostensibly reached similar conclusions to Will about global priorities, and MacAskill himself has obviously been profoundly influenced by utilitarian philosophers in his goals too.

But I do think there's one specific area where SBF's public philosophical statements were extremely alarming even at the time, and he was doing so whilst i... (read more)

fwiw, I wouldn't generally expect "high confidence in utilitarianism" per se to be any cause for concern. (I have high confidence in something close to utilitarianism -- in particular, I have near-zero credence in deontology -- but I can't imagine that anyone who really knows how I think about ethics would find this the least bit practically concerning.)

Note that Will does say a bit in the interview about why he doesn't view SBF's utilitarian beliefs as a major explanatory factor here (the fraud was so obviously negative EV, and the big lesson he took from... (read more)

The 3% figure for utilitarianism strikes me as a bit misleading on it's own, given what else Will said. (I'm not accusing Will of intent to mislead here, he said something very precise that I, as a philosopher, entirely followed, it was just a bit complicated for lay people.) Firstly, he said a lot of the probability space was taken up by error theory, the view that there is no true morality. So to get what Will himself endorses, whether or not there is a true morality, you have to basically subtract an unknown but large amount for his credence in error th... (read more)

2
spencerg
8d
Thanks for putting that together @Wei Dai! Out of curiosity, how did you make that transcript?

Some suggestions for you to consider:

  1. Target a different (non-EA) audience.
  2. Do not say anything or cite any data that could be interpreted or misinterpreted as racist (keeping in mind that some people will be highly motivated to interpret them in this way).
  3. Tailor your message to what you can say/cite. For example, perhaps frame the cause as one of pure justice/fairness (as opposed to consequentialist altruism), e.g., it's simply unfair that some people can not afford genetic enhancement while others can. (Added: But please think this through carefully to
... (read more)
-1
Ives Parr
1mo
Thank you. I think these are good suggestions.

Then I think for practical decision-making purposes we should apply a heavy discount to world A) — in that world, what everyone else would eventually want isn’t all that close to what I would eventually want. Moreover what me-of-tomorrow would eventually want probably isn’t all that close to what me-of-today would eventually want. So it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights. Moreover, even though these worlds possibly differ significantly, I don’t feel like from my present position

... (read more)
2
Owen Cotton-Barratt
1mo
4 is a great point, thanks. On 1--3, I definitely agree that I may prudentially prefer some possibilities than others. I've been assuming that from a consequentialist moral perspective the distribution of future outcomes still looks like the one I give in this post, but I guess it should actually look quite different. (I think what's going on is that in some sense I don't really believe in world A, so haven't explored the ramifications properly.)

Thanks, lots of interesting articles in this list that I missed despite my interest in this area.

One suggestion I have is to add some studies of failed attempts at building/reforming institutions, otherwise one might get a skewed view of the topic. (Unfortunately I don't have specific readings to suggest.)

A related topic you don't mention here (maybe due to lack of writings on it?) is maybe humanity should pause AI development and have a long (or even short!) reflection about what it wants to do next, e.g. resume AI development or do something else like su... (read more)

3
Ryan Greenblatt
1mo
I think all of: * Many people seem to believe in something like "AI will be a big deal, but the singularity is much further off (or will never happen)". * People treat the singularity in far mode even if they admit belief. * Previously commited people (especially academics) don't shift their interests or research areas much based on events in the world, though they do rebrand their prior interests. It requires new people entering fields to actually latch onto new areas and there hasn't been enough time for this. * People who approach these topics from an altruistic perspective often come away with the view "probably we can mostly let the AIs/future figure this out, other topics seems more pressing and more possible to make progress on. * There aren't clear shovel ready projects.

We have to make judgment calls about how to structure our reflection strategy. Making those judgment calls already gets us in the business of forming convictions. So, if we are qualified to do that (in “pre-reflection mode,” setting up our reflection procedure), why can’t we also form other convictions similarly early?

  1. I'm very confused/uncertain about many philosophical topics that seem highly relevant to morality/axiology, such as the nature of consciousness and whether there is such a thing as "measure" or "reality fluid" (and if so what is it based
... (read more)
2
Lukas_Gloor
1mo
Thank you for engaging with my post!! :) I don't think of "convinctions" as anywhere near as strong as hard-coding something. "Convictions," to me,  is little more than "whatever makes someone think that they're very confident they won't change their mind." Occasionally, someone will change their minds about stuff even after they said it's highly unlikely. (If this happens too often, one has a problem with calibration, and that would be bad by the person's own lights, for obvious reasons. It seems okay/fine/to-be-expected for this to happen infrequently.) I say "litte more than [...]" rather than "is exactly [...]" because convictions are things that matter in the context of one's life goals. As such, there's a sense of importance attached to them, which will make people more concerned than usual about changing their views for reasons they wouldn't endorse (while still staying open for low-likelihood ways of changing their minds through a process they endorse!). (Compare this to: "I find it very unlikely that I'd ever come to like the taste of beetroot." If I did change my mind on this later because I joined a community where liking beetroot is seen as very cool, and I get peer-pressured into trying it a lot and trying to form positive associations with it when I eat it, and somehow this ends up working and I actually come to like it, I wouldn't consider this to be as much of a tragedy as if a similar thing happened with my moral convictions.)  Some people can't help it. I think this has a lot to do with reasoning styles. Since you're one of the people on LW/EA forum who place the most value on figuring out things related to moral uncertainty (and metaphilosophy), it seems likely that you're more towards the far end of the spectrum of reasoning styles around this. (It also seems to me that you have a point, that these issues are indeed important/underappreciated – after all, I wrote a book-length sequence on something that directly bears on these questions, but c

I put the full report here so you don't have to wait for them to email it to you.

Anyone with thoughts on what went wrong with EA's involvement in OpenAI? It's probably too late to apply any lessons to OpenAI itself, but maybe not too late elsewhere (e.g., Anthropic)?

4
Nick K.
2mo
At the risk of sounding, it's really not clear to me that anything "went wrong" - from my outside perspective, it's not like there was a clear mess-up on the part of EA's anywhere here, just a difficult situation managed to the best of people's abilities. That doesn't mean that it's not worth pondering whether there's any aspect that had been handled badly, or more broadly what one can take away from this situation (although we should beware to over-update on single notable events). But, not knowing the counterfactuals, and absent a clear picture of what things "going right" would have looked like, it's not evident that this should be chalked up as a failing on the part of EA.
3
JWS
2mo
I think answers to this are highly downstream of object-level positions. If you think timelines are short and scaled-up versions of current architectures will lead to AGI, then 'what went wrong' is contributing to vastly greater chance of extinction. If you don't agree with the above, then 'what went wrong' is overly dragging EA's culture and perception to be focused on AI-Safety, and causing great damage to all of EA (even non-AI-Safety parts) when the OpenAI board saga blew up in Toner & McCauleys' faces. Lessons are probably downstream of this diagnosis. My general lesson aligns with Bryan's recent post - man is EA bad about communicating what it is, and despite the OpenAI fiasco not being an attempted EA-coup motivated by Pascal's mugging longtermist concerns, it seems so many people have that as a 'cached explanation' of what went on. Feels to me like that is a big own goal and was avoidable. Also on OpenAI, I think it's bad that people like Joshua Achiam who do good work at OpenAI seem to really dislike EA. That's a really bad sign - feels like the AI Safety community could have done more not to alienate people like him maybe.

While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.

Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.

My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?

  1. Being wrong or overconfident, distracting people or harming the world with bad ideas.
    1. Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writing
... (read more)
2
Ofer
3mo
There's also the unilateralist's curse: suppose someone publishes an essay about a dangerous, viral idea that they misjudge to be net-positive; after 20 other people also thought about it but judged it to be net-negative.

@Will Aldred I forgot to mention that I do have the same concern about "safety by eating marginal probability" on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn't really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.

Thanks for the comment. I agree that what you describe is a hard part of the overall problem. I have a partial plan, which is to solve (probably using analytic methods) metaphilosophy for both analytic and non-analytic philosophy, and then use that knowledge to determine what to do next. I mean today the debate between the two philosophical traditions is pretty hopeless, since nobody even understands what people are really doing when they do analytic or non-analytic philosophy. Maybe the situation will improve automatically when metaphilosophy has been sol... (read more)

  1. Just talking more about this problem would be a start. It would attract more attention and potentially resources to the topic, and make people who are trying to solve it feel more appreciated and less lonely. I'm just constantly confused why I'm the only person who frequently talks about it in public, given how obvious and serious the problem seems to me. It was more understandable before ChatGPT put AI on everyone's radar, but now it's just totally baffling. And I appreciate you writing this comment. My posts on the topic usually get voted up, but with f
... (read more)
5
Wei Dai
3mo
@Will Aldred I forgot to mention that I do have the same concern about "safety by eating marginal probability" on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn't really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.
Answer by Wei DaiJan 15, 20242
0
0

How should we deal with the possibility/risk of AIs inherently disfavoring all the D's that Vitalik wants to accelerate? See my Twitter thread replying to his essay for more details.

Answer by Wei DaiJan 11, 202419
7
0

The universe can probably support a lot more sentient life if we convert everything that we can into computronium (optimized computing substrate) and use it to run digital/artificial/simulated lives, instead of just colonizing the universe with biological humans. To conclude that such a future doesn't have much more potential value than your 2010 world, we would have to assign zero value to such non-biological lives, or value each of them much less than a biological human, or make other very questionable assumptions. The Newberry 2021 paper that Vasco Gril... (read more)

1
Hayven Frienby
3mo
Such lives wouldn't be human or even "lives" in any real, biological sense, and so yes, I consider them to be of low value compared to biological sentient life (humans, other animals, even aliens should they exist). These "digital persons" would be AIs, machines- with some heritage from humanity, yes, but let's be clear: they aren't us. To be human is to be biological, mortal, and Earthbound -- those three things are essential traits of Homo sapiens. If those traits aren't there, one isn't human, but something else, even if one was once human. "Digitizing" humanity (or even the entire universe, as suggested in the Newberry paper) would be destroying it, even if it is an evolution of sorts. If there's one issue with the EA movement that I see, it's that our dreams are far too big. We are rationalists, but our ultimate vision for the future of humanity is no less esoteric than the visions of Heavens and Buddha fields written by the mystics--it is no less a fundamental shift in consciousness, identity, and mode of existence.  Am I wrong for being wary of this on a more than instrumental level (I would argue that even Yudkowsky's objections are merely instrumental, centered on x- and s-risk alone)? I mean, what would be suboptimal about a sustainable, Earthen existence for us and our descendants? Is it just the numbers (can the value of human lives necessarily be measured mathematically, much less in numbers)?

I think, as a matter of verifiable fact, that if people solve the technical problems of AI alignment, they will use AIs to maximize their own economic consumption, rather than pursue broad utilitarian goals like “maximize the amount of pleasure in the universe”.

If you extrapolate this out to after technological maturity, say 1 million years from now, what does selfish "economic consumption" look like? I tend to think that people's selfish desires will be fairly easily satiated once everyone is much much richer and the more "scalable" "moral" values woul... (read more)

Why does "mundane economic forces" cause resources to be consumed towards selfish ends?

Because most economic agents are essentially selfish. I think this is currently true, as a matter of empirical fact. People spend the vast majority of their income on themselves, their family, and friends, rather than using their resources to pursue utilitarian/altruistic ideals. 

I think the behavioral preferences of actual economic consumers, who are not mostly interested in changing their preferences via philosophical reflection, will more strongly shape the futur... (read more)

To be sure, ensuring AI development proceeds ethically is a valuable aim, but I claim this goal is *not *the same thing as “AI alignment”, in the sense of getting AIs to try to do what people want.

There was at least one early definition of "AI alignment" to mean something much broader:

The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good outcomes in the real world.

I've argued that we should keep using this broa... (read more)

7
Steven Byrnes
4mo
I’ve been using the term “Safe And Beneficial AGI” (or more casually, “awesome post-AGI utopia”) as the overarching “go well” project, and “AGI safety” as the part where we try to make AGIs that don’t accidentally [i.e. accidentally from the human supervisors’ / programmers’ perspective] kill everyone, and (following common usage according to OP) “Alignment” for “The AGI is trying to do things that the AGI designer had intended for it to be trying to do”. (I didn’t make up the term “Safe and Beneficial AGI”. I think I got it from Future of Life Institute. Maybe they in turn got it from somewhere else, I dunno.) (See also: my post Safety ≠ alignment (but they’re close!)) See also a thing I wrote here:

There was at least one early definition of "AI alignment" to mean something much broader:

I agree. I have two main things to say about this point:

  • My thesis is mainly empirical. I think, as a matter of verifiable fact, that if people solve the technical problems of AI alignment, they will use AIs to maximize their own economic consumption, rather than pursue broad utilitarian goals like "maximize the amount of pleasure in the universe". My thesis is independent of whatever we choose to call "AI alignment".
  • Separately, I think the war over the semantic battle
... (read more)
Answer by Wei DaiDec 16, 20237
1
0

The Main Sources of AI Risk?

Not exactly what you're asking for, but you could use it as a reference for all of the significant risks that different people have brought up, to select which ones you want to further research and address in your response post.

Tell me more about these "luxurious AI safety retreats"? I haven't been to an AI safety workshop in several years, and wonder if something has changed. From searching the web, I found this:

photo from AI Safety Europe Retreat 2023

and this:

I was there for an AI workshop earlier this year in Spring and stayed for 2 or 3 days, so let me tell you about the 'luxury' of the 'EA castle': it's a big, empty, cold, stone box, with an awkward layout. (People kept getting lost trying to find the bathroom or a specific room.) Most of the furnishings were gone. Much of the layout you can see in Google Maps w

... (read more)
-2
Vaipan
4mo
That's one example, it is only one though; many other fellowships are very well-paid, up to 3000 euros per month, I'm thinking SERI/CHERI/CERI 

I thought about this and wrote down some life events/decisions that probably contributed to becoming who I am today.

  • Immigrating to the US at age 10 knowing no English. Social skills deteriorated while learning language, which along with lack of cultural knowledge made it hard to make friends during teenage and college years, which gave me a lot of free time that I filled by reading fiction and non-fiction, programming, and developing intellectual interests.
  • Was heavily indoctrinated with Communist propaganda while in China, but leaving meant I then had n
... (read more)

The main thing is that the clean distinction between attackers and defenders in the theory of the offense-defense balance does not exist in practice. All attackers are also defenders and vice-versa.

I notice that this doesn't seem to apply to the scenario/conversation you started this post with. If a crazy person wants to destroy the world with an AI-created bioweapon, he's not also a defender.

Another scenario I worry about is AIs enabling value lock-in, and then value locked-in AIs/humans/groups would have an offensive advantage in manipulating other pe... (read more)

If a crazy person wants to destroy the world with an AI-created bioweapon

Or, more concretely, nuclear weapons. Leaving aside regular full-scale nuclear war (which is censored from the graph for obvious reasons), this sort of graph will never show you something like Edward Teller's "backyard bomb", or a salted bomb. (Or any of the many other nuclear weapon concepts which never got developed, or were curtailed very early in deployment like neutron bombs, for historically-contingent reasons.)

There is, as far as I am aware, no serious scientific doubt that ... (read more)

Answer by Wei DaiDec 09, 202319
3
0

Why aren't there more people like him, and what is he doing or planning to do about that?

Related question: How does one become someone like Carl Shulman (or Wei Dai, for that matter)?

It seems like you're basically saying "evolution gave us reason, which some of us used to arrive at impartiality" which doesn't seem very different from my thinking which I alluded to in my opening comment (except that I used "philosophy" instead of "reason). Does that seem fair, or am I rounding you off too much, or otherwise missing your point?

Yes and no: "evolution gave us reason" is the same sort of coarse approximation as "evolution gave us the ability and desire to compete in status games". What we really have is a sui generis thing which can, in the right environment, approximate ideal reasoning or Machiavellian status-seeking or coalition-building or utility maximization or whatever social theory of everything you want to posit, but which most of the time is trying to split the difference. 

People support impartial benevolence because they think they have good pragmatic reasons to do s... (read more)

I agree that was too strong or over simplified. Do you think there are other evolutionary perspectives from which impartiality is less surprising?

6
prisonpent
5mo
I don't think it's possible to give an evolutionary account of impartiality in isolation, any more than you can give one for algebraic geometry or christology or writing or common-practice tonality. The underlying capabilities (e.g. intelligence, behavioral plasticity, language) are biological, but the particular way in which they end up expressed is not. We might find a thermodynamic explanation of the origin of self-replicating molecules, but a thermodynamic explanation of the reproductive cycle of ferns isn't going to fit in a human brain. You have to move to a higher level of organization to say anything intelligible. Reason, similarly, is likely the sort of thing that admits a good evolutionary explanation, but individual instances of reasoning can only really be explained in psychological terms.

Thanks, I didn't know some of this history.

The Altman you need to distrust & assume bad faith of & need to be paranoid about stealing your power is also usually an Altman who never gave you any power in the first place! I’m still kinda baffled by it, personally.

Two explanations come to my mind:

  1. Past Sam Altman didn't trust his future self, and wanted to use the OpenAI governance structure to constrain himself.
  2. His status game / reward gradient changed (at least subjectively from his perspective). At the time it was higher status to give EA mor
... (read more)
5
trevor1
5mo
1. I think it's reasonable to think that "Constraining Sam in the future" was obviously a highly pareto-efficient deal. EA had every reason to want Sam constrained in the future. Sam had every reason to make that trade, gaining needed power in the short-term, in exchange for more accountability and oversight in the future.  This is clearly a sensible trade that actual good guys would make; not "Sam didn't trust his future self" but rather "Sam had every reason to agree to sell off his future autonomy in exchange for cooperation and trust in the near term". 2. I think "the world changing around Sam and EA, rather than Sam or EA changing" is worth more nuance. I think that, over the last 5 years, the world changed to make groups of humans vastly more vulnerable than before, due to new AI capabilities facilitating general-purpose human manipulation and the world's power players investing in those capabilities.  This dramatically increased the risk of outsider third parties creating or exploiting divisions in the AI safety community, to turn people against each other and use the chaos as a ladder. Given that this risk was escalating, then centralizing power was clearly the correct move in response. I've been warning about this during the months before the OpenAI conflict started, in the preceeding weeks (including the concept of an annual discount rate for each person, based on the risk of that person becoming cognitively compromised and weaponized against the AI safety community), and I even described the risk of one of the big tech companies hijacking Anthropic 5 days before Sam Altman was dismissed. I think it's possible that Sam or people in EA also noticed the world rapidly becoming less safe for AI safety orgs, discovering the threat from a different angle than I did.

It seems that EA tried to "play politics" with Sam Altman and OpenAI, by doing things like backing him with EA money and credibility (in exchange for a board seat) without first having high justifiable trust in him, generally refraining from publicly (or even privately, from what I can gather) criticizing Sam and OpenAI, Helen Toner apologizing to Sam/OpenAI for expressing even mild criticism in an academic paper, springing a surprise attack or counterattack on Sam by firing him without giving any warning or chance to justify himself.

I wonder how much of t... (read more)

gwern
5mo117
10
2
4
14
6

EDIT: this is going a bit viral, and it seems like many of the readers have missed key parts of the reporting. I wrote this as a reply to Wei Dai and a high-level summary for people who were already familiar with the details; I didn't write this for people who were unfamiliar, and I'm not going to reference every single claim in it, as I have generally referenced them in my prior comments/tweets and explained the details & inferences there. If you are unaware of aspects like 'Altman was trying to get Toner fired' or pushing out Hoffman or how Slack was... (read more)

a lot of LessWrong writing refers to 'status', but they never clearly define what it is or where the evidence and literature for it is

Two citations that come to mind are Geoffrey Miller's Virtue Signaling and Will Storr's The Status Game (maybe also Robin Hanson's book although its contents are not as fresh in my mind), but I agree that it's not very scientific or well studied (unless there's a body of literature on it that I'm unfamiliar with), which is something I'd like to see change.

Maybe it's instead a kind of non-reductionist sense of existing a

... (read more)
8
titotal
5mo
I think "status" plays some part in the answers to these, but only a fairly small one.  Why do moralities vary across different communities? Primarily because they are raised in different cultures with different prevalent beliefs. We then modify those beliefs from the baseline as we encounter new ideas and new events, and often end up seeking out other people with shared values to be friends with. But the majority of people aren't just pretending to hold those beliefs to fit in (although that does happen), the majority legitimately believe what they say.  Why do communities get extreme? Well, consult the literature on radicalisation, there are a ton of factors. A vivid or horrible event or ongoing trauma sometimes triggers an extreme response. Less radical members of groups might leave, making the average more radical, so even more moderates leave or split, until the group is just radicals.  As to why we fail to act according to their values, people generally have competing values, including self-preservation and instincts, and are not perfectly rational. Sometimes the primal urge to eat a juicy burger overcomes the calculated belief that eating meat is wrong.  These are all amateur takes, a sociologist could probably answer better. 

One, be more skeptical when someone says they are committed to impartially do the most good, and keep in mind that even if they're totally sincere, that commitment may well not hold when their local status game changes, or if their status gradient starts diverging from actual effective altruism. Two, form a more explicit and detailed model of how status considerations + philosophy + other relevant factors drive the course of EA and other social/ethical movements, test this model empirically, basically do science on this and use it to make predictions and i... (read more)

JWS
5mo14
4
1

Hey Wei, I appreciate you responding to Mo, but I found myself still confused after reading this reply. This isn't purely down to you - a lot of LessWrong writing refers to 'status', but they never clearly define what it is or where the evidence and literature for it is.[1] To me, it seem to function as this magic word that can explain anything and everything. The whole concept of 'status' as I've seen it used in LW seems incredibly susceptible to being part of 'just-so' stories.

I'm highly sceptical of this though, like I don't know what a 'status gra... (read more)

From an evolution / selfish gene's perspective, the reason I or any human has morality is so we can win (or at least not lose) our local virtue/status game. Given this, it actually seems pretty wild that anyone (or more than a handful of outliers) tries to be impartial. (I don't have a good explanation of how this came about. I guess it has something to do with philosophy, which I also don't understand the nature of.)

BTW, I wonder if EAs should take the status game view of morality more seriously, e.g., when thinking about how to expand the social movement... (read more)

9
prisonpent
5mo
If you're talking about status games at all, then not only have you mostly rounded the full selective landscape off to the organism level, you've also taken a fairly low resolution model of human sociality and held it fixed (when it's properly another part of the phenotype). Approximations like this, if not necessarily these ones in particular, are of course necessary to get anywhere in biology - but that doesn't make them any less approximate. If you want to talk about the evolution of some complex psychological trait, you need to provide a very clear account of how you're operationalizing it and explain why your model's errors (which definitely exist) aren't large enough to matter in its domain of applicability (which is definitely not everything). I don't think rationalist-folk-evopsych has done this anywhere near thoroughly enough to justify strong claims about "the" reason moral beliefs exist.

What might EAs taking the status game view more seriously look like, more concretely? I'm a bit confused since from my outside-ish perspective it seems the usual markers of high status are already all there (e.g. institutional affiliation, large funding, [speculatively] OP's CJR work, etc), so I'm not sure what doing more on the margin might look like. Alternatively I may just be misunderstanding what you have in mind. 

What is a plausible source of x-risk that is 10% per century for the rest of time? It seems pretty likely to me that not long after reaching technological maturity, future civilization would reduce x-risk per century to a much lower level, because you could build a surveillance/defense system against all known x-risks, and not have to worry about new technology coming along and surprising you.

It seems that to get a constant 10% per century risk, you'd need some kind of existential threat for which there is no defense (maybe vacuum collapse), or for which t... (read more)

8[anonymous]7mo
What does technological maturity mean?

I’m confused about how it’s possible to know whether someone is making substantive progress on metaphilosophy; I’d be curious if you have pointers.

I guess it's the same as any other philosophical topic, either use your own philosophical reasoning/judgement to decide how good the person's ideas/arguments are, and/or defer to other people's judgements. The fact that there is currently no methodology for doing this that is less subjective and informal is a major reason for me to be interested in metaphilosophy, since if we solve metaphilosophy that will ho... (read more)

Any thoughts on Meta Questions about Metaphilosophy from a grant maker perspective? For example have you seen any promising grant proposals related to metaphilosophy or ensuring philosophical competence of AI / future civilization, that you rejected due to funding constraints or other reasons?

4
Linch
8mo
(Speaking for myself) It seems pretty interesting. If I understand your position correctly, I'm also worried about developing and using AGI before we're a philosophically competent civilization, though my own framing is more like "man it'd be kind of sad if we lost most of the value of the cosmos because we sent von Neumann probes before knowing what to load the probes with."  I'm confused about how it's possible to know whether someone is making substantive progress on metaphilosophy; I'd be curious if you have pointers.  As a practical matter, I don't recall any applications related to metaphilosophy coming across my desk, or voting on metaphilosophy grants that other people investigated. The closest I can think of are applicants for a few different esoteric applications of decision theory. I'll let others at the fund speak about their experiences.
Load more