The salient thing to notice is that this person wants to burn your house down.
In your example, after I notice this, I would call the police to report this person. What do you think I should do (or what does David want me to do) after noticing the political agenda of the people he mentioned? My own natural inclination is to ignore them and keep doing what I was doing before, because it seems incredibly unlikely that their agenda would succeed, given the massive array of political enemies that such agenda has.
I was concerned that after the comment was initially downvoted to -12, it would be hidden from the front page and not enough people would see it to vote it back into positive territory. It didn't work out that way, but perhaps could have?
I want to note that within a few minutes of posting the parent comment, it received 3 downvotes totaling -14 (I think they were something like -4, -5, -5, i.e., probably all strong downvotes) with no agreement or disagreement votes, and subsequently received 5 upvotes spread over 20 hours (with no further downvotes AFAIK) that brought the net karma up to 16 as of this writing. Agreement/disagreement is currently 3/1.
This pattern of voting seems suspicious (e.g., why were all the downvotes clustered so closely in time). I reported the initial cluster of dow...
I think too much moral certainty doesn't necessarily cause someone to be dangerous by itself, and there has to be other elements to their personality or beliefs. For example lots of people are or were unreasonably certain about divine command theory[1], but only a minority of them caused much harm (e.g. by being involved in crusades and inquisitions). I'm not sure it has much to do with realism vs non-realism though. I can definitely imagine some anti-realist (e.g., one with strong negative utilitarian beliefs) causing a lot of damage if they were put in c...
It's entirely possible that I misinterpreted David. I asked for clarification from David in the original comment if that was the case, but he hasn't responded so far. If you want to offer your own interpretation, I'd be happy to hear it out.
I'm saying that you can't determine the truth about an aspect of reality (in this case, what cause group differences in IQ), when both sides of a debate over it are pushing political agendas, by looking at which political agenda is better. (I also think one side of it is not as benign as you think, but that's besides the point.)
I actually don't think this IQ debate is one that EAs should get involved in, and said as much to Ives Parr. But if people practice or advocate for what seem to me like bad epistemic norms, I feel an obligation to push back on that.
More specifically, you don't need to talk about what causes group differences in IQ to make a consequentialist case for genetic enhancement, since there is no direct connection between what causes existing differences and what the best interventions are. So one possible way forward is just to directly compare the cost-effectiveness of different ways of raising intelligence.
Materialism is an important trait in individuals, and plausibly could be an important difference between groups. (Certainly the history of the Jewish people attests to the fact that it has been considered important in groups!) But the horrific recent history of false hypotheses about innate Jewish behavior helps us see how scientifically empty and morally bankrupt such ideas really are.
Coincidentally, I recently came across an academic paper that proposed a partial explanation of the current East Asian fertility crisis (e.g., South Korea's fertility dec...
Note that Will does say a bit in the interview about why he doesn't view SBF's utilitarian beliefs as a major explanatory factor here (the fraud was so obviously negative EV, and the big lesson he took from the Soltes book on white-collar crime was that such crime tends to be more the result of negligence and self-deception than deliberate, explicit planning to that end).
I disagree with Will a bit here, and think that SBF's utilitarian beliefs probably did contribute significantly to what happened, but perhaps somewhat indirectly, by 1) giving him large...
We just wrote a textbook on the topic together (the print edition of utilitarianism.net)! In the preface, we briefly relate our different attitudes here: basically, I'm much more confident in the consequentialism part, but sympathetic to various departures from utilitarian (and esp. hedonistic) value theory, whereas Will gives more weight to non-consequentialist alternatives (more for reasons of peer disagreement than any intrinsic credibility, it seems), but is more confident that classical hedonistic utilitarianism is the best form of consequentialism.
I agree it'd be fun for us to explore the disagreement further sometime!
My memory of the podcast (could be wrong, only listened once!) is that Will said that, conditional on error theory being false, his credence in consequentialism, is about 0.5.
I think he meant conditional on error theory being false, and also on not "some moral view we've never thought of".
Here's a quote of what Will said starting at 01:31:21: "But yeah, I tried to work through my credences once and I think I ended up in like 3% in utilitarianism or something like. I mean large factions go to, you know, people often very surprised by this, but large fact...
If future humans were in the driver’s seat instead, but with slightly more control over the process
Why only "slightly" more control? It's surprising to see you say this without giving any reasons or linking to some arguments, as this degree of alignment difficulty seems like a very unusual position that I've never seen anyone argue for before.
The source code was available, but if someone wanted to claim compliance with the NIST standard (in order to sell their product to the federal government, for example), they had to use the pre-compiled executable version.
I guess there's a possibility that someone could verify the executable by setting up an exact duplicate of the build environment and re-compiling from source. I don't remember how much I looked into that possibility, and whether it was infeasible or just inconvenient. (Might have been the former; I seem to recall the linker randomizing some addresses in the binary.) I do know that I never documented a process to recreate the executable and nobody asked.
It’s not clear to me why human vs. AIs would make war more likely to occur than in the human vs. human case, if by assumption the main difference here is that one side is more rational.
We have more empirical evidence that we can look at when it comes to human-human wars, making it easier to have well-calibrated beliefs about chances of winning. When it comes to human-AI wars, we're more likely to have wildly irrational beliefs.
This is just one reason war could occur though. Perhaps a more likely reason is that there won't be a way to maintain the peace,...
What are some failure modes of such an agency for Paul and others to look out for? (I shared one anecdote with him, about how a NIST standard for "crypto modules" made my open source cryptography library less secure, by having a requirement that had the side effect that the library could only be certified as standard-compliant if it was distributed in executable form, forcing people to trust me not to have inserted a backdoor into the executable binary, and then not budging when we tried to get an exception for this requirement.)
I've looked into the game theory of war literature a bit, and my impression is that economists are still pretty confused about war. As you mention, the simplest model predicts that rational agents should prefer negotiated settlements to war, and it seems unsettled what actually causes wars among humans. (People have proposed more complex models incorporating more elements of reality, but AFAIK there isn't a consensus as to which model gives the best explanation of why wars occur.) I think it makes sense to be aware of this literature and its ideas, but the...
I was curious why given Will's own moral uncertainty (in this interview he mentioned having only 3% credence in utilitarianism) he wasn't concerned about SBF's high confidence in utilitarianism, but didn't hear the topic addressed. Maybe @William_MacAskill could comment on it here?
One guess is that apparently many young people in EA are "gung ho" on utilitarianism (mentioned by Spencer in this episode), so perhaps Will just thought that SBF isn't unusual in that regard? One lesson could be that such youthful over-enthusiasm is more dangerous than it seems,...
I feel like it's more relevant what a person actually believes than whether they think of themselves as uncertain. Moral certainty seems directly problematic (in terms of risks of recklessness and unilateral action) only when it comes together with moral realism: If you think you know the single correct moral theory, you'll consider yourself justified to override other people's moral beliefs and thwart the goals they've been working towards.
By contrast, there seems to me to be no clear link from "anti-realist moral certainty in some subjectivist axiology" ...
I don't think the "3% credence in utilitarianism" is particularly meaningful; doubting the merits of a particular philosophical framework someone uses isn't an obvious reason to be suspicious of them. Particularly not when Sam ostensibly reached similar conclusions to Will about global priorities, and MacAskill himself has obviously been profoundly influenced by utilitarian philosophers in his goals too.
But I do think there's one specific area where SBF's public philosophical statements were extremely alarming even at the time, and he was doing so whilst i...
fwiw, I wouldn't generally expect "high confidence in utilitarianism" per se to be any cause for concern. (I have high confidence in something close to utilitarianism -- in particular, I have near-zero credence in deontology -- but I can't imagine that anyone who really knows how I think about ethics would find this the least bit practically concerning.)
Note that Will does say a bit in the interview about why he doesn't view SBF's utilitarian beliefs as a major explanatory factor here (the fraud was so obviously negative EV, and the big lesson he took from...
The 3% figure for utilitarianism strikes me as a bit misleading on it's own, given what else Will said. (I'm not accusing Will of intent to mislead here, he said something very precise that I, as a philosopher, entirely followed, it was just a bit complicated for lay people.) Firstly, he said a lot of the probability space was taken up by error theory, the view that there is no true morality. So to get what Will himself endorses, whether or not there is a true morality, you have to basically subtract an unknown but large amount for his credence in error th...
Some suggestions for you to consider:
...Then I think for practical decision-making purposes we should apply a heavy discount to world A) — in that world, what everyone else would eventually want isn’t all that close to what I would eventually want. Moreover what me-of-tomorrow would eventually want probably isn’t all that close to what me-of-today would eventually want. So it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights. Moreover, even though these worlds possibly differ significantly, I don’t feel like from my present position
Thanks, lots of interesting articles in this list that I missed despite my interest in this area.
One suggestion I have is to add some studies of failed attempts at building/reforming institutions, otherwise one might get a skewed view of the topic. (Unfortunately I don't have specific readings to suggest.)
A related topic you don't mention here (maybe due to lack of writings on it?) is maybe humanity should pause AI development and have a long (or even short!) reflection about what it wants to do next, e.g. resume AI development or do something else like su...
We have to make judgment calls about how to structure our reflection strategy. Making those judgment calls already gets us in the business of forming convictions. So, if we are qualified to do that (in “pre-reflection mode,” setting up our reflection procedure), why can’t we also form other convictions similarly early?
Anyone with thoughts on what went wrong with EA's involvement in OpenAI? It's probably too late to apply any lessons to OpenAI itself, but maybe not too late elsewhere (e.g., Anthropic)?
While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.
Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.
My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?
@Will Aldred I forgot to mention that I do have the same concern about "safety by eating marginal probability" on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn't really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.
Thanks for the comment. I agree that what you describe is a hard part of the overall problem. I have a partial plan, which is to solve (probably using analytic methods) metaphilosophy for both analytic and non-analytic philosophy, and then use that knowledge to determine what to do next. I mean today the debate between the two philosophical traditions is pretty hopeless, since nobody even understands what people are really doing when they do analytic or non-analytic philosophy. Maybe the situation will improve automatically when metaphilosophy has been sol...
How should we deal with the possibility/risk of AIs inherently disfavoring all the D's that Vitalik wants to accelerate? See my Twitter thread replying to his essay for more details.
The universe can probably support a lot more sentient life if we convert everything that we can into computronium (optimized computing substrate) and use it to run digital/artificial/simulated lives, instead of just colonizing the universe with biological humans. To conclude that such a future doesn't have much more potential value than your 2010 world, we would have to assign zero value to such non-biological lives, or value each of them much less than a biological human, or make other very questionable assumptions. The Newberry 2021 paper that Vasco Gril...
I think, as a matter of verifiable fact, that if people solve the technical problems of AI alignment, they will use AIs to maximize their own economic consumption, rather than pursue broad utilitarian goals like “maximize the amount of pleasure in the universe”.
If you extrapolate this out to after technological maturity, say 1 million years from now, what does selfish "economic consumption" look like? I tend to think that people's selfish desires will be fairly easily satiated once everyone is much much richer and the more "scalable" "moral" values woul...
Why does "mundane economic forces" cause resources to be consumed towards selfish ends?
Because most economic agents are essentially selfish. I think this is currently true, as a matter of empirical fact. People spend the vast majority of their income on themselves, their family, and friends, rather than using their resources to pursue utilitarian/altruistic ideals.
I think the behavioral preferences of actual economic consumers, who are not mostly interested in changing their preferences via philosophical reflection, will more strongly shape the futur...
To be sure, ensuring AI development proceeds ethically is a valuable aim, but I claim this goal is *not *the same thing as “AI alignment”, in the sense of getting AIs to try to do what people want.
There was at least one early definition of "AI alignment" to mean something much broader:
The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good outcomes in the real world.
I've argued that we should keep using this broa...
There was at least one early definition of "AI alignment" to mean something much broader:
I agree. I have two main things to say about this point:
Not exactly what you're asking for, but you could use it as a reference for all of the significant risks that different people have brought up, to select which ones you want to further research and address in your response post.
Tell me more about these "luxurious AI safety retreats"? I haven't been to an AI safety workshop in several years, and wonder if something has changed. From searching the web, I found this:
and this:
...I was there for an AI workshop earlier this year in Spring and stayed for 2 or 3 days, so let me tell you about the 'luxury' of the 'EA castle': it's a big, empty, cold, stone box, with an awkward layout. (People kept getting lost trying to find the bathroom or a specific room.) Most of the furnishings were gone. Much of the layout you can see in Google Maps w
I thought about this and wrote down some life events/decisions that probably contributed to becoming who I am today.
The main thing is that the clean distinction between attackers and defenders in the theory of the offense-defense balance does not exist in practice. All attackers are also defenders and vice-versa.
I notice that this doesn't seem to apply to the scenario/conversation you started this post with. If a crazy person wants to destroy the world with an AI-created bioweapon, he's not also a defender.
Another scenario I worry about is AIs enabling value lock-in, and then value locked-in AIs/humans/groups would have an offensive advantage in manipulating other pe...
If a crazy person wants to destroy the world with an AI-created bioweapon
Or, more concretely, nuclear weapons. Leaving aside regular full-scale nuclear war (which is censored from the graph for obvious reasons), this sort of graph will never show you something like Edward Teller's "backyard bomb", or a salted bomb. (Or any of the many other nuclear weapon concepts which never got developed, or were curtailed very early in deployment like neutron bombs, for historically-contingent reasons.)
There is, as far as I am aware, no serious scientific doubt that ...
Why aren't there more people like him, and what is he doing or planning to do about that?
It seems like you're basically saying "evolution gave us reason, which some of us used to arrive at impartiality" which doesn't seem very different from my thinking which I alluded to in my opening comment (except that I used "philosophy" instead of "reason). Does that seem fair, or am I rounding you off too much, or otherwise missing your point?
Yes and no: "evolution gave us reason" is the same sort of coarse approximation as "evolution gave us the ability and desire to compete in status games". What we really have is a sui generis thing which can, in the right environment, approximate ideal reasoning or Machiavellian status-seeking or coalition-building or utility maximization or whatever social theory of everything you want to posit, but which most of the time is trying to split the difference.
People support impartial benevolence because they think they have good pragmatic reasons to do s...
I agree that was too strong or over simplified. Do you think there are other evolutionary perspectives from which impartiality is less surprising?
Thanks, I didn't know some of this history.
The Altman you need to distrust & assume bad faith of & need to be paranoid about stealing your power is also usually an Altman who never gave you any power in the first place! I’m still kinda baffled by it, personally.
Two explanations come to my mind:
It seems that EA tried to "play politics" with Sam Altman and OpenAI, by doing things like backing him with EA money and credibility (in exchange for a board seat) without first having high justifiable trust in him, generally refraining from publicly (or even privately, from what I can gather) criticizing Sam and OpenAI, Helen Toner apologizing to Sam/OpenAI for expressing even mild criticism in an academic paper, springing a surprise attack or counterattack on Sam by firing him without giving any warning or chance to justify himself.
I wonder how much of t...
EDIT: this is going a bit viral, and it seems like many of the readers have missed key parts of the reporting. I wrote this as a reply to Wei Dai and a high-level summary for people who were already familiar with the details; I didn't write this for people who were unfamiliar, and I'm not going to reference every single claim in it, as I have generally referenced them in my prior comments/tweets and explained the details & inferences there. If you are unaware of aspects like 'Altman was trying to get Toner fired' or pushing out Hoffman or how Slack was...
a lot of LessWrong writing refers to 'status', but they never clearly define what it is or where the evidence and literature for it is
Two citations that come to mind are Geoffrey Miller's Virtue Signaling and Will Storr's The Status Game (maybe also Robin Hanson's book although its contents are not as fresh in my mind), but I agree that it's not very scientific or well studied (unless there's a body of literature on it that I'm unfamiliar with), which is something I'd like to see change.
...Maybe it's instead a kind of non-reductionist sense of existing a
One, be more skeptical when someone says they are committed to impartially do the most good, and keep in mind that even if they're totally sincere, that commitment may well not hold when their local status game changes, or if their status gradient starts diverging from actual effective altruism. Two, form a more explicit and detailed model of how status considerations + philosophy + other relevant factors drive the course of EA and other social/ethical movements, test this model empirically, basically do science on this and use it to make predictions and i...
Hey Wei, I appreciate you responding to Mo, but I found myself still confused after reading this reply. This isn't purely down to you - a lot of LessWrong writing refers to 'status', but they never clearly define what it is or where the evidence and literature for it is.[1] To me, it seem to function as this magic word that can explain anything and everything. The whole concept of 'status' as I've seen it used in LW seems incredibly susceptible to being part of 'just-so' stories.
I'm highly sceptical of this though, like I don't know what a 'status gra...
From an evolution / selfish gene's perspective, the reason I or any human has morality is so we can win (or at least not lose) our local virtue/status game. Given this, it actually seems pretty wild that anyone (or more than a handful of outliers) tries to be impartial. (I don't have a good explanation of how this came about. I guess it has something to do with philosophy, which I also don't understand the nature of.)
BTW, I wonder if EAs should take the status game view of morality more seriously, e.g., when thinking about how to expand the social movement...
What might EAs taking the status game view more seriously look like, more concretely? I'm a bit confused since from my outside-ish perspective it seems the usual markers of high status are already all there (e.g. institutional affiliation, large funding, [speculatively] OP's CJR work, etc), so I'm not sure what doing more on the margin might look like. Alternatively I may just be misunderstanding what you have in mind.
What is a plausible source of x-risk that is 10% per century for the rest of time? It seems pretty likely to me that not long after reaching technological maturity, future civilization would reduce x-risk per century to a much lower level, because you could build a surveillance/defense system against all known x-risks, and not have to worry about new technology coming along and surprising you.
It seems that to get a constant 10% per century risk, you'd need some kind of existential threat for which there is no defense (maybe vacuum collapse), or for which t...
I’m confused about how it’s possible to know whether someone is making substantive progress on metaphilosophy; I’d be curious if you have pointers.
I guess it's the same as any other philosophical topic, either use your own philosophical reasoning/judgement to decide how good the person's ideas/arguments are, and/or defer to other people's judgements. The fact that there is currently no methodology for doing this that is less subjective and informal is a major reason for me to be interested in metaphilosophy, since if we solve metaphilosophy that will ho...
Any thoughts on Meta Questions about Metaphilosophy from a grant maker perspective? For example have you seen any promising grant proposals related to metaphilosophy or ensuring philosophical competence of AI / future civilization, that you rejected due to funding constraints or other reasons?
I of course also think that philosophical progress, done right, is a good thing. However I also think genuine philosophical progress is much harder than it looks (see Some Thoughts on Metaphilosophy for some relevant background views), and therefore am perhaps more worried than most about philosophical "progress", done wrong, being a bad thing.