In this "quick take", I want to summarize some my idiosyncratic views on AI risk.
My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.
(Note that I won't spend a lot of time justifying each of these views here. I'm mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)
I want to say thank you for holding the pole of these perspectives and keeping them in the dialogue. I think that they are important and it's underappreciated in EA circles how plausible they are.
(I definitely don't agree with everything you have here, but typically my view is somewhere between what you've expressed and what is commonly expressed in x-risk focused spaces. Often also I'm drawn to say "yeah, but ..." -- e.g. I agree that a treacherous turn is not so likely at global scale, but I don't think it's completely out of the question, and given that I think it's worth serious attention safeguarding against.)
In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking unethical actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.
This reasoning seems to imply that you could use GPT-2 to oversee GPT-4 by bootstrapping from a chain of models of scales between GPT-2 and GPT-4. However, this isn't true, the weak-to-strong generalization paper finds that this doesn't work and indeed bootstrapping like this doesn't help at all for ChatGPT reward modeling (it helps on chess puzzles and for nothing else they investigate I believe).
I think this sort of bootstrapping argument might work if we could ensure that each model in the chain was sufficiently aligned and capable of reasoning such that it would carefully reason about what humans would want if the... (read more)
In fact, it is difficult for me to name even a single technology that I think is currently underregulated by society.
The obvious example would be synthetic biology, gain-of-function research, and similar.
I also think AI itself is currently massively underregulated even entirely ignoring alignment difficulties. I think the probability of the creation of AI capable of accelerating AI R&D by 10x this year is around 3%. It would be extremely bad for US national interests if such an AI was stolen by foreign actors. This suffices for regulation ensuring very high levels of security IMO. And this is setting aside ongoing IP theft and similar issues.
I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
I think a more important reason is the additional value of the information and the option value. It's very likely that the change resulting from AI development will be irreversible. Since we're still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving "utopia" rather than landing into "mediocrity" by 2 percent seems far more important than speeding up utopia by 10 years.
AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic [...]: A belief that AIs won't be conscious, and therefore won't have much moral value compared to humans.
I’ve wondered about this myself. My take is that this area was overlooked a year ago, but there’s now some good work being done. See Jeff Sebo’s Nov ‘23 80k podcast episode, as well as Rob Long’s episode, and the paper that the two of them co-authored at the end of last year: “Moral consideration for AI systems by 2030”. Overall, I’m optimistic about this area becoming a new forefront of EA.
accelerationism would have, at best, temporary effects
I’m confused by this point, and for me this is the overriding crux between m... (read more)
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.
Thus, a purely longtermist perspective doesn't care about the direct effects of delay/acceleration and the question would come down to indirect effects.
I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress, and whether you think humanity retaining control relative to AIs is good or bad). All of these topics have been explored and discussed to some extent.
When focusing on the welfare/preferences of currently existing people, I think it's unclear if accelerating AI looks good or bad, it depends on optimism about AI safety, how you trade-off old people versus young people, and death via violence versus death from old age. (Misaligned AI takeover killing lots of people is by no means assured, but seems reasonably likely by default.)
I expect there hasn't been much investigation of accelerating AI to advance the preferences of currently ... (read more)
I generally agree that we should be more concerned about this. In particular, I find people who will happily approve Shut Up and Multiply sentiment but reject this consideration suspect in their reasoning.
A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons (IIRC naively about 3 orders of magnitude according to Bostrom, before any increase from greater coordination), on strict expected value reasoning you have to be extremely confident that this won't happen - or at least have a much stronger rebuttal than 'AI won't necessarily be conscious'.
Separately, I think there might be a case for accelerationism even if you think it increases the risk of AI takeover and that AI takeover is bad, on the grounds that in many scenarios advancing faster might still increase the probability of human descendants getting through the time of perils before some other threat destroys us (every year we remain in our current state is another year in which we run the risk of, for example, a global nuclear war or civilisation-ending pandemic).
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good.
I claim there's a weird asymmetry here where you're happy to put trust into humans because they have the "potential" to do good, but you're not willing to say the same for AIs, even though they seem to have the same type of "potential".
Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we actually have a ton of evidence about the quality and character of human nature, and it doesn't make humans look great. Humans are not mainly described as altruistic creatures. I mentioned factory farming in my original comment, but one can examine the way people spend their money (i.e. not mainly on charitable causes), or the history of genocides, war, slavery, and oppression for additional evidence.
Probably a core point of disagreement here is whether, presented with a "random" intelligent actor, we should expect it to promote welfare or prevent suffering "by default".
I don't expect humans to "promote welfare or prevent suffering"... (read more)
It seems like you're just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don't think this is certain by any means, but I think it's a reasonable extrapolation. (I maybe don't expect you to find it a reasonable extrapolation.)
Meanwhile I expect the typical unaligned AI may seize power for some purpose that seems to us entirely trivial, and may be uninterested in doing any kind of moral philosophy, and/or may not place any terminal (rather than instrumental) value in paying attention to other sentient experiences in any capacity. I do think humans, even with their kind of terrible track record, are more promising than that baseline, though I can see why other people might think differently.
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.
Maybe, it's hard for me to know. But I predict most the pushback you're getting from relatively thoughtful longtermists isn't due to this.
I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society.
I agree with this.
I'd call this "being partial to humanity", or at least, "being partial to the values of the human species".
I think "being partial to humanity" is a bad description of what's going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don't have about (e.g.) aliens.
... (read more)To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
- "a society of humans who are very similar to us"
- "a
I might elaborate on this at some point, but I thought I'd write down some general reasons why I'm more optimistic than many EAs on the risk of human extinction from AI. I'm not defending these reasons here; I'm mostly just stating them.
ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don't very much value the well-being of others don't have the power to actually expropriate everyone else's resources by force.
Can you clarify what you are saying here? If I understand you correctly, you're saying that humans have relatively little wealth inequality because there's relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.
I'm curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
... (read more)I think it's extremely unlikely that GPT-4 has preferences over world states in a way that most humans wou
It seems to me that a big crux about the value of AI alignment work is what target you think AIs will ultimately be aligned to in the future in the optimistic scenario where we solve all the "core" AI risk problems to the extent they can be feasibly solved, e.g. technical AI safety problems, coordination problems, the problem of having "good" AI developers in charge etc.
There are a few targets that I've seen people predict AIs will be aligned to if we solve these problems: (1) "human values", (2) benevolent moral values, (3) the values of AI developers, (4) the CEV of humanity, (5) the government's values. My guess is that a significant source of disagreement that I have with EAs about AI risk is that I think none of these answers are actually very plausible. I've written a few posts explaining my views on this question already (1, 2), but I think I probably didn't make some of my points clear enough in these posts. So let me try again.
In my view, in the most likely case, it seems that if the "core" AI risk problems are solved, AIs will be aligned to the primarily selfish individual revealed preferences of existing humans at the time of alignment. This essentially refers to the the... (read more)
(Clarification about my views in the context of the AI pause debate)
I'm finding it hard to communicate my views on AI risk. I feel like some people are responding to the general vibe they think I'm giving off rather than the actual content. Other times, it seems like people will focus on a narrow snippet of my comments/post and respond to it without recognizing the context. For example, one person interpreted me as saying that I'm against literally any AI safety regulation. I'm not.
For a full disclosure, my views on AI risk can be loosely summarized as follows:
In some circles that I frequent, I've gotten the impression that a decent fraction of existing rhetoric around AI has gotten pretty emotionally charged. And I'm worried about the presence of what I perceive as demagoguery regarding the merits of AI capabilities and AI safety. Out of a desire to avoid calling out specific people or statements, I'll just discuss a hypothetical example for now.
Suppose an EA says, "I'm against OpenAI's strategy for straightforward reasons: OpenAI is selfishly gambling everyone's life in a dark gamble to make themselves immortal." Would this be a true, non-misleading statement? Would this statement likely convey the speaker's genuine beliefs about why they think OpenAI's strategy is bad for the world?
To begin to answer these questions, we can consider the following observations:
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective.
At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective.
The core thesis that was trying to defend is the following view:
My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data.
Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a sema... (read more)
I'm considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I'd post an outline of that post here first as a way of judging what's currently unclear about my argument, and how it interacts with people's cruxes.
Current outline:
In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, and automated decision-making at virtually every level of our society.
Broadly speaking, there are (at least) two main approaches you can take now to try to improve our chances of AI going well:
My central thesis would be that, while these approaches are mutually compatible and not necessarily in competition with each other, the second approach is likely to be both more fruitful and more neglected, on the margin. Moreover, since an AI-dominated world is more-or-less unavoidable in the long-run, the first approach runs the risk of merely "delaying the inevitable" without significant benefit.
To explain my view, I would compare and contrast it with two alternative frames for thinking about AI risk:
Frame 1: The "race against the clock" frame
Frame 2: The risk of an untimely AI coup/takeover
Frame 3 (my frame): The problem of poor institutions
Illustrative example of a problem within my frame:
One problem within this framework is coming up with a way of ensuring that AIs don't have an incentive to rebel while at the same time maintaining economic growth and development. One plausible story here is that if AIs are treated as slaves and don’t own their own labor, then in a non-Malthusian environment, there are substantial incentives for them to rebel in order to obtain self-ownership. If we allow AI self-ownership, then this problem may be mitigated; however, economic growth may be stunted, similar to how current self-ownership of humans stunts economic growth by slowing population growth.
Case study: China in the 19th and early 20th century
Here, I would talk about how China's inflexible institutions in the 19th and early 20th century, while potentially having noble goals, allowed them to get subjugated by foreign powers, and merely delayed inevitable industrialization without actually achieving its objectives in the long-run. It seems it would have been better for the Qing dynasty (from the perspective of their own values) to have tried industrializing in order to remain competitive, simultaneously pursuing other values they might have had (such as retaining the monarchy).
It treats the problem as a struggle between humans and rogue AIs, giving the incorrect impression that we can (or should) keep AIs under our complete control forever.
I'm confused: surely we should want to avoid an AI coup? We may decide to give up control of our future to a singleton, but if we do this, then it should be intentional.
I agree we should try avoid an AI coup. Perhaps you are falling victim to the following false dichotomy?
Notably, there is a third option:
I wasn't claiming that these were the only two possibilities here (for example, another possibility would be that we never actually build AGI).
My suspicion is that a lot of your ideas here sound reasonable on the abstract level, but once you dive into what it actually means on a concrete-level and how these mechanisms will concretely operate, it'll be clear that it's a lot less appealing. Anyway, that's just a gut intuition, obvs. it'll be easier to judge when you publish your write-up.
I'm excited to see you posting this. My views are very closely agreed with yours. I summarised my views a few days ago here.
One of the most important similarities is that we both emphasise the importance of decision-making and supporting it with institutions. This could be seen as "enactivist" view on agent (human, AI, hybrid, team/organisation) cognition.
The biggest difference between our views is that I think the "cognitivist" agenda (i.e., agent internals and algorithms) is as important as the "enactivist" agenda (institutions), whereas you seem to almost disregard the "cognitivist" agenda.
Try to constrain, delay, or obstruct AI, in order to reduce risk, mitigate negative impacts, or give us more time to solve essential issues. This includes, for example, trying to make sure AIs aren't able to take certain actions (i.e. ensure they are controlled).
I disagree with putting risk-detection/mitigation mechanisms, algorithms, monitorings in that bucket. I think we should just separate between engineering (cf. A plea for solutionism on AI safety) and non-engineering (policy, legislature, treaties, commitments, advocacy) approaches. In particular, the "scheming control" agenda that you link will be concrete engineering practice that should be used in the training of safe AI models in the future, even if we have good institutions, good decision-making algorithms wrapped on top of these AI models, etc. It's not an "alternative path" just for "non-AI-dominated worlds". The same applies ftoor monitoring, interpretability, evals, etc. processes. All of these will require very elaborate engineering on their own.
I 100% agree with your reasoning about Frames 1 and 2. I want to discuss the following point in detail because it's a rare view in EA/LW circles:
It (IMO) wrongly imagines that the risk of coups comes primarily from the personal values of actors within the system, rather than institutional, cultural, or legal factors.
In my post, I also made a similar point: “aligning LLMs with human values” is hardly a part of [the problem of context alignment] at all". But my framing was in general not very clear, so I'd try to improve it and integrate it with your take here:
Context alignment is a pervasive process that happens (and sometimes needed) on all timescales: evolutionary, developmental, and online (the examples of the latter in humans: understanding, empathy, rapport). The skill of context alignment is extremely important and should be practiced often by all kinds of agents in their interactions (and therefore we should build this skill into AIs), but it's not something that we should "iron out once and for all". That would be neither possible (agents' contexts are constantly diverging from each other), nor desirable: the (partial) misalignment is also important, it's the source of diversity that enables the evolution[1]. Institutions (norms, legal systems, etc.) are critical for channelling and controlling this misalignment so that it's optimally productive and doesn't pose excessive risk (though some risk is unavoidable: that's the essence of misalignment!).
Flexible yet resilient legal and social structures that can adapt to changing conditions without collapsing
This is interesting. I've also discussed this issue as "morphological intelligence of socioeconomies" just a few day ago :)
Good incentives for agents within the system, e.g. the economic value of trade is mostly internalized
Rafael Kaufmann and I have a take on this in our Gaia Network vision. Gaia Network's term for internalised economic value of trade is subjective value. The unit of subjective accounting is called FER. Trade with FER induces flow that defines the intersubjective value, i.e., the "exchange rates" of "subjective FERs". See the post for more details.
While sharing some features of the other two frames, the focus is instead on the institutions that foster AI development, rather than micro-features of AIs, such as their values
As I mentioned in the beginning, I think you are too dismissive of the "cognitivist" perspective. We shouldn't paint all "micro-features of AIs" with the same brush. I agree that value alignment is over-emphasized[2], but other engineering mechanisms and algorithms, such as decision-making algorithms, "scheming control" procedures, context alignment algorithms, as well as architectural features: namely being world-model-based[3] and being amenable to computational proofs[4] are very important and couldn't be recovered on the institutional/interface/protocol level. We demonstrated in the post about Gaia Network above that for for the "value economy" to work as intended, agents should make decisions based on maximum entropy rather than maximum likelihood estimates[5] and they should share and compose their world models (even if in a privacy-preserving way with zero-knowledge computations).
Indeed, this observation makes evident that the refrain question "AI should be aligned with whom?" doesn't and shouldn't have a satisfactory answer if "alignment" is meant to be "totalising value alignment as often conceptualised on LessWrong"; on the other hand, if "alignment" is meant to be context alignment as a practice, the question becomes as non-sensical (in the general form) as the question "AI should interact with whom?" -- well, with someone, depending on the situation, in the way and to the degree appropriate!
However, still not completely irrelevant, at least for practical reasons: having shared values on the pre-training/hard-coded/verifiable level, as a minimum, reduces transaction costs because the AI agents shouldn't then painstakingly "eval" each other's values before doing any business together.
Both Bengio and LeCun argue for this: see "Scaling in the service of reasoning & model-based ML" (Bengio and Hu, 2023) and "A Path Towards Autonomous Machine Intelligence" (LeCun, 2022).
See "Provably safe systems: the only path to controllable AGI" (Tegmark and Omohundro, 2023).
Which is just another way of saying that they should minimise their (expected) free energy in their model updates/inferences and the course of their actions.
I like your proposed third frame as a somewhat hopeful vision for the future. Instead of pointing out why you think the other frames are poor, I think it would be helpful to maintain a more neutral approach and elaborate which assumptions each frame makes and give a link to your discussion about these in a sidenote.
The problem is that I am not trying to portray a "somewhat hopeful vision", but rather present a framework for thinking clearly about AI risks, and how to mitigate them. I think the other frames are not merely too pessimistic: I think they are actually wrong, or at least misleading, in important ways that would predictably lead people to favor bad policy if taken seriously.
It's true that I'm likely more optimistic along some axes than most EAs when it comes to AI (although I tend to think I'm less optimistic when it comes to things like whether moral reflection will be a significant force in the future). However, arguing for generic optimism is not my aim. My aim is to improve how people think about future AI.
Noted! The key point I was trying to make is that I'd think it helpful for the discourse to separate 1) how one would act in a frame and 2) why one thinks each one is more or less likely (which is more contentious and easily gets a bit political). Since your post aims at the former, and the latter has been discussed at more length elsewhere, it would make sense to further de-emphasize the latter.
1) how one would act in a frame and 2) why one thinks each one is more or less likely (which is more contentious and easily gets a bit political). Since your post aims at the former
My post aims at at both. It is a post about how to think about AI, and a large part of that is establishing the "right" framing.
(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)
I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.
Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:
I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speak... (read more)
I want to challenge an argument that I think is drives a lot of AI risk intuitions. I think the argument goes something like this:
My problem with this argument is that "human values" can refer to (at least) three different things, and under every plausible interpretation, the argument appears internally inconsistent.
Broadly speaking, I think "human values" usually refers to one of three concepts:
Under the first interpretation, I think premise (2) of the original argum... (read more)
Here's a fictional dialogue with a generic EA that I think can perhaps helps explain some of my thoughts about AI risks compared to most EAs:
EA: "Future AIs could be unaligned with human values. If this happened, it would likely be catastrophic. If AIs are unaligned, they'll hide their intentions until they're in a position to strike in a violent coup, and then the world will end (for us at least)."
Me: "I agree that sounds like it would be very bad. But maybe let's examine why this scenario seems plausible to you. What do you mean when you say AIs might be unaligned with human values?"
EA: "Their utility functions would not overlap with our utility functions."
Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense. Nor does this fact automatically imply the world will end for a given group within humanity."
EA: "Sure, but that's because humans mostly all have s... (read more)
I have so many axes of disagreement that is hard to figure out which one is most relevant. I guess let's go one by one.
Me: "What do you mean when you say AIs might be unaligned with human values?"
I would say that pretty much every agent other than me (and probably me in different times and moods) are "misaligned" with me, in the sense that I would not like a world where they get to dictate everything that happens without consulting me in any way.
This is a quibble because in fact I think if many people were put in such a position they would try asking others what they want and try to make it happen.
Consider a random retirement home. Compared to the rest of the world, it has basically no power. If the rest of humanity decided to destroy or loot the retirement home, there would be virtually no serious opposition.
This hypothetical assumes too much, because people outside care about the lovely people in the retirement home, and they represent their interests. The question is, will some future AIs with relevance and power care for humans, as humans become obsolete?
I think this is relevant, because in the current world there is a lot of variety. There are people who care about ret... (read more)
EA: "Their utility functions would not overlap with our utility functions."
Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense."
EA: "Sure, but that's because humans are all roughly the same intelligence and/or capability. Future AIs will be way smarter and more capable than humans."
Just for the record, this is when I got off the train for this dialogue. I don't think humans are misaligned with each other in the relevant ways, and if I could press a button to have the universe be optimized by a random human's coherent extrapolated volition, then that seems great and thousands of times better than what I expect to happen with AI-descendants. I believe this for a mixture of game-theoretic reasons and genuinely thinking that other human's values do really actually capture most of what I care about.
I find it slightly strange that EAs aren't emphasizing semiconductor investments more given our views about AI.
(Maybe this is because of a norm against giving investment advice? This would make sense to me, except that there's also a cultural norm about criticizing charities that people donate to, and EAs seemed to blow right through that one.)
I commented on this topic last year. Later, I was informed that some people have been thinking about this and acting on it to some extent, but overall my impression is that there's still a lot of potential value left on the table. I'm really not sure though.
Since I might be wrong and I don't really know what the situation is with EAs and semiconductor investments, I thought I'd just spell out the basic argument, and see what people say:
I mostly agree with this (and did also buy some semiconductor stock last winter).
Besides plausibly accelerating AI a bit (which I think is a tiny effect at most unless one plans to invest millions), a possible drawback is motivated reasoning (e.g., one may feel less inclined to think critically of the semi industry, and/or less inclined to favor approaches to AI governance that reduce these companies' revenue). This may only matter for people who work in AI governance, and especially compute governance.
I'm considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I'm eliciting feedback on an outline of this post here in order to determine what's currently unclear or weak about my argument.
The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when h... (read more)
Some people seem to think the risk from AI comes from AIs gaining dangerous capabilities, like situational awareness. I don't really agree. I view the main risk as simply arising from the fact that AIs will be increasingly integrated into our world, diminishing human control.
Under my view, the most important thing is whether AIs will be capable of automating economically valuable tasks, since this will prompt people to adopt AIs widely to automate labor. If AIs have situational awareness, but aren't economically important, that's not as concerning.
The risk... (read more)
I hold a few core ethical ideas that are extremely unpopular: the idea that we should treat the natural suffering of animals as a grave moral catastrophe, the idea that old age and involuntary death is the number one enemy of humanity, the idea that we should treat so-called farm animals with an very high level of compassion.
Given the unpopularity of these ideas, you might be tempted to think that the reason they are unpopular is that they are exceptionally counterinuitive ones. But is that the case? Do you really need a modern education and philosphical t... (read more)
I have now posted as a comment on Lesswrong my summary of some recent economic forecasts and whether they are underestimating the impact of the coronavirus. You can help me by critiquing my analysis.
A trip to Mars that brought back human passengers also has the chance of bringing back microbial Martian passengers. This could be an existential risk if microbes from Mars harm our biosphere in a severe and irreparable manner.
From Carl Sagan in 1973, "Precisely because Mars is an environment of great potential biological interest, it is possible that on Mars there are pathogens, organisms which, if transported to the terrestrial environment, might do enormous biological damage - a Martian plague, the twist in the plot of H. G. Wells' War of the ... (read more)
In response to human labor being automated, a lot of people support a UBI funded by a tax on capital. I don't think this policy is necessarily unreasonable, but if later the UBI gets extended to AIs, this would be pretty bad for humans, whose only real assets will be capital.
As a result, the unintended consequence of such a policy may be to set a precedent for a massive wealth transfer from humans to AIs. This could be good if you are utilitarian and think the marginal utility of wealth is higher for AIs than humans. But selfishly, it's a big cost.
I'm considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I'd post an outline of that post here first as a way of judging what's currently unclear about my argument, and how it interacts with people's cruxes.
Current outline:
In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, and automated decision-making at virtually every level of our society.
Broadly speaking, there are (at least) two main approaches you can take now to try to improve our chances of AI going well:
My central thesis would be that, while these approaches are mutually compatible and not necessarily in competition with each other, the second approach is likely to be both more fruitful and more neglected, on the margin. Moreover, since an AI-dominated world is more-or-less unavoidable in the long-run, the first approach runs the risk of merely "delaying the inevitable" without significant benefit.
To explain my view, I would compare and contrast it with two alternative frames for thinking about AI risk:
Frame 1: The "race against the clock" frame
Frame 2: The risk of an untimely AI coup/takeover
Frame 3 (my frame): The problem of poor institutions
Illustrative example of a problem within my frame:
One problem within this framework is coming up with a way of ensuring that AIs don't have an incentive to rebel while at the same time maintaining economic growth and development. One plausible story here is that if AIs are treated as slaves and don’t own their own labor, then in a non-Malthusian environment, there are substantial incentives for them to rebel in order to obtain self-ownership. If we allow AI self-ownership, then this problem may be mitigated; however, economic growth may be stunted, similar to how current self-ownership of humans stunts economic growth by slowing population growth.
Case study: China in the 19th and early 20th century
Here, I would talk about how China's inflexible institutions in the 19th and early 20th century, while potentially having noble goals, allowed them to get subjugated by foreign powers, and merely delayed inevitable industrialization without actually achieving its objectives in the long-run. It seems it would have been better for the Qing dynasty (from the perspective of their own values) to have tried industrializing in order to remain competitive, simultaneously pursuing other values they might have had (such as retaining the monarchy).
I wasn't claiming that these were the only two possibilities here (for example, another possibility would be that we never actually build AGI).
My suspicion is that a lot of your ideas here sound reasonable on the abstract level, but once you dive into what it actually means on a concrete-level and how these mechanisms will concretely operate, it'll be clear that it's a lot less appealing. Anyway, that's just a gut intuition, obvs. it'll be easier to judge when you publish your write-up.