I think using "unsafe" in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists.
I agree that when there's no memetic fitness/calibration trade-off, it's always better to be calibrated. But here there is a trade-off. How should we take it?
Very glad to see that happening, regranting solves a bunch of unsolved problems with centralized grantmaking.
I mean, I agree that it has nuance but it's still trained on a set of values that are pretty much current western people values, so it will probably put more or less emphasis on various values according to the weight western people give to each of those.
I may try to write something on that in the future. I'm personally more worried about accidents and think that solving accidents causes one to solve misuse pre-AGI. Post aligned AGI, misuse rebecomes a major worry.
Note that saying "this isn't my intention" doesn't prevent net negative effects of a theory of change from applying. Otherwise, doing good would be a lot easier.
I also highly recommend clarifying what exactly you're criticizing, i.e. the philosophy, the movement norms or some institutions that are core to the movement.
Finally, I usually find the criticism of people a) at the core of the movement and b) highly truth-seeking most relevant to improve the movement so I would expect that if you're trying to improve the movement, you may want to focu...
+1 for clarification. It could be neat if you could use a standard diagram to pinpoint what sort of criticism each one is.
For example, see this one from Astral Codex Ten.
I work every day from about 9:30am to 1am with about 3h off on average and 30 min of walk which helps me brainstorming. Technically this is ~12*7=84h. The main reason is that 1) I want that we don't die and 2) think that there are increasing marginal returns on working hours in a lot of situation, mostly due to the fact that in a lot of domains, winner takes all even if he's only 10% better than others, and because you accumulate in a single person more expertise/knowledge which gives access to more and more rare skills
Among that, I would say that I lose a...
"Nobody cared about" LLMs is certainly not true - I'm pretty sure the relevant people watched them closely.
What do you mean by "the relevant people"? I would love that we talk about specifics here and operationalize what we mean. I'm pretty sure E. Macron haven't thought deeply about AGI (i.e has never thought for more than 1h about timelines) and I'm at 50% that if he had any deep understanding of what changes it will bring, he would already be racing. Likewise for Israel, which is a country which has strong track record of becoming leads in technol...
Hey Misha! Thanks for the comment!
I am quite confused about what probabilities here mean, especially with prescriptive sentences like "Build the AI safety community in China" and "Beware of large-scale coordination efforts."
As I wrote in note 2, I'm here claiming that this claim is more likely to be true under these timelines than the other timelines. But how could I make it clearer without bothering too much? Maybe putting note 2 under the table in italic?
...I also disagree with the "vibes" of probability assignment to a bunch of these, and the lack of clari
Ah ah you probably don't realize it but "you" is actually 4 persons: Amber Dawn for the first draft of the post, me (Simeon) for the ideas, the table and the structure of the post, and me, Nicole Nohemi & Felicity Riddel for the partial rewriting of the draft to make it clearer.
So the credits are highly distributed! And thanks a lot, it's great to hear that!
I think that our disagreement comes from what we mean by "regulating and directing it."
My rough model of what usually happens in national governments (and not the EU, which is a lot more independent from its citizen than the typical national government) is that there are two scenarios:
Thanks for your comment!
A couple of remarks:
Thanks for your comment!
First, you have to have in mind that when people are talking about "AI" in industry and policymaking, they usually have mostly non-deep learning or vision deep learning techniques in mind simply because they mostly don't know the ML academic field but they have heard that "AI" was becoming important in industry. So this sentence is little evidence that Russia (or any other country) is trying to build AGI, and I'm at ~60% Putin wasn't thinking about AGI when he said that.
...If anyone who could play any role at all in develop
Strongly agree, upvoted.
Just a minor point on the Putin quote, as it comes up so often, he was talking to a bunch of schoolkids, encouraging them to do science and technology. He said similarly supportive things about a bunch of other technologies. I'm at >90% he wasn't referring to AGI. He's not even that committed to AI leadership: he's taken few actions indicating serious interest in 'leading in AI'. Indeed, his Ukraine invasion has cut off most of his chip supplies and led to a huge exodus of AI/CS talent. It was just an off-the-cuff rhetorical remark.
Thanks for your comment!
That's an important point that you're bringing up.
My sense is that at the movement level, the consideration you bring up is super important. Indeed, even though I have fairly short timelines, I would like funders to hedge for long timelines (e.g. fund stuff for China AI Safety). Thus I think that big actors should have in mind their full distribution to optimize their resource allocation.
That said, despite that, I have two disagreements:
One of my friends and collaborators did this app which is aimed at predicting the likelihood that we go extinct: https://xriskcalculator.vercel.app/
It might be useful!
It was a way to say that if you think that intelligence is perfectly correlated with "morally good", then you're fine. But you're right that it doesn't include all the ways you could reject the orthogonality thesis
Even if you have a human-ish intelligence, most of the advantage of AI from its other features:
- You can process any type of data, orders of magnitude faster than human and once you know how to do a task your deterministically know how to do it.
- You can just double the amount of GPUs and double the number of AIs. If you pair two AIs and make them interact at high speed, it's much more power than anything human-ish.
These are two...
I think the meta-point might be the crux of our disagreement.
I mostly agree with your inside view that other catastrophic risks struggle to be existential the way AI would, and I'm often a bit perplexed as to how quick people are to jump from 'nearly everyone dies' to 'literally everyone dies'. Similarly I'm sympathetic to the point that it's difficult to imagine particularly compelling scenarios where AI doesn't radically alter the world in some way.
But we should be immensely uncertain about the assumptions we make and I would argue the far most likely fi...
Yes, that's right, but it's very different to be somewhere and by chance affect AGI and to be somewhere because you think that it's your best way to affect AGI.
And I think that if you're optimizing the latter, you're not very likely to end up working in nuclear weapons policy (even if there might be a few people for who it is be the best fit)
I think that this comment is way too outside viewy.
Could you mention concretely one of the "many options" that would change directionally the conclusion of the post?
The claim is "AGI will radically change X". And I tried to argue that if you cared about X and wanted to impact it, basically on the first order you could calculate your impact on it just by measuring your impact on AGI.
"The superintelligence is misaligned with our own objectives but is benign".
You could have an AI with some meta-cognition, able to figure out what's good and maximizing it in the same way EAs try to figure out what's good and maximize it with parts of their life. This view mostly make sense if you give some credence to moral realism.
"My personal view on your subject is that you don't have to work in AI to shape its future."
Yes, that's what I wrote in the post.
"You can also do that by bringing the discussion into the public and create awareness for the dangers."
I don't think it's a good method and I think you should target a much more specific public but yes, I know what you mean.
I think that by the AGI timelines of the EA community, yes other X-risks have roughly a probability of extinction indistinguishable from 0.
And conditional on AGI working we'll also go out of the other risks most likely.
Whereas without AGI, biorisks X-risks might become a thing, not in the short run but in the second half of the century.
That's right! I just think that the base rate for "civilisation collapse prevents us from ever becoming a happy intergalactic civilisation" is very low.
And multiplying any probability by 0.1 also does matter because when we're talking about AGI, we're talking about things are >=10% likely to happen for a lot of people (I put a higher likelihood than that but Toby Ord putting 10% is sufficient).
So it means that even if you condition on biorisks being the same as AGI (which is the point I argue against) for everything else, you still need biorisks t...
I think that I'd be glad to stay as long as we can in the domain of aggregate probabilities and proxies of real scenarios, particularly for biorisks.
Mostly because I think that most people can't do a lot about infohazardy things so the first-order effect is just net negative.
I think that it's one of the problems that explains why many people find my claim far too strong: in the EA community, very few people have a strong inside view on both advanced AIs and biorisks. (I think that's it's more generally true for most combinations of cause areas).
And I think that indeed, with the kind of uncertainty one must have when one's deferring , it becomes harder to do claims as strong as the one I'm making here.
Yes, I think you're right actually.
Here's a weaker claim which is I think it true:
- When someone knows and has thought on a infohazard, the baseline is that he's way more likely to cause harm via it than to cause good.
- Thus, I'd recommend anyone who's not actively thinking about ways to solve to prevent classes of scenario where this infohazard would end up being very bad, to try to forget this infohazard and not talk about it even to trusted individuals. Otherwise it will most likely be net negative.
I think that if you take these infohazards seriously enough, you probably even shouldn't do that. Because if everyone has a 95% likelihood to keep it secret, with 10 persons in the know is 60%.
Thanks!
Do you think that biorisks/nuclear war could plausibly cause us never to recover our values? What's the weight you give to such a scenario?
(I want to know if the weight you put on "worse values" is due to stable totalitarianism due to new technologies or due to collapse -> bad people win).
Thanks for this information!
What's the probability we go extinct due to biorisks by 2045 according to you?
Also, I think that things that are extremely infohazardy shouldn't be thought of too strongly bc without the info revelation they will likely remain very unlikely
Basically, as I said in my post I'm fairly confident about most things except the MVP (minimum viable population) where I almost completely defer to Luisa Rodriguez.
Likewise, for the likelihood of irrecoverable collapse, my prior is that's the likelihood is very low for the reasons I gave above but given that I haven't explored that much the inside view arguments in favor of it, I could quickly update upward and I think that it would the best way for me to update positively on biorisks actually posing an X-risk in the next 30 years.
My view on t...
I think you make a good point if we were close in terms but what matters primarily is the EV and I expect this to dominate uncertainty here.
I didn't do the computations but I feel like if u have something which is OOMs more important than others, even with very large bars of uncertainty you'd probably put >19/20 of your resources on the highest EV thing.
In the same way we don't give to another less cost-effective org to hedge against AMF even though they might have some tail chances of having a very significant positive impact on society because the bars of estimate are very large.
Yep, good point! I just wanted to make clear that IMO a good first-order approximation of your impact on the long-term future is: "What's the causal impact of your work on AI?"
And even though UX designer for 80k / Community building are not focused on AI, they are instrumentally very useful towards AI, in particular if the person who does it has this theory of change in mind.
Yes, scenarios are a good way to put a lower bound but if you're not able to create one single scenario that's a bad sign in my opinion.
For AGI there are many plausible scenarios where I can reach ~1-10% likelihood of dying. With biorisks it's impossible with my current belief on the MVP (minimum viable population)
"If (toy numbers here) AI risk is 2 orders of magnitude more likely to occur than biorisk, but four orders of magnitude less tractable". I think that indeed 2 or 3 OOMs of difference would be needed at least to compensate (especially given that positively shaping biorisks is not extremely positive) and as I argued above I think it's unlikely.
"They are of course not, as irrecoverable collapse , s-risks and permanent curtailing of human potential". I think that irrecoverable collapse is the biggest crux. What likelihood do you put on it? For other type...
By default, you shouldn't have a prior that bio risks is 100x more tractable than AI though. Some (important) people think that the EA community had a net negative impact on biorisks because of infohazards for instance.
Also, I'll argue below that timelines matter for ITN and I'm pretty confident the risk/year is very different for the two risks (which favors AI in my model).
I think it's extremely relevant.
To be honest I think that if someone coming without technical background wanted to contribute, that looking into these things would be one of the best default opportunities because:
1) These points you mention are blindspots of the AI alignment community because the typical member of the AI alignment doesn't really care about all this political stuff. Especially questions on values and on "How does magically those who are 1000x more powerful than others don't start ruling the entire world with their aligned AI" ar...
I think that if the community was convinced that it was by far the most important thing, we would try harder to find projects and I'm confident there are a bunch of relevant things that can be done.
I think we're suffering from a argument to moderation fallacy that makes that we underinvest massively in AI safety bc :
1) AI Safety is hard
2) There are other causes that when you think not to deeply about it, seem equally important
The portfolio argument is an abstraction that hides the fact that if something is way more important than so...
"Firstly, under the standard ITN (Importance Tractability Neglectedness) framework, you only focus on importance. If there are orders of magnitude differences in, let's say, traceability (seems most important here), then longtermists maybe shouldn't work on AI."
I think this makes sense when we're in the domain of non-existential areas. I think that in practice when you're confident on existential outcomes and don't know how to solve them yet, you probably should still focus on it though.
"which probably leads to an overly narrow interprets of what might pos...
Just tell me a story with probabilities of how nuclear war or bioweapons could cause human extinction and you'll see that when you'll multiply the probabilities, it will go down to a very low number.
I repeat but I think that you don't still have a good sense of how difficult it is to kill every humans if the minimal viable population (MVP) is around 1000 as argued in the post linked above.
"knock-on effects"
I think that it's true but I think that on the first-order, not dying from AGI is the most important thing compared with developing it in like 100 years.
If there were no preferences, at least 95% and probably more around 99%. I think that this should update according to our timelines.
And just to clarify, that includes community building etc. as I mentioned.
Thanks for the comment.
I think it would be true if there were other X-risks. I just think that there is no other literal X-risk. I think that there are huge catastrophic risks. But there's still a huge difference between killing 99% of people and killing a 100%.
I'd recommend reading (or skimming through) this to have a better sense of how different the 2 are.
I think that in general the sense that it's cool to work on every risks come precisely from the fact that very few people have thought about every risks and thus people in AI for instance I...
"no other literal X-risk" seems too strong. There are certainly some potential ways that nuclear war or a bioweapon could cause human extinction. They're not just catastrophic risks.
In addition, catastrophic risks don't just involve massive immediate suffering. They drastically change global circumstances in a way which will have knock-on effects on whether, when, and how we build AGI.
All that said, I directionally agree with you, and I think that probably all longtermists should have a model of the effects their work has on the potentiality of aligned AGI...
I know it's not trivial to do that but if you included your AGI timelines into consideration for this type of forecast, you'd come up with very different estimates. For that reason, I'd be willing to bet on most estimates
I have the impression (coming from the simulator theory (https://generative.ink/)) that Decision Transformers (DT) have some chance (~45%) to be a much safer form of trial and error technique than RL. The core reason is that DT learn to simulate a distribution of outcome (e.g they learn to simulate the kind of actions that lead to a reward of 10 as much as one that leads to a reward of 100) and that it's only during inference that you make it doing inferences systematically with a reward of 100. So in some sense, the agent which has become very good via tr...
[TO POLICYMAKERS]
Trying to align very advanced AIs with what we want is a bit like when you try to design a law or a measure to constrain massive companies, such as Google or Amazon, or powerful countries, such as the US or China. You know that when you put a rule in place, they will have enough resources to circumvent it. And you might try as hard as you want, if you didn't design the AI properly in the first place, you won't be able to have it make what you want.
[TO ML RESEARCHERS AND MAYBE TECH EXECUTIVES]
When you look at society's problems, you can observe that many of our structural problems come from strong optimizers.
Now, even thes...
I agree with the general underlying point.
I also think that another important issue is that reasoning on counterfactuals makes people more prone to do things that are unusual AND is more prone to errors (e.g. by not taking into account some other effects).
Both combined make counterfactual reasoning without empirical data pretty perilous on average IMO.
In the case of Ali in your example above for instance, Ali could neglect that the performance he'll have will determine the opportunities & impact he has 5y down the line and so that being exc... (read more)