Humans are less than maximally aligned with each other (e.g. we care less about the welfare of a random stranger than about our own welfare), and humans are also less than maximally misaligned with each other (e.g. most people don’t feel a sadistic desire for random strangers to suffer). I hope that everyone can agree about both those obvious things.
That still leaves the question of where we are on the vast spectrum in between those two extremes. But I think your claim “humans are largely misaligned with each other” is not meaningful enough to argue about....
My terminology would be that (2) is “ambitious value learning” and (1) is “misaligned AI that cooperates with humans because it views cooperating-with-humans to be in its own strategic / selfish best interest”.
I strongly vote against calling (1) “aligned”. If you think we can have a good future by ensuring that it is always in the strategic / selfish best interest of AIs to be nice to humans, then I happen to disagree but it’s a perfectly reasonable position to be arguing, and if you used the word “misaligned” for those AIs (e.g. if you say “alignment is u...
May I ask, what is your position on creating artificial consciousness?
Do you see digital suffering as a risk? If so, should we be careful to avoid creating AC?
I think the word “we” is hiding a lot of complexity here—like saying “should we decommission all the world’s nuclear weapons?” Well, that sounds nice, but how exactly? If I could wave a magic wand and nobody ever builds conscious AIs, I would think seriously about it, although I don’t know what I would decide—it depends on details I think. Back in the real world, I think that we’re eventually going t...
Sorry if I missed it, but is there some part of this post where you suggest specific concrete interventions / actions that you think would be helpful?
Mark Solms thinks he understands how to make artificial consciousness (I think everything he says on the topic is wrong), and his book Hidden Spring has an interesting discussion (in chapter 12) on the “oh jeez now what” question. I mostly disagree with what he says about that too, but I find it to be an interesting case-study of someone grappling with the question.
In short, he suggests turning off the sentient machine, then registering a patent for making conscious machines, and assigning that patent to a nonprofit like maybe Future of Life Institute, and...
I am not claiming analogies have no place in AI risk discussions. I've certainly used them a number of times myself.
Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”. :)
Cf. what I called “left-column thinking” in the diagram here.
For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actu...
It is certainly far from obvious: for example, devastating as the COVID-19 pandemic was, I don’t think anyone believes that 10,000 random re-rolls of the COVID-19 pandemic would lead to at least one existential catastrophe. The COVID-19 pandemic just was not the sort of thing to pose a meaningful threat of existential catastrophe, so if natural pandemics are meant to go beyond the threat posed by the recent COVID-19 pandemic, Ord really should tell us how they do so.
This seems very misleading. We know that COVID-19 has <<5% IFR. Presumably the concer...
(Recently I've been using "AI safety" and "AI x-safety" interchangeably when I want to refer to the "overarching" project of making the AI transition go well, but I'm open to being convinced that we should come up with another term for this.)
I’ve been using the term “Safe And Beneficial AGI” (or more casually, “awesome post-AGI utopia”) as the overarching “go well” project, and “AGI safety” as the part where we try to make AGIs that don’t accidentally [i.e. accidentally from the human supervisors’ / programmers’ perspective] kill everyone, and (following c...
This kinda overlaps with (2), but the end of 2035 is 12 years away. A lot can happen in 12 years! If we look back to 12 years ago, it was December 2011. AlexNet had not come out yet, neural nets were a backwater within AI, a neural network with 10 layers and 60M parameters was considered groundbreakingly deep and massive, the idea of using GPUs in AI was revolutionary, tensorflow was still years away, doing even very simple image classification tasks would continue to be treated as a funny joke for several more years (literally—this comic is from 2014!), I...
That might be true in the very short term but I don’t believe it in general. For example, how many reporters were on the Ukraine beat before Russia invaded in February 2022? And how many reporters were on the Ukraine beat after Russia invaded? Probably a lot more, right?
Thanks for the comment!
I think we should imagine two scenarios, one where I see the demonic possession people as being “on my team” and the other where I see them as being “against my team”.
To elaborate, here’s yet another example: Concerned Climate Scientist Alice responding to statements by environmentalists of the Gaia / naturalness / hippy-type tradition. Alice probably thinks that a lot of their beliefs are utterly nuts. But it’s pretty plausible that she sees them as kinda “on her side” from a vibes perspective. (Hmm, actually, also imagine this is 2...
Great reply! In fact, I think that the speech you wrote for the police reformer is probably the best way to advance the police corruption cause in that situation, with one change: they should be very clear that they don't think that demons exist.
I think there is an aspect where the AI risk skeptics don't want to be too closely associated with ideas they think are wrong: because if the AI x-riskers are proven to be wrong, they don't want to go down with the ship. IE: if another AI winter hits, or an AGI is built that shows no sign of killing anyone, t...
I suggest to spend a few minutes pondering what to do if crazy people (perhaps just walking by) decide to "join" the protest. Y'know, SF gonna SF.
FYI at a firm I used to work at, once there was a group protesting us out front. Management sent an email that day suggesting that people leave out a side door. So I did. I wasn't thinking too hard about it, and I don't know how many people at the firm overall did the same.
(I have no personal experience with protests, feel free to ignore.)
In your hypothetical, if Meta says “OK you win, you're right, we'll henceforth take steps to actually cure cancer”, onlookers would assume that this is a sensible response, i.e. that Meta is responding appropriately to the complaint. If the protester then gets back on the news the following week and says “no no no this is making things even worse”, I think onlookers would be very confused and say “what the heck is wrong with that protester?”
I don’t think “mouldability” is a synonym of “white-boxiness”. In fact, I think they’re hardly related at all:
If you want to say "it's a black box but the box has a "gradient" output channel in addition to the "next-token-probability-distribution" output channel", then I have no objection.
If you want to say "...and those two output channels are sufficient for safe & beneficial AGI", then you can say that too, although I happen to disagree.
If you want to say "we also have interpretability techniques on top of those, and they work well enough to ensure alignment for both current and future AIs", then I'm open-minded and interested in details.
If you want to say "...
I was reading it as a kinda disjunctive argument. If Nora says that a pause is bad because of A and B, either of which is sufficient on its own from her perspective, then you could say "A isn't cruxy for her" (because B is sufficient) or you could say "B isn't cruxy for her" (because A is sufficient). Really, neither of those claims is accurate.
Oh well, whatever, I agree with you that the OP could have been clearer.
If you desperately wish we had more time to work on alignment, but also think a pause won’t make that happen or would have larger countervailing costs, then that would lead to an attitude like: “If only we had more time! But alas, a pause would only make things worse. Let’s talk about other ideas…” For my part, I definitely say things like that (see here).
However, Nora has sections claiming “alignment is doing pretty well” and “alignment optimism”, so I think it’s self-consistent for her to not express that kind of mood.
I have a vague impression—I forget from where and it may well be false—that Nora has read some of my AI alignment research, and that she thinks of it as not entirely pointless. If so, then when I say “pre-2020 MIRI (esp. Abram & Eliezer) deserve some share of the credit for my thinking”, then that’s meaningful, because there is in fact some nonzero credit to be given. Conversely, if you (or anyone) don’t know anything about my AI alignment research, or think it’s dumb, then you should ignore that part of my comment, it’s not offering any evidence, it w...
By contrast, AIs implemented using artificial neural networks (ANN) are white boxes in the sense that we have full read-write access to their internals. They’re just a special type of computer program, and we can analyze and manipulate computer programs however we want at essentially no cost.
Suppose you walk down a street, and unbeknownst to you, you’re walking by a dumpster that has a suitcase full of millions of dollars. There’s a sense in which you “can”, “at essentially no cost”, walk over and take the money. But you don’t know tha...
I’m pretty sure you have met people doing mechanistic interpretability, right?
Nora is Head of Interpretability at EleutherAI :)
Some examples include the now-debunked analogy from evolution, the false distinction between “inner” and “outer” alignment, and the idea that AIs will be rigid utility maximizing consequentialists (here, here, and here).
I feel like you’re trying to round these three things into a “yay versus boo” axis, and then come down on the side of “boo”. I think we can try to do better than that.
One can make certain general claims about learning algorithms that are true and for which evolution provides as good an example as any. One can also make other claims that are...
I certainly give relatively little weight to most conceptual AI research. That said, I respect that it's valuable for you and am open to trying to narrow the gap between our views here - I'm just not sure how!
To be more concrete, I'd value 1 year of current progress over 10 years of pre-2018 research (to pick a date relatively arbitrarily). I don't intend this as an attack on the earlier alignment community, I just think we're making empirical progress in a way that was pretty much impossible before we had good models available to study and I place a lot more value on this.
I think the attitude most people (including me) have is: “If we want to do technical work to reduce AI x-risk, then we should NOT be working on any technical problems that will almost definitely get solved “by default”, e.g. because they’re straightforward and lots of people are already working on them and mostly succeeding, or because there’s no way to make powerful AGI except via first solving those problems, etc.”.
Then I would rephrase your original question as: “OK, if we shouldn’t be working on those types of technical problems above … then are there ...
There’s a school of thought that academics travel much much more than optimal or healthy. See Cal Newport’s Deep Work, where he cites a claim that it’s “typical for junior faculty to travel twelve to twenty-four times a year”, and compares that to Radhika Nagpal’s blog post The Awesomest 7-Year Postdoc or: How I Learned to Stop Worrying and Love the Tenure-Track Faculty Life which says:
...I travel at most 5 times a year. This includes: all invited lectures, all NSF/Darpa investigator or panel meetings, conferences, special workshops, etc. Typically it looks s
If you publish it, a third party could make a small tweak and apply for a patent. If you patent it, a third party could make a small tweak and apply for a patent. What do you see as the difference? Or sorry if I’m misunderstanding the rules.
In theory, publishing X and patenting X are both equally valid ways to prevent other people from patenting X. Does it not work that way in practice?
Could be wrong, but I had the impression that software companies have historically amassed patents NOT because patenting X is the best way to prevent another company from patenting the exact same thing X or things very similar to X, but rather because “the best defense is a good offense”, and if I have a dubious software patent on X and you have a dubious software patent on Y then we can have a balance of terro...
I define “alignment” as “the AI is trying to do things that the AI designer had intended for the AI to be trying to do”, see here for discussion.
If you define “capabilities” as “anything that would make an AI more useful / desirable to a person or company”, then alignment research would be by definition a subset of capabilities research.
But it’s a very small subset!
Examples of things that constitute capabilities progress but not alignment progress include: faster and better and more and cheaper chips (and other related hardware like interconnects), the dev...
At the same time, I think Eliezer made a really strong (and well-argued) point that if we believe in epiphenomenalism then we have no reason to believe that our reports of consciousness have any connection to the phenomenon of consciousness. I haven't seen this point made so clearly elsewhere
Chalmers here says something like that (“It is certainly at least strange to suggest that consciousness plays no causal role in my utterances of ‘I am conscious’. Some have suggested more strongly that this rules out any knowledge of consciousness… The oddness of epiph...
I would have liked this article much more if the title had been “The 25 researchers who have published the largest number of academic articles on existential risk”, or something like that.
The current title (“The top 25 existential risk researchers based on publication count”) seems to insinuate that this criterion is reasonable in the context of figuring out who are the “Top 25 existential risk researchers” full stop, which it’s not, for reasons pointed out in other comments.
I have some interest in cluster B personality disorders, on the theory that something(s) in human brains makes people tend to be nice to their friends and family, and whatever that thing is, it would be nice to understand it better because maybe we can put something like it into future AIs, assuming those future AIs have a sufficiently similar high-level architecture to the human brain, which I think is plausible.
And whatever that thing is, it evidently isn’t working in the normal way in cluster B personality disorder people, so maybe better understanding ...
Are we talking about in the debate, or in long-form good-faith discussion?
For the latter, it’s obviously worth talking about, and I talk about it myself plenty. Holden’s post AI Could Defeat All Of Us Combined is pretty good, and the new lunar society podcast interview of Carl Shulman is extremely good on this topic (the relevant part is mostly the second episode [it was such a long interview they split it into 2 parts]).
For the former, i.e. in the context of a debate, the point is not to hash out particular details and intervention points, but rather just...
Thanks!
we need good clear scenarios of how exactly step by step this happens
Hmm, depending on what you mean by “this”, I think there are some tricky communication issues that come up here, see for example this Rob Miles video.
On top of that, obviously this kind of debate format is generally terrible for communicating anything of substance and nuance.
...Melanie seemed either (a) uninformed of the key arguments (she just needs to listen to one of Yampolskiy's recent podcast interviews to get a good accessible summary). Or (b) refused to engage with such argumen
In this post the criticizer gave the criticizee an opportunity to reply in-line in the published post—in effect, the criticizee was offered the last word. I thought that was super classy, and I’m proud to have stolen that idea on two occasions (1,2).
If anyone’s interested, the relevant part of my email was:
...…
You can leave google docs margin comments if you want, and:
- If I’m just straight-up wrong about something, or putting words in your mouth, then I’ll just correct the text before publication.
- If you are leave a google docs comment that’s more like a counte
There’s probably some analogy here to ‘inner alignment’ versus ‘our alignment’ in the AI safety literature, but I find these two terms so vague, confusing, and poorly defined that I can’t see which of them corresponds to what, exactly, in my gene/brain alignment analogy; any guidance on that would be appreciated.
The following table is my attempt to clear things up. I think there are two stories we can tell.
When you say "I don't know how you can be confident(>50%) to say that it'll surpass human", I'm not sure if you mean "...in 20 years" or "...ever". You mention 20 years in one place but not the rest of your question, so I'm not really sure what you meant.
Your question is using "flops" to mean FLOP/s in some places and FLOP in other places.
Hmm. Touché. I guess another thing on my mind is the mood of the hype-conveyer. My stereotypical mental image of “hype” involves Person X being positive & excited about the product they’re hyping, whereas the imminent-doom-ers that I’ve talked to seem to have a variety of moods including distraught, pissed, etc. (Maybe some are secretly excited too? I dunno; I’m not very involved in that community.)
You’re entitled to disagree with short-timelines people (and I do too) but I don’t like the use of the word “hype” here (and “purely hype” is even worse); it seems inaccurate, and kinda an accusation of bad faith. “Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias). None of those applies to Greg here, AFAICT. Instead, you can just say “he’s wrong” etc.
“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias).
All of this seems to apply to AI-risk-worriers?
I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).
Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂
Yup! Alternatively: we’re working with silicon chips that are 10,000,000× faster than the brain, so we can get a 100× speedup even if we’re a whopping 100,000× less skillful at parallelizing brain algorithms than the brain itself.
Hi, I’m an AGI safety researcher who studies and talks about neuroscience a whole lot. I don’t have a neuroscience degree—I’m self-taught in neuroscience, and my actual background is physics. So I can’t really speak to what happens in neuroscience PhD programs. Nevertheless, my vague impression is that the kinds of things that people learn and do and talk about in neuroscience PhD programs has very little overlap with the kinds of things that would be relevant to AI safety. Not zero, but probably very little. But I dunno, I guess it depends on what classes you take and what research group you join. ¯\_(ツ)_/¯
AGI is possible but putting a date on when we will have an AGI is just fooling ourselves.
So if someone says to you “I’m absolutely sure that there will NOT be AGI before 2035”, you would disagree, and respond that they’re being unreasonable and overconfident, correct?
I find the article odd in that it seems to be going on and on about how it's impossible to predict the date when people will invent AGI, yet the article title is "AGI isn't close", which is, umm, a prediction about when people will invent AGI, right?
If the article had said "technological forecasting is extremely hard, therefore we should just say we don't know when we'll get AGI, and we should make contingency-plans for AGI arriving tomorrow or in 10 years or in 100 years or 1000 etc.", I would have been somewhat more sympathetic.
(Although I still think nu...
I had a very bad time with RSI from 2006-7, followed by a crazy-practically-overnight-miracle-cure-happy-ending. See my recent blog post The “mind-body vicious cycle” model of RSI & back pain for details & discussion. :)
The implications for "brand value" would depend on whether people learn about "EA" as the perpetrator vs. victim. For example, I think there were charitable foundations that got screwed over by Bernie Madoff, and I imagine that their wiki articles would have also had a spike in views when that went down, but not in a bad way.
Related:
I have some discussion of this area in general and one of David Jilk’s papers in particular at my post Two paths forward: “Controlled AGI” and “Social-instinct AGI”.
In short, it seems to me that if you buy into this post, then the next step should be to figure out how human social instincts work, not just qualitatively but in enough detail to write it into AGI source code.
I claim that this is an open problem, involving things like circuits in the hypothalamus and neuropeptide receptors in the striatum. And it’s the main thing that I’m working on myself.
Add...
I think things like “If we see Sign X of misalignment from the AI, we should shut it down and retrain” comprise a small fraction of AI safety research, and I think even that small fraction consists primarily of stating extremely obvious ideas (let’s use honeypots! let’s do sandbox tests! let’s use interpretability! etc.) and exploring whether or not they would work, rather than stating non-obvious ideas. The horse has long ago left the barn on “the idea of sandbox testing and honeypots” being somewhere in an LLM’s training data!
I think a much larger fracti...
My paraphrase of the SDO argument is:
With our best-guess parameters in the Drake equation, we should be surprised that there are no aliens. But for all we know, maybe one or more of the parameters in the Drake equation is many many orders of magnitude lower than our best guess. And if that’s in fact the case, then we should not be surprised that there are no aliens!
…which seems pretty obvious, right?
So back to the context of AI risk. We have:
Hi, I’m an AI alignment technical researcher who mostly works independently, and I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AI alignment—since I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :)update: I’m all set now