All of Steven Byrnes's Comments + Replies

Hi, I’m an AI alignment technical researcher who mostly works independently, and I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AI alignment—since I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :) update: I’m all set now

Humans are less than maximally aligned with each other (e.g. we care less about the welfare of a random stranger than about our own welfare), and humans are also less than maximally misaligned with each other (e.g. most people don’t feel a sadistic desire for random strangers to suffer). I hope that everyone can agree about both those obvious things.

That still leaves the question of where we are on the vast spectrum in between those two extremes. But I think your claim “humans are largely misaligned with each other” is not meaningful enough to argue about.... (read more)

My terminology would be that (2) is “ambitious value learning” and (1) is “misaligned AI that cooperates with humans because it views cooperating-with-humans to be in its own strategic / selfish best interest”.

I strongly vote against calling (1) “aligned”. If you think we can have a good future by ensuring that it is always in the strategic / selfish best interest of AIs to be nice to humans, then I happen to disagree but it’s a perfectly reasonable position to be arguing, and if you used the word “misaligned” for those AIs (e.g. if you say “alignment is u... (read more)

6
Matthew_Barnett
1mo
This is a reasonable definition, but it's important to note that under this definition of alignment, humans are routinely misaligned with each other. In almost any interaction I have with strangers -- for example, when buying a meal at a restaurant -- we are performing acts for each other because of mutually beneficial trade rather than because we share each other's values. That is, humans are largely misaligned with each other. And yet the world does not devolve into a state of violence and war as a result (at least most of the time), even in the presence of large differences in power between people. This has epistemic implications for whether a world filled with AIs would similarly be peaceful, even if those AIs are misaligned by this definition.

May I ask, what is your position on creating artificial consciousness?
Do you see digital suffering as a risk? If so, should we be careful to avoid creating AC?

I think the word “we” is hiding a lot of complexity here—like saying “should we decommission all the world’s nuclear weapons?” Well, that sounds nice, but how exactly? If I could wave a magic wand and nobody ever builds conscious AIs, I would think seriously about it, although I don’t know what I would decide—it depends on details I think. Back in the real world, I think that we’re eventually going t... (read more)

Sorry if I missed it, but is there some part of this post where you suggest specific concrete interventions / actions that you think would be helpful?

1
Silica
2mo
The main goal was to argue for preventing AC. The main intervention discussed was to prevent AC through research and development monitoring. It will likely require the implementation of protocols and labels of certain kinds of consciousness and neurophysics research as DURC or components of concern. I think a close analogue is the biothreat screening projects (IBBIS, SecureDNA) but it’s unclear how a similar project would be implemented for AC “threats”.  By suggesting a call for Artificial Consciousness Safety I am expressing that I don’t think we know any concrete actions that will definitely help and if the need is there (for ACS) we should pursue research to develop interventions. Just like in AI safety no one really knows how to make AI safe. Because I think AC will not be safe and that the risk may not outweigh the benefits, we could seriously pursue strategies that make this common knowledge so things like researchers unintentionally contributing to its creation don’t happen. We may have a significant chance to act before it becomes well known that AC might be possible or profitable. Unlike the runaway effects of AI companies now, we can still prevent the AC economy from even starting.

Mark Solms thinks he understands how to make artificial consciousness (I think everything he says on the topic is wrong), and his book Hidden Spring has an interesting discussion (in chapter 12) on the “oh jeez now what” question. I mostly disagree with what he says about that too, but I find it to be an interesting case-study of someone grappling with the question.

In short, he suggests turning off the sentient machine, then registering a patent for making conscious machines, and assigning that patent to a nonprofit like maybe Future of Life Institute, and... (read more)

3
Silica
2mo
Thanks for this book suggestion, it does seem like an interesting case study.  I'm quite sceptical any one person could reverse engineer consciousness and I don't buy that it's good reasoning to go ahead with publication simply because someone else might. I'll have to look into Solms and return to this.  May I ask, what is your position on creating artificial consciousness? Do you see digital suffering as a risk? If so, should we be careful to avoid creating AC? 

I am not claiming analogies have no place in AI risk discussions. I've certainly used them a number of times myself. 

Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”.  :)

Cf. what I called “left-column thinking” in the diagram here.

For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actu... (read more)

It is certainly far from obvious: for example, devastating as the COVID-19 pandemic was, I don’t think anyone believes that 10,000 random re-rolls of the COVID-19 pandemic would lead to at least one existential catastrophe. The COVID-19 pandemic just was not the sort of thing to pose a meaningful threat of existential catastrophe, so if natural pandemics are meant to go beyond the threat posed by the recent COVID-19 pandemic, Ord really should tell us how they do so.

This seems very misleading. We know that COVID-19 has <<5% IFR. Presumably the concer... (read more)

6
Vasco Grilo
4mo
Thanks, Steven! Yupe, I think those are the questions to ask. My interpretation of the passage you quoted is that David is saying that Toby did not address them. Good point. My recollection is that David acknowledges that in the series, but argues that further arguments would be needed for one to update to the super high risk claimed by Toby. I think empirical evidence could still inform our assessment of the risk to a certain extent. For example, one can try to see how the number of lab accidents correlates with the cost of sequencing DNA, and then extrapolate the number of accidents into the future based on decreasing sequencing cost. Toby discusses some of these matters, but the inference of the 3 % bio existential risk from 2021 to 2120 still feels very ad hoc and opaque to me. Note safety measures (e.g. vaccines and personal protective equipment) would also improve alongside capabilities, so the net effect is not obvious. I guess risk will increase, but Toby's guess for bio existential risk appears quite high.

(Recently I've been using "AI safety" and "AI x-safety" interchangeably when I want to refer to the "overarching" project of making the AI transition go well, but I'm open to being convinced that we should come up with another term for this.)

I’ve been using the term “Safe And Beneficial AGI” (or more casually, “awesome post-AGI utopia”) as the overarching “go well” project, and “AGI safety” as the part where we try to make AGIs that don’t accidentally [i.e. accidentally from the human supervisors’ / programmers’ perspective] kill everyone, and (following c... (read more)

This kinda overlaps with (2), but the end of 2035 is 12 years away. A lot can happen in 12 years! If we look back to 12 years ago, it was December 2011. AlexNet had not come out yet, neural nets were a backwater within AI, a neural network with 10 layers and 60M parameters was considered groundbreakingly deep and massive, the idea of using GPUs in AI was revolutionary, tensorflow was still years away, doing even very simple image classification tasks would continue to be treated as a funny joke for several more years (literally—this comic is from 2014!), I... (read more)

3
Yarrow B.
4mo
Great comment!

That might be true in the very short term but I don’t believe it in general. For example, how many reporters were on the Ukraine beat before Russia invaded in February 2022? And how many reporters were on the Ukraine beat after Russia invaded? Probably a lot more, right?

Thanks for the comment!

I think we should imagine two scenarios, one where I see the demonic possession people as being “on my team” and the other where I see them as being “against my team”.

To elaborate, here’s yet another example: Concerned Climate Scientist Alice responding to statements by environmentalists of the Gaia / naturalness / hippy-type tradition. Alice probably thinks that a lot of their beliefs are utterly nuts. But it’s pretty plausible that she sees them as kinda “on her side” from a vibes perspective. (Hmm, actually, also imagine this is 2... (read more)

Great reply! In fact, I think that the speech you wrote for the police reformer is probably the best way to advance the police corruption cause in that situation, with one change: they should be very clear that they don't think that demons exist. 

I think there is an aspect where the AI risk skeptics don't want to be too closely associated with ideas they think are wrong: because if the AI x-riskers are proven to be wrong, they don't want to go down with the ship. IE: if another AI winter hits, or an AGI is built that shows no sign of killing anyone, t... (read more)

  • I suggest to spend a few minutes pondering what to do if crazy people (perhaps just walking by) decide to "join" the protest. Y'know, SF gonna SF.

  • FYI at a firm I used to work at, once there was a group protesting us out front. Management sent an email that day suggesting that people leave out a side door. So I did. I wasn't thinking too hard about it, and I don't know how many people at the firm overall did the same.

(I have no personal experience with protests, feel free to ignore.)

In your hypothetical, if Meta says “OK you win, you're right, we'll henceforth take steps to actually cure cancer”, onlookers would assume that this is a sensible response, i.e. that Meta is responding appropriately to the complaint. If the protester then gets back on the news the following week and says “no no no this is making things even worse”, I think onlookers would be very confused and say “what the heck is wrong with that protester?”

6
Holly_Elmore
7mo
It was a difficult point to make and we ended up removing it where we could.
9
Holly_Elmore
7mo
It is a confusing point, maybe too subtle for a protest. I am learning!
4
Ben_West
7mo
This is a good point and feels persuasive, thanks!

I don’t think “mouldability” is a synonym of “white-boxiness”. In fact, I think they’re hardly related at all:

  • There can be a black box with lots of knobs on the outside that change the box’s behavior. It’s still a black box.
  • Conversely, consider an old-fashioned bimetallic strip thermostat with a broken dial. It’s not mouldable at all—it can do one and only thing, i.e. actuate a switch at a certain fixed temperature. (Well, I guess you can use it as a doorstop!) But a bimetallic strip thermostat still very white-boxy (after I spend 30 seconds telling you ho
... (read more)

If you want to say "it's a black box but the box has a "gradient" output channel in addition to the "next-token-probability-distribution" output channel", then I have no objection.

If you want to say "...and those two output channels are sufficient for safe & beneficial AGI", then you can say that too, although I happen to disagree.

If you want to say "we also have interpretability techniques on top of those, and they work well enough to ensure alignment for both current and future AIs", then I'm open-minded and interested in details.

If you want to say "... (read more)

3
Nora Belrose
7mo
Differentiability is a pretty big part of the white box argument. The terabyte compiled executable binary is still white box in a minimal sense but it's going to take a lot of work to mould that thing into something that does what you want. You'll have to decompile it and do a lot of static analysis, and Rice's theorem gets in the way of the kinds of stuff you can prove about it. The code might be adversarially obfuscated, although literal black box obfuscation is provably impossible. If instead of a terabyte of compiled code, you give me a trillion neural net weights, I can fine tune that network to do a lot of stuff. And if I'm worried about the base model being preserved underneath and doing nefarious things, I can generate synthetic data from the fine tuned model and train a fresh network from scratch on that (although to be fair that's pretty compute-intensive).

I was reading it as a kinda disjunctive argument. If Nora says that a pause is bad because of A and B, either of which is sufficient on its own from her perspective, then you could say "A isn't cruxy for her" (because B is sufficient) or you could say "B isn't cruxy for her" (because A is sufficient). Really, neither of those claims is accurate.

Oh well, whatever, I agree with you that the OP could have been clearer.

7
Nora Belrose
7mo
Yep it's all meant to be disjunctive and yep it could have been clearer. FWIW this essay went through multiple major revisions and at one point I was trying to make the disjunctivity of it super clear but then that got de-prioritized relative to other stuff. In the future if/when I write about this I think I'll be able to organize things significantly better

If you desperately wish we had more time to work on alignment, but also think a pause won’t make that happen or would have larger countervailing costs, then that would lead to an attitude like: “If only we had more time! But alas, a pause would only make things worse. Let’s talk about other ideas…” For my part, I definitely say things like that (see here).

However, Nora has sections claiming “alignment is doing pretty well” and “alignment optimism”, so I think it’s self-consistent for her to not express that kind of mood.

4
Zach Stein-Perlman
7mo
Insofar as Nora discusses nuances of overhang, it would be odd and annoying for that to not actually be cruxy for her (given that she doesn't say something like this isn't cruxy for me).

I have a vague impression—I forget from where and it may well be false—that Nora has read some of my AI alignment research, and that she thinks of it as not entirely pointless. If so, then when I say “pre-2020 MIRI (esp. Abram & Eliezer) deserve some share of the credit for my thinking”, then that’s meaningful, because there is in fact some nonzero credit to be given. Conversely, if you (or anyone) don’t know anything about my AI alignment research, or think it’s dumb, then you should ignore that part of my comment, it’s not offering any evidence, it w... (read more)

3
Tom McGrath
7mo
I've certainly heard of your work but it's far enough out of my research interests that I've never taken a particularly strong interest. Writing this in this context makes me realise I might have made a bit of a one-man echo chamber for myself... Do you mind if we leave this as 'undecided' for a while? Regarding ELK - I think the core of the problem as I understand it is fairly clear once you begin thinking about interpretability. Understanding the relation between AI and human ontologies was part of the motivation behind my work on alphazero (as well as an interest in the natural abstractions hypothesis). Section 4 "Encoding of human conceptual knowledge" and Section 8 "Exploring activations with unsupervised methods" are the places to look. The section on challenges and limitations in concept probing I think echoes a lot of the concerns in ELK.  In terms of subsequent work on ELK, I don't think much of the work on solving ELK was particularly useful, and often reinvented existing methods (e.g. sparse probing, causal interchange interventions). If I were to try and work on it then I think the best way to do so would be to embed the core challenge in a tractable research program, for instance trying to extract new scientific knowledge from ML models like alphafold. To move this in a more positive direction, the most fruitful/exciting conceptual work I've seen is probably (1) the natural abstractions hypothesis and (2) debate. When I think a bit about why I particularly like these, for (1) it's because it seems plausibly true, extremely useful if true, and amenable to both formal theoretical work and empirical study. For (2) it's because it's a pretty striking new idea that seems very powerful/scalable, but also can be put into practice a bit ahead of really powerful systems.

By contrast, AIs implemented using artificial neural networks (ANN) are white boxes in the sense that we have full read-write access to their internals. They’re just a special type of computer program, and we can analyze and manipulate computer programs however we want at essentially no cost. 

Suppose you walk down a street, and unbeknownst to you, you’re walking by a dumpster that has a suitcase full of millions of dollars. There’s a sense in which you “can”, “at essentially no cost”, walk over and take the money. But you don’t know tha... (read more)

5
Nora Belrose
7mo
It's essentially no cost to run a gradient-based optimizer on a neural network, and I think this is sufficient for good-enough alignment. I view the the interpretability work I do at Eleuther as icing on the cake, allowing us to steer models even more effectively than we already can. Yes, it's not zero cost, but it's dramatically lower cost than it would be if we had to crack open a skull and do neurosurgery. Also, if by "mechanistic interpretability" you mean "circuits" I'm honestly pretty pessimistic about the usefulness of that kind of research, and I think the really-useful stuff is lower cost than circuits-based interp.
1
Rafael Harth
7mo
Well, a computer model is "literally" transparent in the sense that you can see everything, which means the only difficulty is only in understanding what it means. So the part where you spend 5 million dollars on a PET scanner doesn't exist for ANNs, and in that sense you can analyze them for "free". If the understanding part is sufficiently difficult... which it sure seems to be... then this doesn't really help, but it is a coherent conceptual difference.

I’m pretty sure you have met people doing mechanistic interpretability, right?

Nora is Head of Interpretability at EleutherAI :)

Some examples include the now-debunked analogy from evolution, the false distinction between “inner” and “outer” alignment, and the idea that AIs will be rigid utility maximizing consequentialists (here, here, and here).

I feel like you’re trying to round these three things into a “yay versus boo” axis, and then come down on the side of “boo”. I think we can try to do better than that.

One can make certain general claims about learning algorithms that are true and for which evolution provides as good an example as any. One can also make other claims that are... (read more)

1
Gerald Monroe
7mo
Steven the issue is without empirical data you end up with a branching tree of possible futures. And if you make some faulty assumptions early - such as assuming the amount of compute needed to host optimal AI models is small and easily stolen via hacking - you end up lost in a tree of possibilities where every one you consider is "doom". And thus you arrive at the conclusion of "pDoom is 99 percent", because you are only cognitively able to consider adjacent futures in the possibility tree. No living human can keep track of thousands of possibilities in parallel. This is where I think Eliezer and Zvi are lost, where they simply ignore branches that would lead to different outcomes. (And vice versa, you could arrive at the opposite conclusion). It becomes angels at the head of a pin. There is no way to make a policy decision based on this. You need to prove you beliefs with data. It's how we even got here as a species.
4
Zach Furman
7mo
It's perhaps also worth separating the claims that A) previous alignment research was significantly less helpful than today's research and B) the reason that was the case continues to hold today. I think I'd agree with some version of A, but strongly disagree with B. The reason that A seems probably true to me is that we didn't know the basic paradigm in which AGI would arise, and so previous research was forced to wander in the dark. You might also believe that today's focus on empirical research is better than yesterday's focus on theoretical research (I don't necessarily agree) or at least that theoretical research without empirical feedback is on thin ice (I agree). I think most people now think that deep learning, perhaps with some modifications, will be what leads to AGI - some even think that LLM-like systems will be sufficient. And the shift from primarily theoretical research to primarily empirical research has already happened. So what will cause today's research to be worse than future research with more capable models? You can appeal to a general principle of "unknown unknowns," but if you genuinely believe that deep learning (or LLMs) will eventually be used in future AGI, it seems hard to believe that knowledge won't transfer at all.

I certainly give relatively little weight to most conceptual AI research. That said, I respect that it's valuable for you and am open to trying to narrow the gap between our views here - I'm just not sure how!

To be more concrete, I'd value 1 year of current progress over 10 years of pre-2018 research (to pick a date relatively arbitrarily). I don't intend this as an attack on the earlier alignment community, I just think we're making empirical progress in a way that was pretty much impossible before we had good models available to study and I place a lot more value on this.

I think the attitude most people (including me) have is: “If we want to do technical work to reduce AI x-risk, then we should NOT be working on any technical problems that will almost definitely get solved “by default”, e.g. because they’re straightforward and lots of people are already working on them and mostly succeeding, or because there’s no way to make powerful AGI except via first solving those problems, etc.”.

Then I would rephrase your original question as: “OK, if we shouldn’t be working on those types of technical problems above … then are there ... (read more)

There’s a school of thought that academics travel much much more than optimal or healthy. See Cal Newport’s Deep Work, where he cites a claim that it’s “typical for junior faculty to travel twelve to twenty-four times a year”, and compares that to Radhika Nagpal’s blog post The Awesomest 7-Year Postdoc or: How I Learned to Stop Worrying and Love the Tenure-Track Faculty Life which says:

I travel at most 5 times a year. This includes: all invited lectures, all NSF/Darpa investigator or panel meetings, conferences, special workshops, etc. Typically it looks s

... (read more)

If you publish it, a third party could make a small tweak and apply for a patent. If you patent it, a third party could make a small tweak and apply for a patent. What do you see as the difference? Or sorry if I’m misunderstanding the rules.

In theory, publishing X and patenting X are both equally valid ways to prevent other people from patenting X. Does it not work that way in practice?

Could be wrong, but I had the impression that software companies have historically amassed patents NOT because patenting X is the best way to prevent another company from patenting the exact same thing X or things very similar to X, but rather because “the best defense is a good offense”, and if I have a dubious software patent on X and you have a dubious software patent on Y then we can have a balance of terro... (read more)

I define “alignment” as “the AI is trying to do things that the AI designer had intended for the AI to be trying to do”, see here for discussion.

If you define “capabilities” as “anything that would make an AI more useful / desirable to a person or company”, then alignment research would be by definition a subset of capabilities research.

But it’s a very small subset!

Examples of things that constitute capabilities progress but not alignment progress include: faster and better and more and cheaper chips (and other related hardware like interconnects), the dev... (read more)

5[anonymous]8mo
Thanks a lot for this! To take one of your examples - faster and better chips (or more compute generally). It seems like this does actually improve alignment on perhaps the most popular definition of alignment as intent-alignment. In terms of answering questions from prompts, GPT-4 is more in line with the intentions of the user than GPT-3 and this is mainly due to more compute. I mean this in the sense that it producers answers that are better/more in line with what users want I'm not sure I agree that nobody is working on the problem of which reward functions make AIs that are honest and cooperative. For instance, leading AI companies seem to me to be trying to make LLMs that are honest, and cooperative with their users (e.g. not threatening them). In fact, this seems to be a major focus of these companies. Do you think I am missing something?

At the same time, I think Eliezer made a really strong (and well-argued) point that if we believe in epiphenomenalism then we have no reason to believe that our reports of consciousness have any connection to the phenomenon of consciousness. I haven't seen this point made so clearly elsewhere

Chalmers here says something like that (“It is certainly at least strange to suggest that consciousness plays no causal role in my utterances of ‘I am conscious’. Some have suggested more strongly that this rules out any knowledge of consciousness… The oddness of epiph... (read more)

I would have liked this article much more if the title had been “The 25 researchers who have published the largest number of academic articles on existential risk”, or something like that.

The current title (“The top 25 existential risk researchers based on publication count”) seems to insinuate that this criterion is reasonable in the context of figuring out who are the “Top 25 existential risk researchers” full stop, which it’s not, for reasons pointed out in other comments.

7
FJehn
8mo
Good point. Changed the title accordingly. 

I have some interest in cluster B personality disorders, on the theory that something(s) in human brains makes people tend to be nice to their friends and family, and whatever that thing is, it would be nice to understand it better because maybe we can put something like it into future AIs, assuming those future AIs have a sufficiently similar high-level architecture to the human brain, which I think is plausible.

And whatever that thing is, it evidently isn’t working in the normal way in cluster B personality disorder people, so maybe better understanding ... (read more)

5
Geoffrey Miller
9mo
Steven - well, I think the Cluster B personality disorders (including antisocial, borderline, histrionic, and narcissistic disorders) are probably quite important to understand in AI alignment.  Antisocial personality disorder (which is closely related to the more classical notion of 'psychopathy') seems likely to characterize a lot of 'bad actors' who might (mis)use AI for trolling, crime, homicide, terrorism, etc. And, it provides a model for what we don't want AGIs to behave like.

Are we talking about in the debate, or in long-form good-faith discussion?

For the latter, it’s obviously worth talking about, and I talk about it myself plenty. Holden’s post AI Could Defeat All Of Us Combined is pretty good, and the new lunar society podcast interview of Carl Shulman is extremely good on this topic (the relevant part is mostly the second episode [it was such a long interview they split it into 2 parts]).

For the former, i.e. in the context of a debate, the point is not to hash out particular details and intervention points, but rather just... (read more)

9
titotal
10mo
Yeah, I think your version of the argument is the most convincing flavour. I am personally unconvinced by it in the context of x-risk (I don't think we can get to billions of AI's without making AI at least x-risk safe), but the good thing is that it works equally well as an argument for AI catastrophic risk. I don't think this is the case for arguments based on sudden nanomachine factories or whatever, where someone who realizes that the scenario is flimsy and extremely unlikely might just dismiss AI safety altogether.  I don't think the public cares that much about the difference between an AI killing 100% of humanity and an AI killing 50% of humanity, or even 1%, 0.1%. Consider the extreme lengths governments have gone through to prevent terrorist attacks that claimed at most a few thousand lives. 

Thanks!

we need good clear scenarios of how exactly step by step this happens

Hmm, depending on what you mean by “this”, I think there are some tricky communication issues that come up here, see for example this Rob Miles video.

On top of that, obviously this kind of debate format is generally terrible for communicating anything of substance and nuance.

Melanie seemed either (a) uninformed of the key arguments (she just needs to listen to one of Yampolskiy's recent podcast interviews to get a good accessible summary). Or (b) refused to engage with such argumen

... (read more)
5
titotal
10mo
The objection here seems to be that if you present a specific takeover scenario, people will point out flaws in it.  But what exactly is the alternative here? Just state on faith that "the AI will find a way because it's really smart?" Do you have any idea how unconvincing that sounds?

In this post the criticizer gave the criticizee an opportunity to reply in-line in the published post—in effect, the criticizee was offered the last word. I thought that was super classy, and I’m proud to have stolen that idea on two occasions (1,2).

If anyone’s interested, the relevant part of my email was:

You can leave google docs margin comments if you want, and:

  • If I’m just straight-up wrong about something, or putting words in your mouth, then I’ll just correct the text before publication.
  • If you are leave a google docs comment that’s more like a counte
... (read more)

There’s probably some analogy here to ‘inner alignment’ versus ‘our alignment’ in the AI safety literature, but I find these two terms so vague, confusing, and poorly defined that I can’t see which of them corresponds to what, exactly, in my gene/brain alignment analogy; any guidance on that would be appreciated.

The following table is my attempt to clear things up. I think there are two stories we can tell.

... (read more)

When you say "I don't know how you can be confident(>50%) to say that it'll surpass human", I'm not sure if you mean "...in 20 years" or "...ever". You mention 20 years in one place but not the rest of your question, so I'm not really sure what you meant.

1
jackchang110
1y
I mean "ever", thanks for the question

Your question is using "flops" to mean FLOP/s in some places and FLOP in other places.

https://www.lesswrong.com/posts/XiKidK9kNvJHX9Yte/avoid-the-abbreviation-flops-use-flop-or-flop-s-instead

Hmm. Touché. I guess another thing on my mind is the mood of the hype-conveyer. My stereotypical mental image of “hype” involves Person X being positive & excited about the product they’re hyping, whereas the imminent-doom-ers that I’ve talked to seem to have a variety of moods including distraught, pissed, etc. (Maybe some are secretly excited too? I dunno; I’m not very involved in that community.)

You’re entitled to disagree with short-timelines people (and I do too) but I don’t like the use of the word “hype” here (and “purely hype” is even worse); it seems inaccurate, and kinda an accusation of bad faith. “Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias). None of those applies to Greg here, AFAICT. Instead, you can just say “he’s wrong” etc.

“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias). 

All of this seems to apply to AI-risk-worriers?

  • AI-risk-worriers are promoting a narrative that powerful AI will come soon
  • AI-risk-worriers are taken more seriously, have more job opportunities, get more status, get more of their policy proposals, etc, to the extent that this narrative is successful
  • My experience is that
... (read more)

I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).

Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂

Yup! Alternatively: we’re working with silicon chips that are 10,000,000× faster than the brain, so we can get a 100× speedup even if we’re a whopping 100,000× less skillful at parallelizing brain algorithms than the brain itself.

3
Geoffrey Miller
1y
Thanks for the very useful link. I hadn't read that before.  I like the intuition pump that if advanced AI systems are running at about 10 million times human cognitive speed, then one year of human history equals 10 million years of AI experience.

Hi, I’m an AGI safety researcher who studies and talks about neuroscience a whole lot. I don’t have a neuroscience degree—I’m self-taught in neuroscience, and my actual background is physics. So I can’t really speak to what happens in neuroscience PhD programs. Nevertheless, my vague impression is that the kinds of things that people learn and do and talk about in neuroscience PhD programs has very little overlap with the kinds of things that would be relevant to AI safety. Not zero, but probably very little. But I dunno, I guess it depends on what classes you take and what research group you join. ¯\_(ツ)_/¯

8
titotal
1y
Computer science is probably the most relevant degree for AI safety, but there are already lots of computer scientists working on it, and as far as I know very few neuroscientists. So it's possible that adding one additional neuroscientist could be more valuable than adding one additional CSE person. Especially if we factor in that OP may be better at/ more motivated to do neuroscience than CSE.  I could see paths of AI development where neuroscience becomes much more important than it is presently: for example, if we go the "brain emulation" route.  I think my advice for the OP would be that if they like/ are better at neuroscience more than CSE, they should go for it. 

AGI is possible but putting a date on when we will have an AGI is just fooling ourselves. 

So if someone says to you “I’m absolutely sure that there will NOT be AGI before 2035”, you would disagree, and respond that they’re being unreasonable and overconfident, correct?

I find the article odd in that it seems to be going on and on about how it's impossible to predict the date when people will invent AGI, yet the article title is "AGI isn't close", which is, umm, a prediction about when people will invent AGI, right?

If the article had said "technological forecasting is extremely hard, therefore we should just say we don't know when we'll get AGI, and we should make contingency-plans for AGI arriving tomorrow or in 10 years or in 100 years or 1000 etc.", I would have been somewhat more sympathetic.

(Although I still think nu... (read more)

0
Toni MUENDEL
1y
Nice catch. Yes the title could use some refining, but it does catch more attention. The point that I am am trying to make in the essay is, AGI is possible but putting a date on when we will have an AGI is just fooling ourselves.    Thanks for taking the time to comment.

I had a very bad time with RSI from 2006-7, followed by a crazy-practically-overnight-miracle-cure-happy-ending. See my recent blog post The “mind-body vicious cycle” model of RSI & back pain for details & discussion.  :)

The implications for "brand value" would depend on whether people learn about "EA" as the perpetrator vs. victim. For example, I think there were charitable foundations that got screwed over by Bernie Madoff, and I imagine that their wiki articles would have also had a spike in views when that went down, but not in a bad way.

7
RyanCarey
1y
I agree in principle, but I think EA shares some of the blame here - FTX's leadership group consisted of four EAs. It was founded for ETG reasons, with EA founders and with EA investment, by Sam, an act utilitarian, who had been a part of EA-aligned groups for >10 years, and with a foundation that included a lot of EA leadership, and whose activities consisted mostly of funding EAs.

See also Nate Soares arguing against Joe’s conjunctive breakdown of risk here, and me here.

3
Emrik
1y
Thanks a lot : ) (Honestly just posting comments on posts linking to relevant stuff you can think of is both cheap and decent value.)

I have some discussion of this area in general and one of David Jilk’s papers in particular at my post Two paths forward: “Controlled AGI” and “Social-instinct AGI”.

In short, it seems to me that if you buy into this post, then the next step should be to figure out how human social instincts work, not just qualitatively but in enough detail to write it into AGI source code.

I claim that this is an open problem, involving things like circuits in the hypothalamus and neuropeptide receptors in the striatum. And it’s the main thing that I’m working on myself.

Add... (read more)

I think things like “If we see Sign X of misalignment from the AI, we should shut it down and retrain” comprise a small fraction of AI safety research, and I think even that small fraction consists primarily of stating extremely obvious ideas (let’s use honeypots! let’s do sandbox tests! let’s use interpretability! etc.) and exploring whether or not they would work, rather than stating non-obvious ideas. The horse has long ago left the barn on “the idea of sandbox testing and honeypots” being somewhere in an LLM’s training data!

I think a much larger fracti... (read more)

6
james.lucassen
1y
Pulling this sentence out for emphasis because it seems like the crux to me.
3
Peter S. Park
1y
Thanks so much, Steven, for your detailed feedback on the post! I really appreciate it. I should have made it clear that “If we see Sign X of misalignment from the AI, we should shut it down and retrain” was an example, rather than the whole category of potentially sensitive AI safety plans.   Another category which pertains to your example is "How do we use SGD to create an AI that will be aligned when it reaches high capabilities?" But I think the threat model is also relevant for this category of AI safety plans, though perhaps with less magnitude. The reason is that an agentic AI probably instrumentally converges to the behavior of preserving whatever mysterious goal it learned during training. This means it will try to obstruct our SGD strategy, pretend that it's working, and show us all the signs we expect to see. The fundamental problem underlying both this and the previous example is that I expect we cannot predict the precise moment the AI becomes agentic and/or dangerous, and that I expect to put low credence on any one specific SGD-based plan working reliably.  This means that unless we all agree to not build AGI, trial and error towards alignment is probably necessary, even though it's risky. I think the Manhattan Project doesn't have to be geographically isolated like in the desert, although perhaps we need sufficient clustering of people. The Manhattan Project was more brought up as an example of the kind of large-scale shift in research norms we would require, assuming my concern is well-founded. I think with enough collaborative discussion, planning, and execution, it should be very possible to preserve the secrecy-based value of AI safety plans while keeping ease-of-research high for us AI safety researchers. (Although of course, the devil is in the details.) 

My paraphrase of the SDO argument is:

With our best-guess parameters in the Drake equation, we should be surprised that there are no aliens. But for all we know, maybe one or more of the parameters in the Drake equation is many many orders of magnitude lower than our best guess. And if that’s in fact the case, then we should not be surprised that there are no aliens!

…which seems pretty obvious, right?

So back to the context of AI risk. We have:

  1. a framework in which risk is a conjunctive combination of factors…
  2. …in which, at several of the steps, a subset of su
... (read more)
5
Froolow
2y
I think you're using a philosophical framework I just don't recognise here - 'conjunctive' and 'disjunctive' are not ordinary vocabulary in the sort of statistical modelling I do. One possible description of statistical modelling is that you are aiming to capture relevant insights about the world in a mathematical format so you can test hypotheses about those insights. In that respect, a model is good or bad based on how well its key features reflect the real world, rather than because it takes some particular position on the conjunctive-vs-disjunctive dispute. For example I am very excited to see the results of the MTAIR project, which will use a model a little bit like the below. This isn't really 'conjunctive' or 'disjunctive' in any meaningful sense - it tries to multiply probabilities when they should be multiplied and add probabilities when they should be added. This is more like the philosophical framework I would expect modelling to be undertaken in. I'd add that one of the novel findings of this essay is that if there are 'conjunctive' steps between 'disjunctive' steps it is likely the distribution effect I find will still apply (that is, given order-of-magnitude uncertainty). Insofar as you agree that 4-ish steps in AI Risk are legitimately conjunctive as per your comment above, we probably materially agree on the important finding of this essay (that the distribution of risk is asymmetrically weighted towards low-risk worlds) even if we disagree about the exact point estimate around which that distribution skews Small point of clarification - you're looking at the review table for Carlsmith (2021), which corresponds to Section 4.3.1. The correlation table I produce is for the Full Survey dataset, which corresponds to Section 4.1.1.  Perhaps to highlight the difference, in the Full Survey dataset of 42 people; 5 people give exactly one probability <10%,  2 people give exactly two probabilities <10%, 2 people give exactly three probabilities <10% and 1 me
Load more