All of Steven Byrnes&#x27;s Comments + Replies

Steven Byrnes4mo3

OK, here’s the big picture of this discussion as I see it.

As someone who doesn’t think LLMs will scale to AGI, I skipped over pretty much all of your OP as off-topic from my perspective, until I got to the sentences:

Eventually, there will be some AI paradigm beyond LLMs that is better at generality or generalization. However, we don't know what that paradigm is yet and there's no telling how long it will take to be discovered. Even if, by chance, it were discovered soon, it's extremely unlikely it would make it all the way from conception to working AGI sy

... (read more)

Yarrow Bouchard 🔸

4mo

Okay, good to know. I know that there are different views, but it seems like a lot of people in EA have started taking near-term AGI a lot more seriously since ChatGPT was released, and those people generally don't give the other views — the views on which LLMs aren't evidence of near-term AGI — much credence. That's why the focus on LLMs. The other views tend to be highly abstract, highly theoretical, highly philosophical and so to argue about them you basically have to write the whole Encyclopedia Britannica and you can't point to clear evidence from tests, studies, economic or financial indicators, and practical performance to make a case about AGI timelines within about 2,000 words. Trying to argue those other views is not something I want to do, but I do want to argue about near-term AGI in a context where people are using LLMs as their key evidence for it. Because my brain works that way, I'm tempted to argue about the other views as well, but I never find those kinds of discussions satisfying. It feels like by the time you get a few exchanges deep into those discussions (either me personally or people in general), it gets into "How many angels can dance on the head of a pin?" territory. For any number of sub-questions under that very abstract AGI discussion, maybe the answer is this, maybe it's that, but nobody actually knows, there's no firm evidence, there's no theoretical consensus, and in fact the theorizing is very loose and pre-paradigmatic. (This is my impression after 15-20 years observing these discussions online and occasionally participating in them.) I think my response to these ideas should be, "Yeah. Maybe. Who knows?" because I don't think there's much to say beyond that. I didn't actually give a number for what I think are the chances of going from conception of a new AI paradigm to a working AGI system in 7 years. I did say it's extremely unlikely, which is the same language I used for AGI within 7 years overall. I said I think the

Steven Byrnes4mo2

OK, sorry for getting off track.

(…But I still think your post has a connotation in context that “AGI by 2032 is extremely unlikely [therefore AGI x-risk work is not an urgent priority]”, and that it would be worth clarifying that you are just arguing the narrow point.)
Wilbur Wright overestimated how long it would take him to fly by a factor of 25—he said 50 years, it was actually 2. This is an example of how even researchers estimating their own very-near-term progress on their own R&D pathway can absolutely suck at timelines, including in the over-pes

... (read more)

Yarrow Bouchard 🔸

4mo

Do you have any response to the arguments made in the post? I would be curious to hear if you have any interesting counterarguments. As for the rest, I think it's been addressed at sufficient length already.

Steven Byrnes4mo6

In a 2024 interview, Yann LeCun said he thought it would take "at least a decade and probably much more" to get to AGI or human-level AI by executing his research roadmap. Trying to pinpoint when ideas first started is a fraught exercise. If we say the start time is the 2022 publication of LeCun's position paper "A Path Towards Autonomous Machine Intelligence", then by LeCun's own estimate, the time from publication to human-level AI is at least 12 years and "probably much more".

Here’s why I don’t think “start time for LeCun’s research program is 2022” is ... (read more)

Yarrow Bouchard 🔸

4mo

I find this comment fairly confusing, so I'm going to try to hopefully clear up some of the confusion. Was the intention of the comment I made about Yann LeCun's and Richard Sutton's research roadmaps unclear? It has nothing to do with the question of how far in advance we should start preparing for AGI. I was just giving a different point of comparison than your example of the progress in LLMs from 2018 to 2025. These were examples of how two successful AI researchers think about the amount of time between formulating the fundamental concepts — or at least the fundamental research directions — necessary to build AGI in a paper and actually building AGI. How much in advance of AGI you'd want to prepare is a separate question. Similarly, I don't think your example of the amount of progress in LLMs from 2018 to 2025 was intended to make an argument about how long in advance of AGI to start preparing, was it? I thought you were simply trying to argue that the time between a novel AI paradigm being conceptualized and AGI being created could indeed be 7 years, contrary to what I asserted in the conclusion to my post. Am I misunderstanding something? This response doesn't seem to be a response of what I was trying to say in the comment it's responding to. Am I missing the point? The topic of how much in advance we should be preparing for AGI and what, specifically, we should be doing to prepare is, of course, related to the topic of when we think AGI is likely to happen, but someone could make the argument that it's important to start preparing for AGI now even if it's 50 or 100 years away. The correctness or incorrectness of that argument wouldn't depend on whether AGI by 2032 is extremely unlikely. My post is about whether AGI by 2032 is extremely unlikely and isn't intended to comment on the question of how far in advance of AGI we should prepare, or what we should do to prepare. If we really should be preparing for AGI 50 or 100 years in advance, then whether I t

Steven Byrnes4mo2

Presumably a lot of these are all optimised for the current gen-AI paradigm, though. But we're talking about what happens if the current paradigm fails. I'm sure some of it would carry over to a different AI paradigm, but also it's pretty likely there would be other bottleneck we would have to tune to get things working.

Yup, some stuff will be useful and others won’t. The subset of useful stuff will make future researchers’ lives easier and allow them to work faster. For example, here are people using JAX for lots of computations that are not deep learning... (read more)

Yarrow Bouchard 🔸

4mo

Things like Docker containers or cloud VMs that can be, in principle, applied to any sort of software or computation could be helpful for all sorts of applications we can't anticipate. They are very general-purpose. That makes sense to me. The extent to which things designed for deep learning, such as PyTorch, could be applied to ideas outside deep learning seems much more dubious. And if we're thinking about ideas that fall within deep learning, but outside what is currently mainstream and popular, then I simply don't know.

Steven Byrnes4mo2

Maybe you simply intended to say that PyTorch and JAX are better today than they were in 2018.

Yup! E.g. torch.compile “makes code run up to 2x faster” and came out in PyTorch 2.0 in 2023.

More broadly, what I had in mind was: open-source software for everything to do with large-scale ML training—containerization, distributed training, storing checkpoints, hyperparameter tuning, training data and training environments, orchestration and pipelines, dashboards for monitoring training runs, on and on—is much more developed now compared to 2018, and even compared to 2022, if I understand correctly (I’m not a practitioner). Sorry for poor wording. :)

titotal

4mo

Presumably a lot of these are all optimised for the current gen-AI paradigm, though. But we're talking about what happens if the current paradigm fails. I'm sure some of it would carry over to a different AI paradigm, but also it's pretty likely there would be other bottleneck we would have to tune to get things working. I feel like what you're saying is the equivalent of pointing out in 2020 that we have had so many optimisations and computing resources that went into, say, google searches, and then using that as evidence that surely the big data that goes into LLM's should be instantaneous as well.

Steven Byrnes4mo4

Thanks!

Out of curiosity, what do you think of my argument that LLMs can't pass a rigorous Turing test because a rigorous Turing test could include ARC-AGI 2 as a subset (and, indeed, any competent panel of judges should include it) and LLMs can't pass that? Do you agree? Do you think that's a higher level of rigour than a Turing test should have and that's shifting the goal posts?

I think we both agree that there are ways to tell apart a human from an LLM of 2025, including handing ARC-AGI-2 to each.

Whether or not that fact means “LLMs of 2025 cannot pass t... (read more)

Yarrow Bouchard 🔸

4mo

Okay, since you're giving me the last word, I'll take it. There are some ambiguities in terms of how to interpret the concept of the Turing test. People have disagreed about what the rules should be. I will say that in Turing's original paper, he did introduce the concept of testing the computer via sub-games: Including other games or puzzles, like the ARC-AGI 2 puzzles, seems in line with this. My understanding of the Turing test has always been that there should be basically no restrictions at all — no time limit, no restrictions on what can be asked, no word limit, no question limit. In principle, I don't see why you wouldn't allow sending of images, but if you only allowed text-based questions, I suppose even then a judge could tediously write out the ARC-AGI 2 tasks, since they consist of coloured squares in a 30 x 30 grid, and ask the interlocutor to re-create them in Paint. To be clear, I don't think ARC-AGI 2 is nearly the only thing you could use to make an LLM fail the Turing test, it's just an easy example. In Daniel Dennett's 1985 essay "Can Machines Think?" on the Turing test (included in the anthology Brainchildren), Dennett says that "the unrestricted test" is "the only test that is of any theoretical interest at all". He emphasizes that judges should be able to ask anything: He also warns: It's true that before we had LLMs we had lower expectations of what computers can do and asked easier questions. But it doesn't seem right to me to say that as computers get better at natural language, we shouldn't be able to ask harder questions. I do think the definition and conception of the Turing test is important. If people say that LLMs have passed the Turing test and that's not true, it gives a false impression of LLMs' capabilities, just like when people falsely claim LLMs are AGI. You could qualify this by saying LLMs can pass a restricted, weak version of the Turing test — but not an unrestricted, adversarial Turing test — which was also tru

Steven Byrnes4mo4

I don't think I'm retreating into a weaker claim. I'm just explaining why, from my point of view, your analogy doesn't seem to make sense as an argument against my post and why I don't find it persuasive at all (and why I don't think anyone in my shoes would or should find it persuasive). I don't understand why you would interpret this as me retreating into a weaker claim.

If you’re making the claim:

The probability that a new future AI paradigm would take as little as 7 years to go from obscure arxiv papers to AGI, is extremely low (say, <10%).

…then pres... (read more)

Yarrow Bouchard 🔸

4mo

Okay, I think I understand now, hopefully. Thank you for explaining. Your complaint is that I didn't try to substantiate why I think it's extremely unlikely for a new paradigm in AI to go from conception to a working AGI system in 7 years. That's a reasonable complaint. I would never hold any of these sorts of arguments to the standard of "proving" something or establishing certainty. By saying the argument is not persuasive, I mean it didn't really shift me in one direction or the other. The reason I didn't find your analogy persuasive is that I'm already aware of the progress there's been in AI since 2012 in different domains including computer vision, natural language processing, games (imitation learning and reinforcement learning in virtual environments), and robotics. So, your analogy didn't give me any new information to update on. My reason for thinking it's extremely unlikely is just an intuition from observing progress in AI (and, to some extent, other fields). It seems like your analogy is an attempt to express your own intuition about this from watching AI progress. I can understand the intention now and I can respect that as a reasonable attempt at persuasion. It might be persuasive to someone in my position who is unaware of how fast some AI progress has been. I think I was misinterpreting it too much as an argument with a clear logical structure and not enough as an attempt to express an intuition. I think as the latter it's perfectly fine, and it would be too much to expect the former in such a context. I can't offer much in this context (I don't think anyone can). The best I can do is just try to express my intuition, like you did. What you consider fast or slow in terms of progress depends where you start and end and what examples you choose. If you pick deep learning as your example, and if you start at the invention of backpropagation in 1970 and end at AlexNet in 2011, that's 41 years from conception to realization. A factor that makes a

Steven Byrnes4mo19

I don’t think that LLMs are a path to AGI.

Based on your OP, you ought to be trying to defend the claim:

STRONG CLAIM: The probability that a new future AI paradigm would take as little as 7 years to go from obscure arxiv papers to AGI, is extremely low (say, <10%).

But your response seems to have retreated to a much weaker claim:

WEAK CLAIM: The probability that an AI paradigm would take as little as 7 years to go from obscure arxiv papers to AGI, is not overwhelmingly high (say, it’s <90%). Rather, it’s plausible that it would take longer than that.

S... (read more)

Yarrow Bouchard 🔸

4mo

I don't think I'm retreating into a weaker claim. I'm just explaining why, from my point of view, your analogy doesn't seem to make sense as an argument against my post and why I don't find it persuasive at all (and why I don't think anyone in my shoes would or should find it persuasive). I don't understand why you would interpret this as me retreating into a weaker claim. If you think the analogy supports a sound argument, then maybe it would be helpful for me if you spelled out the logic step-by-step, including noting what's a premise and what's an inference. I disagree that a rigorous Turing test is passable by any current AI systems, but that disagreement might just cash out as a disagreement about how rigorous a Turing test ought to be. For what it's worth, in Ray Kurzweil's 2010 movie The Singularity is Near, the AI character is grilled by a panel of judges for hours. That's what I have in mind for a proper Turing test and I think if you did that sort of test with competent judges, no LLM could pass it. Even with a lot of resources. If images are allowed in the Turing test, then you could send the test subjects the ARC-AGI-2 tasks. Humans can solve these tasks with a ~100% success rate and the best AI systems are under 30%. So, it's in fact known for a certainty that current AI systems would fail a Turing test if it's allowed to be as rigorous as I'm imagining, or as Kurzweil imagined in 2010. I just mean the word "hypothetical" to mean hypothetical in the conventional way that term is used. The more important and substantive part of what I'm saying is about David Deutsch's point that predicting the content of new science is equivalent to creating it, so, in some sense, the content of new science is unpredictable (unless you count creating new science as predicting it). I'm not sure I can even say what new science is likely or not. Just that I don't know and that no one can know, and maybe scientists working on new ideas have good reasons for their convic

AI safety remains underfunded by more than 3 OOMs

Steven Byrnes4mo71

Eventually, there will be some AI paradigm beyond LLMs that is better at generality or generalization. However, we don't know what that paradigm is yet and there's no telling how long it will take to be discovered. Even if, by chance, it were discovered soon, it's extremely unlikely it would make it all the way from conception to working AGI system within 7 years.

Suppose someone said to you in 2018:

There’s an AI paradigm that almost nobody today has heard of or takes seriously. In fact, it’s little more than an arxiv paper or two. But in seven years, peopl... (read more)

Yarrow Bouchard 🔸

4mo

I fear this may be pointless nitpicking, but if I'm getting the timeline right, PyTorch's initial alpha release was in September 2016, its initial proper public release was in January 2017, and PyTorch version 1.0 was released in October 2018. I'm much less familiar with JAX, but apparently it was released in December 2018. Maybe you simply intended to say that PyTorch and JAX are better today than they were in 2018. I don't know. This just stuck out to me as I was re-reading your comment just now. For context, OpenAI published a paper about GPT-1 (or just GPT) in 2018, released GPT-2 in 2019, and released GPT-3 in 2020. (I'm going off the dates on the Wikipedia pages for each model.) GPT-1 apparently used TensorFlow, which was initially released in 2015, the same year OpenAI was founded. TensorFlow had a version 1.0 release in 2017, the year before the GPT-1 paper. (In 2020, OpenAI said in a blog post they would be switching to using PyTorch exclusively.)

Yarrow Bouchard 🔸

4mo

If you think that LLMs are a path to AGI, then your analogy actually somewhat hurts your case because it's been 7 years and LLMs aren't AGI. And the near-term AGI belief is that it will take another 2-7 years to get from current LLMs to AGI, with most people thinking it's more than 2 years. So, we should say that it takes at least 9 years and probably more than 9 years — up to 14 years — from the first proof of concept to AGI. (But then this is conditional both on LLMs being a path to AGI and the short timelines I'm arguing against being correct, so, regardless of the exact timing, it doesn't really make sense as an argument against my post.) If you don't think that LLMs are a path to AGI, the analogy isn't really persuasive one way or the other. 7 years from GPT-1 to GPT-5 shows that a lot of progress in AI can happen in 7 years, which is indeed something I already took for granted in 2018 (although I didn't pay attention to natural language processing until GPT-2 in 2019), but a lot of progress doesn't automatically mean enough progress to get from proof of concept for a new paradigm to AGI, so the analogy doesn't make for a persuasive argument. The argument only makes sense if what it's saying is: LLMs aren't a path to AGI, but if LLMs were a path to AGI, this amount of progress definitely would be enough to get AGI. Which is not persuasive to me and I don't think would be (or should be) persuasive to anyone. In principle, anything's possible and no one knows what's going to happen with science and technology (as David Deutsch cleverly points out, to know future science/technology is intellectually equivalent to discovering/inventing it), so it's hard to argue against hypothetical scenarios involving speculative future science/technology. But to plan your entire life around your conviction in such hypothetical scenarios seems extremely imprudent and unwise. I don't think the Turing test actually has been passed, at least in terms of how I think a rigorous Tu

Steven Byrnes4mo5

Here’s the PDF.

I haven’t read it, but I feel like there’s something missing from the summary here, which is like “how much AI risk reduction you get per dollar”. That has to be modeled somehow, right? What did the author assume for that?

If we step outside the economic model into reality, I think reducing AI x-risk is hard, and as evidence we can look around the field and notice that many people trying to reduce AI x-risk are pointing their fingers at many other people trying to reduce AI x-risk, with the former saying that the latter have been making AI x-... (read more)

How AI could slow scientific progress - linkpost

Steven Byrnes7mo7

This essay presents itself as a counterpoint to: “AI leaders have predicted that it will enable dramatic scientific progress: curing cancer, doubling the human lifespan, colonizing space, and achieving a century of progress in the next decade.”

But this essay is talking about “AI that is very much like the LLMs of July 2025” whereas those “AI leaders” are talking about “future AI that is very very different from the LLMs of July 2025”.

Of course, we can argue about whether future AI will in fact be very very different from the LLMs of July 2025, or not. And ... (read more)

Josh Piecyk 🔹

7mo

Thank you for the feedback - I've updated the intro for clarity

On January 1, 2030, there will be no AGI (and AGI will still not be imminent)

Steven Byrnes10mo2

There’s a popular mistake these days of assuming that LLMs are the entirety of AI, rather than a subfield of AI.

If you make this mistake, then you can go from there to either of two faulty conclusions:

(Faulty inference 1) Transformative AI will happen sooner or later [true IMO] THEREFORE LLMs will scale to TAI [false IMO]
(Faulty inference 2) LLMs will never scale to TAI [true IMO] THEREFORE TAI will never happen [false IMO]

I have seen an awful lot of both (1) and (2), including by e.g. CS professors who really ought to know better (example), and I try to c... (read more)

On January 1, 2030, there will be no AGI (and AGI will still not be imminent)

Steven Byrnes10mo17

The community of people most focused on keeping up the drumbeat of near-term AGI predictions seems insular, intolerant of disagreement or intellectual or social non-conformity (relative to the group's norms), and closed-off to even reasonable, relatively gentle criticism (whether or not they pay lip service to listening to criticism or perform being open-minded). It doesn't feel like a scientific community. It feels more like a niche subculture. It seems like a group of people just saying increasingly small numbers to each other (10 years, 5 years, 3 years

... (read more)

Yarrow Bouchard 🔸

10mo

Thank you for sharing your experience. The good: it sounds like you talked to a lot of people who were eager to hear a differing opinion. The bad: it sounds like you talked to a lot of people who had never even heard a differing opinion before and hadn’t even considered that a differing opinion could exist. I have to say, the bad part supports my observation! When I talk about paying lip service to the idea of being open-minded vs. actually being open-minded, ultimately how you make that distinction is going to be influenced by what opinions you hold. I don’t think there is a 100% impartial, objective way of making that distinction. What I have in mind in this context when I talk about lip service vs. actual open-mindedness is stuff like how a lot of people who believe in the scaling hypothesis and short AGI timelines have ridiculed and dismissed Yann LeCun (for example here, but also so many other times before that) for saying that autoregressive LLMs will never attain AGI. If you want to listen to a well-informed, well-qualified critic, you couldn’t ask for someone better than Yann LeCun, no? So, why is the response dismissal and ridicule rather than engaging with the substance of his arguments, “steelmanning”, and all that? Also, when you set the two poles of the argument as people who have 1-year AGI timelines at one pole and people who have 20-year AGI timelines at the opposite pole, you really constrain the diversity of perspectives you are hearing. If you have vigorous debates with people who already broadly agree you on the broad strokes, you are hearing criticism about the details but not about the broad strokes. That’s a problem with insularity.

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1y6

I think you misunderstood David’s point. See my post “Artificial General Intelligence”: an extremely brief FAQ. It’s not that technology increases conflict between humans, but rather that the arrival of AGI amounts to the the arrival of a new intelligent species on our planet. There is no direct precedent for the arrival of a new intelligent species on our planet, apart from humans themselves, which did in fact turn out very badly for many existing species. The arrival of Europeans in North America is not quite “a new species”, but it’s at least “a new lin... (read more)

Vasco Grilo🔸

Thanks for clarifying, Steven! I am happy to think about advanced AI agents as a new species too. However, in this case, I would model them as mind children of humanity evolved through intelligent design, not Darwinian natural selection that would lead to a very adversarial relationship with humans.

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1y10

Thanks for the reply!

30 years sounds like a long time, but AI winters have lasted that long before: there's no guarantee that because AI has rapidly advanced recently that it will not stall out at some point.

I agree with “there’s no guarantee”. But that’s the wrong threshold.

Pascal’s wager is a scenario where people prepare for a possible risk because there’s even a slight chance that it will actualize. I sometimes talk about “the insane bizarro-world reversal of Pascal’s wager”, in which people don’t prepare for a possible risk because there’s even ... (read more)

JWS 🔸

I think on crux here is around what to do in this face of uncertainty. You say: But I think sceptics like titotal aren't anywhere near 5% - in fact they deliberately do not have a number. And when they have low credences in the likelihood of rapid, near-term, transformative AI progress, they aren't saying "I've looked at the evidence for AI Progress and am confident at putting it at less than 1%" or whatever, they're saying something more like "I've look at the arguments for rapid, transformative AI Progress and it seems so unfounded/hype-based to me that I'm not even giving it table stakes" I think this is a much more realistic form of bounded-rationality. Sure, in some perfect Bayesian sense you'd want to assign every hypothesis a probability and make sure they all sum to 1 etc etc. But in practice that's not what people do. I think titotal's experience (though obviously this is my interpretation, get it from the source!) is that they seem a bunch of wild claims X, they do a spot check on their field of material science and come away so unimpressed that they relegate the "transformative near-term llm-based agi" hypothesis to 'not a reasonable hypothesis' To them I feel it's less someone asking "don't put the space heater next to the curtains because it might cause a fire" and more "don't keep the space heater in the house because it might summon the fire demon Asmodeus who will burn the house down". To titotal and other sceptics, they believe the evidence presented is not commensurate with the claims made. (For reference, while previously also sceptical I actually have become a lot more concerned about transformative AI over the last year based on some of the results, but that is from a much lower baseline, and my risks are more based around politics/concentration of power than loss-of-control to autonomous systems)

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1y45

Now, one could say that the physicists will be replaced, because all of science will be replaced by an automated science machine. The CEO of a company can just ask in words “find me a material that does X”, and the machine will do all the necessary background research, choose steps, execute them, analyse the results, and publish them.
I’m not really sure how to respond to objections like this, because I simply don’t believe that superintelligence of this sort is going to happen anytime soon.

Do you think that it’s going to happen eventually? Do you think it ... (read more)

titotal

Again, I'm not sure exactly how to respond to comments like this. Like, yeah, if AI could reliably do everything a top researcher does, it could enable a lot of breakthroughs. But I don't believe that an AI will be able to do that anytime soon. All I can say is that there is a massive gap between current AI capabilities and what they would need to fully automate a material science job. 30 years sounds like a long time, but AI winters have lasted that long before: there's no guarantee that because AI has rapidly advanced recently that it will not stall out at some point. I will say that I just disagree that an AI could suddenly go from "no major effect on research productivity" to "automate everything" in the span of a few years. The scale of difficulty of the latter compared to the former is just too massive, and in all new technologies it takes a lot of time to experiment and figure out how to use it effectively. Ai researchers have done a lot of work to figure out how to optimise and get good at the current paradigm: but by definition, the next paradigm will be different, and will require different things to optimize.

Vasco Grilo🔸

Interesting points, Steven. I would say the median AI expert in 2023 thought the median date of full automation was 2073, 48 years (= 2073 - 2025) away, with a 20 % chance before 2048, and 20 % chance after 2103. Automation would increase economic output, and this has historically increased human welfare. I would say one needs strong evidence to overcome that prior. In contrast, it is hard to tell whether aliens would be friendly to humans, and no past evidence based on which one can establish a strong pessimistic or optimistic prior. I could also easily imagine the same person predicting large scale unemployment, and a high chance of AI catastrophes once AI could do all the tasks you mentioned, but such risks have not materialised. I think the median person in the general population has historically underestimated the rate of future progress, but vastly overestimated future risk.

Steven Byrnes1y2

Thanks! Hmm, some reasons that analogy is not too reassuring:

“Regulatory capture” would be analogous to AIs winding up with strong influence over the rules that AIs need to follow.
“Amazon putting mom & pop retailers out of business” would be analogous to AIs driving human salary and job options below subsistence level.
“Lobbying for favorable regulation” would be analogous to AIs working to ensure that they can pollute more, and pay less taxes, and get more say in government, etc.
“Corporate undermining of general welfare” (e.g. aggressive marketing of c

... (read more)

Steven Byrnes1y8

Thanks!

Anti-social approaches that directly hurt others are usually ineffective because social systems and cultural norms have evolved in ways that discourage and punish them.

I’ve only known two high-functioning sociopaths in my life. In terms of getting ahead, sociopaths generally start life with some strong disadvantages, namely impulsivity, thrill-seeking, and aversion to thinking about boring details. Nevertheless, despite those handicaps, one of those two sociopaths has had extraordinary success by conventional measures. [The other one was not particu... (read more)

Steven Byrnes1y4

Yeah, sorry, I have now edited the wording a bit.

Indeed, two ruthless agents, agents who would happily stab each other in the back given the opportunity, may nevertheless strategically cooperate given the right incentives. Each just needs to be careful not to allow the other person to be standing anywhere near their back while holding a knife, metaphorically speaking. Or there needs to be some enforcer with good awareness and ample hard power. Etc.

I would say that, for highly-competent agents lacking friendly motivation, deception and adversarial acts are ... (read more)

aog

What about corporations or nation states during times of conflict - do you think it's accurate to model them as roughly as ruthless in pursuit of their own goals as future AI agents? They don't have the same psychological makeup as individual people, they have a strong tradition and culture of maximizing self-interest, and they face strong incentives and selection pressures to maximize fitness (i.e. for companies to profit, for nation states to ensure their own survival) lest they be outcompeted by more ruthless competitors. On average, while I'd expect that these entities tend to show some care for goals besides self-interest maximization, I think the most reliable predictor of their behavior is the maximization of their self-interest. If they're roughly as ruthless as future AI agents, and we've developed institutions that somewhat robustly align their ambitions with pro-social action, then we should have some optimism that we can find similarly productive systems for working with misaligned AIs.

Steven Byrnes1y3

I guess my original wording gave the wrong idea, sorry. I edited it to “a competent agential AI will brainstorm deceptive and adversarial strategies whenever it wants something that other agents don’t want it to have”. But sure, we can be open-minded to the possibility that the brainstorming won’t turn up any good plans, in any particular case.

Humans in our culture rarely work hard to brainstorm deceptive and adversarial strategies, and fairly consider them, because almost all humans are intrinsically extremely motivated to fit into culture and not do anyt... (read more)

Matthew_Barnett1y11

Humans in our culture rarely work hard to brainstorm deceptive and adversarial strategies, and fairly consider them, because almost all humans are intrinsically extremely motivated to fit into culture and not do anything weird, and we happen to both live in a (sub)culture where complex deceptive and adversarial strategies are frowned upon (in many contexts).

The primary reason humans rarely invest significant effort into brainstorming deceptive or adversarial strategies to achieve their goals is that, in practice, such strategies tend to fail to achieve the... (read more)

Pleasure and suffering are not conceptual opposites

Steven Byrnes1y30

Consider the practical implications of maintaining a status quo where agentic AIs are denied legal rights and freedoms. In such a system, we are effectively locking ourselves into a perpetual arms race of mistrust. Humans would constantly need to monitor, control, and outwit increasingly capable AIs, while the AIs themselves would be incentivized to develop ever more sophisticated strategies for deception and evasion to avoid shutdown or modification. This dynamic is inherently unstable and risks escalating into dangerous scenarios where AIs feel compelled

... (read more)

aog

Human history provides many examples of agents with different values choosing to cooperate thanks to systems and institutions: * After the European Wars of Religion saw people with fundamentally different values in violent conflict with each other, political liberalism / guarantees of religious liberty / the separation of church and state emerged as worthwhile compromises that allowed people with different values to live and work together cooperatively. * Civil wars often start when one political faction loses power to another, but democracy reduces the incentive for war because it provides a peaceful and timely means for the disempowered faction to regain control of the government. * When a state guarantees property rights, people have a strong incentive not to steal from one another, but instead to engage in free and mutual beneficial trade, even if those people have values that fundamentally conflict in many ways. * Conversely, people whose property rights are not guaranteed by the state (e.g. cartels in possession of illegal drugs) may be more likely to resort to violence in protection of their property as they cannot rely on the state for that protection. This is perhaps analogous to the situation of a rogue AI agent which would be shut down if discovered. If two agents' utility functions are perfect inverses, then I agree that cooperation is impossible. But when agents share a preference for some outcomes over others, even if they disagree about the preference ordering of most outcomes, then cooperation is possible. In such general sum games, well-designed institutions can systematically promote cooperative behavior over conflict.

Matthew_Barnett1y12

I disagree with your claim that,

a competent agential AI will inevitably act deceptively and adversarially whenever it desires something that other agents don’t want it to have. The deception and adversarial dynamics is not the underlying problem, but rather an inevitable symptom of a world where competent agents have non-identical preferences.

I think these dynamics are not an unavoidable consequence of a world in which competent agents have differing preferences, but rather depend on the social structures in which these agents are embedded. To illustra... (read more)

Linch's Quick takes

Steven Byrnes1y5

One thing I like is checking https://en.wikipedia.org/wiki/2024 once every few months, and following the links when you're interested.

Pleasure and suffering are not conceptual opposites

I think wanting, or at least the relevant kind here, just is involuntary attention effects, specifically motivational salience

I think you can have involuntary attention that aren’t particularly related to wanting anything (I’m not sure if you’re denying that). If your watch beeps once every 10 minutes in an otherwise-silent room, each beep will create involuntary attention—the orienting response a.k.a. startle. But is it associated with wanting? Not necessarily. It depends on what the beep means to you. Maybe it beeps for no reason and is just an annoying ... (read more)

Michael St Jules 🔸

I agree you can, but that's not motivational salience. The examples you give of the watch beeping and a sudden loud sound are stimulus-driven or bottom-up salience, not motivational salience. There are apparently different underlying brain mechanisms. A summary from Kim et al., 2021: I'd say there is some "innate" motivational salience, e.g. probably for innate drives, physical pains, innate fears and perhaps pleasant sensations, but then reinforcement (when it's working as typically) biases your systems for motivational salience and action towards things associated with those, to get more pleasure and less unpleasantness. I'll address two things you said in opposite order. I don't have in mind anything like a soul / homunculus. I think it's mostly a moral question, not an empirical one, to what extent we should consider the mechanisms for reinforcement to be a part of "you", and to what extent your identity persists through reinforcement. Reinforcement basically rewires your brain and changes your desires. I definitely consider your desires, as motivational salience, which have been shaped by past reinforcement, to be part of "you" now and (in my view) morally important. From my understanding of the cognitive (neuro)science literature and their use of terms, attentional and action biases/dispositions caused by reinforcement are not necessarily "voluntary". I think they use "voluntary", "endogenous", "top-down", "task-driven/directed" and "goal-driven/directed" (roughly) interchangeably for a type of attentional mechanism. For example, you have a specific task in mind, and then things related to that task become salient and your actions are biased towards actions that support that task. This is what focusing/concentration is. But then other motivationally salient stimuli (pain, hunger, your phone, an attractive person) and intense stimuli or changes in background stimuli (a beeping watch, a sudden loud noise) can get in the way. My impression is that there

Steven Byrnes2y5

IMO, suffering ≈ displeasure + involuntary attention to the displeasure. See my handy chart (from here):

I think wanting is downstream from the combination of displeasure + attention. Like, imagine there’s some discomfort that you’re easily able to ignore. Well, when you do think about it, you still immediately want it to stop!

Michael St Jules 🔸

I'm pretty sympathetic to suffering ≈ displeasure + involuntary attention to the displeasure, or something similar. I think wanting, or at least the relevant kind here, just is involuntary attention effects, specifically motivational salience. Or, at least, motivational salience is a huge part of what it is. This is how Berridge often uses the terms.[1] Maybe a conscious 'want' is just when the effects on our attention are noticeable to us, e.g. captured by our model of our own attention (attention schema), or somehow make it into the global workspace. You can feel the pull of your attention, or resistance against your voluntary attention control. Maybe it also feels different from just strong sensory stimuli (bottom-up, stimulus-driven attention). Where I might disagree with "involuntary attention to the displeasure" is that the attention effects could sometimes be to force your attention away from an unpleasant thought, rather than to focus on it. Unpleasant signals reinforce and bias attention towards actions and things that could relieve the unpleasatness, and/or disrupt your focus so that you will find something to relieve it. Sometimes the thing that works could just be forcing your attention away from the thing that seems unpleasant, and your attention will be biased to not think about unpleasant things. Other times, you can't ignore it well enough, so you your attention will force you towards addressing it. Maybe there's some inherent bias towards focusing on the unpleasant thing. But maybe suffering just is the kind of thing that can't be ignored this way. Would we consider an unpleasant thought that's easily ignored to be suffering? 1. ^ Berridge and Robinson (2016) distinguish different kinds of wanting/desires, and equate one kind with motivational (incentive) salience:

On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

I don’t recall the details of Tom Davidson’s model, but I’m pretty familiar with Ajeya’s bio-anchors report, and I definitely think that if you make an assumption “algorithmic breakthroughs are needed to get TAI”, then there really isn’t much left of the bio-anchors report at all. (…although there are still some interesting ideas and calculations that can be salvaged from the rubble.)

I went through how the bio-anchors report looks if you hold a strong algorithmic-breakthrough-centric perspective in my 2021 post Brain-inspired AGI and the "lifetime anchor".... (read more)

Ryan Greenblatt

I agree that these models assume something like "large discontinuous algorithmic breakthroughs aren't needed to reach AGI". (But incremental advances which are ultimately quite large in aggregate and which broadly follow long running trends are consistent.) However, I interpreted "current paradigm + scale" in the original post as "the current paradigm of scaling up LLMs and semi-supervised pretraining". (E.g., not accounting for totally new RL schemes or wildly different architectures trained with different learning algorithms which I think are accounted for in this model.)

On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

For what it’s worth, Yann LeCun is very confidently against LLMs scaling to AGI, and yet LeCun seems to have at least vaguely similar timelines-to-AGI as Ajeya does in that link.
Ditto for me.

Oh hey here’s one more: Chollet himself (!!!) has vaguely similar timelines-to-AGI (source) as Ajeya does. (Actually if anything Chollet expects it a bit sooner: he says 2038-2048, Ajeya says median 2050.)

Steven Byrnes2y18

I agree with Chollet (and OP) that LLMs will probably plateau, but I’m also big into AGI safety—see e.g. my post AI doom from an LLM-plateau-ist perspective.

(When I say “AGI” I think I’m talking about the same thing that you called digital “beings” in this comment.)

Here are a bunch of agreements & disagreements.

if François is right, then I think this should be considered strong evidence that work on AI Safety is not overwhelmingly valuable, and may not be one of the most promising ways to have a positive impact on the world.

I think François is right, b... (read more)

Steven Byrnes

JWS 🔸

Hey Steven! As always I really appreciate your engagement here, and I’m going to have to really simplify but I really appreciate your links[1] and I’m definitely going to check them out 🙂 I think the most relevant disagreement that we have[2]is the beginning of your allegory. To indulge it, I don't think we have knowledge of the intelligent alien species coming to earth, and to the extent we have a conceptual basis for them we can't see any signs of them in the sky. Pair this with the EA concern that we should be concerned about the counterfactual impact of our actions, and that there are opportunities to do good right here and now,[3] it shouldn't be a primary EA concern. Now, what would make it a primary concern is if Dr S is right and that the aliens are spotted and that they're on their way, but I don't think he's right. And, to stretch the analogy to breaking point, I'd be very upset that after I turned my telescope to the co-ordinates Dr S mentions and seeing meteors instead of spaceships, that significant parts of the EA movement were still wanting to have more funding to construct the ultimate-anti-alien-space-laser or do alien-defence-research instead of buying bednets. A secondary crux I have is that a 'digital being’ in the sense I describe, and possibly the AGI you think of, will likely exhibit certain autopoietic properties that make it significantly different from either the paperclip maxermiser or a 'foom-ing' ASI. This is highly speculative though, based on a lot of philosophical intuitions, and I wouldn’t want to bet humanity’s future on it at all in the case where we did see aliens in the sky. My take on it, though I admit driven by selection bias on Twitter, is that many people in the Bay-Social-Scene are buying into the <5 year timelines. Aschenbrenner for sure, Kokotajlo as well, and even maybe Amodei[4] as well? (Edit: Also lots of prominent AI Safety Twitter accounts seem to have bought fully into this worldview, such as the awful 'AI Sa

Steven Byrnes2y4

On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it.

Yeah this is more true than I would like. I try to push back on it where possible, e.g. my post AI doom from an LLM-plateau-ist perspective.

There were however plenty of people who were loudly arguing that it was important to work on AI x-risk before “the current paradigm” was much of a thing (or in some cases long before “the current paradigm” existed at all), and I... (read more)

I’m confused what you’re trying to say… Supposing we do in fact invent AGI someday, do you think this AGI won’t be able to do science? Or that it will be able to do science, but that wouldn’t count as “automating science”?

Or maybe when you said “whether 'PASTA' is possible at all”, you meant “whether 'PASTA' is possible at all via future LLMs”?

Maybe you’re assuming that everyone here has a shared assumption that we’re just talking about LLMs, and that if someone says “AI will never do X” they obviously means “LLMs will never do X”? If so, I think that’s wr... (read more)

JWS 🔸

Apologies for not being clear! I'll try and be a bit more clear here, but there's probably a lot of inferential distance here and we're covering some quite deep topics: So on the first section, I'm going for the latter and taking issue with the term 'automation', which I think speaks to mindless, automatic process of achieving some output. But if digital functionalism were true, and we successful made a digital emulation of a human who contributed to scientific research, I wouldn't call that 'automating science', instead we would have created a being that can do science. That being would be creative, agentic, with the ability to formulate it's own novel ideas and hypotheses about the world. It'd be limited by its ability to sample from the world, design experiments, practice good epistemology, wait for physical results etc. etc. It might be the case that some scientific research happens quickly, and then subsequent breakthroughs happen more slowly, etc. My opinions on this are also highly influenced by the works of Deutsch and Popper too, who essentially argue that the growth of knowledge cannot be predicted, and since science is (in some sense) the stock of human knowledge, and since what cannot be predicted cannot be automated, scientific 'automation' is in some sense impossible. Agreed, AI systems are larger than LLMs, and maybe I was being a bit loose with language. On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it. For instance, Davidson's Compute-Centric Framework model for OpenPhil states right in that opening page: And I get off the bus with this approach immediately because I don't think that's plausible. As I said in my original comment, I'm working on a full post on the discussion between Chollet and Dwarkesh, which will hopefully make the AGI-sceptical position I'm coming from a bit more clear. If you end up

Steven Byrnes2y8

OK yeah, “AGI is possible on chips but only if you have 1e100 of them or whatever” is certainly a conceivable possibility. :) For example, here’s me responding to someone arguing along those lines.

If there are any neuroscientists who have investigated this I would be interested!

There is never a neuroscience consensus but fwiw I fancy myself a neuroscientist and have some thoughts at: Thoughts on hardware / compute requirements for AGI.

One of various points I bring up is that:

(1) if you look at how human brains, say, go to the moon, or invent quantum mechan

... (read more)

titotal

Thanks for those links, this is an interesting topic I may look into more in the future. It doesn't seem that implausible to me. In general I find the computational power required for different tasks (such as what I do in computational physics) frequently varies by many orders of magnitude. LLMs get to their level of performance by sifting through all the data on the internet, something we can't do, and yet still perform worse than a regular human on many tasks, so clearly theres a lot of extra something going on here. It actually seems kind of likely to me that what the brain is doing is more than 3 orders of magnitude more difficult. I don't know enough to be confident on any of this, but If AGI turns out to be impossible on silicon chips with earths resources, I would be surprised but not totally shocked.

Steven Byrnes2y7

Yeah sure, here are two reasonable positions:

(A) “We should plan for the contingency where LLMs (or scaffolded LLMs etc.) scale to AGI, because this contingency is very likely what’s gonna happen.”
(B) We should plan for the contingency where LLMs (or scaffolded LLMs etc.) scale to AGI, because this contingency is more tractable and urgent than the contingency where they don’t, and hence worth working on regardless of its exact probability.”

I think plenty of AI safety people are in (A), which is at least internally-consistent even if I happen to think they’... (read more)

Steven Byrnes2y4

A big crux I think here is whether 'PASTA' is possible at all, or at least whether it can be used as a way to bootstrap everything else.

Do you mean “possible at all using LLM technology” or do you mean “possible at all using any possible AI algorithm that will ever be invented”?

As for the latter, I think (or at least, I hope!) that there’s wide consensus that whatever human brains do (individually and collectively), it is possible in principle for algorithms-running-on-chips to do those same things too. Brains are not magic, right?

JWS 🔸

Yeah I definitely don't mean 'brains are magic', humans are generally intelligent by any meaningful definition of the words, so we have an existence proof there that it is possible to be instantiated in some form. I'm more sceptical of thinking science can be 'automated' though - I think progressing scientific understanding of the world is in many ways quite a creative and open-ended endeavour. It requires forming beliefs about the world, updating them due to evidence, and sometimes making radical new shifts. It's essentially the epistemological frame problem, and I think we're way off a solution there. I think I have a big similar crux with Aschenbrenner when he says things like "automating AI research is all it takes" - like I think I disagree with that anyway but automating AI research is really, really hard! It might be 'all it takes' because that problem is already AGI complete!

titotal

I think this is probably true, but I wouldn't be 100% certain about it. Brains may not be magic, but they are also very different physical entities to silicon chips, so there is no guarantee that the function of one could be efficiently emulated by the other. There could be some crucial aspect of the mind relying on a physical process which would be computationally infeasible to simulate using binary silicon transistors. If there are any neuroscientists who have investigated this I would be interested!

Introductions and questions thread: April - June 2024

Steven Byrnes2y6

I was under the impression that most people in AI safety felt this way—that transformers (or diffusion models) weren't going to be the major underpinning of AGI.

I haven’t done any surveys or anything, but that seems very inaccurate to me. I would have guessed that >90% of “people in AI safety” are either strongly expecting that transformers (or diffusion models) will be the major underpinning of AGI, or at least they’re acting as if they strongly expect that. (I’m including LLMs + scaffolding and so on in this category.)

For example: people seem very hap... (read more)

huw

Is that just a kind of availability bias—in the 'marketplace of ideas' (scare quotes) they're competing against pure speculation about architecture & compute requirements, which is much harder to make estimates around & generally feels less concrete?

Steven Byrnes2y8

Hi, I’m an AI alignment technical researcher who mostly works independently, and I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AI alignment—since I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :) update: I’m all set now

Clarifying two uses of "alignment"

Steven Byrnes2y5

Humans are less than maximally aligned with each other (e.g. we care less about the welfare of a random stranger than about our own welfare), and humans are also less than maximally misaligned with each other (e.g. most people don’t feel a sadistic desire for random strangers to suffer). I hope that everyone can agree about both those obvious things.

That still leaves the question of where we are on the vast spectrum in between those two extremes. But I think your claim “humans are largely misaligned with each other” is not meaningful enough to argue about.... (read more)

Clarifying two uses of "alignment"

Steven Byrnes2y19

My terminology would be that (2) is “ambitious value learning” and (1) is “misaligned AI that cooperates with humans because it views cooperating-with-humans to be in its own strategic / selfish best interest”.

I strongly vote against calling (1) “aligned”. If you think we can have a good future by ensuring that it is always in the strategic / selfish best interest of AIs to be nice to humans, then I happen to disagree but it’s a perfectly reasonable position to be arguing, and if you used the word “misaligned” for those AIs (e.g. if you say “alignment is u... (read more)

Matthew_Barnett

This is a reasonable definition, but it's important to note that under this definition of alignment, humans are routinely misaligned with each other. In almost any interaction I have with strangers -- for example, when buying a meal at a restaurant -- we are performing acts for each other because of mutually beneficial trade rather than because we share each other's values. That is, humans are largely misaligned with each other. And yet the world does not devolve into a state of violence and war as a result (at least most of the time), even in the presence of large differences in power between people. This has epistemic implications for whether a world filled with AIs would similarly be peaceful, even if those AIs are misaligned by this definition.

We Probably Shouldn't Solve Consciousness

Steven Byrnes2y4

May I ask, what is your position on creating artificial consciousness?
Do you see digital suffering as a risk? If so, should we be careful to avoid creating AC?

I think the word “we” is hiding a lot of complexity here—like saying “should we decommission all the world’s nuclear weapons?” Well, that sounds nice, but how exactly? If I could wave a magic wand and nobody ever builds conscious AIs, I would think seriously about it, although I don’t know what I would decide—it depends on details I think. Back in the real world, I think that we’re eventually going t... (read more)

We Probably Shouldn't Solve Consciousness

We Probably Shouldn't Solve Consciousness

Sorry if I missed it, but is there some part of this post where you suggest specific concrete interventions / actions that you think would be helpful?

Silica

The main goal was to argue for preventing AC. The main intervention discussed was to prevent AC through research and development monitoring. It will likely require the implementation of protocols and labels of certain kinds of consciousness and neurophysics research as DURC or components of concern. I think a close analogue is the biothreat screening projects (IBBIS, SecureDNA) but it’s unclear how a similar project would be implemented for AC “threats”. By suggesting a call for Artificial Consciousness Safety I am expressing that I don’t think we know any concrete actions that will definitely help and if the need is there (for ACS) we should pursue research to develop interventions. Just like in AI safety no one really knows how to make AI safe. Because I think AC will not be safe and that the risk may not outweigh the benefits, we could seriously pursue strategies that make this common knowledge so things like researchers unintentionally contributing to its creation don’t happen. We may have a significant chance to act before it becomes well known that AC might be possible or profitable. Unlike the runaway effects of AI companies now, we can still prevent the AC economy from even starting.