Seth Herd

16 karmaJoined


I think the scaling hypothesis is false, and we'll get to AGI quite soon anyway, by other routes.  The better scaling works, the faster we'll get there, but that's gravy. We have all of the components of a human-like mind today, putting them together is one route to AGI.

I think a major issue is that the people who would be best at predicting AGI usually don't want to share their rationale.

Gears-level models of the phenomenon in question are highly useful in making accurate predictions. Those with the best models are either worriers who don't want to advance timelines, or enthusiasts who want to build it first. Neither has an incentive to convince the world it's coming soon by sharing exactly how that might happen.

The exceptions are people who have really thought about how to get from AI to AGI, but are not in the leading orgs and are either uninterested in racing or want to attract funding and attention for their approach. Yann LeCun comes to mind.

Imagine trying to predict the advent of heavier-than-air flight without studying either birds or mechanical engineering. You'd get predictions like the ones we saw historically - so wild as to be worthless, except those from the people actually trying to achieve that goal.

(copied from LW comment since the discussion is happening over here)

I personally think LLMs will plateau around human level, but that they will be made agentic and self-teaching, and therefore and self-aware (in sum, "sapient") and truly dangerous by scaffolding them into language model agents or language model cognitive architectures. See Capabilities and alignment of LLM cognitive for my logic in expecting that.

That would be a good outcome. We'd have agents with their own goals, capable enough to do useful and dangerous things, but probably not quite capable enough to self-exfiltrate, and probably initially under the control of relatively sane people. That would scare the pants off of the world, and we'd see some real efforts to align the things. Which is uniquely do-able, since they'd take top-level goals in natural language, and be readily interpretable by default (with real concerns still there aplenty, including waluigi effects and their utterances not reliably reflecting their real underlying cognition).

I think the general consensus, which I share, is that neither mind uploading nor good BCI to allow brain extensions are likely to happen before AGI. I wish I had citations ready to hand.

I haven't heard as much discussion of the biological superbrains approach. I think it's probably feasible to increase intelligence through genetic engineering, but that's probably also too long to help with alignment before AGI happens, if you took the route of altering embryos. Altering adults would be tougher and more limited. And it would hit the same legal problems.

I think that neuromorphic AGI is a possibility, which is why some of my alignment work addresses it. I think the best and most prominent work on that topic is Steve Byrnes' Intro to Brain-Like-AGI Safety.

I think that's quite a pessimistic take. I take Altman seriously on caring about x-risk, although I'm not sure he takes it quite seriously enough. This is based on public comments to that effect around 2013, before he started running OpenAI. And Sutskever definitely seems properly concerned.

I agree that those teams aren't completely trustworthy, and in an ideal world, we should be making this decision by including everyone on earth. But with a partial pause, do you expect to have better or worse teams in the lead for achieving AGI? That was my point.

I'm not sure which is the better place to have this discussion, so I'm trying both. Copied from my comment on Less Wrong

That all makes sense. To expand a little more on some of the logic:

It seems like the outcome of a partial pause rests in part on whether that would tend to put people in the lead of the AGI race who are more or less safety-concerned.

I think it's nontrivial that we currently have three teams in the lead who all appear to honestly take the risks very seriously, and changing that might be a very bad idea.

On the other hand, the argument for alignment risks is quite strong, and we might expect more people to take the risks more seriously as those arguments diffuse. This might not happen if polarization becomes a large factor in beliefs on AGI risk. The evidence for climate change was also pretty strong, but we saw half of America believe in it less, not more, as evidence mounted. The lines of polarization would be different in this case, but I'm afraid it could happen. I outlined that case a little in AI scares and changing public beliefs

In that case, I think a partial pause would have a negative expected value, as the current lead decayed, and more people who believe in risks less get into the lead by circumventing the pause.

This makes me highly unsure if a pause would be net-positive. Having alignment solutions won't help if they're not implemented because the taxes are too high.

The creation of compute overhang is another reason to worry about a pause. It's highly uncertain how far we are from making adequate compute for AGI affordable to individuals. Algorithms and compute will keep getting better during a pause. So will theory of AGI, along with theory of alignment.

This puts me, and I think the alignment community at large, in a very uncomfortable position of not knowing whether a realistic pause would be helpful.

It does seem clear that creating mechanisms and political will for a pause are a good idea.

Advocating for more safety work also seems clear cut.

To this end, I think it's true that you create more political capitol by successfully pushing for policy.

A pause now would create even more capitol, but it's also less likely to be a win, and it could wind up creating polarization and so costing rather than creating capitol. It's harder to argue for a pause now when even most alignment folks think we're years from AGI.

So perhaps the low-hanging fruit is pushing for voluntary RSPs, and government funding for safety work. These are clear improvements, and likely to be wins that create capitol for a pause as we get closer to AGI.

There's a lot of uncertainty here, and that's uncomfortable. More discussion like this should help resolve that uncertainty, and thereby help clarify and unify the collective will of the safety community.

I agree with you that humans have mismatched goals among ourselves, so some amount of goal mismatch is just a fact we have to deal with. I think the ideal is that we get an AGI that makes its goal the overlap in human goals; see [Empowerment is (almost) All We Need]( and others on preference maximization. 

I also agree with your intuition that having a non-maximizer improves the odds of an AGI not seeking power or doing other dangerous things. But I think we need to go far beyond the intuition; we don't want to play odds with the future of humanity. To that end, I have more thoughts on where this will and won't happen.

I'm saying "the problem" with optimization is actually mismatched goals, not optimization/maximization. In more depth, and hopefully more usefully: I think unbounded goals are the problem with optimization (not the only problem, but a very big one). 

If an AGI had a bounded goal like "make on billion paperclips", it wouldn't be nearly as dangerous; it might decide to eliminate humanity to make the odds of getting to a billion as good as possible (I can't remember where I saw this important point; I think maybe Nate Soares made it). But it might decide that its best odds would just be making some improvements to the paperclip business, in which case it wouldn't cause problems.

Mismatched goals is the problem. The logic of instrumental convergence applies to any goal, not just maximization goals.

This is a start, but just a start. Optimization/maximization isn't actually the problem. Any highly competent agent with goals that don't match ours is the problem.

A world that's 10% paperclips and the rest composed of other stuff we don't care about is no better than a true optimizer.

The idea "just don't optimize" has a surprising amount of support in AGI safety, including quantilizers and satisficing. But they seem like only a bare start on taking the points off of the tiger's teeth to me. The tiger will still gnaw you to death if it wants to even a little.

Load more