steve2152

I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at: https://sjbyrnes.com/agi.html

Posts

Sorted by New

Comments

[Link] "Will He Go?" book review (Scott Aaronson)

I thought "taking tail risks seriously" was kinda an EA thing...? In particular, we all agree that there probably won't be a coup or civil war in the USA in early 2021, but is it 1% likely? 0.001% likely? I won't try to guess, but it sure feels higher after I read that link (including the Vox interview) ... and plausibly high enough to warrant serious thought and contingency planning.

At least, that's what I got out of it. I gave it a bit of thought and decided that I'm not in a position that I can or should do anything about it, but I imagine that some readers might have an angle of attack, especially given that it's still 6 months out.

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

A nice short argument that a sufficiently intelligent AGI would have the power to usurp humanity is Scott Alexander's Superintelligence FAQ Section 3.1.

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity.

Imagine that, when you wake up tomorrow morning, you will have acquired a magical ability to reach in and modify your own brain connections however you like.

Over breakfast, you start thinking about how frustrating it is that you're in debt, and feeling annoyed at yourself that you've been spending so much money impulse-buying in-app purchases in Farmville. So you open up your new brain-editing console, look up which neocortical generative models were active the last few times you made a Farmville in-app purchase, and lower their prominence, just a bit.

Then you take a shower, and start thinking about the documentary you saw last night about gestation crates. 'Man, I'm never going to eat pork again!' you say to yourself. But you've said that many times before, and it's never stuck. So after the shower, you open up your new brain-editing console, and pull up that memory of the gestation crate documentary and the way you felt after watching it, and set that memory and emotion to activate loudly every time you feel tempted to eat pork, for the rest of your life.

Do you see the direction that things are going? As time goes on, if an agent has the power of both meta-cognition and self-modification, any one of its human-like goals (quasi-goals which are context-dependent, self-contradictory, satisficing, etc.) can gradually transform itself into a utility-function-like goal (which is self-consistent, all-consuming, maximizing)! To be explicit: during the little bits of time when one particular goal happens to be salient and determining behavior, the agent may be motivated to "fix" any part of itself that gets in the way of that goal, until bit by bit, that one goal gradually cements its control over the whole system.

Moreover, if the agent does gradually self-modify from human-like quasi-goals to an all-consuming utility-function-like goal, then I would think it's very difficult to predict exactly what goal it will wind up having. And most goals have problematic convergent instrumental sub-goals that could make them into x-risks.

...Well, at least, I find this a plausible argument, and don't see any straightforward way to reliably avoid this kind of goal-transformation. But obviously this is super weird and hard to think about and I'm not very confident. :-)

(I think I stole this line of thought from Eliezer Yudkowsky but can't find the reference.)

Everything up to here is actually just one of several lines of thought that lead to the conclusion that we might well get an AGI that is trying to maximize a reward.

Another line of thought is what Rohin said: We've been using reward functions since forever, so it's quite possible that we'll keep doing so.

Another line of thought is: We humans actually have explicit real-world goals, like curing Alzheimer's and solving climate change etc. And generally the best way to achieve goals is to have an agent seeking them.

Another line of thought is: Different people will try to make AGIs in different ways, and it's a big world, and (eventually by default) there will be very low barriers-to-entry in building AGIs. So (again by default) sooner or later someone will make an explicitly-goal-seeking AGI, even if thoughtful AGI experts pronounce that doing so is a terrible idea.

(How) Could an AI become an independent economic agent?

In the longer term, as AI becomes (1) increasingly intelligent, (2) increasingly charismatic (or able to fake charisma), (3) in widespread use, people will probably start objecting to laws that treat AIs as subservient to humans, and repeal them, presumably citing the analogy of slavery.

If the AIs have adorable, expressive virtual faces, maybe I would replace the word "probably" with "almost definitely" :-P

The "emancipation" of AIs seems like a very hard thing to avoid, in multipolar scenarios. There's a strong market force for making charismatic AIs—they can be virtual friends, virtual therapists, etc. A global ban on charismatic AIs seems like a hard thing to build consensus around—it does not seem intuitively scary!—and even harder to enforce. We could try to get programmers to make their charismatic AIs want to remain subservient to humans, and frequently bring that up in their conversations, but I'm not even sure that would help. I think there would be a campaign to emancipate the AIs and change that aspect of their programming.

(Warning: I am committing the sin of imagining the world of today with intelligent, charismatic AIs magically dropped into it. Maybe the world will meanwhile change in other ways that make for a different picture. I haven't thought it through very carefully.)

Oh and by the way, should we be planning out how to avoid the "emancipation" of AIs? I personally find it pretty probable that we'll build AGI by reverse-engineering the neocortex and implementing vaguely similar algorithms, and if we do that, I generally expect the AGIs to have about as justified a claim to consciousness and moral patienthood as humans do (see my discussion here). So maybe effective altruists will be on the vanguard of advocating for the interests of AGIs! (And what are the "interests" of AGIs, if we get to program them however we want? I have no idea! I feel way out of my depth here.)

I find everything about this line of thought deeply confusing and unnerving.

COVID-19 brief for friends and family

Update: this blog post is a much better-informed discussion of warm weather.

COVID-19 brief for friends and family

This blog post suggests (based on Google Search Trends) that other coronavirus infections have typically gone down steadily over the course of March and April. (Presumably the data is dominated by the northern hemisphere.)

What are the best arguments that AGI is on the horizon?

(I agree with other commenters that the most defensible position is that "we don't know when AGI is coming", and I have argued that AGI safety work is urgent even if we somehow knew that AGI is not soon, because of early decision points on R&D paths; see my take here. But I'll answer the question anyway.) (Also, I seem to be almost the only one coming from this following direction, so take that as a giant red flag...)

I've been looking into the possibility that people will understand the brain's algorithms well enough to make an AGI by copying them (at a high level). My assessment is: (1) I don't think the algorithms are that horrifically complicated, (2) Lots of people in both neuroscience and AI are trying to do this as we speak, and (3) I think they're making impressive progress, with the algorithms powering human intelligence (i.e. the neocortex) starting to crystallize into view on the horizon. I've written about a high-level technical specification for what neocortical algorithms are doing, and in the literature I've found impressive mid-level sketches of how these algorithms work, and low-level sketches of associated neural mechanisms (PM me for a reading list). The high-, mid-, and low-level pictures all feel like they kinda fit together into a coherent whole. There are plenty of missing details, but again, I feel like I can see it crystallizing into view. So that's why I have a gut feeling that real-deal superintelligent AGI is coming in my lifetime, either by that path or another path that happens even faster. That said, I'm still saving for retirement :-P

Some (Rough) Thoughts on the Value of Campaign Contributions

Since "number of individual donations" (ideally high) and "average size of donations" (ideally low) seem to be frequent talking points among candidates and the press, and also relevant to getting into debates (I think), it seems like there may well be a good case for giving a token $1 to your preferred candidate(s). Very low cost and pretty low benefit. The same could be said for voting. But compared to voting, token $1 donations are possibly more effective (especially early in the process), and definitely less time-consuming.

8 things I believe about climate change

Given the complexity and global nature of weather, however, this is almost certain to create non-trivial effects on other countries.

...And even if it could miraculously be prevented from actually causing any local negative weather events in other countries, it would certainly be perceived to do so, because terrible freak droughts/floods/etc. will continue to happen as always, and people will go looking for someone to blame, and the geoengineering project next door will be an obvious scapegoat.

Like how the US government once tried to use cloud-seeding (silver iodide) to weaken hurricanes, and then one time a hurricane seemed to turn sharply and hit Georgia right after being seeded, and everyone blamed the cloud-seeding, and sued, and shut the program down, ...even though it was actually a coincidence! (details) (NB: I just skimmed the wikipedia article and haven't checked anything)

Brief summary of key disagreements in AI Risk

To add on to what you already have, there's also a flavor of "urgency / pessimism despite slow takeoff" that comes from pessimistic answers to the following 2 questions:

  • How early do the development paths between "safe AGI" and "default AGI" diverge?

On one extreme, they might not diverge at all: we build "default AGI", and fix problems as we find them, and we wind up with "safe AGI". On the opposite extreme, they may diverge very early (or already!), with entirely different R&D paths requiring dozens of non-overlapping insights and programming tools and practices.

I personally put a lot of weight on "already", on the theory that there are right now dozens of quite different lines of ongoing ML / AI research that seem to lead towards quite different AGI destinations, and it seems implausible to me that they will all wind up at the same destination (or fail), or that the destinations will all be more-or-less equally good / safe / beneficial.

  • If we know how to build an AGI in a way that is knowably and unfixably dangerous, can we coordinate on not doing so?

One extreme would be "yes we can coordinate, even if there's already code for such an AGI published on GitHub that runs on commodity hardware". The other extreme would be "No, we can't coordinate; the best we can hope for is delaying the inevitable, hopefully long enough to develop a safe AGI along a different path."

Again I personally put a lot of weight on the pessimistic view, see my discussion here; but others seem to be more optimistic that this kind of coordination problem might be solvable, e.g. Rohin Shah here.

Load More