Steven Byrnes

Research Fellow @ Astera
Working (6-15 years of experience)


Hi I'm Steve Byrnes, an AGI safety researcher in Boston, MA, USA, with a particular focus on brain algorithms—see


Topic Contributions

The implications for "brand value" would depend on whether people learn about "EA" as the perpetrator vs. victim. For example, I think there were charitable foundations that got screwed over by Bernie Madoff, and I imagine that their wiki articles would have also had a spike in views when that went down, but not in a bad way.

See also Nate Soares arguing against Joe’s conjunctive breakdown of risk here, and me here.

I have some discussion of this area in general and one of David Jilk’s papers in particular at my post Two paths forward: “Controlled AGI” and “Social-instinct AGI”.

In short, it seems to me that if you buy into this post, then the next step should be to figure out how human social instincts work, not just qualitatively but in enough detail to write it into AGI source code.

I claim that this is an open problem, involving things like circuits in the hypothalamus and neuropeptide receptors in the striatum. And it’s the main thing that I’m working on myself.

Additionally, there are several very good reasons to work on the human social instincts problem, even if you don’t buy into other parts of David Jilk’s assertions here.

Additionally, figuring out human social instincts is (I claim) (at least mostly) orthogonal to work that accelerates AGI timelines, and therefore we should all be able to rally around it as a good idea.

Whether we should also try to accelerate anthropomorphic AGI timelines, e.g. by studying the learning algorithms in the neocortex, is bound to be a much more divisive question. I claim that on balance, it’s mostly a very bad idea, with certain exceptions including closed (and not-intended-to-be-published) research projects by safety/alignment-concerned people. [I’m stating this opinion without justifying it.]

I think things like “If we see Sign X of misalignment from the AI, we should shut it down and retrain” comprise a small fraction of AI safety research, and I think even that small fraction consists primarily of stating extremely obvious ideas (let’s use honeypots! let’s do sandbox tests! let’s use interpretability! etc.) and exploring whether or not they would work, rather than stating non-obvious ideas. The horse has long ago left the barn on “the idea of sandbox testing and honeypots” being somewhere in an LLM’s training data!

I think a much larger fraction of AI safety research is geared towards thinking about how to make the AI not misaligned in the first place. So if the AI is scheming against us, reading those posts won’t be very helpful to it, because those ideas have evidently already failed.

I also think you’re understating how secrecy would inhibit progress. And we need progress, if we are to succeed at the goal of knowing how to make an AI that’s not misaligned in the first place.

In fact, even in the “If we see Sign X of misalignment from the AI, we should shut it down and retrain” type of research, I would strongly vote for open-and-therefore-better research (that the AI can also see) versus closed-and-therefore-probably-worse research (that the AI can’t see). For example, really good interpretability could be robust enough that it still works even if the AI has read the same articles as the programmers, and bad interpretability won’t work even if the AI hasn’t.

So I think this article is focusing on a niche benefit of secrecy that seems very unlikely to outweigh the cost.

But meanwhile a very big and real secrecy-related problem is the kind of conventional AGI-related infohazards that safety researchers talk about all the time, i.e. people don’t want to publicly share ideas that would make AGI happen sooner. For example, lots of people disagree with Eliezer Yudkowsky about important aspects of AGI doom, and it’s not getting resolved because Eliezer is not sharing important parts of his beliefs that he sees as sensitive. Ditto with me for sure, ditto with lots of people I’ve talked to.

Would this problem be solvable with a giant closed Manhattan Project thing like you talked about? I dunno. The Manhattan project itself had a bunch of USSR spies in it. Not exactly reassuring! OTOH I’m biased because I like living in Boston and don’t want to move to a barbed-wire-enclosed base in the desert  :-P

My paraphrase of the SDO argument is:

With our best-guess parameters in the Drake equation, we should be surprised that there are no aliens. But for all we know, maybe one or more of the parameters in the Drake equation is many many orders of magnitude lower than our best guess. And if that’s in fact the case, then we should not be surprised that there are no aliens!

…which seems pretty obvious, right?

So back to the context of AI risk. We have:

  1. a framework in which risk is a conjunctive combination of factors…
  2. …in which, at several of the steps, a subset of survey respondents give rather low probabilities for that factor being present

So at each step in the conjunctive argument, we wind up with some weight on “maybe this factor is really low”. And those add up.

I don’t find the correlation table (of your other comment) convincing. When I look at the review table, there seem to be obvious optimistic outliers—two of the three lowest numbers on the whole table came from the same person. And your method has those optimistic outliers punching above their weight.

(At least, you should be calculating correlations between log(probability), right? Because it’s multiplicative.)

Anyway, I think that AI risk is more disjunctive than conjunctive, so I really disagree with the whole setup. Recall that Joe’s conjunctive setup is:

  1. It will become possible and financially feasible to build APS systems.
  2. There will be strong incentives to build APS systems | (1). 
  3. It will be much harder to develop APS systems that would be practically PS-aligned if deployed, than to develop APS systems that would be practically PS-misaligned if deployed (even if relevant decision-makers don’t know this), but which are at least superficially attractive to deploy anyway | (1)-(2).
  4. Some deployed APS systems will be exposed to inputs where they seek power in misaligned and high-impact ways (say, collectively causing >$1 trillion 2021-dollars of damage) | (1)-(3).
  5. Some of this misaligned power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)-(4).
  6. This will constitute an existential catastrophe | (1)-(5).

Of these:

  • 1 is legitimately a conjunctive factor: If there’s no AGI, then there’s no AGI risk. (Though I understand that 1 is out of scope for this post?)
  • I don’t think 2 is a conjunctive factor. If there are not strong incentives to build APS systems, I expect people to do so anyway, sooner or later, because it’s scientifically interesting, it’s cool, it helps us better understand the human brain, etc. For example, I would argue that there are not strong incentives to do recklessly dangerous gain-of-function research, but that doesn’t seem to be stopping people. (Or if “doing this thing will marginally help somebody somewhere to get grants and tenure” counts as “strong incentives”, then that’s a very low bar!)
  • I don’t think 3 is a conjunctive factor, because even if alignment is easy in principle, there are bound to be people who want to try something different just because they’re curious what would happen, and people who have weird bad ideas, etc. etc. It’s a big world!
  • 4-5 does constitute a conjunctive factor, I think, but I would argue that avoiding 4-5 requires a conjunction of different factors, factors that get us to a very different world involving something like a singleton AI or extreme societal resilience against destructive actors, of a type that seems unlikely to me. (More on this topic in my post here.)
  • 6 is also a conjunctive factor, I think, but again avoiding 6 requires (I think) a conjunction of other factors. Like, to avoid 6 being true, we’d probably need to a unipolar outcome (…I would argue…), and the AI would need to have properties that are “good” in our judgment, and the AI would probably need to be able to successfully align its successors and avoid undesired value drift over the vast times and distances.

I join you in strongly disagreeing with people who say that we should expect unprecedented GDP growth from AI which is very much like AI today but better. OTOH, at some point we'll have AI that is like a new intelligent species arriving on our planet, and then I think all bets are off.

Principal Investigator, i.e. a professor in charge of a group of grad students and/or other underlings.

I just changed the wording to "professor" or "advisor" instead of "PI".

Principal Investigator, i.e. a professor in charge of a group of grad students and/or other underlings. [UPDATE: I changed the wording.]

(This whole answer is USA-specific)

This seems to me like a scary situation with essentially zero job security, but maybe I’m wrong about this?

The only real job security is to have marketable skills. Eternal perfect job security is extremely rare in the USA—I can’t think of anyone but tenured professors who have that. If you work at a startup, the startup could go under. If you work at a big firm, there could be layoffs. Etc.

The way I see it, if you successfully get a grant in Year N, then that should be strong evidence that you can successfully get a grant in Year N+1. After all, you’ll now have an extra year of highly-relevant experience, plus better connections etc. Right? (Well, unless you waste the grant money and get a bad reputation.) (Or unless the cause area funding situation gets worse in general, but that would equally be a concern as an employee at big nonprofit too, and anyway seems unlikely for major EA cause areas in the near future.)

And if not, whatever type of job you were doing before, you can apply for that type of job again! (If you leave on good terms, you could apply for literally the same job you left.)

Should they take their grant in small amounts spaced out year-by-year instead of all in the first year?

Do your taxes with accrual accounting! One time I wound up getting 26 months of pay in one calendar year. It would have been a catastrophe with cash-basis accounting, but it was perfectly lovely thanks to accrual accounting.  :)

For tax efficiency, should grant recipients optimally incorporate themselves as an S-corporation, or a charitable foundation, or something else?

You can be self-employed automatically without filing any special paperwork. That’s the category I’m in.

IIUC, the advantages of being a charitable foundation are all on the grant-giver side, not the grant-receiver side. Namely: (1) If you’re a charitable foundation, and another nonprofit wants to give you money, it is extremely easy for them to do so. (2) If you’re a charitable foundation, and an individual wants to give you money, then they can tax-deduct it.

However, some institutions including EA Funds have jumped through whatever hoops there are such that they can give money to individuals.

If your grantor is willing to give you the money as an individual, I think there’s no reason on your end to do anything different than that.

(If you want the advantages of being a nonprofit, e.g. getting money from SFF, without filing all the paperwork to be a nonprofit, I vaguely recall that there is an institution in the EA space that will “take you in” under its umbrella. But I can’t remember which one. There are also “virtual research institutes” (Theiss, Ronin, IGDORE, maybe others), that offer the same advantage (i.e. that your grantor would be officially granting to a nonprofit), but they’ll take a cut of every grant you get. A different advantage of the “virtual research institutes”, I suspect, is their ability to handle government grants, which I imagine come with a ton of bureaucracy & paperwork.)

Certain kinds of incorporation give you liability protection, which would be relevant if your “business” is going to borrow money or where there’s a risk of getting sued. That hasn’t been applicable for me.

If you get a $50K grant, is this better or worse on net than earning $50K of traditional W-2 employment income? … How do EA freelance researchers deal with the things that are typically provided through the employer/employee relationship — things like healthcare, disability insurance, retirement savings accounts, and so forth?

If you want to know how big a grant is necessary to support your living expenses, you have to do the annoying spreadsheet where you calculate the major taxes and deductions and expenses etc.

To answer your specific questions:

  • For me, $X of grant income was considerably worse than $X of W-2 income, even leaving aside the fact that the latter often comes with employer-provided benefits. The QBI deduction helps, but not nearly enough to compensate for the employer contribution to payroll taxes etc. It’s possible that this is income-dependent, I’m just saying what it was for me.
  • Yes I pay out-of-pocket for disability insurance, and (Roth & regular) IRAs, and an obamacare plan.

what lies do you tell your relatives to stop them from nagging you about your unorthodox career decisions

It was fine, partly because I didn’t quit my old job until my first 1-year grant was finalized, and so far I have gotten renewal grants well before the previous grants ran out. (Sample size = 1, but still.)

Load More