Jay Bailey🔸

Head of Technology and Standards @ Arcadia Impact
1208 karmaJoined Working (6-15 years)Brisbane QLD, Australia

Bio

Participation
2

I'm a former software engineer from Brisbane, Australia who successfully pivoted into AI safety. I spent eighteen months at the UK AISI, and now work at Arcadia Impact.

Comments
160

In the online gaming community, people often say a game is "dead" to mean either "Has reduced activity from a previous peak" or, if sufficiently principled about it, "Has had declining activity for a significant period of time, with no expected boost in the near future".

So, if you mentally replace "dead" with "more inactive than it used to be" I think this confusion will resolve. The standard I expect Noah is using internally is "At least as much activity as I perceived the EA Forum to have in its most active period since I joined it". (But I am not Noah, and am guessing entirely from previous usage I've seen, not from knowledge of them specifically)

I also don't like this word being used in this way. "Dead" implies it's no longer worth engaging with, and for online games there is a very real risk of hyperstition - people don't want to invest in a dead game. "Dying" would be a better term, but even then that exaggerates it - it is possible to preserve a healthy game or a forum with a lower, but stable, level of activity for a long time. "Declining" would be the best term imo, to honestly describe what is happening, if that is indeed what's happening.

This advice is designed specifically for frontier AI companies - I definitely think there's nothing wrong with wanting to earn more in most situations. Usually I would advise against giving 100% of your money over some threshold, precisely because it removes the personal incentive to earn more money. This is generally bad, unless of course you happen to be in a position where money may incentivize you to do bad things. Then removing that incentive is a very good idea. 

I would give this same advice if someone were, say, considering leaving an animal advocacy group to join a meat producer, earn 3x the money, and lobby from the inside for better practices. Maybe this is the best move for animals, maybe not - but you should be very suspicious about this reasoning when it stands to benefit you financially. In this exact case, wanting more money is highly counterproductive to doing good, because it distorts your thinking. In most cases, this is not true, or is true to a much smaller extent.

Basically, the pattern is "If you would gain more money from joining the powerful people doing bad things, that's a bad incentive structure and you should remove that incentive structure from yourself."

Recently, I've been mulling over the question of whether it was a good idea or not to join a frontier AI company's safety team for the purposes of reducing extinction risk. One of my big cons was something like:

Jay, you think the incentives are less likely to affect you compared to most people. But most AI safety people who join frontier labs probably think this. You will be affected as well.

So I decided on a partial mitigation strategy, entirely as a precautionary principle and not at all because I thought I needed to. I committed to myself and to several people I'm close to that if I were to join a frontier lab safety team, I would donate 100% of the surplus that I would gain as a result of taking that job instead of a less lucrative job somewhere else.

At this time I was applying for a few jobs, one of which was at a frontier company. Approximately immediately, my System 1 became way less interested in that job. And I didn't even have an offer in hand for a specific amount of money. I don't have good reasons to care a lot about getting more money for myself. I have enough already, and I voluntarily live well below my means. This did not stop the effect from existing, and I didn't notice the effect before. I still don't notice the effect on my thinking in a vacuum. I only notice it by doing a mental side-by-side comparison.

I now think anyone who is considering joining a frontier company in order to reduce extinction risk should make this same commitment as a basic defensive measure against perverted incentives. I am sure there exist people who are entirely indifferent to money in this way - this is at least partially a skill issue on my part. But it does seem that "Thinking you are indifferent to the money" is not a reliable signal that your thinking is unaltered by it.

This is also an opportunity to say that, if I ever do join a frontier safety team, I officially give you permission to ask me if I'm meeting this commitment of mine in conversation, even if other people are around.

I disagree pretty strongly with this.

The biggest danger to EA is being cool. I mean this completely seriously. When EA becomes cool, people who don't care about EA show up to secure money and status. When EA becomes cool, that reputation needs defending, which is often corrosive to truth. EA being unpopular enough to deter status-seekers while not being so universally loathed that even mission-aligned nerds hesitate to associate with it for fear of social repercussions is exactly where we ought to be.

I think the argument of "EA could achieve more if it had a better reputation" is compelling, intuitive, and wrong. It seems like you're imagining a cool version of EA that tons of smart people want to join, but also maintains the same level of mission alignment and commitment to truth. I think this is actually impossible. 

EA's impact is a product of magnitude * direction. A better reputation increases our magnitude, but it's very easy for that direction to get much closer to zero. (And, since we take the money of committed altruists, a direction that is insufficiently positive is actually net-negative, thanks to opportunity cost) 

Thanks for the post! For those reading, I'm Jay - head of technology and standards for the Inspect Evals repo, and the reviewer of Declan's PR. I happened to spot this post without realising it was from a recent contributor! A couple of quick clarifications around the structure of Inspect Evals (it is pretty confusing)

Inspect Evals != Inspect, and isn't run by the same team. Inspect is the evals framework, Inspect Evals is a repository of evals that use that framework. Inspect Evals is run by Arcadia Impact, and we're contracted by the UK AISI to maintain it. 

Our developers work remotely as contractors, so moving to London isn't required. I'm in Australia at the moment. (though we're doing a restructure atm and it's uncertain how that's going to pan out, so I'm not sure about our current hiring)

I think the EA Hotel people have a point about evaluations, personally. I think if you're going to open-source evaluations, you should ask "Would I be okay if frontier AI companies trained on this / hill-climbed on this metric?" For frontier maths evals you might not want this - is it worth the increased knowledge we get about these capabilities? For moral reasoning under uncertainty, you may actively want them to do something like this.

Finally, I'm glad you liked the agent stuff - that's been a majority of where my time's gone this quarter. Appreciate the feedback, and more is always welcome :)

This is now covered for Lightcone, but MIRI is still open.

I live in Australia, and am interested in donating to the fundraising efforts of MIRI and Lightcone Infrastructure, to the tune of $2,000 USD for MIRI and $1,000 USD for Lightcone. Neither of these are tax-advantaged for me. Lightcone is tax advantaged in the US, and MIRI is tax advantaged in a few countries according to their website

Anyone want to make a trade, where I donate the money to a tax-advantaged charity in Australia that you would otherwise donate to, and you make these donations? As I understand it, anything in Effective Altruism Australia would work. Since my tax bill is expected to be about 1/3rd this year, I'm open to matching up to 4.5k USD for this instead of 3k, which will cost me about 3k in the long run. This will not funge against my existing 10% donations to global health, it's on top of them, so you get all that sweet, sweet counterfactual impact.

If he's still around, @Mitchell Laughlin🔸 can confirm that I've successfully done a match like this before across a longer timeframe and larger total amount.

My p(doom) went down slightly (From around 30% to around 25%) mainly as a result of how GPT-4 caused governments to begin taking AI seriously in a way I didn't predict. My timelines haven't changed - the only capability increase of GPT-4 that really surprised me was its multimodal nature. (Thus, governments waking up to this was a double surprise, because it clearly surprised them in a way that it didn't surprise me!)

I'm also less worried about misalignment and more worried about misuse when it comes to the next five years, due to how LLM"s appear to behave. It seems that LLM's aren't particularly agentic by default, but can certainly be induced to perform agent-like behaviour - GPT-4's inability to do this well seems to be a capability issue that I expect to be resolved in a generation or two. Thus, I'm less worried about the training of GPT-N but still worried about the deployment of GPT-N. It makes me put more credence in the slow takeoff scenario.

This also makes me much more uncertain about the merits of pausing in the short-term, like the next year or two. I expect that if our options were "Pause now" or "Pause after another year or two", the latter is better. In practice, I know the world doesn't work that way and slowing down AI now likely slows down the whole timeline, which complicates things. I still think that government efforts like the UK's AISI are net-positive (I'm joining them for a reason, after all) but I think a lot of the benefit to reducing x-risk here is building a mature field around AI policy and evaluations before we need it - if we wait until I think the threat of misaligned AI is imminent, that may be too late.

This is exactly right, and the main reason I wrote this up in the first place. I wanted this to serve as a data point for people to be able to say "Okay, things have gone a little off the rails, but things aren't yet worse than they were for Jay, so we're still probably okay." Note that it is good to have a plan for when you should give up on the field, too - it should just allow for some resilience and failures baked in. My plan was loosely "If I can't get a job in the field, and I fail to get funded twice, I will leave the field". 

Also contributing to positive selection effects is that you're more likely to see the more impressive results in the field, because they're more impressive. That gives your brain a skewed idea of what the median person in the field is doing. Our brain thinks "Average piece of alignment research we see" is "Average output of alignment researchers".

The counterargument to this is "Well, shouldn't we be aiming for better than median? Shouldn't these impressive pieces be our targets to reach?" I think so, yes, but I believe in incremental ambition as well - if one is below-average in the field, aiming to be median first, then good, then top-tier rather than trying to immediately be top-tier seems to me a reasonable approach.

Welcome to the Forum!

This post falls into a pretty common Internet failure mode, which is so ubiquitous outside of this forum that it's easy to not realise that any mistake has even been made - after all, everyone talks like this. Specifically, you don't seem to consider whether your argument would convince someone who genuinely believes these views. I am only going to agree with your answer to your trolley problem if I am already convinced invertebrates have no moral value...and in that case, I don't need this post to convince me that invertebrate welfare is counterproductive. There isn't any argument for why someone who does not currently agree with you should change their mind.

It is worth considering what specific reasons people who care about invertebrate reasoning have, and trying to answer those views directly. This requires putting yourself in their shoes and trying to understand why they might consider invertebrates to have actual moral worth.

"So what's the problem? Why don't I just let the invertebrate-lovers go do their thing, while I do mine? The problem is that those arguing for the invertebrate cause as an issue of moral importance have brought bad arguments to the table."

This is much more promising, and I'd like to see actual discussion of what these arguments are, and why they're bad.

Load more