(Post 6/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
Explicit backchaining is one way to do prioritisation. I sometimes forget that there are other useful heuristics, like:
(Post 5/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
(Post 4/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
I’ve spent a bit of time over the last year trying to form better judgement. Dumping some notes here on things I tried or considered trying, for future reference.
(Post 3/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
(Post 2/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
(Post 1/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)
According to me, these are some of the key uncertainties in AI governance field-building—questions which, if we had better answers to them, might significantly influence decisions about how field-building should be done.
Things that surprised me about the results
Thanks for your comment!
I doubt that it's reasonable to draw these kinds of implications from the survey results, for a few reasons:
A broader point: I think making importance comparisons (between interventions) on the level of abstrac...
In a following post, we will explore:
- How you could orient your career toward working on security
Did you end up writing this, or have a draft of it you'd be willing to share?
In fact, one I am writing this comment because I think this post itself endorses that framing to too great an extent.
Probably agree with you there
I do not think it is appropriate to describe this [the Uber crash] simply as an accident
Also agree with that. I wasn't trying to claim it is simply an accident—there are also structural causes (i.e. bad incentives). As I wrote:
Note that this could also be well-described as an "accident risk" (there was some incompetence on behalf of the engineers, along with the structural causes). [emphasis added]
If I...
Unfortunately, when someone tells you "AI is N years away because XYZ technical reasons," you may think you're updating on the technical reasons, but your brain was actually just using XYZ as excuses to defer to them.
I really like this point. I'm guilty of having done something like this loads myself.
...When someone gives you gears-level evidence, and you update on their opinion because of that, that still constitutes deferring. What you think of as gears-level evidence is nearly always disguised testimonial evidence. At least to some, usually damning, degree
Thanks for your comment! I agree that the concept of deference used in this community is somewhat unclear, and a separate comment exchange on this post further convinced me of this. It's interesting to know how the word is used in formal epistemology.
Here is the EA Forum topic entry on epistemic deference. I think it most closely resembles your (c). I agree there's the complicated question of what your priors should be, before you do any deference, which leads to the (b) / (c) distinction.
Thanks for your comment!
Asking "who do you defer to?" feels like a simplification
Agreed! I'm not going to make any changes to the survey at this stage, but I like the suggestion and if I had more time I'd try to clarify things along these lines.
I like the distinction between deference to people/groups and deference to processes.
deference to good ideas
[This is a bit of a semantic point, but seems important enough to mention] I think "deference to good ideas" wouldn't count as "deference", in the way that this community has ended up using it. As per the foru...
Cool, makes sense.
The main way to answer this seems to be getting a non-self-rated measure of research skill change.
Agreed. Asking mentors seems like the easiest thing to do here, in the first instance.
Somewhat related comment: next time, I think it could be better to ask "What percentage of the value of the fellowship came from these different components?"* instead of "What do you think were the most valuable parts of the programme?". This would give a bit more fine-grained data, which could be really important.
E.g. if it's true that most of the value of ERIs comes from networking, this would suggest that people who want to scale ERIs should do pretty different things (e.g. lots of retreats optimised for networking).
*and give them several buckets to select from, e.g. <3%, 3-10%, 10-25%, etc.
Thanks for putting this together!
I'm surprised by the combination of the following two survey results:
Fellows' estimate of how comfortable they would be pursuing a research project remains effectively constant. Many start out very comfortable with research. A few decline.
and
Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable about the programs. (emphasis mine)
That is: on average, fellows claim they learned to do better research, but became ...
Re (1) See When Will AI Exceed Human Performance? Evidence from AI Experts (2016) and the 2022 updated version. These surveys don't ask about x-risk scenarios in detail, but do ask about the overall probability of very bad outcomes and other relevant factors.
Re (1) and (3), you might be interested in various bits of research that GovAI has done on the American public and AI researchers.
You also might want to get in touch with Noemi Dreksler, who is working on surveys at GovAI.
A potentially useful subsection for each perspective could be: evidence that should change your mind about how plausible this perspective is (including things you might observe over the coming years/decades). This would be kinda like the future-looking version of the "historical analogies" subsection.
Another random thought: a bunch of these lessons seem like the kind of things that general writing and research coaching can teach. Maybe summer fellows and similar should be provided with that? (Freeing up time for you/other people in your reference class to play to your comparative advantage.)
(Though some of these lessons are specific to EA research and so seem harder to outsource.)
Love it, thanks for the post!
"Reading 'too much' is possibly the optimal strategy if you're mainly trying to skill up (e.g., through increased domain knowledge), rather than have direct impact now. But also bear in mind that becoming more efficient at direct impact is itself a form of skilling up, and this pushes back toward 'writing early' as the better extreme."
Two thoughts on this section:
Additional (obvious) arguments for writing early: producing stuff builds career capital, and is often a better way to learn than just reading.
I want to disen
Relevant discussion from a couple of days ago: https://astralcodexten.substack.com/p/why-not-slow-ai-progress
The question was:
Assume for the purpose of this question that HLMI* will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run?
So it doesn't presuppose some agentic form of AGI—but rather asks about the same type of technology that the median respondant gave a 50% chance of arriving within 45 years.
*HLMI was defined in the survey as:
“High-level machine intelligence” (HLMI) is achieved when unaided machines can accomplish every task better and more cheaply than human workers.
Right, I just wanted to point out that the average AI researcher who dismisses AI x-risk doesn't do so because they think AGI is very unlikely. But I admit to often being confused about why they do dismiss AI x-risk.
The same survey asked AI researchers about the outcome they expect from AGI:
The median probability was 25% for a “good” outcome and 20% for an “extremely good” outcome. By contrast, the probability was 10% for a bad outcome and 5% for an outcome described as “Extremely Bad (e.g., human extinction).”
If I learned that there was some scientifi...
Thanks, I agree with most of these suggestions.
"Other (AI-enabled) dangerous tech" feels to me like it clearly falls under "exacerbating other x-risk factors"
I was trying to stipulate that the dangerous tech was a source of x-risk in itself, not just a risk factor (admittedly the boundary is fuzzy). The wording was "AI leads to deployment of technology that causes extinction or unrecoverable collapse" and the examples (which could have been clearer) were intended to be "a pathogen kills everyone" or "full scale nuclear war leads to unrecoverable collapse"
they basically see AGI as very unlikely
Certainly some people you talk to in the fairness/bias crowd think AGI is very unlikely, but that's definitely not a consensus view among AI researchers. E.g. see this survey of AI researchers (at top conferences in 2015, not selecting for AI safety folk), which finds that:
Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years
I'm also still a bit confused about what exactly this concept refers to. Is a 'consequentialist' basically just an 'optimiser' in the sense that Yudkowsky uses in the sequences (e.g. here), that has later been refined by posts like this one (where it's called 'selection') and this one?
In other words, roughly speaking, is a system a consequentialist to the extent that it's trying to take actions that push its environment towards a certain goal state?
Found the source. There, he says that an "explicit cognitive model and explicit forecasts" about the future are necessary to true consequentialist cognition (CC). He agrees that CC is already common among optimisers (like chess engines); the dangerous kind is consequentialism over broad domains (i.e. where everything in the world is in play, is a possible means, while the chess engine only considers the set of legal moves as its domain).
"Goal-seeking" seems like the previous, less-confusing word for it, not sure why people shifted.
Another (unoriginal) way that heavy AI reg could be counterproductive for safety: AGI alignment research probably increases in productivity as you get close to AGI. So, regulation in jurisdictions with the actors who are closest to AGI (currently, US/UK) would give those actors less time to do high productivity AGI alignment research, before the 2nd place actor catches up
And within a jurisdiction, you might think that responsible actors are most likely to comply to regulation, differentially slowing them down
Ways of framing EA that (extremely anecdotally*) make it seem less ick to newcomers. These are all obvious/boring; I'm mostly recording them here for my own consolidation
I'm still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?
Or do you have no distinction in mind, but just think that the phrase "inside view" captures the sentiment better?
Thanks, I appreciate this post a lot!
Playing the devil's advocate for a minute, I think one main challenge to this way of presenting the case is something like "yeah, and this is exactly what you'd expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they're deployed widely enough to do so?"
And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don't manage to fix?
Thanks for writing this!
There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.
I disagree with this.
We ran a survey of prominent AI safety and governance researchers, where we asked them to estimate the probability of five different AI x-risk scenarios.
Arguably, the "terminator-like" scenarios are the "Superintelligence" scenario, and part 2 of "What failure looks like" (as you suggest...
After practising some self-love I am now noticeably less stressed about work in general. I sleep better, have more consistent energy, enjoy having conversations about work-related stuff more (so I just talk about EA and AI risk more than I used to, which was a big win on my previous margin). I think I maybe work fewer hours than I used to because before it felt like there was a bear chasing me and if I wasn't always working then it was going to eat me, whereas now that isn't the case. But my working patterns feel healthy and sustainable now; before, I was...
Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).
Nice post! I agree with ~everything here. Parts that felt particularly helpful:
Maybe this process generalises and so longtermist AI governance can learn from other communities?
In some sense, this post explains how the longtermist AI governance community is trying to go from “no one understands this issue well”, to actually improving concrete decisions that affect the issue.
It seems plausible that the process described here is pretty general (i.e. not specific to AI governance). If that’s true, then there could be opportunities for AI governance to learn from how this process has been implemented in other communities/fields and vice-versa.
Something that would improve this post but I didn’t have time for:
For each kind of work, give a sense of:
Note: "If you want to add one or more co-authors to your post, you’ll need to contact the Forum team..." is no longer the easiest way to add co-authors, so might want to be updated accordingly.
And by the way, thanks for adding this new feature!
Thanks JP!