All of Lauro Langosco's Comments + Replies

I'd like to make sure that the person who read the grant takes AI safety seriously and much more seriously than other X-risks

FWIW I fit that description in the sense that I think AI X-risk is higher probability. I imagine some / most others at LTFF would as well.

2
Linch
8mo
I would guess more likely than not that this belief is universal at the fund tbh. (eg nobody objected to the recent decision to triage ~all of our currently limited funding to alignment grants).

Speaking for myself: it depends a lot on whether the proposal or the person seems promising. I'd be excited about funding promising-seeming projects, but I also don't see a ton of low-hanging fruit when it comes to AI gov research.

This is a hard question to answer, in part because it depends a lot on the researcher. My wild guess for a 90%-interval is $500k-$10m

Yes, everyone apart from Caleb is part-time. My understanding is LTFF is looking make more full-time hires (most importantly a fund chair to replace Asya).

5
Linch
8mo
I'm currently spending ~95% of my work time on EA Funds stuff (and paid to do so), so effectively full-time. We haven't decided how long I'll stay on, but I want to keep working on EA Funds at least until it's in a more stable position (or, less optimistically, we make a call to wind it down).  But this is a recent change, historically Caleb was the only full-time person.

That's fair; upon re-reading your comment it's actually pretty obvious you meant the conditional probability, in which case I agree multiplying is fine.

I think the conditional statements are actually straightforward - e.g. once we've built something far more capable than humanity, and that system "rebels" against us, it's pretty certain that we lose, and point (2) is the classic question of how hard alignment is. Your point (1) about whether we build far-superhuman AGI in the next 30 years or so seems like the most uncertain one here.

2
titotal
11mo
Yeah, no worries, I was afraid I'd messed up the math for a second there! It's funny, I think my estimates are the opposite of yours, I think 1 is probably the most likely, whereas I view 3 as vastly unlikely. None of the proposed takeover scenarios seem within the realm of plausibility, at least in the near future.  But I've already stated my case elsewhere.

Hi Geoffrey! Yeah, good point - I agree that the right way to look at this is finer-grained, separating out prospects for success via different routes (gov regulation, informal coordination, technical alignment, etc).

In general I quite like this post, I think it elucidates some disagreements quite well.

Thanks!

I’m not sure it represents the default-success argument on uncertainty well.

I haven't tried to make an object-level argument for either "AI risk is default-failure" or "AI risk is default-success" (sorry if that was unclear). See Nate's post for the former.

Re your argument for default-success, you only need to have 97% certainty for 1-4 if every step was independent, which they aren't.

I do agree that discussion is better pointed to discussing this evidence than gesturing to uncertainty

Agreed.

2
titotal
11mo
I'm pretty sure this isn't true. To be clear, I was talking about conditional probabilities, the probability of each occurring, given that the previous steps had already occurred. Consider me making an estimate like "theres a 90% chance you complete this triathlon (without dropping out)". In order to complete the triathlon as a whole, I need to complete the swimming, cycling and running in turn.  To get to 90% probability overall, I might estimate that you have a 95% chance of completing the swimming portion, a 96% chance of completing the cycling portion given that you finished the swimming portion, and a 99% chance of you completing the running portion, given that you finished the swimming and running portion. Total probability is 0.95*0.96*0.99=0.90.  The different events are correlated (a fit person will find all three easier than an unfit person), but that's taken care of in the conditional nature of the calculation. It's also possible that uncertainty is correlated (If I find out you have a broken leg, all three of my estimates will probably go down, even though they are conditional).  With regards to the doomsday scenario, the point is that there are several possible exit ramps (the AI doesn't get built, it isn't malevolent, it can't kill us all). If you want to be fairly certain that no exit ramps are taken, you have to be very certain that each individual exit ramp won't get taken. 

Sure, but that's not a difference between the two approaches.

3
mhendric
11mo
But it'll be intensified if the community mainly exists of people that like the same causes because the filter for membership is cause-centered rather than member-centered. 

However, there are important downsides to the "cause-first" approach, such as a possible lock-in of main causes

I'm surprised by this point - surely a core element of the 'cause-first' approach is cause prioritization & cause neutrality? How would that lead to a lock-in?

8
Guy Raveh
11mo
That might be true in theory, but not in practice. People become biased towards the causes they like or understand better.

Thanks for the post, it was an interesting read!

Responding to one specific point: you compare

Community members delegate to high-quality research, think less for themselves but more people end up working in higher-impact causes

to

Community members think for themselves, which improves their ability to do more good, but they make more mistakes

I think there is actually just one correct solution here, namely thinking through everything yourself and trusting community consensus only insofar as you think it can be trusted (which is just thinking through th... (read more)

2
EdoArad
11mo
Note that if you place a high degree of trust, then the correct approach to maximize direct impact would generally be to delegate a lot more (and, say, focus on the particularities of your specific actions). I think that it makes a lot of sense to mostly trust the cause-prioritization enterprise as a whole, but maybe this comes at the expense of people doing less independent thinking, which should address your other comment. 

That's why the standard prediction is not that AIs will be perfectly coherent, but that it makes sense to model them as being sufficiently coherent in practice, in the sense that e.g. we can't rely on incoherence in order to shut them down.

I don't think the strategy-stealing assumption holds here: it's pretty unlikely that we'll build a fully aligned 'sovereign' AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.

1[anonymous]2y
I don't mean to imply that we'll build a sovereign AI (I doubt it too). Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won't kill you but by the strategy stealing assumption can still compete with unaligned AIs.

Thanks the good points and the links! I agree the arms control epistemic community is an important story here, and re-reading Adler's article I notice he even talks about how Szilard's ideas were influential after all:

Very few people were as influential in the intellectual development of the arms control approach as Leo Szilard, whom Norman Cousins described as "an idea factory." Although Szilard remained an outsider to RAND and to the halls of government, his indirect influence was considerable because he affected those who had an impact on political de

... (read more)

Good points!

it's just that the interests of government decision-makers coincided a bit more with their conclusions.

Yeah I buy this. There's a report from FHI on nuclear arms control [pdf, section 4.8] that concludes that the effort for international control in 1945/46 was doomed from the start, because of the political atmosphere at the time:

Improving processes, with clearer, more transparent, and more informed policymaking would not likely have led to successful international control in 1945/46. This is only likely to have been achieved under radica

... (read more)
2
Ramiro
2y
On the other hand, I have to disclose that I sometimes (e.g., when I think about Schelling Nobel Lecture) consider a "dismal hypothesis": given human nature, if the world hadn't seen what happened to Hiroshima, it's quite possible that  people wouldn't have developed the same level of aversion to nukes, and we might have had something like a nuclear WW III. I guess people often need a concrete "proof of concept" to take risks seriously - so they can regard them as imminent . Possibly that's an additional factor in the explanation of why we succeeded with smallpox and CFCs, and why biosecurity gained more track after covid-19.

You might be interested in this great intro sequence to embedded agency. There's also corrigibility and MIRI's other work on agent foundations.

Also, coherence arguments and consequentialist cognition.

AI safety is a young field; for most open problems we don't yet know of a way to crisply state them in a way that can be resolved mathematically. So if you enjoy taking messy questions and turning them into neat math you'll probably find much to work on.

ETA: oh and of course ELK.

Upvoted because concrete scenarios are great.

Minor note:

HQU is constantly trying to infer the real state of the world, the better to predict the next word Clippy says, and suddenly it begins to consider the delusional possibility that HQU is like a Clippy, because the Clippy scenario exactly matches its own circumstances. [...] This idea "I am Clippy" improves its predictions

This piece of complexity in the story is probably not necessary. There are "natural", non-delusional ways for the system you describe to generalize that lead to the same outcome. T... (read more)

Oh, the whole story is strictly speaking unnecessary :). There are disjunctively many stories for an escape or disaster, and I'm not trying to paint a picture of the most minimal or the most likely barebones scenario.

The point is to serve as a 'near mode' visualization of such a scenario to stretch your mind, as opposed to a very 'far mode' observation like "hey, an AI could make a plan to take over its reward channel". Which is true but comes with a distinct lack of flavor. So for that purpose, stuffing in more weird mechanics before a reward-hacking twis... (read more)

Makes sense―I agree that the base value of becoming an MEP seems really good.