N

nonn

303 karmaJoined

Posts
1

Sorted by New
3
· · 1m read

Comments
38

sorry, I don't check the forum much. And yeah, your revised picture sounds like what I meant.

Still interested in something like this, though might punt on it for a month or two (busy times), partly because I have to think about what makes sense to me/what my probabilities are. The above was just intended to illustrate the point about betting, I didn't think much about the numbers. I'm definitely not an "AI 2027" person, but also definitely not an "AI 2070" person re: medians.

If you know your numbers that would help - e.g. it sounds like you could suggest numbers that are a good deal to you, but also sound like an obviously good deal to me, despite my less-clear picture? You could also hash that if desired until I come back :P

the one above is not worth it for me

That's what I meant by interest-indexed, unless that isn't capturing your concern?

Also to be clear that was just an example possible bet, would have to think about what actually makes sense (2035 is somewhat near my median tho). My point was just that money isn't valuable to me after crazy AI stuff already happened, so me winning has no value in a conventional bet & it has to be constructed giving money beforehand to be returned with interest + winnings.

Would be open to confirming identity, and/or likely have other folks who would be keen to make the same bet.

What is your P(existentially bad outcomes) in the next 10 years? As maybe a starting point for finding a bet that sounds good to you.

How are you operationalizing this? No matter the odds, it doesn't make sense to make bets of the form

  • heads we're dead OR there's infinite abundance + I get some money from you
  • tails I give your some money

Maybe would be open to "you transfer 1k to me now (2026), I give you interest-indexed 2k in 2035" or whatever odds make sense. Though I understand you'd need to trust me and/or some trusted way to make sure I give it back

I also don't think short timelines is actually cruxy to my argument above, which is mainly about their argument being wrong + pointing at other arguments for misalignment, not timelines.

The theoretical section seems weak, and basically just sidesteps all sorts of arguments like this. And theoretical arguments are a key crux imo

And their main theoretical argument doesn’t seem right either. They say "evolution only touches DNA whereas gradient descent touches the whole network”… but the way we ‘touch’ the whole network is via a simple loss function + default weight update rule (which would easily fit into DNA)! You could just as easily say DNA does this, by defining the algorithm/architecture/update process that creates/updates the brain.

They then imply we have much more fine-grained control than DNA/evolution could, but in fact we our current method implies very little fine control. Like yes, in principle, we have access to all the weights. If we understood more, maybe we could have an extremely-complicated loss function and sophisticated update process, and then DNA couldn’t code for something analogously sophisticated. But that’s not remotely what we do, and best practice is essentially "define some loss functions on vibes and see what works”.

If anything, it seems like DNA’s training algorithm exerts more fine-grain control & has a more complicated ‘loss function/update rule’ than gradient descent.

(And evolution had far more time to try out adaptions to novel behavior at ‘near-human capability’ and still failed on inner alignment, albeit unclear how to compare "tons of random random tries" v.s. "a few vibes-based tries")

Not this person, but many AI risk arguments are necessarily logical rather than empirical - there are good reasons to believe the relevant behaviors won't appear, or be trivially easy to counter (at least re: harmful outputs), until you have very capable systems.

Like, if I can construct a deceptive response-to-training strategy (but current models can't), that's enough evidence to be concerned future superhuman models might do similar deceptive alignment. Other concerns like inner optimizers (e.g. humans stopped being kid-maxxers at high capability, because our proxy decoupled from evolution's target) might not show up, or change in character, as models become less limited. And even when you can demonstrate the behavior empirically, people dismiss it as overly-induced or a toy environment - which was the whole point, just to show plausibility not prove it.

More fundamentally: If I argue that a future thing logically implies certain risks arise, responding with "there's no empirical evidence" is silly. Logical chains and structural arguments are still valid epistemic tools.

I think a major blocker to this kind of thing is that people feel like 'it's not a real career' and worry what would happen if they tried to leave, or just didn't see success in their fieldbuilding startup.

IMO this is very incorrect above a certain threshold of ability, especially for people already working in EA or AIS technical/policy/generalist roles. But it would be very helpful if your team could offer some stronger guarantees to these people!

Here's one basic idea (common and probably far from optimal): 'failed-fieldbuilding-attempt insurance' - For people you think should do this, you agree to give a 5 year stipend of 2-5k/month if they try & fail & can't find another decent job. Likely you wouldn't even have to pay this out much, because most people that you're excited to see try fieldbuilding are IMO incorrect about not being able to transition back. So in practice, you'd give them the stipend for a few months before they found a new job. And many of them would actually succeed & you'd pay nothing!

I think this is a good point about precise phrasing, but I think the argument still basically goes through that insects should be treated as extremely important in expectation. You can eliminate the two envelope problem by either make the numbers fixed/concrete, or you can use conditional probabilities.

Intuitively: suppose you thought there was an 50% chance you prevent a holocaust-level (10,000,000 lives) event happening to humans, but a 50% chance that this intervention would be completely useless. Alternately, you could do a normal intervention to save 1000 lives.

You could say "the normal intervention as a 50% chance to be ~infinitely more valuable than the holocaust-prevention thing"

But it's obvious you should do the holocaust prevention thing. Because here it's more obvious what the comparative/conditional stakes are. In one possible world, the 'world you can affect' is vastly larger, and that world should be prioritized.


Caveats: ignoring longtermist arguments, and the probability insects matter is << 50% imo

I don't know what you mean? You can look at existing interventions that primarily help very young people (neonatal or childhood vitamin supplementation) v.s. a comparably-effective interventions that target adults or older people (e.g. cash grants, schistosomiasis)

There are multiple GiveWell charities in both categories, so this is just saying you should weight towards the ones that target older folks by maybe a factor of 2x or more, v.s. what givewell says (they assume the world won't change much)

Assuming two interventions are around similarly effective in life-years saved, interventions saving old lives must (necessarily) save more lives in the short run. E.g. save 4 lives granting 10 life-years v.s. save 1 life granting 40 life years.

Some fraction of people who don't work on AI risk cite "wanting to have more certainty of impact"as their main reason. But I think many of them are running the same risk anyway: namely, that what they do won't matter because transformative AI will make their work irrelevant, or dramatically lower value.

This is especially obvious if they work on anything that primarily returns value after a number of years. E.g. building an academic career or any career where most impact is realized later, working toward policy changes, some movement-building things, etc.

But also applies somewhat to things like nutrition or vaccination or even preventing deaths, where most value is realized later (by having better life outcomes, or living an extra 50 years). Though this category does still have certainty of impact, just the amount of impact might be cut by whatever fraction of worlds are upended in some way by AI. And this might affect what they should prioritize... e.g. they should prefer saving old lives over young ones, if the interventions are pretty-close on naive effectiveness measures.

Load more