M

MichaelDickens

8151 karmaJoined
mdickens.me

Bio

Participation
2

I do independent research on EA topics. I write about whatever seems important, tractable, and interesting (to me).

I have a website: https://mdickens.me/ Much of the content on my website gets cross-posted to the EA Forum, but I also write about some non-EA stuff over there.

I used to work as a software developer at Affirm.

Sequences
1

Quantitative Models for Cause Selection

Comments
1051

Two quick thoughts:

  1. This is a neat idea, it's difficult to come up with safe preferences to encode in an ASI, and the concept of strong risk-aversion might help.
  2. A major obstacle (which I didn't see listed in section 8) is that currently we have no idea how to embed any set of preferences whatsoever in an ASI. 2b. If we figure out how to encode risk-averse preferences in an ASI, then I'm not sure it makes sense to speak of it as "misaligned", because clearly we do know how to get it to pursue goals that we care about. It seems weird to expect that we won't know how to make ASI not want to tile the universe with paperclips, but we will know how to make it want to risk-aversely tile the universe with paperclips.

I think section 10 is pointing at something similar. I find it at least somewhat plausible that RL on risk aversion generalizes better than other kinds of RL. I would still be surprised if we could get risk aversion to generalize to ASI using anything resembling current techniques, but this seems like a better-than-average idea for preventing AI takeover.

Mass third world immigration threatens advanced civilization - This is the most controversial point, but I think it is intuitive. Influxes of third worlders into Europe have caused a large amount of political and social unrest that has come to a head in the past decade and a half. Statistics from Denmark show that pretty much all third world migrants are a net negative on how much they contribute versus how much is spent on them by the state, while first worlders are a net positive.

Seems like you are cherry-picking one particular statistic about state welfare expenditures and ignoring all other relevant evidence. Most importantly, you are ignoring the consensus among economists that immigration increases economic prosperity.

If you will allow me to respond to your cherry-picked evidence with my own cherry-picked evidence: The United States consists of 97% immigrants or descendants of immigrants, and it's the wealthiest country in the world.

Biggest (easily-fixable) outstanding issue is I still don't think it makes sense to model deployment by end-2026 because the IPO lockup probably won't have ended by then.

Robust alignment requires alignment-relevant intervention during pretraining

I'd say this is the wrong question. Like, I do not expect that any current alignment approach is going to work. If we do ever figure out what works, it will not look like "pretraining" or "post-training", it will be something completely different.

Although I guess you could call that "pretraining"?

Some commentary. I mostly agree with the page, but I will focus on the bits where I see room for improvement:

  • All 8 gates look correct to me, but they don't all deserve equal emphasis.
    • Gate 1 says IPOs have lock-ups. That's true but I basically don't think that matters because lock-ups are very predictable: they will announce how long it is, and that's exactly how long it will be. There's no uncertainty. The main reason it's relevant is that a lockup gives more time for AI valuations to fluctuate or collapse, but the text doesn't even mention this.
    • Gate 5 and Gate 7 seem like they're saying the same thing.
    • Gate 8 ("Bay social incentives") seems uninteresting since it's not a claim in the same category as the others. It's more like a meta-level reason why people might not think about the other 7 gates.
  • Would be cool for the BOTEC to use distributions rather than point estimates. (Squiggle is good for this, and Squigglehub even has a built-in way to have AI generate models.) IMO distributions are a lot more informative than point estimates.
  • It looks like the default estimates in the BOTEC are pulled from the sources, but it's not clear which estimates came from which sources. There should be inline citations.
  • "Anthropic valuation" variable should specifically be the valuation at the end of the 6-month lockup. Doesn't matter much for a point estimate but it would increase the variance if there variable were a probability distribution.
  • Unclear what the "Founder pledge" variable refers to. Is it the % of pledgers' wealth that they've pledged to donate? If so, the default of 80% seems really high?
  • "Employee committed pool" is defined in terms of dollars rather than as a % of company valuation, which seems weird. Shouldn't it depend on the value of the equity?
  • This model is supposed to illustrate how a lot of people are being too optimistic, but even then, I think most of the point estimates in the model are too optimistic. Consider that e.g. the median self-reported earner-to-give only donates (IIRC) 3% of their income.
  • IMO "Deployment by end-2026" should use a different date. IPO 3-6 months from now plus 6 months lockup means no money will be deployed in 2026, unless Anthropic does a fast IPO + early lockup release. Even by the end of 2027, you're talking about a 3-9 month turnaround time on lockup ending -> grants being disbursed. FTX Foundation donated $190 million (pre-clawbacks) in about 6 months, which was ridiculously fast compared to a typical foundation, and that was still a pretty small % of its long-term budget (or at least, what was believed to be its long-term budget before FTX collapsed).
  • I would delete the OpenAI Foundation bit because (1) the model has enough parameters already and (2) I doubt OpenAI Foundation will give much money to causes that look good by EA lights.
  • "Grantmaker capacity multiplier" seems nonsensical as written. Shouldn't the capacity max out at 1x? If grantmakers are a complete non-bottleneck, then the other parameters will dictate the amount disbursed; if they're a bottleneck, then the amount disbursed will be less. There's no way for grantmaker capacity to have a multiplier >1x.
    • Also this would make more sense as a dollar amount, not a multiplier. Like there's a fixed total amount that grantmakers can reasonably disburse. You could model it in a more complicated way but IMO a simple cap is the way to do it. Or maybe don't use this parameter at all. I think it's probably worth including, but keep it simple.
  • "Field absorption ceiling" is structured more sensibly than "Grantmaker capacity multiplier", but these two seem redundant because they're closely related. If orgs have more capacity to expand, grantmakers can deploy money faster by giving to those orgs. If there are more grantmakers, they can create more and bigger RFPs. etc. I would include one variable or the other, but not both.
  • "The steelman could be too pessimistic if founders or employees treat liquidity as an urgent moral obligation" – TBH the BOTEC as written seems to me like it's already pricing in that founders/employees will treat donations as urgent, e.g. it's implying that Anthropic money will be disbursed faster than FTX Foundation money, which itself was disbursed at historic speed. IMO most likely reason why the model will end up underestimating is that Anthropic market cap ends up being like 10x higher than predicted.
  • My downward adjustments to the model aren't even the pessimistic case. The pessimistic case* is that the AI field collapses (investor funding dries up or something) and Anthropic stock is worth $0. Base rate says there's like a 50% chance that that will happen. Even optimistically, you should expect at least a 10–20% chance that Anthropic stockholders get nothing.
  • The recommendations under "How to plan if the skeptical case is live" don't really make sense. AFAIK ~zero orgs are planning as if they're guaranteed to get a huge pile of donations 1–2 years from now. I believe nonprofits mainly plan based on the money they already have on their books + short-term (<1 year) fundraising expectations. "How to plan if the skeptical case is live" is just "business as usual".
    • That section says "Funders and field builders should prioritize grantmaker capacity", but that's what to do if the skeptical case is wrong, not if the skeptical case is right.
  • "What would update this memo?" – as with the gates, no sense of prioritization is given. IMO by far the biggest uncertainty, about which we will get more information in the future, is: What will Anthropic's valuation be when the lockup ends? "Concrete donor vehicles" is also important evidence, but we won't get that until probably 6-24 months later.

*this is pessimistic for donations but I would actually prefer that this happen because it would lengthen timelines. so in a way it's the optimistic outcome

Wouldn't this be true for any intelligent being, not just LLMs? Almost by definition, you're disproportionately bad at diagnosing your own mistakes, because otherwise you probably would've stopped making them.

I can't say I've ever noticed AI being disproportionately bad at understanding AI psychology. Can you say more about why you believe this?

I'd never thought of this argument, and it's obviously correct in retrospect.

Although "trying less hard" might not quite be pointing at the right thing. Reflecting on your epistemics / course of action could still be considered a form of "trying hard", so maybe it would be better to describe it as "rocketing in a particular pre-determined direction less hard".

Load more