I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/about
I think this model is kind of misleading, and that the original astronomical waste argument is still strong. It seems to me that a ton of the work in this model is being done by the assumption of constant risk, even in post-peril worlds. I think this is pretty strange. Here are some brief comments:
I like the process of proposing concrete models for things as a substrate for disagreement, and I appreciate that you wrote this. It feels much better to articulate objections like "I don't think this particular parameter should be constant in your model" than to have abstract arguments. I also like how it's now more clear that if you do believe that risk in post-peril worlds is constant, then the argument for longtermism is much weaker (although I think still quite strong because of my comments about v).
I expect 10 people donating 10% of their time to be less effective than 1 person using 100% of their time because you don't get to reap the benefits of learning for the 10% people. Example: if people work for 40 years, then 10 people donating 10% of their time gives you 10 years with 0 experience, 10 with 1 year, 10 with 2 years, and 10 with 3 years; however, if someone is doing EA work full-time, you get 1 year with 0 exp, 1 with 1, 1 with 2, etc. I expect 1 year with 20 years of experience to plausibly be as good/useful as 10 with 3 years of experience. Caveats to the simple model:
I make a similar argument here.
Many people in EA depart from me here: they see choices that do not maximize impacts as personal mistakes. Imagine a button that, if you press it, would cause you to always take the impact-maximizing action for the rest of your life, even if it entails great personal sacrifice. Many (most?) longtermist EAs I talk to say they would press this button – and I believe them. That’s not true of me; I’m partially aligned with EA values (since impact is an important consideration for me), but not fully aligned.
I think there are people (e.g. me) that value things besides impact and would also press the button because of golden-rule type reasoning. Many people optimize for impact to the point where it makes them less happy.
Note: I work for ARC.
I would consider someone a "pretty good fit" (whatever that means) for alignment research if they started out with a relatively technical background, e.g. an undegrad degree in math/cs, but not really having engaged with alignment before and they were able to come up with a decent proposal after:
If someone starts from having thought about alignment a bunch, I would consider them a potentially "pretty good researcher" if they were able to come up with a decent proposal in 2-8 hours. I expect many existing (alignment) researchers to be able to come up with proposals in <1 hour.
Note that I'm saying "if (can come up with proposal in N hours), then (might be good alignment researcher)" and not saying the other implication also holds, e.g. it is not the case that "if (might be good alignment researcher), then (can come up with proposal in N hours)"
nit: link on "reasons" was pasted twice. For others it's https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
Also hadn't seen that paper. Thanks!
yes, thanks!