Ben Millwood🔸

4683 karmaJoined

Participation
3

  • Attended an EA Global conference
  • Attended an EAGx conference
  • Attended more than three meetings with a local EA group

Comments
533

Topic contributions
1

forgive the self-promotion but here's a related Facebook post I made:

The law of conservation of expected evidence, E(E(X|Y)) = E(X), essentially states that you can't "expect to change your mind", in the sense that, if you already thought that your estimate of (say) some intervention's cost-effectiveness would go up by an average of Z after reading this study, then your EV should already have been Z higher before you read it. You should be balanced (in EV terms) between the possible outcomes that would be positive surprises and negative surprises, otherwise you're just not calculating your EVs correctly.

Anyway, let's take X to be global future welfare, and Y to be the consequences of some action you take. E(E(X|Y)) = E(X) means that the average global well-being given the outcome of your action is exactly the same as the average global well-being without the outcome of your action. So why did you bother doing it?

I trust that you'll enforce this trademark against anyone who takes any actions with an unduly large impact on the world, requiring them to first apply for a license to do so.

This got me thinking:

 no namename
feedbackanonymous formnormal
no feedbackshut up???

Have you considered making a form where people can submit their names and nothing else?

Not that it's super important, but TVTropes didn't invent the phrase (nor do they claim they did), it's from Warhammer 40,000.

I downvoted this because I think this isn't independently valuable / separate enough from your existing posts to merit a new, separate post. I think it would have been better as a comment on your existing posts (and as I've said on a post by someone else about your reviews, I think we're better off consolidating the discussion in one place).

That said, I think the sentiments expressed here are pretty reasonable, and I would have upvoted this in comment form I think.

Someone on the forum said there were ballpark 70 AI safety roles in 2023

Just to note that the UK AI Security Institute employs more than 50 technical staff by itself and I forget how many non-technical staff, so this number may be due an update.

This doesn't seem right to me because I think it's popular among those concerned with the longer term future to expect it to be populated with emulated humans, which clearly isn't a continuation of the genetic legacy of humans, so I feel pretty confident that it's something else about humanity that people want to preserve against AI. (I'm not here to defend this particular vision of the future beyond noting that people like Holden Karnofsky have written about it, so it's not exactly niche.)

You say that expecting AI to have worse goals than humans would require studying things like what the empirical observed goals of AI systems turn out to be, and similar – sure, so in the absence of having done those studies, we should delay our replacement until they can be done. And doing these studies is undermined by the fact that right now the state of our knowledge on how to reliably determine what an AI is thinking is pretty bad, and it will only get worse as they develop their abilities to strategise and lie. Solving these problems would be a major piece of what people are looking for in alignment research, and precisely the kind of thing it seems worth delaying AI progress for.

another opportunity for me to shill my LessWrong writing posing this question: Should we exclude alignment research from LLM training datasets?

I don't have a lot of time to spend on this, but this post has inspired me to take a little time to figure out whether I can propose or implement some controls (likely: making posts visible to logged-in users only) in ForumMagnum (the software underlying the EA Forum, LW, and the Alignment Forum)

edit: https://github.com/ForumMagnum/ForumMagnum/issues/10345

Load more