abergal

Program Associate at Open Philanthropy and chair of the Long-Term Future Fund. I spend half my time on AI and half my time on EA community-building. Any views I express on the forum are my own, not the views of my employer.

Wiki Contributions

Comments

Why AI alignment could be hard with modern deep learning

Another potential reason for optimism is that we'll be able to use observations from early on in the training runs of systems (before models are very smart) to affect the pool of Saints / Sycophants / Schemers we end up with. I.e., we are effectively "raising" the adults we hire, so it could be that we're able to detect if 8-year-olds are likely to become Sycophants / Schemers as adults and discontinue or modify their training accordingly.

Open Philanthropy is seeking proposals for outreach projects

Sorry this was unclear! From the post:

There is no deadline to apply; rather, we will leave this form open indefinitely until we decide that this program isn’t worth running, or that we’ve funded enough work in this space. If that happens, we will update this post noting that we plan to close the form at least a month ahead of time.

I will bold this so it's more clear.

Open Philanthropy is seeking proposals for outreach projects

There's no set maximum; we expect to be limited by the number of applications that seem sufficiently promising, not the cost.

Taboo "Outside View"

Yeah, FWIW I haven't found any recent claims about insect comparisons particularly rigorous.

HIPR: A new EA-aligned policy newsletter

FWIW I had a similar initial reaction to Sophia, though reading more carefully I totally agree that it's more reasonable to interpret your comment as a reaction to the newsletter rather than to the proposal. I'd maybe add an edit to your high-level comment just to make sure people don't get confused?

Ben Garfinkel's Shortform

Really appreciate the clarifications! I think I was interpreting "humanity loses control of the future" in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where "humanity" refers to present-day humans, rather than humans at any given time period.  I totally agree that future humans may have less freedom to choose the outcome in a way that's not a consequence of alignment issues.

I also agree value drift hasn't historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.

Ben Garfinkel's Shortform

Wow, I just learned that Robin Hanson has written about this, because obviously, and he agrees with you.

Load More