1245 karmaJoined Aug 2016


Planned Obsolescence



I work at Open Philanthropy, and in the last few months I took on much of our technical AI safety grantmaking. 

In November and December, Jacob sent me a list of academics he felt that someone at Open Phil should reach out to and solicit proposals from. I was interested in these opportunities, but at the time, I was full-time on processing grant proposals that came in through Open Philanthropy's form for grantees affected by the FTX crash and wasn't able to take them on. 

This work tailed off in January, and since then I've focused on a few bigger grants, some writing projects, and thinking through how I should approach further grantmaking. I think I should have reached out to at least a few of the people Jacob suggested earlier (e.g. in February). I didn't make any explicit decision to reject someone that Jacob thought was a slam dunk because I disagreed with his assessment — rather, I was slower to reach out to talk to people he thought I should fund than I could have been. 

I plan to talk to several of the leads Jacob sent my way in Q2, and (while I would plan to think through the case for these grants myself to the extent I can) I expect to end up agreeing a lot with Jacob's assessments. 

With that said, Jacob and I do have more nebulous higher-level disagreements about things like how truth-tracking academic culture tends to be and how much academic research has contributed to AI alignment so far, and in some indirect way these disagreements probably contributed to me prioritizing these reach outs less highly than someone else might have. 


Here's the link to the various podcasting services it's on! https://www.buzzsprout.com/2160905/share 


(I work at Open Phil, speaking for myself)

FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.


I'm really sorry that you and so many others have this experience in the EA community. I don't have anything particularly helpful or insightful to say -- the way you're feeling is understandable, and it really sucks :(

I just wanted to say I'm flattered and grateful that you found some inspiration in that intro talk I gave. These days I'm working on pretty esoteric things, and can feel unmoored from the simple and powerful motivations which brought me here in the first place -- it's touching and encouraging to get some evidence that I've had a tangible impact on people.


I can give a sense of my investment, though I'm obviously an unusual case in multiple ways. I'm a coauthor on the report but I'm not an ARC researcher, and my role as a coauthor was primarily to try to make it more likely that the report would be accessible to a broader audience, which involved making sure my own "dumb questions" were answered in the report.

I kept time logs, and the whole project of coauthoring the report took me ~100 hours. By the end I had one "seed" of an ELK idea but unfortunately didn't flesh it out because other work/life things were pretty hectic. Getting to this "seed" was <30 min of investment.

I think if I had started with the report in hand, it would have taken me ~10 hours to read it carefully enough and ask enough "dumb questions" to get to the point of having the seed of an idea about as good as that idea, and then another ~10 hours to flesh it out into an algorithm + counterexample. I think the probability I'd have won the $5000 prize after that investment is ~50%, making the expected investment 40h. I think there's a non-trivial but not super high chance I'd have won larger prizes, so the $ / hours ratio is significantly better in expectation than $125/hr (since the ceiling for the larger prizes is so much higher).

My background: 

  • I have a fairly technical background, though I think the right way to categorize me is as "semi-technical" or "technical-literate." I did computer science in undergrad and enjoyed it / did well, but my day to day work mainly involves writing. I can do simple Python scripting. I can slowly and painfully sometimes do the kinds of algorithms problem sets I did quickly in undergrad.
  • Four years ago I wrote this to explain what I understood of Paul's research agenda at the time.
  • I've been thinking about AI alignment a lot over the last year, and especially have the unfair advantage of getting to talk to Paul a lot.  With that said, I didn't really know much or think much about ELK specifically (which I consider pretty self-contained) until I started writing the report, which was late Nov / early Dec.

ARC would be excited for you to send a short email to elk@alignmentresearchcenter.org with a few bullet points describing your high level ideas, if you want to get a sense for whether you're on the right track / whether fleshing them out would be likely to win a prize.


I was imagining Sycophants as an outer alignment failure, assuming the model is trained with naive RL from human feedback.


Not intended to be expressing a significantly shorter timeline; 15-30 years was supposed to be a range of "plausible/significant probability" which the previous model also said (probability on 15 years was >10% and probability on 30 years was 50%). Sorry that wasn't clear!

(JTBC I think you could train a brain-sized model sooner than my median estimate for TAI, because you could train it on shorter horizon tasks.)


Ah yeah, that makes sense -- I agree that a lot of the reason for low commercialization is local optima, and also agree that there are lots of cool/fun applications that are left undone right now.

Load more