Wiki Contributions


Why AI alignment could be hard with modern deep learning

Not intended to be expressing a significantly shorter timeline; 15-30 years was supposed to be a range of "plausible/significant probability" which the previous model also said (probability on 15 years was >10% and probability on 30 years was 50%). Sorry that wasn't clear!

(JTBC I think you could train a brain-sized model sooner than my median estimate for TAI, because you could train it on shorter horizon tasks.)

AMA: Ajeya Cotra, researcher at Open Phil

Ah yeah, that makes sense -- I agree that a lot of the reason for low commercialization is local optima, and also agree that there are lots of cool/fun applications that are left undone right now.

AMA: Ajeya Cotra, researcher at Open Phil

To clarify, we are planning to seek more feedback from people outside the EA community on our views about TAI timelines, but we're seeing that as a separate project from this report (and may gather feedback from outside the EA community without necessarily publicizing the report more widely).

AMA: Ajeya Cotra, researcher at Open Phil

Finally, have you talked much to people outside the alignment/effective altruism communities about your report? How have reactions varied by background? Are you reluctant to publish work like this broadly? If so, why? Do you see risks of increasing awareness of these issues pushing unsafe capabilities work?


I haven't engaged much with people outside the EA and AI alignment communities, and I'd guess that very few people outside these communities have heard about the report. I don't personally feel sold that the risks of publishing this type of analysis more broadly (in terms of potentially increasing capabilities work) outweigh the benefits of helping people better understand what to expect with AI and giving us a better chance of figuring out if our views are wrong. However, some other people in the AI risk reduction community who we consulted (TBC, not my manager or Open Phil as an institution) were more concerned about this, and I respect their judgment, so I chose to publish the draft report on LessWrong and avoid doing things that could result in it being shared much more widely, especially in a "low-bandwidth" way (e.g. just the "headline graph" being shared on social media).

AMA: Ajeya Cotra, researcher at Open Phil

Thanks!  I'll answer your cluster of questions about takeoff speeds and commercialization in this comment and leave another comment respond to your questions about sharing my report outside the EA community.

Broadly speaking, I do expect that  transformative AI will be foreshadowed by incremental economic gains; I generally expect gradual takeoff , meaning I would bet that at some point growth will be ~10% per year before it hits 30% per year (which was the arbitrary cut-off for "transformative" used in my report). I don't think it's necessarily the case; I just think it'll probably work this way. On the outside view, that's how most technologies seem to have worked. And on the inside view, it seems like there are lots of valuable-but-not-transformative applications of existing models on the horizon, and industry giants + startups are already on the move trying to capitalize.

My views imply a roughly ~10% probability that the compute to train transformative AI would be affordable in 10 years or less, which wouldn't really leave time for this kind of gradual takeoff. One reason it's a pretty low number is because it would imply sudden takeoff and I'm skeptical of that implication (though it's not the only reason -- I think there are separate reasons to be skeptical of the Lifetime Anchor and the Short Horizon Neural Network anchor, which drive short timelines in my model).

I don't expect that several generations of more powerful successors to GPT-3 will be developed before we see significant commercial applications to GPT-3; I expect commercialization of existing models and scaleup to larger models to be happening in parallel. There are already various applications online, e.g. AI Dungeon (based on GPT-3), TabNine (based on GPT-2), and this list of other apps. I don't think that evidence OpenAI was productizing GPT-3 would shift my timelines much either way, since I already expect them to be investing pretty heavily in this. 

Relative to the present, I expect the machine learning industry to invest a larger share of its resources going forward into commercialization, as opposed to pure R&D: before this point a lot of the models studied in an R&D setting just weren't very useful (with the major exception of vision models underlying self-driving cars), and now they're starting to be pretty useful. But at least over the next 5-10 years I don't think that would slow down scaling / R&D much in an absolute sense, since the industry as a whole will probably grow, and there will be more resources for both scaling R&D and commercialization.

AMA: Ajeya Cotra, researcher at Open Phil

I haven't thought very deeply about this, but my first intuition is that the most compelling reason to expect to have an impact that predictably lasts longer than several hundred years without being washed out is because of the possibility of some sort of "lock-in" -- technology that allows values and preferences to be more stably transmitted into the very long-term future than current technology allows. For example, the ability to program space probes with instructions for creating the type of "digital life" we would morally value, with error-correcting measures to prevent drift, would count as a technology that allows for effective lock-in in my mind. 

A lot of people may act as if we can't impact anything post-transformative AI because they believe technology that enables lock-in will be built very close in time after transformative AI (since TAI would likely cause R&D towards these types of tech to be greatly accelerated).

AMA: Ajeya Cotra, researcher at Open Phil
  1. I think "major insights" is potentially a somewhat loaded framing; it seems to imply that only highly conceptual considerations that change our minds about previously-accepted big picture claims count as significant progress. I think very early on, EA produced a number of somewhat arguments and considerations which felt like "major insights" in that they caused major swings in the consensus of what cause areas to prioritize at a very high level; I think that probably reflected that the question was relatively new and there was low-hanging fruit. I think we shouldn't expect future progress to take the form of "major insights" that wildly swing views about a basic, high-level question as much (although I still think that's possible).
  2. Since 2015, I think we've seen good analysis and discussion of AI timelines and takeoff speeds, discussion of specific AI risks that go beyond the classic scenario presented in Superintellilgence,  better characterization of multipolar and distributed AI scenarios, some interesting and more quantitative debates on giving now vs giving later and "hinge of history" vs "patient" long-termism, etc. None of these have provided definitive / authoritative answers, but they all feel useful to me as someone trying to prioritize where Open Phil dollars should go.
  3. I'm not sure how to answer this; I think taking into account the expected low-hanging fruit effect, and the relatively low investment in this research, progress has probably been pretty good, but I'm very uncertain about the degree of progress I "should have expected" on priors.
  4. I think ideally the world as a whole would be investing much more in this type of work than it is now. A lot of the bottleneck to this is that the work is not very well-scoped or broken into tractable sub-problems, which makes it hard for a large number of people to be quickly on-boarded to it.
  5.  Related to the above, I'd love for the work to become better-scoped over time -- this is one thing we prioritize highly at Open Phil.
AMA: Ajeya Cotra, researcher at Open Phil

My answer to this one is going to be a pretty boring "it depends" unfortunately. I was speaking to my own experience in responding to the top level question, and since I do a pretty "generalist"-y job, improving at general reasoning is likely to be more important for me. At least when restricting to areas that seem highly promising from a long-termist perspective, I think questions of personal fit and comparative advantage will end up determining the degree to which someone should be specialized in a particular topic like machine learning or biology.

I also think that often someone who is a generalist in terms of topic areas still specializes in a certain kind of methodology, e.g. researchers at Open Phil will often do "back of the envelope calculations" (BOTECs) in several different domains, effective "specializing" in the BOTEC skillset.

AMA: Ajeya Cotra, researcher at Open Phil

Yes, I meant that the version of long-termism we think about at Open Phil is animal-inclusive.

AMA: Ajeya Cotra, researcher at Open Phil

Personally, I don't do much explicit, dedicated practice or learning of either general reasoning skills (like forecasts) or content knowledge (like Anki decks); virtually all of my development on these axes comes from "just doing my job." However, I don't feel strongly that this is how everyone should be -- I've just found that this sort of explicit practice holds my attention less and subjectively feels like a less rewarding and efficient way to learn, so I don't invest in it much. I know lots of folks who feel differently, and do things like Anki decks, forecasting practice, or both.

Load More