PhD Student @ NYU Alignment Research Group
83 karmaJoined Dec 2019



My deeply concerning impression is that OpenPhil (and the average funder) has timelines 2-3x longer than the median safety researcher. Daniel has his AGI training requirements set to 3e29, and I believe the 15th-85th percentiles among safety researchers would span 1e31 +/- 2 OOMs. On that view,  Tom's default values are off in the tails.

My suspicion is that funders write off this discrepancy, if noticed, as inside-view bias i.e. thinking safety researchers self-select for scaling optimism. My,  admittedly very crude, mental model of an OpenPhil funder makes two further mistakes in this vein: (1) Mistakenly taking the Cotra report's biological anchors weighting as a justified default setting of parameters rather than an arbitrary choice which should be updated given recent evidence. (2) Far overweighting the semi-informative priors report despite semi-informative priors abjectly failing to have predicted Turing-test level AI progress. Semi-informative priors apply to large-scale engineering efforts which for the AI domain has meant AGI and the Turing test. Insofar as funders admit that the engineering challenges involved in passing the Turing test have been solved, they should discard semi-informative priors as failing to be predictive of AI progress. 

To be clear, I see my empirical claim about disagreement between the funding and safety communities as most important -- independently of my diagnosis of this disagreement. If this empirical claim is true, OpenPhil should investigate cruxes separating them from safety researchers, and at least allocate some of their budget on the hypothesis that the safety community is correct. 

In my opinion, the applications of prediction markets are much more general than these. I have a bunch of AI safety inspired markets up on Manifold and Metaculus. I'd say the main purpose of these markets is to direct future research and study. I'd phrase this use of markets as "A sub-field prioritization tool". The hope is that markets would help me integrate information such as (1) methodology's scalability e.g. in terms of data, compute, generalizability (2) research directions' rate of progress (3) diffusion of a given research direction through the rest of academia, and applications.

Here are a few more markets to give a sense of what other AI research-related markets are out there: Google Chatbot, $100M open-source model, retrieval in gpt-4

Seems to me safety timeline estimation should be grounded by a cross-disciplinary, research timeline prior. Such a prior would be determined by identifying a class of research proposals similar to AI alignment in terms of how applied/conceptual/mathematical/funded/etc. they are and then collecting data on how long they took. 

I'm not familiar with meta-science work, but this would probably involve doing something like finding an NSF (or DARPA) grant category where grants were made public historically and then tracking down what became of those lines of research. Grant-based timelines are likely more analogous to individual sub-questions of AI alignment than the field as a whole; e.g. the prospects for a DARPA project might be comparable to the prospects for working out the details of debate. Converting such data into a safety timelines prior would probably involve estimating how correlated progress is on grants within subfields.

Curating such data, and constructing such a prior would be useful both in terms of informing the above estimates, but also for identifying factors of variation which might be intervened on--e.g. how many research teams should be funded to work on the same project in theoretical areas? This timelines prior problem seems like a good fit for a prize, where entries would look like recent progress studies reports (c.f. here and here).

Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?

It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.

A few thoughts on ML/AI safety which may or may not generalize:

You should read successful candidates' SOPs to get a sense of style, level of detail, and content c.f. 1, 2, 3. Ask current EA PhDs for feedback on your statement. Probably avoid writing a statement focused on an AI safety/EA idea which is not in the ML mainstream e.g. IDA, mesa-optimization, etc. If you have multiple research ideas, considering writing more than one (i.e. tailored) SOP and submit the SOP which is most relevant to faculty at each university.

Look at groups' pages to get a sense of the qualification distribution for successful applicants, this is a better way to calibrate where to apply than looking at rankings IMO. This is also a good way to calibrate how much experience you're expected to have pre-PhD. My impression is that in many ML programs it is very difficult to get in directly out of undergraduate if you do not have an exceptional track-record e.g. top publications, or Putnam high scores etc.

For interviews, bringing up concrete ideas on next steps for a professor's paper is probably very helpful.

My vague impression is that financial security and depression are less relevant than in other fields here, as you can probably find job opportunities partway through if either becomes problematic. Would be interested to hear disagreement.

On-demand Software Engineering Support for Academic AI Safety Labs

AI safety work, e.g. in RL and NLP, involves both theoretical and engineering work, but academic training and infrastructure does not optimize for engineering. An independent non-profit could cover this shortcoming by providing software engineers (SWE) as contractors, code-reviewers, and mentors to academics working on AI safety. AI safety research is often well funded, but even grant-rich professors are bottlenecked by university salary rules and professor hours which makes hiring competent SWE at market rate challenging. An FTX Foundation funded organization could get around these bottlenecks by doing independent vetting of SWE and offering industry-competitive salaries and then having hired SWE collaborate with academic safety researchers at no cost to the lab. If successful, academic AI safety work ends up faster in terms of researcher hours and higher impact because papers are accompanied by more legible and standardized code bases -- i.e. AI safety work ends up looking more like distill. Estimating potential impact of this proposal could be done by soliciting input from researchers who moved from academic labs to private AI safety organizations.

EDIT: This seems to already exist at https://alignmentfund.org/

Re: feasibility of AI alignment research, Metaculus already has Control Problem solved before AGI invented . Do you have a sense of what further questions would be valuable?

Ok, seems like this might have been more a terminological misunderstanding on my end. I think I agree with what you say here, 'What if the “Inner As AGI” criterion does not apply? Then the outer algorithm is an essential part of the AGI’s operating algorithm'.

Ok, interesting. I suspect the programmers will not be able to easily inspect the inner algorithm, because the inner/outer distinction will not be as clear cut as in the human case. The programmers may avoid sitting around by fiddling with more observable inefficiencies e.g. coming up with batch-norm v10.

Good clarification. Determining which kinds of factoring are the ones which reduce valence is more subtle than I had thought. I agree with you that the DeepMind set-up seems more analogous to neural nociception (e.g. high heat detection). My proposed set-up (Figure 5) seems significantly different from the DM/nociception case, because it factors the step where nociceptive signals affect decision making and motivation. I'll edit my post to clarify.

Load more