Topic Contributions


Information security considerations for AI and the long term future

Thanks for the post! This seems like a clearly important and currently quite neglected area and I'd love to see more work on it.

My current hot-take is that it seems viable to make AGI research labs a sufficiently hardened target that most actors cannot exploit them. But I don't really see a path to preventing the most well-resourced state actors from at least exfiltrating source code. There's just so many paths to this: getting insiders to defect, supply chain attacks, etc. Because of this I suspect it'll be necessary to get major state actors to play ball by other mechanisms (e.g. international treaties, mutually assured destruction). I'm curious if you agree or are more optimistic on this point?

I also want to note that espionage can reduce x-risk in some cases: e.g. actors may be less tempted to cut corners on safety if they have intelligence that their competitors are still far away from transformative AI. Similarly, it could be used as an (admittedly imperfect) mechanism for monitoring compliance with treaties or more informal agreements. I do still expect better infosec to be net-positive, though.

Free-spending EA might be a big problem for optics and epistemics

Making bets on new ambitious projects doesn't seem necessarily at odds with frugality: you can still execute on them in a lean way, some things just really do take a big CapEx. Granted whether Google or any major tech company really does this is debatable, but I do think they tend to at least try to instill it, even if there is some inefficiency e.g. due to principal-agent problems.

Free-spending EA might be a big problem for optics and epistemics

Thanks for writing this post, this is an area I've also sometimes felt concerned about so it's great to see some serious discussion.

A related point that I haven't seen called out explicitly is that monetary costs are often correlated with other more significant, but less visible, costs such as staff time. While I think the substantial longtermist funding overhang really does mean we should spend more money, I think it's still very important that we scrutinize where that money is being spent. One example that I've seen crop up a few time is retreats or other events being organized at very short notice (e.g. less than two weeks). In most of these cases there's not been a clear reason why it needs to happen right now, and can't wait a month or so. There's a monetary cost to doing things last minute (e.g. more expensive flights and hotel rooms) but the biggest cost is the event will be less effective than if the organizers and attendees had more time to plan for it.

More generally I'm concerned that too much funding can have a detrimental effect on organisational culture. It's often possible to make a problem temporarily go away just by throwing money at it. Sometimes that's the right call (focus on core competencies), but sometimes it's better off fixing the structural problem, before an organisation scales and it gets baked in. Anecdotally it seems like many of the world's most successful companies do try to make frugality part of their culture, e.g. it's one of Amazon's leadership principles.

In general, being inefficient at a small scale can still end up being very impactful if you work on the right problem. But I think to make a serious dent on the world's problems, we're likely going to need some mega-projects, spending billions of dollars with large headcount. Inefficiency at that scale is likely to result in project failure: oversight and incentives only get harder. So it seems critical that we continue to develop the ability in EA to execute on projects efficiently, even if in the short-term we might achieve more by neglecting that.

I do feel a bit confused about what to do in practice to address these problems, and would love to see more thinking on it. For individual decisions, I've found figuring out what my time (in some context) is worth and sticking to it for time-money tradeoffs is helpful. In general I'd be suspicious if someone is always choosing to spend money when it saves time, or vice-versa. For funding decisions, these concerns are one of the reasons I lean towards keeping the bar for funding relatively high even if that means we can't immediately deploy funding. I also support vetting people carefully to avoid incentivizing people pretending to be longtermists (or just having very bad epistemics).

I think it's important to distinguish people's expectations and the reality of what gets rewarded. Both matter: if people expect something to be unrewarding, they won't do it even if it would be appreciated; and perhaps even worse, if people expect to get rewarded for something but in fact there is limited support, they may waste time going down a dead end.

Another axis worth thinking about is what kind of rewards are given. The post prompts for social rewards, but I'm not sure why we should focus on this specifically: things like monetary compensation, work-life balance, location, etc all matter and are determined at least in part by EA orgs and grantmakers decisions. Even if we focus on social rewards, does this look like gratitude from your colleagues, being invited to interesting events, having social media followers, a set of close friends you like, ...? All of these can be rewarding, but the amount of weight people put on each varies a lot. I think it helps to be precise here, as otherwise two people might disagree about how rewarding a role is, even though they agree about the facts of the matter.

Off the top of my head, categories of people who I think often get rewarded too much / too little by the movement.

Overrated: AI safety researchers

I am an AI safety researcher, so I'll start with deprecating myself! To be clear, I think AI safety should be a priority, and people who are making progress here deserve resources to let them scale up their research. But it seems to sometimes be put on a pedestal I don't think it really belongs to. Biosecurity, cause prioritization, improving institutional decision making, etc all seem within an order of magnitude of AI at least -- and people's relative fit for the area can dwarf that. I think this is one of the cases where perception is more skewed than reality: e.g. although the bar for funding AI safety research does seem a bit lower than other areas, I've generally seen promising projects / people in other areas be able to attract funding relatively easily too.

I'd also like to see more critical evaluation of people's research agendas. I see more deference than I'm comfortable with. It's a tricky balance: we don't want to strangle a research agenda at birth just because it doesn't fit our preconceptions. So I think it makes sense to give individuals a decent amount of runway to pursue novel approaches. But I think accountability can actually help make people more productive, both by motivating them and giving useful feedback. Given time constraints I think it's OK to sometimes fund or otherwise support people without having a good inside view of why their work matters, but I'd like to see people be more explicit about that in their own reasoning and communication with others so we don't get a positive feedback loop. Concretely I've fairly often wished I could fund someone without giving an implied endorsement -- not because I think their work is bad, I'm just not confident.

Overrated: Parroting popular arguments

There's often a lot of deference to the opinions of high-status figures in the EA community. I don't think this is necessarily bad per se: no one has time to look into every possible issue, so relying on expert opinion is a necessary shortcut. However, the question then arises, how are the so-called experts selected?

A worrying trend I've seen is that people who agree with the current in-vogue opinion and parrot the popular arguments often seem to be given more epistemic credit than they deserve. While those who try hard to form their own opinions, and sometimes make mistakes, are more likely to be viewed with skepticism. The tricky thing here is the "parrots" are right more often than the "independent thinkers" -- but the marginal contribution of the parrots contribution to the debate is approximately zero.

I'm not sure how to fix this. I think one thing that can help is rewarding people for having good reasons for doing what they're working on, rather than you agreeing with their outcome per se. So, if I meet someone who is e.g. working on AI safety but does not seem to have a strong grasp of the arguments for it or why they're a good fit for it, I might encourage them to look at other options. Whereas if I meet someone working on e.g. asteroid deflection, which I'd personally guess is much less impactful, I'd be supportive of them if they had decent responses to my critique (even if I'm not convinced by the response).

Underrated: Micro-entrepreneurship

A key part of entrepeneurship is identifying an opportunity others are overlooking, and then taking initiative to exploit that opportunity. Entrepreneurs in the "Silicon Valley startup" form are adequately rewarded (although I'll note it's common for founders to face intense skepticism early on, before the idea is validated). But there's opportunities to apply this style of thinking and work at varying scales: setting up a new community event, helping an org you join run better, etc. These are often taken for granted, especially since once the idea has been executed, it may often seem trivial. But such "obvious" ideas frequently languish for many years as no one bothers to solve them.

For example, during my PhD at CHAI, I helped scale-up an internship program, fundraised for and helped run a program to give cash grants to other PhD students (not myself) who were being held back by funding constraints, helped lead meetings to help integrate new PhD students, fundraised for and helped set up a compute cluster, etc. None of these were particularly hard: I believe most other people in the group could have done them. But they didn't, and I expect <50% of them would have happened if I hadn't taken initiative.

I wasn't rewarded for these particularly, and they'll do little to help me in a research career. But I actually count this as a success case -- in many orgs I wouldn't have even had the freedom to take these actions! So I'd encourage leaders of orgs to at the least try to give your individual contributors freedom to take leadership of useful projects, and where possible try to reward them for it, even if it's not incentivized by the broader ecosystem.

Underrated: Direct work outside the community

Working directly for an EA org is rewarding in many ways (social connection, prestige), although by no means all (compensation low to middling relative to what many of the individuals could earn). But there's lots of direct paths to impact that don't involve working with EAs!

For example, if you want to improve institution decision-making, it might make sense to spend at least some time working at the kind of large governmental institution you seek to later reform. Even the most ardent civil servant would not claim that large government bureaucracies are a particularly exciting place to work.

Similarly, I see a lot of people working on AI safety at a handful of labs that have made safety a priority: e.g. DeepMind, OpenAI, Anthropic, Redwood. This makes a decent amount of sense, but might there not be considerable value working at a company that might build powerful AI which currently has few internal experts on safety, such as Google Brain or Meta AI Research? This isn't for everyone: you ideally should have some seniority already, and need strong communication skills to get leadership and other teams excited by your work. But I expect it could be higher impact, by getting a new group of people to work on safety problems, and helping ensure that any systems those labs build are aligned.

One thing that could help here is having a strong community outside of workplaces and narrow geographical hubs. And also evaluating people's career more by their long-term trajectory, and not just what they're working on right now, noting that direct impact outside EA orgs will often by necessity involve some work that by our lights would be of limited impact.

EA Infrastructure Fund: May–August 2021 grant recommendations

That being said, we might increase our funding threshold if we learn that few grants have been large successes, or if more funders are entering the space.

My intuition is that more funders entering the space should lower your bar for funding, as it'd imply there's generally more money in this space going after the same set of opportunities. I'm curious what the reasoning behind this is, e.g. unilateralist curse considerations?

Democratising Risk - or how EA deals with critics

First of all, I'm sorry to hear you found the paper so emotionally draining. Having rigorous debate on foundational issues in EA is clearly of the utmost importance. For what it's worth when I'm making grant recommendations I'd view criticizing orthodoxy (in EA or other fields) as a strong positive so long as it's well argued. While I do not wholly agree with your paper, it's clearly an important contribution, and has made me question a few implicit assumptions I was carrying around.

The most important updates I got from the paper:

  1. Put less weight on technological determinism. In particular, defining existential risk in terms of a society reaching "technological maturity" without falling prey to some catastrophe frames technological development as being largely inevitable. But I'd argue even under the "techno-utopian" view, many technological developments are not needed for "technological maturity", or at least not for a very long time. While I still tend to view development of things like advanced AI systems as hard to stop (lots of economic pressures, geographically dispersed R&D, no expert consensus on whether it's good to slow down/accelerate), I'd certainly like to see more research into how we can affect the development of new technologies, beyond just differential technological advancement.
  2. "Existential risk" is ambiguous, so hard to study formally, we might want to replace it with more precise terms like "extinction risk" that are down-stream of some visions of existential risk. I'm not sure how decision relevant this ends up being, I think disagreement about how the world will unfold explains more of the disagreement on x-risk probabilities than definitions of x-risk, but it does seem worth trying to pin down more precisely.
  3. "Direct" vs "indirect" x-risk is a crude categorization, as most hazards lead to risks via a variety of pathways. Taking AI: there are some very "direct" risks such as a singleton AI developing some superweapon, but also some more "indirect" risks such as an economy of automated systems gradually losing alignment with collective humanity.

My main critiques:

  1. I expect a fairly broad range of worldviews end up with similar conclusions to the "techno-utopian approach" (TUA). The key beliefs seem to be that: (a) substantially more value is present in the future than exists today; (b) we have a moral obligation to safeguard that. The TUA is a very strong version of this, where there is many orders of magnitude more value in the future (transhumanism, total utilitarianism) and moral obligation is equal in the future and present (strong longtermism). But a non-transhumanist who wants 8 billion non-modified, biological humans to continue happily living on Earth for the next 100,000 years and values future generations at 1% of current generations would for many practical purposes make the same decisions.
  2. I frequently found myself unsure if there was actually a concrete disagreement between your views and those in the x-risk community, including those you criticize, beyond a choice of framing and emphasis. I understand it can be hard to nail down a disagreement, but this did leave me a little unsatisfied. For example, I'm still unsure what it really means to "democratise research and decision-making in existential risk" (page 26). I think almost all x-risk researchers would welcome more researchers from complementary academic disciplines or philosophical bents, and conversely I expect you would not suggest that random citizen juries should start actively participating in research. One concrete question I had is what axes you'd be most excited for the x-risk research field to become more diverse on at the margin: academic discipline, age, country, ethnicity, gender, religion, philosophical views, ...?
  3. Related to the above, it frequently felt like the paper was arguing against uncharitable versions of someone else's views -- VWH is an example others have brought up. On reflection, I think there is value to this, as many people may be holding those versions of the person's views even if the individual themselves had a more nuanced perspective. But it did often make me react "but I subscribe to <view X> and don't believe <supposed consequence Y>"! One angle you could consider taking in future work is to start by explaining your most core disagreements with a particular view, and then go on to elaborate on problems with commonly held adjacent positions.

I'd also suggest that strong longtermism is a meaningfully different assumption to e.g. transhumanism and total utilitarianism. In particular, the case for existential or extinction risk research seems many orders of magnitude weaker under a near-termist than strong longtermist worldview. Provided you think strong longtermism is at least credible, it seems reasonable to assume it when doing x-risk research, even though you should discount the impact of such interventions based on your credence in longtermism when making a final decision on where to allocate resources. If there is a risk that seems very likely to occur (e.g. AI, bio) such that it is plausible under both near-termist and longtermist grounds then perhaps it makes sense to drop this assumption, but even then I suspect it is often easier to just run two different analyses, given the different outcome metrics of concerns (e.g. % x-risk averted vs QALYs saved).

2017 Donor Lottery Report

Since writing this post, I have benefited both from 4 years of hindsight, and also significantly more grantmaking experience with just over a year at the long-term future fund. My main updates:

  • Exploit networks: I think small individual donors are often best off donating to people in their network that larger donors don't have access to. In particular I 70% believe it would have been better for me to wait 1-3 years and donate the money to opportunities as and when they came up. For example, there have been a few cases where something would help CHAI but couldn't be funded institutionally (for various bureaucratic or political reasons) -- I think we always managed to find a way to make it work, but me just having effectively discretionary funding would have made things simpler.
  • Efficient Markets in Grantmaking: When I wrote the post I tended to think the small orgs were getting overlooked by major donors, because it wasn't worth the time cost of evaluating. There's some truth to this, but I think more often the major donors actually had good reasons against wanting to fund the orgs more. -
  • Impact from Post: The post had less direct impact than I hoped, e.g. I haven't seen much analysis following on from it or heard of any major donations influenced by it. Although I've not tried very hard to track this, so I may have missed it. However, it did have a pretty big indirect impact, of making me more interested in grantmaking and likely helping me get a position on the long-term future fund. Notably you can write posts about what orgs are good to donate to even if you don't have $100k to donate... so I'd encourage people to do this if they have an interest in grantmaking, or scrutinize how good the grants made by existing grantmakers are. In general I'd like to see more discussion and diversity of opinions around where to make grants.
Long-Term Future Fund: May 2021 grant recommendations

Thanks for raising this. I think we communicated the grant decision to Po-Shen in late March/early April, when the pandemic was still significantly more active in the US. I was viewing this as "last chance to trial this idea", and I think I still stand by that given what we knew at the time, although I'd be less excited by this with the benefit of hindsight (the pandemic has developed roughly along my median expectations, but that still means I put significant credence on case rates being much higher than they currently are.)

In general our grant write-ups will appear at least a few weeks after the grant decision has actually been communicated to the grantee, as CEA needs to conduct due dilligence, we need to draft the write-up, have the write-up reviewed by the grantee, etc. For time-sensitive grants, the lag between making the grant and writing-up can be longer.

I'll also plug that the LTFF is one of the few funders that are able to make grants on short notice, so people with similarly ephemeral opportunities in the future should feel free to apply to us, there's an option in the application to flag it as time-sensitive.

How to PhD

Sorry for the (very) delayed reply here. I'll start with the most important point first.

But compared to working with a funder who, like you, wants to solve the problem and make the world be good, any of the other institutions mentioned including academia look extremely misaligned.

I think overall the incentives set up by EA funders are somewhat better than run-of-the-mill academic incentives, but I think the difference is smaller than you seem to believe, and I think we're a long way from cracking it. I think this is something we can get better at, but it's something that I expect will take significant infrastructure and iteration: e.g. new methods for peer review, experimenting with different granter-grantee relationships, etc.

Concretely, I think EA funders are really good (way better than most of academia or mainstream funders) at picking important problems like AI safety or biosecurity. I also think they're better at reasoning about possible theories of change (if this project succeeds, would it actually help?) and considering a variety of paths to impact (e.g. maybe a blog post can have more impact than a paper in this case, or maybe we'd even prefer to distribute some results privately).

However, I think most EA funders are actually worse at evaluating whether the research agenda is being executed well than the traditional academic structure. I help the LTFF evaluate grants, many of which are for independent research, and while I try to understand people's research agenda and how successful they've been, I think it's fair to say I spend at least an order of magnitude less time on this per applicant than someone's academic advisor.

Even worse, I have basically zero visibility into the process -- I only see the final write-up, and maybe have an interview with the person. If I see a negative result, it's really hard for me to tell if the person executed on the agenda well but the idea just didn't pan out, or if they bungled the process. Whereas I find it quite easy to form an opinion on projects I advise, as I can see the project evolve over time, and how the person responds to setbacks. Of course, we can (and do) ask for references, but if they're executing independently they may not have any, and there's always some CoI on advisors providing a reference.

Of course, when it comes to evaluating larger research orgs, funders can do a deeper dive and the stochasticity of research matters less (as it's averaged over a longer period of time). But this is just punting the problem to those who are running the org. In general I still think evaluating research output is a really hard problem.

I do think one huge benefit EA has is that people are mostly trying to "play fair", whereas in academia there is sadly more adversarial behavior (on the light side, people structuring their papers to dodge reviewer criticism; on the dark side, actual collusion in peer review or academic fraud). However, this isn't scalable, and I wouldn't want to build systems that rely on it.

In that more general comparison article, I think I may have still cautioned about academic incentives in particular. Because they seem, for lack of a better word, sneakier?

This is a fair point. I do think people kid themselves a bit about how much "academic freedom" they really have, and this can lead to people in effect internalizing the incentives more.

I've observed folks [...] behave as if they believe a research project to be directly good when I (and others) can't see the impact proposition, and the behavior feels best explained by publishing incentives.

Believing something is "directly good" when others disagree seems like a classic case of wishful thinking. There are lots of reasons why someone might be motivated to work on a project (despite it not, in fact, being "directly good"). Publication incentives are certainly a big one, and might well be the best explanation for the cases you saw. But in general I think it could also be that they just find that topic intellectually interesting, have been working on it for a while and are suffering from sunk cost fallacy, etc.

Load More