275 karmaJoined


Like Akash, I agree with a lot of the object-level points here and disagree with some of the framing / vibes. I'm not sure I can articulate the framing concerns I have, but I do want to say I appreciate you articulating the following points:

  • Society is waking up to AI risks, and will likely push for a bunch of restrictions on AI progress
    • Sydney and the ARC Captcha example have made AI safety stuff more salient. 
    • There's opportunity for substantially more worry about AI risk to emerge after even mild warning events (e.g. AI-powered cyber events, crazier behavior emerging during evals)
  • Society's response will be dumb and inefficient in a lot of ways, but could also end up getting pointed in some good directions
  • The more an org's AI development / deployment abilities are constrained by safety considerations (whether their own concerns or other stakeholders'), the more safety looks like just another thing you need in order to deploy your powerful AI systems, so that safety work becomes a complement to capabilities work.

There are other safety problems-- often ones that are more speculative-- that the market is not incentivizing companies to solve.


My personal response would be as follows: 

  1. As Leopold presents it, the key pressure here that keeps labs in check is societal constraints on deployment, not perceived ability to make money. The hope is that society's response has the following properties:
    1. thoughtful, prominent experts are attuned to these risks and demand rigorous responses
    2. policymakers are attuned to (thoughtful) expert opinion
    3. policy levers exist that provide policymakers with oversight / leverage over labs
  2. If labs are sufficiently thoughtful, they'll notice that deploying models is in fact bad for them! Can't make profit if you're dead. *taps forehead knowingly*
    1. but in practice I agree that lots of people are motivated by the tastiness of progress, pro-progress vibes, etc., and will not notice the skulls.

Counterpoints to 1:

Good regulation of deployment is hard (though not impossible in my view). 

  • reasonable policy responses are difficult to steer towards
  • attempts at raising awareness of AI risk could lead to policymakers getting too excited about the promise of AI while ignoring the risks
  • experts will differ; policymakers might not listen to the right experts

Good regulation of development is much harder, and will eventually be necessary.

This is the really tricky one IMO. I think it requires pretty far-reaching regulations that would be difficult to get passed today, and would probably misfire a lot. But doesn't seem impossible, and I know people are working on laying groundwork for this in various ways (e.g. pushing for labs to incorporate evals in their development process).

Sorry to hear about your experience! 

Which countries are at the top/bottom of the priority list to be funded? [And why?]

I think this is a great question, and I suspect it's somewhat under-considered. I looked into this a couple years ago as a short research project, and I've heard there hasn't been a ton more work on it since then. So my guess is that the reasoning might be somewhat ad-hoc or intuitive, but tries to take into account important factors like "size / important-seemingness of country for EA causes", talent pool for EA, and ease of movement-building (e.g. do we already have high-quality content in the relevant language). 

My guess is that:

  • There are some valuable nuances that could be included in assessments of countries, but are either not included or are done so inconsistently. 
    • For example, for a small-medium country like Romania it might be more useful to think of a national group as similar to a city group for the country's largest city, and Bucharest looks pretty promising to me based on a quick glance at its Wiki page -- but I wouldn't have guessed that if I hadn't thought to look it up. Whereas e.g. Singapore benefits from being a well-known world-class city. 
    • Similarly, it looks like Romania has a decent share of English-speakers (~30% or ~6 million) and they tend to be pretty fluent, but again I wouldn't have really known that beforehand. Someone making an ad-hoc assessment may not have thought to check those data sources, + might not have context on how to compare different countries (is 30% high? low?) .
  • The skills / personality of group members and leaders probably make up a large part of funders' assessments, but are kinda hard to assess if they don't have a long track record. But they probably need funding to get a track record in the first place! 
    • And intuitive assessments of leaders are probably somewhat biased against people who don't come from the assessor's context (e.g. have a different accent), though I hope and assume people at least try to notice & counteract that.   

Zero-bounded vs negative-tail risks

(adapted from a comment on LessWrong)

In light of the FTX thing, maybe a particularly important heuristic is to notice cases where the worst-case is not lower-bounded at zero. Examples:

  • Buying put options (value bounded at zero) vs shorting stock (unbounded)
  • Running an ambitious startup that fails is usually just zero, but what if it's committed funding & tied its reputation to lots of important things that will now struggle? 
  • More twistily -- what if you're committing to a course of action s.t. you'll likely feel immense pressure to take negative-EV actions later on, like committing fraud in order to save your company or pushing for more AI progress so you can stay in the lead?

Not that you should definitely not do things that potentially have large-negative downsides, but you can be a lot more willing to experiment when the downside is capped at zero.

Indeed, a good norm in many circumstances is to do lots of exploration and iteration. This is how science, software development, and most research happens. Things get a lot tricker when even this stage has potential deep harms -- as in research with advanced AI. (Or, more boundedly & fixably, infohazard risks from x- and s-risk reduction research.)


In practice, people will argue about what counts as effectively zero harm, vs nonzero. Human psychology, culture, and institutions are sticky, so  exploration that naively looks zero-bounded can have harm potential via locking in bad ideas or norms. I think that harm is often fairly small, but it might be both important and nontrivial to notice when it's large -- e.g., which new drugs are safe to explore for a particular person? caffeine vs SSRIs vs weed vs alcohol vs opioids... 


(Note that the "zero point" I'm talking about here is an outcome where you've added zero value to the world. I'm thinking of the opportunity cost of the time or money you invested as a separate term.)

Inside-view, some possible tangles this model could run into:

  • Some theories care about the morality of actions rather than states. But I guess you can incorporate that into 'states' if the history of your actions is included in the world-state -- it just makes things a bit harder to compute in practice, and means you need to track "which actions I've taken that might be morally meaningful-in-themselves according to some of my moral theories." (Which doesn't sound crazy, actually!)
  •  the obvious one: setting boundaries on "okay" states is non-obvious, and is basically arbitrary for some moral theories. And depending on where the boundaries are set for each theory, theories could increase or decrease in influence on one's actions. How should we think about okayness boundaries? 
    • One potential desideratum is something like "honest baragaining." Imagine each moral theory as an agent that sets its "okayness level" independent of the others, and acts to maximize good from its POV. Then the our formalism should  lead to each agent being incentivized to report its true views.  (I think this is a useful goal in practice, since I often do something like weighing considerations by taking turns inhabiting different moral views). 
      • I think this kind of thinking naturally leads to moral parliament models -- I haven't actually read the relevant FHI work, but I imagine it says a bunch of useful things, e.g. about using some equivalent of quadratic voting between theories. 
    • I think there's an unfortunate tradeoff here, where you either have arbitrary okayness levels or all the complexity of nuanced evaluations. But in practice maybe success maximization could function as the lower level heuristic (or middle level, between easier heuristics and pure act-utilitarianism) of a multi-level utilitarianism approach.

Speaking as a non-expert: This is an interesting idea, but I'm confused as to how seriously I should take it. I'd be curious to hear:

  1. Your epistemic status on this formalism. My guess is you're at "seems like a good cool idea; others should explore this more", but maybe you want to make a stronger statement, in which case I'd want to see...
  2. Examples! Either a) examples of this approach working well, especially handling weird cases that other approaches would fail at. Or, conversely, b) examples of this approach leading to unfortunate edge cases that suggest directions for further work.

I'm also curious if you've thought about the parliamentary  approach to moral uncertainty, as proposed by some FHI folks. I'm guessing there are good reasons they've pushed in that direction rather than more straightforward "maxipok with p(theory is true)", which makes me think (outside-view) that there are probably some snarls one would run into here. 

Ah, sorry, I was thinking of Tesla, where Musk was an early investor and gradually took a more active role in the company.

In February 2004, the company raised $7.5 million in series A funding, including $6.5 million from Elon Musk, who had received $100 million from the sale of his interest in PayPal two years earlier. Musk became the chairman of the board of directors and the largest shareholder of Tesla.[15][16][13] J. B. Straubel joined Tesla in May 2004 as chief technical officer.[17]

A lawsuit settlement agreed to by Eberhard and Tesla in September 2009 allows all five – Eberhard, Tarpenning, Wright, Musk, and Straubel – to call themselves co-founders.

I think it's reasonable and often useful to write early-stage research in terms of one's current weak best guess, but this piece makes me worry that you're overconfident or not doing as good a job as you could of mapping out uncertainties. The most important missing point, I'd say, is effects on AI / biorisk (as Linch notes). There's also the lack of (or inconsistent treatment of) counterfactual impact of businesses, as I mention in my other comment. 

Also, a small point, but given the info you linked, calling Oracle "universally reviled" seems too strong. This kind of rhetorical flourish makes me worry that you're generally overconfident or not tracking truth as well as you could be. 


The market value of Amazon is circa $1T, meaning that it has managed to capture at least that much value, and likely produced much more consumer surplus. 

I'm confused about your assessment of Bezos, and more generally about how you assess value creation via businesses. 

My core concern here is counterfactual impact. If Bezos didn't exist, presumably another Amazon-equivalent would have come into existence, perhaps several years later. So he doesn't get full credit for Amazon existing, but rather for such an org existing for a few more years. And maybe for it being predictably better or worse than counterfactual competitors, if we can think of any predictable effects there.  

Both points (competitor catch-up and trajectory change) also apply to the Google cofounders, though maybe there's a clearer story for their impact via e.g. Google providing more free high-quality services (like GDocs) than competitors like Yahoo likely would have, had they been in the lead. 

For companies that don't occupy a 'natural niche' but rather are idiosyncratic, it seems more reasonable to evaluate the founder's impact based on something like the company's factual value creation, and not worry about counterfactuals. Examples might be Berkshire Hathaway and some of Elon's companies, esp Neuralink and the Boring Company. (SpaceX has had a large counterfactual effect, but Elon didn't start it; not sure how to evaluate his effect on the space launch industry.) I'd be interested in a counterfactual analysis of Tesla's effect on e.g. battery cost and electric vehicle growth trend in the US / world. (My best guess is it's a small effect, but maybe it's a moderately important one.)

Load more