All of Zach Stein-Perlman's Comments + Replies

See https://ea-internships.pory.app/board, you can filter for volunteer.

It would be helpful to mention if you have background or interest in particular cause areas.

1
Clay
1d
Thank you! This was really helpful

I'm annoyed at vague "value" questions. If you ask a specific question the puzzle dissolves. What should you do to make the world go better? Maximize world-EV, or equivalently maximize your counterfactual value (not in the maximally-naive way — take into account how "your actions" affect "others' actions"). How should we distribute a fixed amount of credit or a prize between contributors? Something more Shapley-flavored, although this isn't really the question that Shapley answers (and that question is almost never relevant, in my possibly controversial op... (read more)

My current impression is that there is no mechanism and funders will do whatever they feel like and some investors will feel misled...

I now agree funders won't really lose out, at least.

Hmm. I am really trying to fill in holes, not be adversarial, but I mostly just don't think this works.

the funder probably recognizes some value in [] the projects the investors funded that weren't selected for retrofunding

No. If the project produces zero value, then no value for funder. If the project produces positive value, then it's retrofunded. (At least in the simple theoretical case. Maybe in practice small-value projects don't get funded. Then profit-seeking investors raise their bar: they don't just fund everything that's positive-EV, only stuff t... (read more)

2
Jason
21d
Scott Alexander has stated that: "Since most people won’t create literally zero value, and I don’t want to be overwhelmed with requests to buy certificates for tiny amounts, I’m going to set a limit that I won’t buy certificates that I value at less than half their starting price." I'm not sure exactly what "starting price" means here, but one could envision a rule like this causing a lot of grants which the retrofunder would assign some non-trivial value to nevertheless resolving to $0 value. It's impossible to optimize for all potential virtues at once, though. I'm actually a bit of a self-professed skeptic of impact markets, but I am highly uncertain about how much to value the error-correction possibility of impact markets in mitigating the effects of initial misjudgments of large grantmakers. One can imagine a market with conditions that are might be more favorable to the impact-market idea than longtermism. Suppose we had a market in upstart global health charities that resolves in five years. By default, the major funders will give to GiveWell-recommended charities. If a startup proves more effective than that baseline after five years, its backers win extra charitable dollars to "spend" on future projects (possibly capped?) -- but the major funders get to enjoy increased returns going forward because they now have an even better place to put some of their funds.  And this scheme would address a real problem -- it is tough to get funding for an upstart in global health because the sure-thing charities are already so good, yet failure to adequately form new charities in response to changing global conditions will eventually mean a lot of missed opportunities. In contrast, it is somewhat more likely that a market in longtermist interventions is a solution in search of a problem severe enough to deal with the complications and overhead costs. I do agree that collusion / manipulation is a real concern here. By analogy, if you've looked at the history of Manif

This is ultimately up to retro funders, and they each might handle cases like this differently.

Oh man, having the central mechanism unclear makes me really uncomfortable for the investors. They might invest reasonably, thinking that the funders would use a particular process, and then the funders use a less generous process...

In my opinion, by that definition of true value which is accounting for other opportunities and limited resources, they should just pay $100 for it. If LTFF is well-calibrated, they do not pay any more (in expectation) in the impact m

... (read more)

Actually I'm confused again. Suppose:

Bob has a project idea. The project would cost $10. A funder thinks it has a 99% chance of producing $0 value and a 1% chance of producing $100 value, so its EV is $1, and that's less than its cost, so it's not funded in advance. A super savvy investor thinks the project has EV > $10 and funds it. It successfully produces $100 value.

How much is the funder supposed to give retroactively?


I feel like ex-ante-funder-beliefs are irrelevant and the right question has to be "how much would you pay for the project if you kne... (read more)

3
Rachel Weinberg
24d
(This is ultimately up to retro funders, and they each might handle cases like this differently.) In my opinion, by that definition of true value which is accounting for other opportunities and limited resources, they should just pay $100 for it. If LTFF is well-calibrated, they do not pay any more (in expectation) in the impact market than they do with regular grantmaking, because 99% of project like this will fail, and LTFF will pay nothing for those. So there is still the same amount of total surplus, but LTFF is only paying for the projects that actually succeeded. I think irl, the “true value” thing you’re talking about is still dependent on real wages, because it’s sensitive to the other opportunities that LTFF has which are funding people’s real wages. There's a different type of "true value", which is like how much would the free market pay for AI safety researchers if it could correctly account for existential risk reduction which is an intergenerational public good. If they tried to base valuations on that, they'd pay more in the impact market than they would with grants.

Ah, hooray! This resolves my concerns, I think, if true. It's in tension with other things you say. For example, in the example here, "The Good Foundation values the project at $18,000 of impact" and funds the project for $18K. This uses the true-value method rather than the divide-by-P(success) method.

In this context "project's true value (to a funder) = $X" means "the funder is indifferent between the status quo and spending $X to make the project happen." True value depends on available funding and other available opportunities; it's a marginal analysis question.

2
Zach Stein-Perlman
24d
Actually I'm confused again. Suppose: Bob has a project idea. The project would cost $10. A funder thinks it has a 99% chance of producing $0 value and a 1% chance of producing $100 value, so its EV is $1, and that's less than its cost, so it's not funded in advance. A super savvy investor thinks the project has EV > $10 and funds it. It successfully produces $100 value. How much is the funder supposed to give retroactively? ---------------------------------------- I feel like ex-ante-funder-beliefs are irrelevant and the right question has to be "how much would you pay for the project if you knew it would succeed." But this question is necessarily about "true value" rather than covering the actual costs to the project-doer and giving them a reasonable wage. (And funders have to use the actual-costs-and-reasonable-wage stuff to fund projects for less than their "true value" and generate surplus.)

I agree this would be better — then the funders would be able to fund Alice's project for $1 rather than $10. But still, for projects that are retroactively funded, there's no surplus-according-to-the-funder's-values, right?

4
Jason
24d
That's a valid concern. The traditional form of surplus I had in mind [edit: might not be; see my response to Rachel about the proper util-to-$ conversion factor] there any more. However, the funder probably recognizes some value in (1) the projects the investors funded that weren't selected for retrofunding, and (2) aligned investors likely devoting their "profits" on other good projects (if the market is set up to allow charitable reinvestment only, rather than withdrawal of profits -- I suspect this will likely be the case for tax reasons). If those gains aren't enough for the retrofunder, it could promise 100% payment up to investment price, but only partial payment of impact over the investment price -- thus splitting the surplus between itself and the investor in whatever fraction seems advisable.
5
Rachel Weinberg
24d
I think there is still surplus-according-to-the-funders-values in this impact market specifically, just as much as there is with regular grants. Retro funders were not asked to assign valuations based on their "true values" where maybe 1 year of good AI safety research is worth in the 7 figures (though what this "true value" thing would even mean I do not quite understand). Instead, they were asked to "operate on a model where they treat retrospective awards the same as prospective awards, multiplied by a probability of success." So they get the same surplus as usual, just with more complete information when deciding how much to pay.

Related, not sure: maybe it's OK if the funder retroactively gives something like cost ÷ ex-ante-P(success). What eliminates the surplus is if the funder retroactively gives ex-post-value.

Edit: no, this mechanism doesn't work. See this comment.

Yes. Rather than spending $1 on a project worth $10, the funder is spending $10 on the project — so the funder's goals aren't advanced. (Modulo that the retroactive-funding-recipients might donate their money in ways that advance the funder's goals.)

2
Zach Stein-Perlman
24d
Related, not sure: maybe it's OK if the funder retroactively gives something like cost ÷ ex-ante-P(success). What eliminates the surplus is if the funder retroactively gives ex-post-value. Edit: no, this mechanism doesn't work. See this comment.

Thanks.

So if project-doers don't sell all of their equity, do they get retroactive funding for the rest, or just moral credit for altruistic surplus? The former seems very bad to me. To illustrate:

Alice has an idea for a project that would predictably [produce $10 worth of impact / retrospectively be worth $10 to funders]. She needs $1 to fund it. Under normal funding, she'd be funded and there'd be a surplus worth $9 of funder money. In the impact market, she can decline to sell equity (e.g. by setting the price above $10 and supplying the $1 costs herself) and get $10 retroactive funding later, capturing all of the surplus.

The latter... might work, I'll think about it.

1
Lily Jordan
24d
They'd get retroactive funding for the rest, yes. When you say it seems very bad, do you mean because then LTFF (for example) has less money to spend on other things, compared to the case where they just gave the founder a (normal, non-retroactive) grant for the estimated cost of the project?

Oh wait I forgot about the details at https://manifund.org/about/impact-certificates. Specific criticism retracted until learning more; skepticism remains. What happens if a project is funded at a valuation higher than its funding-need? If Alice's project is funded for $5, where does $4 go?

3
Lily Jordan
24d
If the project is funded at a valuation of $5, it wouldn’t necessarily receive $5 – it would receive whatever percentage of $5 the investor bought equity in. So if the investor bought 80%, the project would receive $4; if the investor bought 20%, the project would receive $1. If Alice didn’t think she could put the extra dollar on top of the first four to use, then she presumably wouldn’t sell more than $4 worth of equity, or 80%, because the purpose of selling equity is to receive cash upfront to cover your immediate costs. (Almost everything about impact markets has a venture-capital equivalent – for example, if an investor valued your company at $10 million, you might sell them 10% equity for $1 million – you wouldn't actually sell them all $10 million worth if $1 million gave you enough runway.) On Manifund itself, the UI doesn't actually provide an option to overfund a project beyond its maximum goal, but theoretically this isn't impossible. But there's not much of an incentive for a project founder to take that funding, unless they're more pessimistic about their project's valuation than investors are; otherwise, it's better for them to hold onto the equity. (And if the founder is signaling pessimism in their own valuation, then an investor might be unwise to offer to overfund in the first place.) Does that answer your question?

Designing an impact market well is an open problem, I think. I don't think your market works well, and I think the funders were mistaken to express interest. To illustrate:

Alice has an idea for a project that would predictably [produce $10 worth of impact / retrospectively be worth $10 to funders]. She needs $1 to fund it. Under normal funding, she'd be funded and there'd be a surplus worth $9 of funder money. In the impact market, whichever investor reads and understands her project first funds it and later gets $10.

More generally, in your market, all sur... (read more)

4
Linch
24d
(speaking for myself) I had an extended discussion with Scott (and to a lesser extent Rachel and Austin) about the original proposed market mechanism, which iiuc hasn't been changed much since.  I'm not particularly worried about final funders losing out here, if anything I remember being paternalistically worried that the impact "investors" don't know what they're getting into, in that they appeared to be taking on more risks without getting a risk premium. But if the investors, project founders, Manifold, etc, are happy to come to this arrangement with their eyes wide open, I'm not going to be too paternalistic about their system.  I do broadly agree with something like "done is better than perfect," and that it seemed better to get this particular impact market out the gate than to continue to debate the specific mechanism design. That said, theoretically unsound mechanisms imo have much lower value-of-information, since if they fail it'd be rather unclear whether impact markets overall don't work vs this specific setup won't. 
2
Jason
24d
I think this is a valid concern, but one that can probably be corrected by proper design. The potential problem occurs if the investors get a shot at the projects before the retrofunders do. Some projects are pretty clearly above the funder's bar in expectancy. If investors get the first crack at those projects, the resultant surplus should in theory be consumed by the investors. If you'd rather have the retrofunders picking winners, that's not a good thing. Here, at least as far as ACX grant program ("ACX") is concerned (unsure about other participants), the funder has already had a chance to fund the proposals (= a chance to pick out the proposals with surplus). It passed on that, which generally implies a belief that there was no surplus. If Investor Ivan does a better job predicting the ACX-assigned future impact than ACX itself, then that is at least some evidence that Investor Ivan is a better grantmaker than ACX itself. Even evaluated from a very ACX-favorable criterion standard (i.e., ACX's own judgment at time2), Ivan outperformed ACX in picking good grants. Now, I tend to be skeptical that investors be able to better predict the retrofunders' views at time2 than the retrofunders themselves. Thus, as long as retrofunders are given the chance to pick off clear winners ahead of time, it seems unlikely investors will break even. The third possibility is that investors and retrofunders can buy impact certificates at the same time, bidding against each other.  in that scenario, I believe the surplus might go to the best grantee candidates, which could cause problems of its own. But investors shouldn't be in an advantaged position if that possibility either; they can only "win" to the extent they can outpredict the retrofunder. (COI note: am micrograntor with $500 budget)
1
DC
24d
I recommend asking clarifying questions to reduce confusion before confidently expressing what turn out to be at least in part, spurious criticisms. I guarantee you it's not fun for the people announcing their cool new project to receive.
2
Zach Stein-Perlman
25d
Oh wait I forgot about the details at https://manifund.org/about/impact-certificates. Specific criticism retracted until learning more; skepticism remains. What happens if a project is funded at a valuation higher than its funding-need? If Alice's project is funded for $5, where does $4 go?
  • Ideally powerful AI will enable something like reflection rather than locking in prosaic human values or our ignorant conceptions of the good.
  • Cosmopolitan values don't come free.
  • The field of alignment is really about alignability, not making sure "the right people control it." That's a different problem.

My favorite AI governance research since this post (putting less thought into this list):

  1. Responsible Scaling Policies (METR 2023)
  2. Deployment corrections (IAPS: O'Brien et al. 2023)
  3. Open-Sourcing Highly Capable Foundation Models (GovAI: Seger et al. 2023)
  4. Do companies’ AI Safety Policies meet government best practice? (CFI: Ó hÉigeartaigh et al. 2023)
  5. AI capabilities can be significantly improved without expensive retraining (Davidson et al. 2023)

I mostly haven't really read recent research on compute governance (e.g. 1, 2) or international governance (e.g. 1, ... (read more)

I appreciate it; I'm pretty sure I have better options than finishing my Bachelor's; details are out-of-scope here but happy to chat sometime.

TLDR: AI governance; maybe adjacent stuff.

Skills & background: AI governance research; email me for info on my recent work.

Location: flexible.

LinkedIn: linkedin.com/in/zsp/.

Email: zacharysteinperlman at gmail.

Other notes: no college degree.

  1. I've left AI Impacts; I'm looking for jobs/projects in AI governance. I have plenty of runway; I'm looking for impact, not income. Let me know if you have suggestions!
    1. (Edit to clarify: I had a good experience with AI Impacts.)
  2. PSA about credentials (in particular, a bachelor's degree): they're important even for working in EA and AI safety.
    1. When I dropped out of college to work on AI safety, I thought credentials are mostly important as evidence-of-performance, for people who aren't familiar with my work, and are necessary in high-bureaucracy insti
... (read more)
6
Linch
3mo
Might be out-of-scope for this shortform, but have you considered/are you able to go back to Williams? My impression is that you did well there and (unlike many EAs) you enjoyed your experience so it'd be less costly than for many. 

You don't need EA or AI safety motives to explain the event. Later reporting suggested that it was caused by (1) Sutskever and other OpenAI executives telling the board that Altman often lied (WSJ, WaPo, New Yorker) and (2) Altman dishonestly attempting to remove Toner from the board (over the obvious pretext that her coauthored paper Decoding Intentions was too critical of OpenAI, plus allegedly falsely telling board members that McCauley wanted Toner removed) (NYT, New Yorker). As far as I know, there's ~no evidence that EA or AI safety motives were rele... (read more)

Thanks!

General curiosity. Looking at it, I'm interested in my total-hours and karma-change. I wish there was a good way to remind me of... all about how I interacted with the forum in 2022, but wrapped doesn't do that (and probably ~can't do it; probably I should just skim my posts from that year...)

Cool. Is it still possible to see my 2022 wrapped?

6
Sarah Cheng
4mo
Currently we've left it up here but we plan to remove it soon to clean up our codebase. I'm curious why you want to see last year's?

I object to your translation of actual-votes into approval-votes and RCV-votes, at least in the case of my vote. I gave almost all of my points to my top pick, almost all of the rest to my second pick, almost all of the rest to my third pick, and so forth until I was sure I had chosen something that would make top 3. But e.g. I would have approved of multiple. (Sidenote: I claim my strategy is optimal under very reasonable assumptions/approximations. You shouldn't distribute points like you're trying to build a diverse portfolio.)

8
Lizka
4mo
Thanks! I agree that the approach you describe is optimal under very reasonable assumptions, but I think in practice few people used it (the median ratio between someone's top choice and their second choice was 2, the mean if you throw out one outlier was ~20; only 7 people voted for at least 2 candidates and had ratios between their top two that were at least 20). Moreover, we had some[1] voters who didn't vote the way you describe, but who did assign a fairly big number of projects similar small point values — I think kind of throwing in some points for charities they don't favor that much, and I didn't want to overweight their votes in the way I tallied the RCV-translated (or approval-translated) scores.  Still, I agree that my translations are bad — I should at least represent scores from people who basically approximated RCV in the current voting method the way they would be counted in RCV. I might try this (and think about what translation actually makes sense — just the top 10 charities people voted for?) later, but might not prioritize doing it.  For approval voting, you could also just look at the number of voters who gave a charity any (positive) number of points; these counts are included in this post and wouldn't have changed the top 3.  1. ^ Quickly estimating: there were 90 voters who voted for at least 4 candidates whose last two votes differed by a ratio of less than 1.5. There were 27 if instead of requiring at least 4 candidates, you require that the smallest point assignment is <5.  Or looking at it another way: across all ratios (across all voters) between what a given voter gave the candidate they ranked N and the candidate they ranked N+1, if we remove only the top 1 percentile of ratios (removing because a few people did use an approximation of RCV - equivalent in this case to removing ratios higher than 100:1), the mean is 2. Across all ~12K ratios, about 500 are exactly 1. 

we are convinced this push towards decentralization will make the EA ecosystem more resilient and better enable our projects to pursue their own goals.

I'm surprised. Why? What was wrong with the EV sponsorship system?

(I've seen Elizabeth's and Ozzie's posts on this topic and didn't think the downsides of sponsorship were decisive. Curious which downsides were decisive for you.)

[Edit: someone offline told me probably shared legal liability is pretty costly.]

Jason
4mo49
10
0
5
4

There's no inherent contradiction between the current sponsorship program isn't a good fit for this sponsor and these sponsored orgs and fiscal sponsorship is often a good thing.

There are a number of specific reasons I could see that this wasn't a good fit:

  • EV's reputation took damage due to the FTX scandal, particularly with the Charity Commission
  • We don't know what action the CC will take, but -- either by CC mandate or "voluntary" action -- EV's board will likely have to be significantly more involved in monitoring sponsored orgs than another sponsor woul
... (read more)

My take is basically that (a) the projects have been run so independently that there was minimal benefit from being within the same legal entity, (b) organizations with very different legal risk profiles sharing a legal entity requires either excessive caution from some or excessive risk to others, and (c) board oversight is important for nonprofits and overseeing so many independent projects with their own CEOs didn't make sense for part-time volunteer board members.

(Also seconding Elizabeth's post.)

  1. Yep, AI safety people tend to oppose sharing model weights for future dangerous AI systems.
  2. But it's not certain that (operator-aligned) open-source powerful AI entails doom. To a first approximation, it entails doom iff "offense" is much more efficient than "defense," which depends on context. But absent super monitoring to make sure that others aren't making weapons/nanobots/whatever, or super efficient defenses against such attacks, I intuit that offense is heavily favored.

An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.

A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.

I don't know how we can pursue that goal.

I don't want to try to explain now, sorry.

(This shortform was intended more as starting-a-personal-list than as a manifesto.)

What's the best thing to read on "Zvi's writing on EAs confusing the map for the territory"? Or at least something good?

1
Quadratic Reciprocity
5mo
I'm not sure what would be the best thing since I don't remember there being a particular post about this. However, he talks about it in his book review for Going Infinite and I also like his post on Altruism is Incomplete. Lots of people I know find his writing confusing though and it's not like he's rigorously arguing for something. When I agree with Zvi, it's usually because I have had that belief in the back of my mind for a while and him pointing it out makes it more salient, rather than because I got convinced by a particular argument he was making. 

Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I'll elaborate on all this in a top-level post.

Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.

Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren't cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.

Or me. I'm veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.

I certainly haven't done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention... (read more)

(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don't really want to get into this now.)

Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.

Thanks.

Some people say things like "my doom-credence fluctuates between 10% and 25% day to day"; this is dutch-book-able and they'd make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.

5
AnonymousTurtle
5mo
This is dutch-book-able only if there is no bid-ask spread. A rational choice in this case would be to have a very wide bid-ask spread. E.g. when Holden Karnofsky writes that his P(doom) is between 10% and 90%, I assume he would bet for doom at 9% or less, bet against doom at 91% or more, and not bet for 0.11<p<0.89. This seems a very rational choice in a high-volatility situation where information changes extremely quickly. (As an example, IIRC the bid-ask spread in financial markets increases right before earnings are released).
4
MichaelStJules
5mo
Hmm, okay. So, for example, when they’re below 15%, you bet that it will happen at odds matching 15% against them, and when they’re above 20%, you bet that it won't happen at 20% against them. And just make sure to size the bets right so that if you lose one bet, your payoff is higher in the other, which you'd win. They "give up" the 15-20% range for free to you. Still, maybe they just mean to report the historical range or volatility of their estimates? This would be like reporting the historical volatility of a stock. They may not intend to imply, say, that they'll definitely fall below 15% at some point and above 20% at another. Plus, picking one way to average may seem unjustifiably precise to them. The average over time is one way, but another is the average over relatively unique (clusters) of states of mind, e.g. splitting weight equally between good, ~neutral and bad moods, averages over possible sets of value assignments for various parameters. There are many different reasonable choices they can make, all pretty arbitrary.

Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:

Meta-uncertainty:

  • Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
    • But thinking in terms of probabilities over probabilities is sometimes useful, e.g. you have a probability distribution over possible worlds/models and those worlds/models are probabilistic
  • Unstable beliefs about stuff
... (read more)
2
DC
5mo
What problems are you thinking of in particular?
4
Quadratic Reciprocity
5mo
Strong upvoted and couldn't decide whether to disagreevote or not. I agree with the points you list under meta-uncertainty and your point on naively using calibration as a proxy for forecasting ability + thinking you can bet on the end of the world by borrowing money. I disagree with your thoughts on ethics (I'm sympathetic to Zvi's writing on EAs confusing the map for the territory).
5
AnonymousTurtle
5mo
Thank you for writing this. I share many of these, but I'm very uncertain about them. Here it is: I think this is rational, I think of probabilities in terms of bets and order books. I think this is close to my view, and the analogy of financial markets is not irrelevant. Changing literally day-to-day seems extreme, but month-to-month seems very reasonable given the speed of everything that's happening, and it matches e.g. the volatility of NVIDIA stock price. To me, "utilitarianism" seems pretty general, as long as you can arbitrarily define utility and you can arbitrarily choose between Negative/Rule/Act/Two-level/Total/Average/Preference/Classical utilitarianism. I really liked this section of a recent talk by Toby Ord (Starting from "It starts by observing that the three main traditions in Western philosophy each emphasize a different focal point:"). (I also don't know if axiology is the right word for what we want to express here, we might be talking past each other) I mostly agree with you, but second order effects seem hard to evaluate and both costs and benefits are so minuscule (and potentially negative) that I find it hard to do a cost-benefit-analysis. I agree with you, but for some it might be an instrumentally useful intentional framing. I think some use phrases like "[Personal flourishing] for its own sake, for the sake of existential risk." (see also this comment for a fun thought experiment for average utilitarians, but I don't think many believe it) Some think the probability of extinction per century is only going up with humanity increasing capabilities, and are not convinced by arguments that we'll soon reach close-to-speed-of-light travel which will make extinction risk go down. See also e.g. Why I am probably not a longtermist (except point 1). I find this very reasonable. I agree, I think this makes a ton of sense for people in community building that need to work with many cause areas (e.g. CEA staff, Peter Singer), but I fear that it
5
MichaelStJules
5mo
Are you convinced the costs outweigh the benefits? It may be good for important instrumental reasons, e.g. reducing cognitive dissonance about sentience and moral weights, increasing the day-to-day salience of moral patients with limited agency or power (which could be an important share of those in the future), personal integrity or virtue, easing cooperation with animal advocates (including non-consequentialist ones), maybe health reasons.
2
MichaelStJules
5mo
(Not totally sure what you mean here.) I think the portfolio items are justified on the basis of distinct worldviews, which differ in part based on their normative commitments (e.g. theories of welfare like hedonism or preference views, moral weights, axiology, decision theory, epistemic standards, non-consequentialist commitments) across which there is no uniquely justified universal common scale. People might be doing this pretty informally or deferring, though.   I think this can make sense if you have imprecise credences or normative uncertainty (for which there isn't a uniquely justified universal common scale across views). Specifically, if you're unable to decide whether action A does net good or net harm (in expectation), because it does good for cause X and harm for cause Y, and the two causes are too hard to compare, it might make sense to offset. Portfolios can be (more) robustly positive than the individual acts. EDIT: But maybe you find this too difference-making?

(meta musing) The conjunction of the negations of a bunch of statements seems a bit doomed to get a lot of disagreement karma, sadly. Esp. if the statements being negated are "common beliefs" of people like the ones on this forum.

I agreed with some of these and disagreed with others, so I felt unable to agreevote. But I strongly appreciated the post overall so I strong-upvoted.

Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities

This is just straightforwardly correct statistics.  For example, ask a true bayesian to estimate the outcome of flipping a coin of unknown bias, and they will construct a probability distribution of coin flip probabilites, and only reduce this to a single probability when forced to make a bet. But when not taking a bet, they should be doing update... (read more)

6
MichaelStJules
5mo
When I do this, it's because I'm unable or unwilling to assign a probability distribution over the probabilities, so it won't reduce to simple (precise) probabilities. Actually, in general, I think precise probabilities are epistemically unjustified (e.g. Schoenfield, 2012, section 3), but I'm willing to use more or less precise probabilities depending on the circumstances. I'm not sure if I'd claim to have such unstable beliefs myself, but if you're trying to be very precise with very speculative, subjective and hard-to-specifically-defend probabilities, then I'd imagine they could be very unstable, and influenced by things like your mood, e.g. optimism and pessimism bias. That is, unless you commit to your credences even if you'd had formed different ones if you had started from scratch or you make arbitrary choices in forming them that could easily have gone differently. You might weigh the same evidence or arguments differently from one day to the next. I'd guess most people would also have had at least slightly different credences on AI timelines if they had seen the same evidence or arguments in a different order, or were in a different mood when they were forming their credences or building models, or for many other different reasons. Some number or parameter choices will come down to intuition, and intuition can be unstable. I don't think people are fluctuating predictably (dutch-book-ably). How exactly they'd change their minds or even the direction is not known to them ahead of time. (But maybe you could Dutch book people by predicting their moods and so optimism and pessimism bias?)

I have high credence in basically zero x-risk after [the time of perils / achieving technological maturity and then stabilizing / 2050]. Even if it was pretty low, "pretty low" * 10^70 ≈ 10^70. Most value comes from the worlds with extremely low longterm rate of x-risk, even if you think they're unlikely.

(I expect an effective population much much larger than 10^10 humans, but I'm not sure "population size" will be a useful concept (e.g. maybe we'll decide to wait billions of years before converting resources to value), but that's not the crux here.)

6
Vasco Grilo
5mo
Meta point. I would be curious to know why my comment was downvoted (2 karma in 4 votes without my own vote). For what is worth, I upvoted all your comments upstream my comment in this thread because I think they are valuable contributions to the discussion. By "basically zero", you mean 0 in practice (e.g. for EV calculations)? I can see the above applying for some definitions of time of perils and technological maturity, but then I think they may be astronomically unlikely. I think it is often the case that people in EA circles are sensitive to the possibility of astronomical upside (e.g. 10^70 lives), but not to astronomically low chance of achieving that upside (e.g. 10^-60 chance of achieving 0 longterm existential risk). I explain this by a natural human tendency not to attribute super low probabilities for events whose mechanics we do not understand well (e.g. surviving the time of perils), such that e.g. people would attribute similar probabilities to a cosmic endowment of 10^50 and 10^70 lives. However, these may have super different probabilities for some distributions. For example, for a Pareto distribution (a power-law), the probability density of a given value is proportional to "value"^-(alpha + 1). So, for a tail index of alpha = 1, a value of 10^70 is 10^-40 (= 10^(-2*(70 - 50))) as likely as a value of 10^50. So intuitions that the probability of 10^50 value is similar to that of 10^70 value would be completely off. One can counter my particular example above by arguing that a power law is a priori implausible, and that we should use a more uninformative prior like a loguniform distribution. However, I feel like the choice of the prior would be somewhat arbitrary. For example, the upper bound of the prior loguniform distribution would be hard to define, and would be the major driver of the overall expected value. I think we should proceed with caution if prioritisation is hinging on decently arbitrary choices informed by almost no empirical eviden

It takes like 20 hours of focused reading to get basic context on AI risk and threat models. Once you have that, I feel like you can read everything important in x-risk-focused AI policy in 100 hours. Same for x-risk-focused AI corporate governance, AI forecasting, and macrostrategy.

[Edit: read everything important doesn't mean you have nothing left to learn; it means something like you have context to appreciate ~all papers, and you can follow ~all conversations in the field except between sub-specialists, and you have the generators of good overviews lik... (read more)

Here are some of the curricula that HAIST uses:

The HAIST website also has a resources tab with lists of technical and policy papers. 

2
Ariel Simnegar
5mo
+1 to the interest in these reading lists. Because my job is very time-consuming, I haven’t spent much time trying to understand the state of the art in AI risk. If there was a ready-made reading list I could devote 2-3 hours per week to, such that it’d take me a few months to learn the basic context of AI risk, that’d be great.

I disagree-voted because I feel like I've done much more than 100-hours of reading on AI Policy (including finishing the AI Safety Fundamentals Governance course) and still have a strong sense there's a lot I don't know, and regularly come across new work that I find insightful. Very possibly I'm prioritising reading the wrong things (and would really value a reading list!) but thought I'd share my experience as a data point. 

9
Will Aldred
5mo
For what it's worth, I found your "AI policy ideas: Reading list" and "Ideas for AI labs: Reading list" helpful,[1] and I've recommended the former to three or four people. My guess would be that these reading lists have been very helpful to a couple or a few people rather than quite helpful to lots of people, but I'd also guess that's the right thing to be aiming for given the overall landscape. I expect there's no good reason for this, and that it's simply because it's nobody's job to make such reading lists (as far as I'm aware), and the few(?) people who could make good intermediate-to-advanced level readings lists either haven't thought to do so or are too busy doing object-level work? 1. ^ Helpful in the sense of: I read or skimmed the readings in those lists that I hadn't already seen, which was maybe half of them, and I think this was probably a better use of my time than the counterfactual.

(I agree that the actual ratio isn't like 10^20. In my view this is mostly because of the long-term effects of neartermist stuff,* which the model doesn't consider, so my criticism of the model stands. Maybe I should have said "undervalue longterm-focused stuff by a factor of >10^20 relative to the component of neartermist stuff that the model considers.")

*Setting aside causing others to change prioritization, which it feels wrong for this model to consider.

Thanks. I respect that the model is flexible and that it doesn't attempt to answer all questions. But at the end of the day, the model will be used to "help assess potential research projects at Rethink Priorities" and I fear it will undervalue longterm-focused stuff by a factor of >10^20.

I believe Marcus and Peter will release something before long discussing how they actually think about prioritization decisions.

AFAICT, the model also doesn't consider far future effects of animal welfare and GHD interventions. And against relative ratios like >10^20 between x-risk and neartermist interventions, see:

  1. https://reducing-suffering.org/why-charities-dont-differ-astronomically-in-cost-effectiveness/
  2. https://longtermrisk.org/how-the-simulation-argument-dampens-future-fanaticism

I haven't engaged with this. But if I did, I think my big disagreement would be with how you deal with the value of the long-term future. My guess is your defaults dramatically underestimate the upside of technological maturity (near-lightspeed von neumann probes, hedonium, tearing apart stars, etc.) [edit: alternate frame: underestimate accessible resources and efficiency of converting resources to value], and the model is set up in a way that makes it hard for users to fix this by substituting different parameters.

The significance of existential risk dep

... (read more)
5
Vasco Grilo
5mo
Hi Zach, Note such astronomical values require a very low longterm existential risk. For the current human population of ~ 10^10, and current life expectancy of ~ 100 years, one would need a longterm existential risk per century of 10^-60 (= 10^(70 - 10)) to get a net present value of 10^70 human lives. XPT's superforecasters and experts guessed a probability of human extinction by 2100 of 1 % and 6 %, so I do not think one can be confident that longterm existential risk per century will be 10^-60. One can counter this argument by suggesting the longterm future population will also be astronomicaly large, instead of 10^10 as I assumed. However, for that to be the case, one needs a long time without an existential catastrophe, which again requires an astronomically low longterm existential risk. In addition, it is unclear to me how much cause prioritization depends on the size of the future. For example, if one thinks decelerating/accelerating economic growth affects AI extinction risk, many neatermist interventions would be able to meaningully decrease it by decelerating/accelerating economic growth. So the cost-effectiveness of such neartermist interventions and AI safety interventions would not differ by tens of orders of magnitude. Brian Tomasik makes related points in the article Michael linked below.

I think you're right that we don't provide a really detailed model of the far future and we underestimate* expected value as a result. It's hard to know how to model the hypothetical technologies we've thought of, let alone the technologies that we haven't. These are the kinds of things you have to take into consideration when applying the model, and we don't endorse the outputs as definitive, even once you've tailored the parameters to your own views.

That said, I do think the model has a greater flexibility than you suggest. Some of these options are hidd... (read more)

This was the press release; the actual order has now been published.

One safety-relevant part:

4.2.  Ensuring Safe and Reliable AI.  (a)  Within 90 days of the date of this order, to ensure and verify the continuous availability of safe, reliable, and effective AI in accordance with the Defense Production Act, as amended, 50 U.S.C. 4501 et seq., including for the national defense and the protection of critical infrastructure, the Secretary of Commerce shall require:

          (i)   Companies developing or

... (read more)
1
SebastianSchmidt
5mo
Great to see how concrete and serious the US is now. This basically means that models more powerful than GPT-4 have to be reported to the government. 
1
Tristan Williams
6mo
Thanks, I'll toss this in at the top now for those that are curious

Both. As you note, Scanlonian contractualism is about reasonable-rejection.

(Personally, I think it's kinda appealing to consider contractualism for deriving principles, e.g. via rational-rejection or more concretely via veil-of-ignorance. I'm much less compelled by thinking in terms of claims-to-aid. I kinda assert that deriving-principles is much more central to contractualism; I notice that https://plato.stanford.edu/entries/contractualism/ doesn't use "claim," "aid," or "assistance" in the relevant sense, but does use "principle.")

(Probably not going to engage more on this.)

6
Bob Fischer
6mo
Ah, I see. Yeah, we discuss this explicitly in Section 2. The language in the executive summary is a simplification.

I just read the summary but I want to disagree with:

Contractualism says: When your actions could benefit both an individual and a group, don't compare the individual's claim to aid to the group's claim to aid, which assumes that you can aggregate claims across individuals. Instead, compare an individual's claim to aid to the claim of every other relevant individual in the situation by pairwise comparison. If one individual's claim to aid is a lot stronger than any other's, then you should help them.

"Contractualism" is a broad family of theories, many ... (read more)

3
Bob Fischer
6mo
Fair point about it being a broad family of theories, Zach. What's the claim that you take Scanlonian contractualism not to entail? The bit about not comparing the individual's claim to aid to the group's? Or the bit about who you should help?

Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.

Also if you're taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.

[To be clear I haven't read most of the post.]

9
Jaime Sevilla
7mo
I have grips with the methodology of the article, but I don't think highlighting the geometric mean of odds over the mean of probabilities is a major fault.  The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence). The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed: 1. There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10. 2.  And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.  In ultimate instance, we can theorize all we want, but as a matter of fact the best performance when predicting complex events is achieved when taking the geometric mean of odds, both in terms of logloss and brier scores. Without more compelling evidence or a very clear theoretical reason that distinguishes between the contexts, it seems weird to argue that we should treat AI risk differently.  And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.

No, I don’t have a take on deference in EA. I meant: post contests generally give you evidence about which posts to pay attention to, especially if they’re run by OP. I am sharing that I have reason to believe that (some of) these winners are less worth-paying-attention-to than you’d expect on priors.

(And normally this reason would be very weak because the judges engaged much more deeply than I did, but my concerns with the posts I engaged with seem unlikely to dissolve upon deeper engagement.)

Congratulations to the winners.

I haven't engaged deeply with any of the winning posts like the judges have, but I engaged shallowly with 3–4 when they were written. I thought they were methodologically doomed (‘Dissolving’ AI Risk) or constituted very weak evidence even if they were basically right (AGI and the EMH and especially Reference Class-Based Priors). (I apologize for this criticism-without-justification, but explaining details is not worth it and probably the comments on those posts do a fine job.)

Normally I wouldn't say this. But OP is high-stat... (read more)

2
zchuang
7mo
[edited: last sentence for explicitness of my point] I think this worry should be more a critique of the EA community writ-large for being overly deferential than for OP holding a contest to elicit critiques of its views and then following through with that in their own admittedly subjective criteria. OP themselves note in the post that people shouldn't take this to be OP's institutional tastes.

(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk's headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)

4
Ted Sanders
7mo
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I'm realizing I don't understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom... then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right? Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.

Yeah, I think the real reason is we think we're safer than OpenAI (and possibly some wanting-power but that mostly doesn't explain their behavior).

See Dario's Senate testimony from two months ago:

With the fast pace of progress in mind, we can think of AI risks as falling into three buckets:

●  Short-term risks are those present in current AI systems or that imminently will be present. This includes concerns like privacy, copyright issues, bias and fairness in the model’s outputs, factual accuracy, and the potential to generate misinformation or propaganda.

●  Medium-term risks are those we will face in two to three years. In that time period, Anthropic’s projections suggest that AI systems ma

... (read more)
5
JWS
7mo
Thanks for linking Dario's testimony. I actually found this extract which was closer to answering my question: I know this statement would have been massively pre-prepared for the hearing, but I don't feel super convinced by it: On his point 1) such benefits have to be weighed up against the harms, both existential and not. But just as many parts of the xRisk story are speculative, so are many of the purported benefits from AI research. I guess Dario is saying 'it could' and not it will, but for me if you want to "improve efficiency throughout government" you'll need political solutions, not technical ones. Point 2) is the 'but China' response to AI Safety. I'm not an expert in US foreign policy strategy (funny how everyone is these days), but I'd note this response only works if you view the path to increasing capability as straightforward. It also doesn't work, in my mind, if you think there's a high chance of xRisk. Just because someone else might ignite the atmosphere, doesn't mean you should too. I'd also note that Dario doesn't sound nearly as confident making this statement as he did talking to it with Dwarkesh recently. Point 3) makes sense if you think the value of the benefits massively outweighs the harms, so that you solve the harms as you reap the benefits. But if those harms outweigh the benefits, or you incure a substantial "risk of ruin", then being at the frontier and expanding it further unilaterally makes less sense to me. I guess I'd want the CEOs and those with power in these companies to actually be put under the scrutiny in the political sphere which they deserve. These are important and consequential issues we're talking about, and I just get the vibe that the 'kid gloves' need to come off a bit in turns of oversight and scrutiny/scepticism.
Load more