# All of ofer's Comments + Replies

Limits to Legibility

First, you increase the pressure on the "justification generator" to mask various black boxes by generating arguments supporting their conclusions.

.

Third, there's a risk that people get convinced based on bad arguments - because their "justification generator" generated a weak legible explanation, you managed to refute it, and they updated. The problem comes if this involves discarding the output of the neural network, which was much smarter than the reasoning they accepted.

On the other hand, if someone in EA is making decisions about high-stakes in... (read more)

Announcing Epoch: A research organization investigating the road to Transformative AI

Hey there!

Can you describe your meta process for deciding what analyses to work on and how to communicate them? Analyses about the future development of transformative AI can be extremely beneficial (including via publishing them and getting many people more informed). But getting many people more hyped about scaling up ML models, for example, can also be counterproductive. Notably, The Economist article that you linked to shows your work under the title "The blessings of scale". (I'm ... (read more)

Thinking about the ways publications can be harmful is something that I wish was practiced more widely in the world, specially in the field of AI.

That being said, I believe that in EA, and in particular in AI Safety, the pendulum has swung too far - we would benefit from discussing these issues more openly.

In particular, I think that talking about AI scaling is unlikely to goad major companies to invest much more in AI (there are already huge incentives). And I think EAs and people otherwise invested in AI Safety would benefit from having a... (read more)

Our current publication policy is:

1. Any Epoch staff member can object when we announce intention to publish a paper or blogpost.
2. We then have a discussion about it. If we conclude that there is a harm and that the harm outweights the benefits we refrain from publishing.
3. If no consensus is reached we discuss the issue with some of our trusted partners and seek advice.
4. Some of our work that is not published is instead disseminated privately on a case-by-case basis

We think this policy has a good mix of being flexible and giving space for Epoch staff to raise concerns.

Impact markets may incentivize predictably net-negative projects

First of all, what we’ve summarized as “curation” so far could really be distinguished as follows:

1. Making access for issuers invite-only, maybe keeping the whole marketplace secret (in combination with #2) until we find someone who produces cool papers/articles and who we trust and then invite them.
2. Making access for investors/retro funders invite-only, maybe keeping the whole marketplace secret (in combination with #1) until we find an impact investor or a retro funder who we trust and then invite them.
3. Read every certificate either before or shortly af
Impact markets may incentivize predictably net-negative projects

The thing I'm looking for is the comparison between the benefits and the costs; are the costs larger?

Efficient impact markets would allow anyone to create certificates for a project and then sell them for a price that corresponds to a very good prediction of their expected future value. Therefore, sufficiently efficient impact markets will probably fund some high EV projects that wouldn't otherwise be funded (because it's not easy for classical EA funders to evaluate them or even find them in the space of possible projects). If we look at that set of pr... (read more)

Impact markets may incentivize predictably net-negative projects

We would never submit our own certificates to a prize contest that we are judging, but we’d also be open to not submitting any of our impact market–related work to any other prize contests if that’s what consensus comes to.

Does this mean that you (the Impact Markets team) may sell certificates of your work to establish an impact market on that very impact market?

Impact markets may incentivize predictably net-negative projects

I do not endorse the text written by "Imagined Ofer" here. Rather than describing all the differences between that text and what I would really say, I've now published this reply to your first comment.

Impact markets may incentivize predictably net-negative projects

Web3: Seems about as bad as any web2 solution that allows people to easily back up their data.

I think that a decentralized impact market that can't be controlled or shut down seems worse. Also, a Web3 platform will make it less effortful for someone to launch a competing platform (either with or without the certificates from the original platform).

Impact markets may incentivize predictably net-negative projects

But abandoning the project of impact markets because of the downsides seems about as misguided to us as abandoning self-driving cars because of adversarial-example attacks on street signs.

I think the analogy would work better if self-driving cars did risky things that could cause a terrible accident, in order to prevent the battery from running out reach the destination sooner.

Attributed Impact may look complicated but we’ve just operationalized something that is intuitively obvious to most EAs – expectational consequentialism. (And moral trade and so

3Denis Drescher4d
First of all, what we’ve summarized as “curation” so far could really be distinguished as follows: 1. Making access for issuers invite-only, maybe keeping the whole marketplace secret (in combination with #2) until we find someone who produces cool papers/articles and who we trust and then invite them. 2. Making access for investors/retro funders invite-only, maybe keeping the whole marketplace secret (in combination with #1) until we find an impact investor or a retro funder who we trust and then invite them. 3. Read every certificate either before or shortly after it is published. (In combination with exposé certificates in case we make a mistake.) Let’s say #3 is a given. Do you think the marketplace would fulfill your safety requirements if only #1, only #2, or both were added to it? It involves explaining that. What we wrote was to argue that Attributed Impact is not as complicated as it may sound but rather quite intuitive. If you want to open a bazaar, one of your worries could be that people will use it to sell stolen goods. Currently these people sell the stolen goods online or on other bazaars, and the experience may be a bit clunky. By default these people will be happy to use your bazaar for their illegal trade because it makes life slightly easier for them. Slightly easier could mean that they get to sell a bit more quickly and create a bit more capacity for more stealing. But if you enact some security measures to keep them out, you quickly reach the point where the bazaar is less attractive than the alternatives. At that point you already have no effect anymore on how much theft there is going on in the world in aggregate. So the trick is to tune the security measures just right that they make the place less attractive than alternatives to the thieves and yet don’t impose prohibitively high costs on the legitimate sellers. My intent so far was to focus on text that is accessible online, e.g., articles, papers, some books.
Impact markets may incentivize predictably net-negative projects

(3) declaring the impact certificates not burned and allowing people some time to export their data.

That could make it easier for another team to create a new impact market that will seamlessly replace the impact market that is being shut down.

My original idea from summer 2021 was to use blockchain technology simply for technical ease of implementation (I wouldn’t have had to write any code). That would’ve made the certs random tokens among millions of others on the blockchain. But then to set up a centralized, curated marketplace for them with a smar

2Denis Drescher5d
Okay, but to keep the two points separate: 1. Allowing people to make backups: You’d rather make it as hard as possible to make backups, e.g., by using anti-screenscraping tools and maybe hiding some information about the ledger in the first place so people can’t easily back it up. 2. Web3: Seems about as bad as any web2 solution that allows people to easily back up their data. Is that about right?
Impact markets may incentivize predictably net-negative projects

[Limited liability] is a historically unusual policy (full liability came first), and seems to me to have basically the same downsides (people do risky things, profiting if they win and walking away if they lose), and basically the same upsides (according to the theory supporting LLCs, there's too little investment and support of novel projects).

Can you explain the "same upsides" part?

Can you say more about why you think this consideration is sufficient to be net negative? (I notice your post seems very 'do-no-harm' to me instead of 'here are the posi

2vaniver4d
Yeah; by default people have entangled assets which will be put at risk by starting or investing in a new project. Limiting the liability that originates from that project to just the assets held by that project means that investors and founders can do things that seem to have positive return on their own, rather than 'positive return given that you're putting all of your other assets at stake.' [Like I agree that there's issues where the social benefit of actions and the private benefits of actions don't line up, and we should try to line them up as well as we can in order to incentivize the best action. I'm just noting that the standard guess for businesses is "we should try to decrease the private risk of starting new businesses"; I could buy that it's different for the x-risk environment, where we should not try to decrease the private risk of starting new risk reduction projects, but it's not obviously the case.] Sure, I agree with this, and with the sense that the costs are large. The thing I'm looking for is the comparison between the benefits and the costs; are the costs larger? Sure, I buy that adverse selection can make things worse; my guess was that the hope was that classical EA funders would also operate thru the market. [Like, at some point your private markets become big enough that they become public markets, and I think we have solid reasons to believe a market mechanism can outperform specific experts, if there's enough profit at stake to attract substantial trading effort.]
Impact markets may incentivize predictably net-negative projects

(Not an important point [EDIT: meaning the text you are reading in these parentheses], but I don't think that a karma of 18 points is a proof for that; maybe the people who took the time to go over that post and vote are mostly amateurs who found the topic interesting. Also, as an aside, if someone one day publishes a brilliant insight about how to develop AGI much faster, taking the post down can be net-negative due to the Streisand effect).

I'm confident that almost all the alignment researche... (read more)

2Yonatan Cale4d
You changed my mind! I think the missing part, for me, is a public post saying "this is what I'm going to do, but I didn't start", which is what the prospective funder sees, and would let the retro funder say "hey you shouldn't have funded this plan". I think. I'll think about it
1DonyChristie5d
I think you're missing the part where if such a marketplace was materially changing the incentives and behavior of the Alignment Forum, people could get an impact certificate for counterbalancing externalities such as critiquing/flagging/moderating a harmful AGI capabilities post, possibly motivating them to curate more than a small moderation team could handle. That's not to say that in that equilibrium there couldn't be an even stronger force of distributionally mismatched positivity bias, e.g. upvote-brigading assuming there are some Goodhart incentives to retro fund posts in proportion to their karma, but it is at least strongly suggestive.
Impact markets may incentivize predictably net-negative projects

I expect this will reduce the price at which OpenAI is traded

But an impact market can still make OpenAI's certificates be worth $100M if, for example, investors have at least 10% credence in some future retro funder being willing to buy them for$1B (+interest). And that could be true even if everyone today believed that creating OpenAI is net-negative. See the "Mitigating the risk is hard" section in the OP for some additional reasons to be skeptical about such an approach.

I missed what you're replying to though. Is it the "The problem of funding net

Impact markets may incentivize predictably net-negative projects

It's just an example for how a post on the alignment forum can be net-negative and how it can be very hard to judge whether it's net-negative. For any net-negative intervention that impact markets would incentivize, if people can do it without funding then the incentive to do impressive things can also cause them to carry out the intervention. In those cases, impact markets can cause those interventions to be more likely to be carried out.

5Yonatan Cale5d
I hope I'm not strawmanning your claim and please call me out if I am, but, Seems like you are arguing for making it more likely to have [a risk] that, as you point out, happened, and the AF could solve with almost no cost, and they chose not to. ..right? So.. why do you think it's a big problem? Or at least.. seems like the AF disagrees about this being a problem.. no? (Please say if this is an unfair question somehow)
Impact markets may incentivize predictably net-negative projects

I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it's very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations).

My model of you would say either that:

1. funding those particular posts is net bad, or
2. funding those two posts in particular may be net good, but it sets a precedent that will cause there to be further counte
Impact markets may incentivize predictably net-negative projects

If someone wants to advance AI capabilities, they can already get prospective funding by opening a regular for-profit startup.

No?

Right. But without an impact market it can be impossible to profit from, say, publishing a post with a potentially transformative insight about AGI development. (See this post as a probably-harmless-version of the type of posts I'm talking about here.)

5Yonatan Cale7d
I acknowledge this could be bad, but (as with most of my comments here), this is not a new problem. Also today, if someone publishes such a post in the Alignment Forum: I hope they have moderation for taking it down, wether the author expects to make money from it or not. Or is your worry something like "there will be 10x more such posts and the moderation will be overloaded"?
Impact markets may incentivize predictably net-negative projects

If someone thinks a net-negative project is being traded on (or run at all), how about posting about it on the forum?

As we wrote in the post, even if everyone believes that a certain project is net-negative, its certificates may be traded for a high price due to the chance that the project will end up being beneficial. For example, consider OpenAI (I'm not making here a claim that OpenAI is net-negative, but it seems that many people in EA think it is, and for the sake of this example let's imagine that everyone in EA think that). It's plausible that Op... (read more)

3Denis Drescher5d
It could be done a bit more smoothly by (1) accepting no new issues, (2) completing all running prize rounds, and (3) declaring the impact certificates not burned and allowing people some time to export their data. (I don’t think it would be credible for the marketplace to declare the certs burned since it doesn’t own them.) My original idea from summer 2021 was to use blockchain technology simply for technical ease of implementation (I wouldn’t have had to write any code). That would’ve made the certs random tokens among millions of others on the blockchain. But then to set up a centralized, curated marketplace for them with a smart and EA curation team. We’ve moved away from that idea. Our current market is fully web2 with no bit of blockchain anywhere. Safety was a core reason for the update. (But the ease-of-implementation reasons to prefer blockchain also didn’t apply so much anymore. We have a doc somewhere with all the pros and cons.) For our favored auction mechanisms, it would be handy to be able to split transactions easily, so we have thought about (maybe, at some point) allowing users to connect a wallet to improve the user experience, but that would be only for sending and receiving payments. The certs would still be rows in a Postgres database in this hypothetical model. Sort of like how Rethink Priorities accepts crypto donations or a bit like a centralized crypto exchange (but that sounds a bit pompous). But what do you think about the original idea? I don’t think it's so different from a fully centralized solution where you allow people to export their data or at least not prevent them from copy-pasting their certs and ledgers to back them up. My greatest worries about crypto stem less from the technology itself (which, for all I know, could be made safe) but from the general spirit in the community that decentralization, democratization, ungatedness, etc. are highly desirable values to strive for. I don’t want to have to fight against the domi
7Yonatan Cale7d
Impact markets may incentivize predictably net-negative projects

I think that it's more likely to be the result of an effort to mitigate potential harm from future pandemics. One piece of evidence that supports this is the grant proposal, which was rejected by DARPA, that is described in this New Yorker article. The grant proposal was co-submitted by the president of the EcoHealth Alliance, a non-profit which is "dedicated to mitigating the emergence of infectious diseases", according to the article.

Impact markets may incentivize predictably net-negative projects

I find it hard to believe that any version of the lab leak theory involved all the main actors scrupulously doing what they thought was best for the world.

I don't find it hard to believe at all. Conditional on a lab leak, I'm pretty confident no one involved was consciously thinking: "if we do this experiment it can end up causing a horrible pandemic, but on the other hand we can get a lot of citations."

Dangerous experiments in virology are probably usually done in a way that involves a substantial amount of effort to prevent accidental harm. It's not o... (read more)

3Arepo7d
Strong disagree. A bioweapons lab working in secret on gain of function research for a somewhat belligerent despotic government, which denies everything after an accidental release is nowhere near any model I have of 'scrupulous altruism'. Ironically, the person I mentioned in my previous comment is one of the main players at Anthropic, so your second paragraph doesn't give me much comfort.
Impact markets may incentivize predictably net-negative projects

Unless ~several people in EA had an opportunity to talk to that billionaire, I don't think this is an example of the unilateralist's curse (regardless of whether it was net negative for you to talk to them).

Fair, though many  EAs are probably in positions where they can talk to other billionaires (especially with >5 hours of planning), and probably chose not to do so.

Impact markets may incentivize predictably net-negative projects

Is there any real-world evidence of the unilateralist's curse being realised?

If COVID-19 is a result of a lab leak that occurred while conducting a certain type of experiment (for the purpose of preventing future pandemics), perhaps many people considered conducting/funding such experiments and almost all of them decided not to.

My sense historically is that this sort of reasoning to date has been almost entirely hypothetical

I think we should be careful with arguments that such and such existential risk factor is entirely hypothetical. Causal chains ... (read more)

3Arepo8d
I'm talking about the unilateralist's curse with respect to actions intended to be altruistic, not the uncontroversial claim that people sometimes do bad things. I find it hard to believe that any version of the lab leak theory involved all the main actors scrupulously doing what they thought was best for the world. I think we should be careful with arguments that existential risk discussions require lower epistemic standards. That could backfire in all sorts of ways, and leads to claims like one I heard recently from a prominent player that a claim about artificial intelligence prioritisation for which I asked for evidence is 'too important to lose to measurability bias'.
Impact markets may incentivize predictably net-negative projects

I messed up when writing that comment (see the EDIT block).

Impact markets may incentivize predictably net-negative projects

I think people following local financial incentives is always going to happen, and the point of an impact market is to structure financial incentives to be aligned with what the EA community broadly thinks is good.

It may be useful to think about it this way: Suppose an impact market is launched (without any safety mechanisms) and $10M of EA funding are pledged to be used for buying certificates as final buyers 5 years from now. No other final buyers join the market. The creation of the market causes some set of projects X to be funded and some other set... (read more) 1ofer8d I messed up when writing that comment (see the EDIT block). Impact markets may incentivize predictably net-negative projects or setting up a system to short sell different projects. I don't think that short selling would work. Suppose a net-negative project has a 10% chance to end up being beneficial, in which case its certificates will be worth$1M (and otherwise the certificates will end up being worth $0). Therefore, the certificates are worth today$100K in expectation. If someone shorts the certificates as if they are worth less than that, they will lose money in expectation.

Impact markets may incentivize predictably net-negative projects

Furthermore, people looking to make money are already funding net negative companies due to essentially the same problems (companies have non-negative evaluations), so shifting them towards impact markets could be good, if impact markets have better projects than existing markets on average.

Impact markets may incentivize predictably net-negative projects

Hm, naively - is this any different than the risks of net-negative projects in the for-profit startup funding markets? If not, I don't think this a unique reason to avoid impact markets.

Impact markets can incentivize/fund net-negative projects that are not currently of interest to for-profit investors. For example, today it can be impossible for someone to make a huge amount of money by launching an aggressive outreach campaign to make people join EA, or publishing a list of "the most dangerous ongoing experiments in virology that we should advocate to sto... (read more)

I'm not sure that "uniqueness" is the right thing to look at.

Mostly, I meant: the for-profit world already incentivizes people to take high amounts of risk for financial gain. In addition, there are no special mechanisms to prevent for-profit entities from producing large net-negative harms. So asking that some special mechanism be introduced for impact-focused entities is an isolated demand for rigor.

There are mechanisms like pollution regulation, labor laws, etc which apply to for-profit entities - but these would apply equally ... (read more)

Expected ethical value of a career in AI safety

I added an EDIT block in the first paragraph after quoting you (I've misinterpreted your sentence).

Expected ethical value of a career in AI safety

Hey there!

the AI safety research seems unlikely to have strong enough negative unexpected consequences to outweigh the positive ones in expectation.

The word "unexpected" sort of makes that sentence trivially true. If we remove it, I'm not sure the sentence is true. [EDIT: while writing this I misinterpreted the sentence as: "AI safety research seems unlikely to end up causing more harm than good"] Some of the things to consider (written quickly, plausibly contains errors, not a complete list):

• The AIS field (and the competition between AIS researchers
1Jordan Taylor16d
Important point. I changed to
3aogara16d
Also, low quality research or poor discussion can make it less likely that important decision makers will take AI safety seriously.
Unflattering reasons why I'm attracted to EA

If we shame each other for using our EA activities to make friends, find mates, raise status, make a living, or feel good about ourselves, we undermine EA.

This seems plausible. On the other hand, it may be important to be nuanced here. In the realms of anthropogenic x-risks and meta-EA, it is often very hard to judge whether a given intervention is net-positive or net-negative. Conflicts of interest can cause people to be less likely to make good decisions from an EA perspective.

Experiment in Retroactive Funding: An EA Forum Prize Contest

In the original EA Forum Prize, the ex-post EV at the time of evaluation is usually similar to the ex-ante EV assuming that the evaluation happens closely after the post was written. (In a naive impact market, the price of a certificate can be high due to the chance that 3 years from now its ex-post EV will be extremely high.)

6Denis Drescher1mo
So you’re saying it’s fine for them not to make the distinction because they’re so quick that it hardly matters, but that it’s important for us? That makes sense. I suppose that circles back to my earlier comment that I think that our wording is pretty clear about the ex ante nature of the riskiness, but that we can make it even more clear by inserting a few more sentences into the post that make the ex ante part very explicit.
Experiment in Retroactive Funding: An EA Forum Prize Contest

The original EA Forum Prize does not seem to have had the distribution mismatch problem; the posts were presumably evaluated based on their ex-ante EV (or something like that?).

4Denis Drescher1mo
I don’t know if they were, so either way it was probably also not obvious to some post authors that they’d be judged by ex ante EV, and it’s enough for one of them to only think that they’ll be judged by ex post value to run into the distribution mismatch. At least to the same extent – whatever it may be – as our contest. Expectational consequentialism seems to me like the norm, though that may be just my bubble, so I would judge both contests to be benign and net positive because I would expect most people to not want to gamble with everyone’s lives, to not think that a contest tries to encourage them to gamble with everyone’s lives, and to not want to just disguise their gamble from the prize committee.
Experiment in Retroactive Funding: An EA Forum Prize Contest

Thanks for the info!

If the shareholders of the public benefit corporation will be able to receive dividends, I think there's a conflict of interest problem with this setup. The Impact Markets team will probably need to make high-stakes decisions under great uncertainty. (E.g. should an impact market be launched? Should the impact market be decentralized? Should a certain person be invited to serve as a retro funder? How to navigate the tradeoff between explaining the safety rules thoroughly and writing more engaging posts that are more conducive to gaining... (read more)

6Denis Drescher1mo
I can see the appeal in the commitment to consumption. We might just do that if it inspires trust in the market. Then again it sends a weird signal if not even we want to use our own system to sustain our operation. “Dogfooding” would also allow us to experience the system from the user side and notice problems with it even when no one reports them to us. Also people are routinely trusted not to make callous decisions even if it’d be to their benefit. For example, charities are trusted to make themselves obsolete if at all possible. The existence of the Against Malaria Foundation hinges on there being malaria. Yet we trust them to do their best to eliminate malaria. Charities often receive exploratory grants to allow them to run RCTs and such. They’re still trusted to conduct a high-quality RCT and not manipulate the results even though their own jobs and the ex post value of years of their work hinge on the results. Even I personally used to run a charity that was very dear to me, but when we became convinced that the program was nonoptimal and found out that we couldn’t change the bylaws of the association to accommodate a more optimal program, we also shut it down.
Experiment in Retroactive Funding: An EA Forum Prize Contest

There’s a difficult trade-off between the high-fidelity communication of our long explainer posts and the concision that is necessary to get people to actually read a post when it comes to participating in a contest. Our explainer posts get very little engagement. To participate in the contest it’s not necessary to understand exactly how our mechanisms work, so we hope to reach more people by explaining things in simpler terms without words like “ex ante” and comparisons to constructed counterfactual world histories.

After this contest, it will still be ... (read more)

5Denis Drescher1mo
Hmm, I love writing high-fidelity content. Just thinking, “how can I express what I mean as clearly as I can” rather than “how can I simplify what I mean to maximize the fidelity/complexity ratio” is a lot easier for me. But a lot of smart people disagree, and point to how shallow heuristics and layered didactic approaches are essential bridge inferential gaps under time constraints. So I would like to pose the question to anyone else reading this: If you read “Toward Impact Markets” and you read the above post, do you think we should’ve gone for the same level of fidelity above? Or not? Or something in between? Excluding whole categories of usually valuable content from contests, though, seems like a very uncommon level of caution. I’m not saying that I *know* that it’s exaggerated caution, but there have been many prize contests for content on the EA Forum, and none of them were so concerned about info hazards. Some of them have had bigger prize pools too. And in addition the EA Forum is moderated, and the moderators probably have a protocol for how to respond to info hazards. I’ve long pushed for something like the “EA Criticism and Red Teaming [https://forum.effectivealtruism.org/posts/8hvmvrgcxJJ2pYR4X/announcing-a-contest-ea-criticism-and-red-teaming] ” contest (though I usually had more specific spins on the idea in mind), I’m delighted it exists, and I think it’ll be good. But it is a lot more risky than ours. It has a greater prize pool, the most important red-teaming should focus on topics that are important to EA at the moment, so “longtermism” (i.e. “how do we survive the next 20 years”) topics like biosecurity and AI safety, and the whole notion of red-teaming is conceptually close to info hazards too. (E.g., some people claim that some others invoke “info hazard” as a way to silence epistemic threats to their power. I mostly disagree, but my point is about how close the concepts are to each other.) The original EA Forum Prize referred readers to th
Experiment in Retroactive Funding: An EA Forum Prize Contest

We do not plan to resell or consume/open the impact of the posts in the short term but reserve the right to do so in the future.

If you end up reselling impact that you've purchased with a grant from the Future Fund Regranting Program, where does the money go?

5Denis Drescher1mo
We didn’t think about this because we’re not planning this at all. But we’re in the process of forming a public benefit corporation. Our benefit statement is “Increase contributions to public and common goods by developing and deploying innovative market mechanisms.” The PBC will be the one doing the purchases, so if we ever sell the certs again the returns will flow black to the PBC account and will be used in line with the benefit statement. That’s sort of like when a grant recipient buys furniture for an office but then, a few years later, moves to a group office with existing furniture and sells their own (now redundant) furniture on eBay. Those funds then also flow back to the account of the grant recipient unless they have some nonstandard agreements around their furniture. But of course we can run this by FTX if it ever becomes an action-relevant question.
Experiment in Retroactive Funding: An EA Forum Prize Contest

Hi Dony!

In a section titled "Other less important details", after a sentence saying "It’s not a requirement to read all of the following, […]" there is the following sentence:

The certificate description justifies the value of the impact as defined by the latest version of the Attributed Impact definition (currently 0.2).

Other than that sentence, the OP does not convey that the retro funders will consider the ex-ante EV of a post (and won't attribute to the post a higher EV than that, even if the post ends up being extremely beneficial). Instead, the OP... (read more)

5Denis Drescher1mo
We’ve gone through countless iterations with this announcement post that usually took the shape of one of us drafting something, us then wondering whether it’s too complicated and will cause people to tune out and ignore the contest, and us then trying to greatly shorten and simplify it. There’s a difficult trade-off between the high-fidelity communication of our long explainer posts and the concision that is necessary to get people to actually read a post when it comes to participating in a contest. Our explainer posts get very little engagement. To participate in the contest it’s not necessary to understand exactly how our mechanisms work, so we hope to reach more people by explaining things in simpler terms without words like “ex ante” and comparisons to constructed counterfactual world histories. Like, grocery shopping would be a terrible experience if every customer had to understand all the scheduling around harvest, stocks and flows between warehouses, just-in-time delivery, pricing in of some expected number of produce that expire before they’re bought, etc. If anyone who wants to use impact markets has to spend more time up front to learn more about them than the markets are worth to them, that’d be a failure. This is exacerbated in this case where a submitter has a < 100% chance to get a reward of a few hundred dollars. That comes down to quite little money in expectation, so we’ve been trying hard to make the experience as light on the time commitment as possible while linking our full explainer posts at every turn to make sure that people cannot miss the high-fidelity version if they’re looking for it. Once we have bigger budgets, we can also ask people to engage more upfront with our processes. That said, we’ve thought a lot about the bolded key sentence “morally good, positive-sum, and non-risky.” We hope that everyone who submits will read it. By “non-risky” we mean “ex ante non-risky.” We hoped that the term captured that as it’s not common to ta
Being an individual alignment grantmaker

Okay, but if you’re not actually talking about “malicious” retro funders (a category in which I would include actions that are not typically considered malicious today, such as defecting against minority or nonhuman interests), the difference between a world with and without impact markets becomes very subtle and ambiguous in my mind.

I think it depends on the extent to which the (future) retro funders take into account the ex-ante impact, and evaluate it without an upward bias even if they already know that the project ended up being extremely beneficial.

2Denis Drescher1mo
Yes, that’ll be important!
Being an individual alignment grantmaker

I think the concern here is not about "unaligned retro funders" who consciously decide to do harmful things. It doesn't take malicious intent to misjudge whether a certain effort is ex-ante beneficial or harmful in expectation.

I wonder, though, when I play this through in my mind, I can’t quite see almost any investor investing anything but tiny amounts into a project on the promise that there might be at some point a retro funder for it.

Suppose investors were able to buy impact certificates of organizations like OpenAI, Anthropic, Conjecture, EcoHealt... (read more)

5Denis Drescher1mo
Okay, but if you’re not actually talking about “malicious” retro funders (a category in which I would include actions that are not typically considered malicious today, such as defecting against minority or nonhuman interests), the difference between a world with and without impact markets becomes very subtle and ambiguous in my mind. Like, I would guess that Anthropic and Conjecture are probably good, though I know little about them. I would guess that early OpenAI was very bad and current OpenAI is probably bad. But I feel great uncertainty over all of that. And I’m not even taking all considerations into account that I’m aware of because we still don’t have a model of how they interact. I don’t see a way in which impact markets could systematically prevent (as opposed to somewhat reduce) investment mistakes that today not even funders as sophisticated as Open Phil can predict. Currently, all these groups receive a lot of funding from the altruistic funders directly. In a world with impact markets, the money would first come from investors. Not much would change at all. In fact I see most benefits here in the incentive alignment with employees. In my models, each investor makes fewer grants than funders currently do because they specialize more and are more picky. My math doesn’t work out, doesn’t show that they can plausibly make a profit, if they’re similarly or less picky than current funders. So I could see a drop in sophistication as relatively unskilled investors enter the market. But then they’d have to improve or get filtered out within a few years as they lose their capital to more sophisticated investors. Relatively speaking, I think I’m more concerned about the problem you pointed out where retro funders get scammed by issuers who use p-hacking-inspired tricks to make their certificates seem valuable when they are not. Sophisticated retro funders can probably address that about as well as top journals can, which is already not perfect, but more nai
Being an individual alignment grantmaker

I’ll create a document for all the things we need to watch out for when it comes to attacks by issuers, investors, and funders, so we can monitor them in our experiments.

(I don't think that potential "attacks" by issuers/investors/funders are the problem here.)

But that does not solve the retro funder alignment that is part of your argument.

I don't think it's an alignment issue here. The price of a certificate tracks the maximum amount of money that any future retro funder will be willing to pay for it. So even if 100% of the current retro funders sa... (read more)

2Denis Drescher1mo
What should we actually do in response to moral uncertainty?

Probably something like striving for a Long Reflection process. (Due to complex cluelessness more generally, not just moral uncertainty.)

1Sharmake1mo
The real issue is unrealistic levels of coordination and a assumption that moral objectivism is true. While it is an operating assumption in order to do anything in EA, that doesn't equal that's it's true.
Some unfun lessons I learned as a junior grantmaker

In general, what do you think of the level of conflict of interests within EA grantmaking?

My best guess, based on public information, is that CoIs within longtermism grantmaking are being handled with less-than-ideal strictness. For example, generally speaking, if a project related to anthropogenic x-risks would not get funding without the vote of a grantmaker who is a close friend of the applicant, it seems better to not fund the project.

(For example, Anthropic raised a big Series A from grantmakers closely related to their president Daniella Amodei’

Some unfun lessons I learned as a junior grantmaker

Thank you for the info!

I understand that you recently replaced Jonas as the head of the EA Funds. In January, Jonas indicated that the EA Funds intends to publish a polished CoI policy. Is there still such an intention?

The policy that you referenced is the most up-to-date policy that we have but, I do intend to publish a polished version of the COI policy on our site at some point. I am not sure right now when I will have the capacity for this but thank you for the nudge.

Some unfun lessons I learned as a junior grantmaker

Hi Linch, thank you for writing this!

I started off with a policy of recusing myself from even small CoIs. But these days, I mostly accord with (what I think is) the equilibrium: a) definite recusal for romantic relationships, b) very likely recusal for employment or housing relationships, c) probable recusal for close friends, d) disclosure but no self-recusal by default for other relationships.

In January, Jonas Vollmer published a beta version of the EA Funds' internal Conflict of Interest policy. Here are some excerpts from it:

Any relationship that

In general, what do you think of the level of conflict of interests within EA grantmaking? I’m a bit of an outsider to the meta / AI safety folks located in Berkeley, but I’ve been surprised to find out the frequency of close relationships between grantmakers and grant receivers. (For example, Anthropic raised a big Series A from grantmakers closely related to their president Daniella Amodei’s husband, Holden Karnofsky!)

Do you think COIs pose a significant threat to the EA’s epistemic standards? How should grantmakers navigate potential COIs? How should this be publicly communicated?

(Responses from Linch or anybody else welcome)

7calebp1mo
My impression is that Linch's description of their actions above is consistent with our current COI policy. The Fund chairs and I have some visibility over COI matters, and fund managers often flag cases when they are unsure what the policy should be, and then I or the fund Chairs can weigh in with our suggestion. Often we suggest proceeding as usual or a partial but not full recusal (e.g. the fund manager should participate in discussion but not vote on the grant themselves).
Optimizing Public Goods Funding with blockchain tech and clever incentive design (RETROX)

Suppose Alice is working on a dangerous project that involves engineering a virus for the purpose of developing new vaccines. Fortunately, the dangerous stage of the project is completed successfully (the new virus is exterminated before it has a chance to leak), and now we have new vaccines that are extremely beneficial. At this point, observing that the project had a huge positive impact, will Retrox retroactively fund the project?

2joe73991mo
That makes more sense now. Nothing inherent to the retrox platform would prevent this if the expert badgeholders agree to vote for the retroactive funding of the risky viral engineering project. The fact that severe risks had to be taken should be factored into the assignment of the votes, i.e. how value was created. Incentivizing more high risk behavior with potentially extremely harmful impacts is undesireable. Retroactively funding a project of this nature would set a precedent for the types of projects which are funded in the future which I think would probably not lead to a pareto preferred future. The expected value trade-offs would be something like: value added for humanity by financially supporting successful but risky viral engineering project vs potential harm induced by incentivizing more people to pursue high risk endeavours into the future. I think the latter outweighs the former hence my previous hunch.
We Ran an AI Timelines Retreat

We aimed for participants to form evidence-based views on questions such as:

[...]

• What are the most probable ways AGI could be developed?

A smart & novel answer to this question can be an information hazard, so I'd recommend consulting with relevant people before raising it in a retreat.

Optimizing Public Goods Funding with blockchain tech and clever incentive design (RETROX)

Suppose Alice is working on a risky project that has a 50% chance of ending up being extremely beneficial and 50% chance of ending up being extremely harmful. If the project ends up being extremely beneficial, will Retrox allow Alice to make a lot of money from her project?

1joe73991mo
What sort of of projects are you envisioning? AI research labs where there is a 50/50 chance as to whether they end up caring about AI safety? Retroactive funding means that one has the ability to assess the past impact of a particular project in a particular domain and then give out grants through quadratic voting. The ability to look at a project's impact in the past would aid with setting priors for how likely something is to be harmful in the future. If a project has the potential to be incredibly harmful then this should be weighed up by the badge holders who vote and less (or no) votes should be assigned to projects, depending on the probability and severity of the potential negative impacts in the future. From a practical standpoint, the continuous stream of funds which extends well into the future can be stopped by the expert voters if the project is deemed harmful. In general - as it stands - the retrox platform does not have any built in logic which prevents any projects from being funded in the first place, but this is something which needs to be carefully considered and weighed up by those who vote on where the funds are allocated. I think this is where a lot of the "heavy lifting" is done and more careful consideration on who should be eligible to vote is perhaps required. Maybe you have some interesting ideas. Ideally you'd have an immutable and accessible record of people's qualifications, skills and past experiences which would allow one to pick out the right candidates. Another idea would be to have a consensus mechanism between the expert voters which would allow projects to be "blacklisted" or blocked from being funded at all if the risk of them causing extreme harm is considered too great.
The biggest risk of free-spending EA is not optics or motivated cognition, but grift

Grifters are optimizing only to get themselves money and power; EAs are optimizing for improving the world.

I think it is not so binary in reality. It's likely that almost no one thinks about themselves as a grifter; and almost everyone in EA is at least somewhat biased towards actions that will cause them to have more money and power (on account of being human). So, while I think this post points at an extremely important problem, I wouldn't use the grifters vs. EAs dichotomy.

5Linch1mo
I disagree with your model of human nature. I think I'd agree with you if you instead said I think it's valuable to remember that people in EA aren't perfect saints, and have natural human foibles and selfishness. But also humans are not by default either power- or money- maximizers, and in fact some selfish goals are poorly served by power-seeking (e.g. laziness)
6Lorenzo1mo
I strongly agree with this, and think it's important to keep in mind I don't think this matches my (very limited) intuition. I think that there is huge variance in how much different individuals in EA optimize for money/power/prestige. It seems to me that some people really want to "move up the career ladder" in EA orgs, and be the ones that have that precious "impact". While others really want to "do the most good", and would be genuinely happy to have others take decision-making roles if they thought it would lead to more good.
When is AI safety research harmful?

Option value considerations dictate that we continue doing AI safety research even if we’re unsure of its value because it’s much easier to stop a research program than to start one.

I think the opposite is often true. Once there are people who get compensated for doing X it can be very hard to stop X. (Especially if it's harder for impartial people, who are not experts-in-X, to evaluate X.)

2Nathan_Barnard2mo
Yeah I think that's very reasonable
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Thanks, you're right. There's this long thread, but I'll try to explain the issues here more concisely. I think the theorems have the following limitations that were not reasonably explained in the paper (and some accompanying posts):

1. The theorems are generally not applicable for stochastic environments (despite the paper and some related posts suggesting otherwise).
2. The theorems may not be applicable if there are cycles in the state graph of the MDP (other than self-loops in terminal states); for example:
• The theorems are not applicable in states from w
4Charles He2mo
To onlookers, I want to say that: * This isn't exactly what Ofer is complaining about, but one take on the issue, that math can be overstated, poorly socialized, misleading or overbearing, is a common critique in domains that use a lot of applied math (theoretical econ, interdisciplinary biology) that borrows from pure math, physics, etc. * It depends on things (well, sort of your ideology, style, and academic politics TBH) but I think the critique can often be true. * Although to be fair, this particular one critique seems much more specific and it seems like Ofer might be talking past Alex Turner and his meaning [https://www.alignmentforum.org/posts/b6jJddSvWMdZHJHh3/environmental-structure-can-cause-instrumental-convergence?commentId=nNoovBvzorfHch2GS] (but I have no actual idea of the math or the claims) * The tone of the original post is pretty normal or moderate, and isn't "casting shade". * but it might be consistent with issues like: * this person has some agenda that is unhelpful and unreasonable; * they are just a gadfly; * they don't really "get it" but know enough to fool themselves and pick at things forever. * But these issues apply to my account too. I think the tone is pretty good to me.
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Hey there!

And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I'd encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there's a strong reason for a system optimizing those objectives to get to the states that give them many more options.

After spending a lot of time on understanding that work, my impression is that the main theorem... (read more)

Have you explained your thoughts somewhere? It'd more productive to hash out the disagreement rather than generically casting shade!

Being an individual alignment grantmaker

Yeah, that feels like a continuous kind of failure. Like, you can reduce the risk from 50% to 1% and then to 0.1% but you can’t get it down to 0%.

Suppose we want the certificates of a risky, net-negative project to have a price that is lower by 10x than the price they would have on a naive impact market. Very roughly, it needs to be the case that the speculators have a credence of less than 10% that at least one relevant retro funder will evaluate the ex-ante impact to be high (positive). Due to the epistemic limitations of the speculators, that conditi... (read more)

2Denis Drescher1mo
Thanks! I’ve noted and upvoted your comment. I’ll create a document for all the things we need to watch out for when it comes to attacks by issuers, investors, and funders, so we can monitor them in our experiments. In this case I think a partial remedy is for retro funders to take the sort of active role in the steering of the market that I’ve been arguing for where they notice when projects get a lot of investments that they’re not excited about and clarify their position. But that does not solve the retro funder alignment that is part of your argument.
Chaining Retroactive Funders to Borrow Against Unlikely Utopias

Our solutions to at least remove incentives like that (but not to additionally penalize it) are in the Solutions section of the article that Ofer linked

Will those solutions work?

Do you have control over who can become a retro funder after the market is launched? To what extent will the retro funders understand or care about the complicated definition of Attributed Impact? And will they be aware of, and know how to account for, complicated considerations like: "if the certificate's description is not very specific and leaves the door open to risky, net-n... (read more)

2Denis Drescher2mo