All of NunoSempere's Comments + Replies

Cultured meat predictions were overly optimistic

There is also the interesting thing that, as far as I can tell, New Harvest, founded in 2004, basically failed, and we had to wait until the Good Food Institute came to push things along in 2016.

(as a point of comparison, New Harvest claims to have raised ~$7.5M in its frontpage (presumably during the whole of its existence), whereas the GFI spent $8.9M in 2019 alone)

This could be in part because GFI got more financial support from the EA community, both from Open Phil and due to ACE.

  • 2012: ACE was founded.
  • 2014: ACE did an exploratory review of New Harvest.
  • 2015: Lewis Bollard joined Open Phil in September to start its grantmaking in animal welfare. New Harvest was named a standout charity by ACE at the end of 2015.
  • 2016: GFI is founded. Open Phil made its first animal welfare grants. GFI received its first grant from Open Phil, of $1M. GFI become an ACE top charity at the end of the year.
  • 2017: Open Phil made another gran
... (read more)
3Linch3dThanks, done.
The motivated reasoning critique of effective altruism

I liked the post. Some notes:

  • In my experience, the level of disagreeableness and paranoia needed to overcome motivated reasoning is very much above the levels than a casual EA group can sustain.
  • A less nice way to describe "motivated reasoning" might be "load-bearing delusions". So it's not that people can just correct their motivated reasoning to correct reasoning "just so". For instance, if someone has a need to believe that their work is useful and valuable and engage in motivated reasoning to justify this, it's not clear to me that they would at all be
... (read more)
6Linch3dOh I agree. Do you think it's worth editing my post to make that clearer?
The motivated reasoning critique of effective altruism

until recently (with Open Phil publishing a few models for AI), longtermists have not been using numbers or models much

This seems to me to not be the case. For a very specific counterexample, AI Impacts has existed since 2015.

2MichaelStJules2dFair. I should revise my claim to being about the likelihood of a catastrophe and the risk reduction from working on these problems (especially or only in AI; I haven't looked as much at what's going on in other x-risks work). AI Impacts looks like they were focused on timelines.
When pooling forecasts, use the geometric mean of odds

They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12.

Consider that if you're aggregating expert predictions, you might be generating probabilities too soon. Instead you could for instance interview the subject-matter experts, make the transcript available to expert forecasters, and then aggregate the probabilities of the latter. This might produce more accurate probabilities.

When pooling forecasts, use the geometric mean of odds

came back with 50% and 0.00000000001%.

I want to push back a bit against the use of 0.00000000001% in this example. In particular, I was sort of assuming that experts are kind of calibrated, and if two human experts have that sort of disagreement:

  • Either this is the kind of scenario in which we're discussing how a fair coin will land, and one of the experts has seen the coin
  • Or something is very, very wrong

In particular, with some light selection of experts (e.g, decent Metaculus forecasters), I think you'd almost never see this kind of scenario unless someon... (read more)

3Toby_Ord5dI see what you mean, though you will find that scientific experts often end up endorsing probabilities like these. They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12. You are right that if you knew the experts were Bayesian and calibrated and aware of all the ways the model or calculation could be flawed, and had a good dose of humility, then you could read more into such small claimed probabilities — i.e. that they must have a mass of evidence they have not yet shared. But we are very rarely in a situation like that. Averaging a selection of Metaculus forecasters may be close, but is quite a special case when you think more broadly about the question of how to aggregate expert predictions.
3Misha_Yagudin6dI endorse Nuño's comment re: 0.00000000001%. While it's pretty easy to agree that a probability of a stupid mistake/typo is greater than 0.00000000001%, it is sometimes hard to follow in practice. I think Yudkowsky communicates it's well on a more visceral level in his Infinite Certainty [https://www.lesswrong.com/posts/ooypcn7qFzsMcy53R/infinite-certainty] essay. I got to another level of appreciation of this point after doing a calibration exercise for mental arithmetics [https://www.openphilanthropy.org/calibration] — all errors were unpredictable "oups" like misreading plus for minus or selecting the wrong answer after making correct calculations.
When pooling forecasts, use the geometric mean of odds

I guess I can potentially see us changing our minds in a year's time and deciding that arithmetic mean of probabilities is better after all, or that some other method is better than both of these.

This seems very unlikely, I'll bet your $20 against my $80 that this doesn't happen.

5Linch11d(I have not read the post) I endorse these implicit odds, based on both theory and some intuitions from thinking about this in practice.
When pooling forecasts, use the geometric mean of odds

The bias isn't that more questions resolve positively than users expect. The bias is that users expect more questions to resolve positive than actually resolve positive.

I don't get what the difference between these is.

3Simon_M15d"more questions resolve positively than users expect" Users expect 50 to resolve positively, but actually 60 resolve positive. "users expect more questions to resolve positive than actually resolve positive" Users expect 50 to resolve positive, but actually 40 resolve positive. I have now editted the original comment to be clearer?
Frank Feedback Given To Very Junior Researchers

Re: the Edit, I've added an additional paragraph to make that particular point slightly less biting. 

Also, thanks for the point around framing in terms of people's goals.

Frank Feedback Given To Very Junior Researchers

Thanks for the comment. Any thoughts on Linch's comment below?

5Charles He15dThanks for the reply. I think both your main post and Linch's comment are both very valuable, thoughtful contributions. I agree that such direct advice is under supplied. Your experiences/suggestions should be taken seriously and is a big contribution. I don't have anything substantive to add.
Growth of prediction markets over time?

Also, for others reading the thread, the equivalent LW post got a better answer.

Growth of prediction markets over time?

prediction markets wax and wane

Sorry, I meant "prediction market platforms" wax and wane, thought the other interpretation is also correct.

4NunoSempere15dAlso, for others reading the thread, the equivalent LW post [https://www.lesswrong.com/posts/4XXnMXfTrYXqpugwB/growth-of-prediction-markets-over-time] got a better answer.
Frank Feedback Given To Very Junior Researchers

do proof of work to show that you value their efforts

Yes, this makes some sense, thanks. 

Frank Feedback Given To Very Junior Researchers

People use feedback not just to determine what to improve at, but also as an overall assessment of whether they're doing a good job

Good point, thanks.

Frank Feedback Given To Very Junior Researchers

I added a paragraph to the "Models about the value of your project" to make it less biting.

Growth of prediction markets over time?

This would be extremely annoying to gather, way more effort than the value of information would warrant. Nonetheless, here is a database of prediction markets which you could use to get started. 

Things that make this annoying:

  • Presidential betting markets were fairly big before the, say, 1930s
  • A large part of the volume comes from sports betting, and I imagine that you don't particularly care about it, but disambiguating by type of market might be difficult
  • Prediction market popularity waxes and wanes, so you'd have to get data for a lot of different pre
... (read more)
2kokotajlod15dThanks! Gosh, it's disappointing to learn that prediction markets wax and wane in popularity over time instead of steadily exponentially increasing as I had assumed. (I'm thinking about the 'world events' kind, not the sports or stock market kind) This makes me pessimistic that they'll ever get big enough to raise the sanity waterline.
Frank Feedback Given To Very Junior Researchers

beginning and ending the feedback with encouraging words,

So a version of this is also known as a "shit sandwich", and it's not clear to me that it is an effective pattern. In particular, it seems plausible that it only works a limited number of times before people start to notice and develop an aversion to it. I personally find it fairly irritating/annoying.

It's also not clear to me what flavor of encouragement is congruent with a situation in which e.g., getting EA jobs is particularly hard (though perhaps less so for research positions since Rethink Prio... (read more)

0Harrison D15d“ I personally find it fairly irritating/annoying.” What may be true for you may not be true for others. And all of us like to think we are capable of dispassionately analyzing and evaluating feedback, but sometimes monkey brain + emotions + deeply negative feedback = error

I don't have research management experience in particular, but I have a lot of knowledge work (in particular software engineering) management experience.

IMO, giving insufficient positive feedback is a common, and damaging,  blind spot for managers, especially those (like you and me) who expect their reports to derive most of their motivation from being intrinsically excited about their end goal. If unaddressed, it can easily lead to your reports feeling demotivated and like their work is pointless/terrible even when it's mostly good.

People use feedbac... (read more)

I think narrowly following the form can be kind of annoying, but the spirit of the idea is to do proof of work to show that you value their efforts, which can help to make it gut-level easier for the recipient to hear the criticism as constructive advice from an ally (that they want to take on board) rather than an attack from someone who doesn't like them (that they want to defend against).

This might be a cultural thing but in the UK/US/Canada, a purely negative note from a superior/mentor/advisor  (or even friendly peer) feels really really bad.

I really strongly suggest if you are a leader or mentor, to always end a message on a sincerely positive note. 

So a version of this is also known as a "shit sandwich", and it's not clear to me that it is an effective pattern. In particular, it seems plausible that it only works a limited number of times before people start to notice and develop an aversion to it. I personally find it fairly

... (read more)
Open Thread: August 2021

Here is an update post on Metaforecast.

Shallow evaluations of longtermist organizations

To resolve that prediction, yes, I would say that that interpretation is correct. 

Contribution-Adjusted Utility Maximization Funds: An Early Proposal

CAUMFs might act as a fairly small layer on top of other services

If this was as easy as downloading a package over npm, it seems like an obviously good idea. But overall my impression is that this would have way too much overhead in terms of legal headache and coordination required might be pretty great.

Could we do this as a DAO

The thing this might be pointing at is that writing smarts contracts to manipulate money right now seems more convenient on a blockchain than ~anywhere else. Like, I'm sure that stripe has some API, but implementing something like a... (read more)

DeepMind: Generally capable agents emerge from open-ended play

My hot take: This seems like a somewhat big deal to me. It's what I would have predicted, but that's scary, given my timelines

Might be confirmation bias. But is it.

3kokotajlod2moI did say it was a hot take. :D If I think of more sophisticated things to say I'll say them.
Buck's Shortform

But if you already have this coalition value function, you've already solved the coordination problem and there’s no reason to actually calculate the Shapley value! If you know how much total value would be produced if everyone worked together, in realistic situations you must also know an optimal allocation of everyone’s effort. And so everyone can just do what that optimal allocation recommended.

This seems correct


A related claim is that the Shapley value is no better than any other solution to the bargaining problem. For example, instead of allocat

... (read more)
A Sequence Against Strong Longtermism

I think that some of your anti-expected-value beef can be addressed by considering stochastic dominance as a backup decision theory in cases where expected value fails.

For instance, maybe I think that a donation to ALLFED in expectation leads to more lives saved than a donation to a GiveWell charity. But you could point out that the expected value is undefined, because maybe the future contains infinite amount of both flourishing and suffering. Then donating to ALLFED can still be the superior option if I think that it's stochastically dominant.

There are p... (read more)

NunoSempere's Shortform

Notes on: A Sequence Against Strong Longtermism

Summary for myself. Note: Pretty stream-of-thought.

Proving too much

  • The set of all possible futures is infinite which somehow breaks some important assumptions longtermists are apparently making.
    • Somehow this fails to actually bother me
  • ...the methodological error of equating made up numbers with real data
    • This seems like a cheap/unjustified shot. In the world where we can calculate the expected values, it would seems fine to compare (wide, uncertain) speculative interventions with harcore GiveWell data (note that
... (read more)
Shallow evaluations of longtermist organizations

Sure, but it was particularly salient to me in this case because the evaluation was so negative

2kokotajlod2moAh, OK, that makes sense.
Shallow evaluations of longtermist organizations

In what capacity are you asking? I'd be more likely to do so if you were asking as a team member, because the organization right now looks fairly small and I would almost be evaluating individuals.

Shallow evaluations of longtermist organizations

So what I specifically meant was: It's interesting that the current leadership probably thinks that CSER is valuable (e.g., valuable enough to keep working at it, rather than directing their efforts somewhere else, and presumably valuable enough to absorb EA funding and talent). This presents a tricky updating problem, where I should probably average my own impressions from my shallow review with their (probably more informed) perspective. But in the review, I didn't do that, hence the "unmitigated inside view" label. 

6kokotajlod2moHmmm, this surprises me a bit because doesn't it apply to pretty much all of your evaluations on this list? Presumably for each of them, the leadership of the org has somewhat different opinions than your independent impression, and your overall view should be an average of the two. I didn't get the impression that you were averaging your impression with those of other org's leadership.
What should we call the other problem of cluelessness?

I like "opaqueness" for the reason that it is gradable.

Shallow evaluations of longtermist organizations

This is a good question, and in hindsight, something I should have recorded. For the project as a whole, maybe two weeks to a month, but not of full-time work. I don't remember the times for each organization. 

Shallow evaluations of longtermist organizations

Thanks Michael. Going through your options one by one.

  1. Inform decisions about donations that are each in something like the $10-$5000 dollar range. Not an aim I had, but sure, why not.
  2. Inform decisions about donations/grants that are each in something like the >$50,000 dollar range. So rather than inform those directly, inform the kind of research that you can either do or buy with money to inform that donation. $50,000 feels a little bit low for commissioning research to make a decision, though (could a $5k to $10k investment in a better version of this
... (read more)
7Ozzie Gooen3mo+1, to both the questions and the answers. In an ideal world we'd have intense evaluations of all organizations that are specific to all possible uses, done in communications styles relevant to all people. Unfortunately this is an impossible amount of work, so we have to find some messy shortcuts that get much of the benefit at a decent cost. I'm not sure how to best focus longtermist organization evaluations to maximize gains for a diversity of types of decisions. Fortunately I think whenever one makes an evaluation for one specific thing (funding decisions), these wind up relevant for other things (career decisions, organization decisions). My primary interest at this point are evaluations of the following: * How much total impact is an organization having, positive or negative? * How can such impact be improved? * How efficient is the organization (in terms of money and talent) * How valuable is it to other groups or individuals to read / engage with the work of this organization? (Think Yelp or Amazon reviews) My guess is that such investigations will help answer a wide assortment of different questions. To echo what Nuño said, some of my interest in this specific task was in attempting a fairly general-purpose attempt. I think that increasingly substantial attempts is a pretty good bet, because a whole lot could either go wrong (this work upsets some group or includes falsities) or new ideas could be figured out (particularly by commenters, such as those on this post). In the longer term my preference isn't for QURI/Nuño to be doing the majority of public evaluations of longtermist orgs, but instead for others to do most of this work. Perhaps this could be something of a standard blog post type, and/or there could be 1-2 small organizations dedicated to it. I think it really should be done independently from other large orgs (to be less biased and more isolated), so it probably wouldn't make sense for this work to be done as part of a much bigger
Shallow evaluations of longtermist organizations

Thanks for considering ALLFED. We try to respond to inquiries quickly. We have looked back, and have not be able to locate any such inquiries. We will be finalizing our 2020 report with financial details soon.

This is most likely my fault; I think I got confused between allfed.org and allfed.info

To clarify, the cost of preparation does not include the scale up in a catastrophe

For clarity:
1. Your guesstimate model: 3% to 50% mitigation of the impact of war with a 30M to 200M, a war which has a probability 0.02% to 5% per year. You also say that so far, you'v... (read more)

6Denkenberger3moWe tried to buy the .org domain, but unfortunately it was not for sale. There are definitely a lot of failure modes, though part of the money should go to updating institutions as staff turn over. Thanks for updating the Guesstimate. Good question. I think these are quite different because billions of dollars had been put into preparedness, at least for a pandemic. Though billions of dollars have been put into preventing a nuclear war (and reducing weapon stockpiles), we could not find anything preparing for feeding populations for a multiyear catastrophe. I think generally there are logarithmic returns [https://www.fhi.ox.ac.uk/law-of-logarithmic-returns/], which means the first amount of money spent on a problem has much greater marginal cost effectiveness.
Event-driven mission hedging and the 2020 US election

I think you'd have to think about the market equilibrium here. So for instance, if the price of capturing a tCO2e falls to $0.1/tonne, then more people will want to buy them, and the impact of a marginal tonne captured [1] might be lower. More generally, more people would be doing climate related projects, because the administration would be more welcoming of them.

In contrast, under $1/tonne, less people might want to buy them, and thus the marginal impact of a tonne captured might be higher. Similarly, perhaps fewer people would choose to carry out climat... (read more)

2018-2019 Long Term Future Fund Grantees: How did they do?

I would also want more bins than the ones I provide, i.e., not considering the total value is probably one of the parts I like less about this post. 

2018-2019 Long Term Future Fund Grantees: How did they do?

Makes sense. In particular, noticing that grants are all particularly legible might lead you update in the direction of a truncated distribution like you consider. So far, the LTFF seems like it has maybe moved  a bit in the direction of more legibility, but not that much.

What are some key numbers that (almost) every EA should know?

Average yearly donation by EAs (EA survey respondents)

2018-2019 Long Term Future Fund Grantees: How did they do?

Suppose you give initial probability to all three normals. Then you sample an event, and its value is 1. Then you update against the green distribution, and in favor of the red and black distributions. The black distribution has a higher mean, but the red one has a higher standard deviation.

2Linch3moThanks, I understand what you mean now!
2018-2019 Long Term Future Fund Grantees: How did they do?

Well, because a success can be caused by a process who has a high mean, but also by a process which has a lower mean and a higher standard deviation. So for example, if you learn that someone has beaten Magnus Carlsen, it could be someone in the top 10, like Caruana, or it could be someone like Ivanchuk, who has a reputation as an "unreliable genius" and is currently number 56, but who, when he has good days, has extremely good days.

Suppose you give initial probability to all three normals. Then you sample an event, and its value is 1. Then you update against the green distribution, and in favor of the red and black distributions. The black distribution has a higher mean, but the red one has a higher standard deviation.

2018-2019 Long Term Future Fund Grantees: How did they do?

are you referring to what it seems the LTFF expected when they made the grant, what you think you would've expected at the time the grant was made, what you expect from EA/longtermist donations in general, or something else

Yes, that's tricky. The problem I have here is that different grants are in different domains and take different amounts. Ideally I'd have something like "utilons per dollar/other resources" but that's impractical. Instead, I judge a grant in its own terms: Did it achieve the purpose in the grants rationale? or something similarly valuable in case there was  a change of plan?

2018-2019 Long Term Future Fund Grantees: How did they do?

Thanks! To answer the questions under the first bullet point: 

  • Individuals performed better than organizations, but there weren't that many organizations. 
  • Individuals pursuing research directions mostly did legibly well, and the ones who didn't do legibly well seem like they had less of a well-defined plan, as one might expect. 
    • But some people with less defined directions also seem like they did well. 
    • Also note that maybe I'm rating research directions which didn't succeeded as less well defined. 
    • I don't actually have access to the
... (read more)
What should the norms around privacy and evaluation in the EA community be?

Did I imply that I thought it was bad for people to update in this way?

Reading it again, you didn't

2018-2019 Long Term Future Fund Grantees: How did they do?

Yes, for me updating upwards on total success on a lower percentage success rate seems intuitively fairly weird. I'm not saying it's wrong, it's that I have to stop and think about it/use my system 2. 

In particular, you have to have a prior distribution such that more valuable opportunities have a lower success rate. But then you have to have a bag of opportunities such that the worse they do, the more you get excited.

Now, I think this happens if you have a bag with "golden tickets", "sure things", and "duds".  Then not doing well would make you ... (read more)

4Linch3moIt makes sense to update upwards on the mean, but why would you update on the standard deviation from n of 1? (I might be missing something obvious)

Thanks, that makes sense.

  • I agree with everything you say about the GovAI example (and more broadly your last paragraph).
  • I do think my system 1 seems to work a bit differently since I can imagine some situations in which I would find it intuitive to update upwards on total success based on a lower 'success rate' - though it would depend on the definition of the success rate. I can also tell some system-2 stories, but I don't think they are conclusive.
    • E.g., I worry that a large fraction of outcomes with "impact at least x" might reflect a selection process t
... (read more)
What should the norms around privacy and evaluation in the EA community be?

If the extent of your evaluation is a quick search for public info, and you don't find much, I think the responsible conclusion is "it's unclear what happened" rather than "something went wrong". I think this holds even for projects that obviously should have public outputs if they've gone well.

So to push back against this, suppose that if you have four initial probabilities (legibly good, silently good, legibly bad, silently bad). Then you also have a ratio (legibly good + silently good) : (legibly bad + silently bad). 

Now if you learn that the proje... (read more)

4Aaron Gertler3moI was too vague in my response here: By "the responsible conclusion", I mean something like "what seems like a good norm for discussing an individual project" rather than "what you should conclude in your own mind". I agree on silent success vs. silent failure and would update in the same way you would upon seeing silence from a project where I expected a legible output. If the book isn't published in my example, it seems more likely that some mundane thing went poorly (e.g. book wasn't good enough to publish) than that the author got cancer or found a higher-impact opportunity. But if I were reporting an evaluation, I would still write something more like "I couldn't find information on this, and I'm not sure what happened" than "I couldn't find information on this, and the grant probably failed". (Of course, I'm more likely to assume and write about genuine failure based on certain factors: a bigger grant, a bigger team, a higher expectancy of a legible result, etc. If EA Funds makes a $1m grant to CFAR to share their work with the world, and CFAR's website has vanished three years later, I wouldn't be shy about evaluating that grant.) I'm more comfortable drawing judgments about an overall grant round. If there are ten grants, and seven of them are "no info, not sure what happened", that seems like strong evidence that most of the grants didn't work out, even if I'm not past the threshold of calling any individual grant a failure. I could see writing something like: "I couldn't find information on seven of the ten grants where I expected to see results; while I'm not sure what happened in any given case, this represents much less public output than I expected, and I've updated negatively about the expected impact of the fund's average grant as a result." (Not that I'm saying an average grant necessarily should have a legible positive impact; hits-based giving is a thing. But all else being equal, more silence is a bad sign.)
Load More