Scott Alexander

2378Joined Aug 2021


I find this interesting, but also somewhat hard to identify any meaningful patterns. For example, one could expect red points to be clustered at the top for Manifold, indicating that more forecasts equal better performance. But we don't see that here. The comparison may be somewhat limited anyway: In the eyes of the Metaculus community prediction, all forecasts are created equal. On Manifold, however, users can invest different amounts of money. A single user can therefore in principle have an outsized influence on the overall market price if they are willing to spend enough. I'd be interested to see more on how accuracy on Manifold changes with the number of traders and overall trading volume. Who knows, maybe Manifold would be ahead if they had a similar number of forecasters to Metaculus?

Does this mean that if you controlled for number of forecasters, you still think Metaculus would beat Manifold? If not, do you have any opinion on this question (sorry if I missed it).

Thank you. I misremebered the transcription question. I now agree with all of your resolutions, with the most remaining uncertainty on translation.

Thank you for doing this! I was working on a similar project and mostly came up with the same headline finding as you: the experts seemed well-calibrated. I did  decide a few of the milestones a little differently, and would like to hear why you chose the way you did so I can decide whether or not to change mine.

  • Zach Stein-Perlman from AI Impacts said that he thought "efficiently sort very large lists" and "write good Python code" were false, because the questions said it had to be done in a certain way by a certain type of neural net, and that wasn't how it was done.
  • I was planning to count "transcribe as well as humans" as false, based on . Maybe the top labs could achieve this with a year of work, but I think the question specifies they need to do as well as the best human transcriptionists, and right now they don't seem close.
  • I counted "translate as well as bilingual humans" as true based on a few quick tests of ChatGPT; I'm curious if you have some specific source for why it's false.
  • I don't think AI has won at Starcraft. The last word I've heard from this was , where AlphaStar could beat 99.8% of humans but not the absolute champions. I haven't seen any further progress on this since 2019. Again, it's possible that a year of concerted effort could change this, but that seems speculative. See also
  • I'm surprised you judged "high marks for a high school essay" as false; this seems like a central use case for ChatGPT and Bing/GPT4.
  • I was planning to judge "concisely explain game play" as true, based on, which is testing basically this skill. Also, I was able to play a partial game of chess with ChatGPT where it explained all its moves - before it started hallucinating and making moves which were impossible. Still, it seemed to have the "explanation" skill down pat! I imagine if you asked it to explain why a chess engine made a given move, it would give a pretty plausible answer.

Beyond those quibbles - I was also looking at (the dataset itself; the summary doesn't include the milestones). This new version seems like total garbage. The experts continue to predict several of the milestones are five years out, including milestones that were achieved by ChatGPT (ie a few months after the survey) and at least one milestone that had already clearly been achieved by the time the survey was released! Unless there's some reason to think the new crop of experts is worse than the old one, this makes me think they only did okay last time by luck/coincidence, and actually they have no idea what they're doing.

(I don't think it works to say that the period 2017-2022.5 was predictable, but the period 2022.5-2023 wasn't, because part of what the 2017 experts were right about was ChatGPT, which came out in late 2022.)

Thanks for asking. One reason we decided to start with forecasting was because we think it has comparatively low risks compared to other fields like AI or biotech. 

If this goes well and we move on to a more generic round, we'll include our thoughts on this, which will probably include a commitment not to oracular-fund projects that seem like they were risky when proposed, and maybe to ban some extremely risky projects from the market entirely. I realize we didn't explicitly say that here, which is because this is a simplified test round and we think the forecasting focus makes risks pretty unlikely.

In the unlikely event that someone proposes a forecasting project < $20,000 which we think carries significant risk, we're prepared to take those steps this time too.

In 2018, I collected data about several types of sexual harassment on the SSC survey, which I will report here to help inform the discussion. I'm going to simplify by assuming that only cis women are victims and only cis men are perpetrators, even though that's bad and wrong.

Women who identified as EA were less likely report lifetime sexual harassed at work than other women, 18% vs. 20%. They were also less likely to report being sexually harassed outside of work, 57% vs. 61%. 

Men who identified as EA were less likely to admit to sexually harassing people at work (2.1% vs. 2.9%) or outside of work (16.2% vs. 16.5%)

The sample was 270 non-EA women, 99 EA women, 4940 non-EA men, and 683 EA men. None of these results were statistically significant, although all of them trended in the direction of EAs experiencing less sexual harassment. 

This doesn't prove that EA environments have less harassment than the average environment, since it could be that EAs are biased to have less sexual harassment for other reasons, and whatever additional harassment they get in EA isn't enough to make up for it; the vast majority of EAs have the vast majority of interactions in non-EA environments. I tried to sort of get around this by limiting my analysis to people living in California, on the grounds that they were more likely to be plugged into EA communities and jobs.  Conditional on being a woman in California, being EA did make someone more likely to experience sexual harassment, consistently, as measured in many different ways. But Californian EAs were also younger, much more bisexual, and much more polyamorous than Californian non-EAs; adjusting for sexuality and polyamory didn't remove the gap, but age was harder to adjust for and I didn't try.  EAs who said they were working at charitable jobs that they explicitly calculated were effective had lower harassment rates than the average person, but those working at charitable jobs that they didn't expliclitly calculate were higher. All of these subgroup analyses were very small sample size.

Overall I am not sure that anything can be concluded from these results either way.

I would urge everyone thinking about this question to read my original discussion of the sexual harassment survey results. It mostly focuses on professions but I think the overall conclusion is extremely relevant here too. You can also find the link to the data there in case you want to double-check my results.

Minor object-level objection: you say we should predict that crypto exchanges like FTX to fail, but I tried to calculate the risk of this in the second part of my post, and the average FTX-sized exchange fails only very rarely. 

I don't think this is our main point of disagreement though. My main point of disagreement is about how actionable this is and what real effects it can have.

I think that the main way EA is "affiliated with" crypto  is that it has accepted successful crypto investors' money. Of people who have donated the most to EA, I think about 5-7 of the top ten names made their money in something crypto-related (even counting all the FTX people as one donor). Some of those people (example: Vitalik Buterin) are well-liked, honest, and haven't done anything to embarrass us. I think it would be practically bad to stop accepting their people, and morally bad (as a betrayal) to denounce them and writing them out of the movement based on guilt by association. (CoI note: I have benefited from non-FTX crypto money)

I see you're not recommending that EA stop taking crypto money. But then I'm not sure what you do want, other than what's already happening:

  • You recommend EA not invest in crypto, but I don't think the movement is really doing this, and if they are I would expect that to be for normal economic reasons like diversification (and in general I expect Open Phil's investment managers to know more than us) .
  • You recommend that organizations not put crypto people on their board, but I don't know of this happening except when those people have already been EAs before getting into crypto (I think SBF was on CEA's board before he got into crypto, although I could be wrong). If it was happening in other cases, I would assume it was because of standard practices around very big donors getting on the board, and not because EAs love crypto so much that they invite random crypto leaders to join company boards. If you know of examples to the contrary I would be interested in hearing them.
  • You recommend not boasting about ties to crypto insiders. I haven't seen this happen except with SBF, where I think the boasting was along the lines of "look how well this person earning to give paid off". I agree that people should do less of that in the future.

Although the point of "don't invite random crypto scammers to serve on your board and become the public face of EA for no reason" is obviously correct, I don't know of anyone actually doing this, and so I worry that the real effect of posts like this will be to slowly make crypto so toxic in this community that EA leaders feel pressured to refuse crypto donations for PR reasons, and then we lose > half of our potential money. I'm especially worried about some kind of purity spiral, where after crypto is toxified, the next level is people arguing that Facebook has also been a pretty evil company at various points and so maybe we shouldn't accept Dustin's money either. I don't see a good Schelling fence here and would prefer not to start down that slope. I think we should avoid associating  with (including taking money from) anyone who seems likely to be an outright fraud or breaking the law, and maybe some extremely harmful industries like tobacco, but not try to more generally be the arbiters of which industries are vs. aren't socially productive.

Thanks for your thoughtful response.

I'm trying to figure out how much of a response to give, and how to balance saying what I believe vs. avoiding any chance to make people feel unwelcome, or inflicting an unpleasant politicized debate on people who don't want to read it. This comment is a bad compromise between all these things and I apologize for it, but:

I think the Kathy situation is typical of how effective altruists respond to these issues and what their failure modes are. I think "everyone knows" (in Zvi's sense of the term, where it's such strong conventional wisdom that nobody ever checks if it's true ) that the typical response to rape accusations is to challenge and victim-blame survivors. And that although this may be true in some times and places, the typical response in this community is the one which, in fact, actually happened - immediate belief by anyone who didn't know the situation, and a culture of fear preventing those who did know the situation from speaking out. I think it's useful to acknowledge and push back against that culture of fear.

(this is also why I stressed the existence of the amazing Community Safety team - I think "everyone knows" that EA doesn't do anything to hold men accountable for harm, whereas in fact it tries incredibly hard to do this and I'm super impressed by everyone involved)

I acknowledge that makes it sound like we have opposing cultural goals - you want to increase the degree to which people feel comfortable expressing out that EA's culture might be harmful to women, I want to increase the degree to which people feel comfortable pushing back against claims to that effect which aren't true. I think there is some subtle complicated sense in which we might not actually have opposing cultural goals, but I agree to a first-order approximation they sure do seem different. And I realize this is an annoyingly stereotypical situation  - I, as a cis man, coming into a thread like this and saying I'm worried about a false accusations and chilling effects. My only two defenses are, first, that I only got this way because of specific real and harmful false accusations, that I tried to do an extreme amount of homework on them before calling false, and that I only ever bring up in the context of defending my decision there.  And second, that I hope I'm possible to work with and feel safe around, despite my cultural goals, because I want to have a firm deontological commitment to promoting true things and opposing false things, in a way that doesn't refer to my broader cultural goals at any point. 

Predictably, I disagree with this in the strongest possible terms.

If someone says false and horrible things to destroy other people's reputation, the story is "someone said false and horrible things to destroy other people's reputation". Not "in some other situation this could have been true". It might be true! But discussion around the false rumors isn't the time to talk about that.

Suppose the shoe was on the other foot, and some man (Bob), made some kind of false and horrible rumor about a woman (Alice). Maybe he says that she only got a good position in her organization by sleeping her way to the top. If this was false, the story isn't "we need to engage with the ways Bob felt harmed and make him feel valid." It's not "the Bob lied lens is harsh and unproductive". It's "we condemn these false and damaging rumors". If the headline story is anything else, I don't trust the community involved one bit, and I would be terrified to be associated with it.

I understand that sexual assault is especially scary, and that it may seem jarring to compare it to less serious accusations like Bob's. But the original post says we need to express emotions more, and I wanted to try to convey an emotional sense of how scary this position feels to me. Sexual assault is really bad and we need strong norms about it. But we've been talking a lot about consequentialism vs. deontology lately, and where each of these is vs. isn't appropriate. And I think saying "sexual assault is so bad, that for the greater good we need to focus on supporting accusations around it, even when they're false and will destroy people's lives" is exactly the bad kind of consequentialism that never works in real life. The specific reason it never works in real life is that once you're known for throwing the occasional victim under the bus for the greater good, everyone is terrified of associating with you.

Perhaps I would feel differently if I knew of examples of the EA community publicly holding men accountable for harm to women.

This is surprising to me; I know of several cases of people being banned from EA events for harm to women. When I've tried to give grants to people, I have gotten unexpected emails from EA higher-ups involved in a monitoring system, who told me that one of those people secretly had a history of harming women and that I should reconsider the grant on that basis. I have personally, at some physical risk to myself, forced a somewhat-resistant person to leave one of my events because they had a history of harm to women (this was Giego C; I think it was clear-cut enough to be okay to name a name here; I know most orgs have already banned him, and if your org hasn't then I recommend they do too - email me and I can explain why). I know of some other cases where men caused less severe cases of harm or discomfort to women, there were very long discussions by (mostly female members of) EA leadership about whether they should be allowed to continue in their roles, and after some kind of semi-formal proceeding, with the agreement of the victim, after an apology, it was decided that they should be allowed to continue in their roles, sometimes with extra supervision. There's an entire EA Community Health Team with several employees and a mid-six-figure budget, and a substantial fraction of their job is holding men accountable for harm to women.  If none of this existed, maybe I'd feel differently.  But right now my experience of EA is that they try really hard to prevent harm to women, so hard that the current disagreement isn't whether to ban some man accused of harming women, but whether it was okay for me to mention that a false accusation was false.

Again in honor of the original post saying we should be more open about our emotions: I'm sorry for bringing this up. I know everyone hates having to argue about these topics. Realistically I'm writing this because I'm triggered and doing it as a compulsion, and maybe you also wrote your post because you're triggered and doing it as a compulsion, and maybe Maya wrote her post because she's triggered and doing it as a compuIsion. This is a terrible topic where a lot of people have been hurt and have strong feelings, and I don't know how to avoid this kind of cycle where we all argue about horrible things in circles. But I am geninely scared of living in a community where nobody can save good people from false accusations because some kind of mis-aimed concern about the greater good has created a culture of fear around ever speaking out. I have seen something like this happen to other communities I once loved and really don't want it to happen here. I'm open to talking further by email if you want to continue this conversation in a way that would be awkward on a public forum.

EDIT: After some time to cool down, I've removed that sentence from the comment, and somewhat edited this comment which was originally defending it. 

I do think the sentence was true. By that I mean that (this is just a guess, not something I know from specifically asking them) the main reason other people were unwilling to post the information they had, was because they were worried that someone would write a public essay saying "X doesn't believe sexual assault victims" or "EA has a culture of doubting sexual assault victims". And they all hoped someone else would go first to mention all the evidence that these particular rumors were untrue, so that that person could be the one to get flak over this for the rest of their life (which I have, so good prediction!), instead of them. I think there's a culture of fear around these kinds of issues that it's useful to bring to the foreground if we want to model them correctly.

But I think you're gesturing at a point where if I appear to be implicitly criticizing Maya for bringing that up, fewer people will bring things like that up in the future, and even if this particular episode was false, many similar ones will be true, so her bringing it up is positive expected value, so I shouldn't sound critical in any way that discourages future people from doing things like that. 

Although it's possible that the value gained by saying this true thing is higher than the value lost by potential chilling effects, I don't want to claim to have an opinion on this, because in fact I wrote that comment feeling pretty triggered and upset, without any effective value calculations at all. Given that it did get heavily upvoted, I can see a stronger argument for the chilling effect part and will edit it out.

Load More