All of Misha_Yagudin&#x27;s Comments + Replies

Two directions for research on forecasting and decision making

Is Rebecca still a fund manager, or is the LTFF page out of sync?

So it's fair to say that FFI-supers were selected and evaluated on the same data? This seems concerning. Specifically, on which questions the top-60 were selected, and on which questions the below scores were calculated? Did these sets of questions overlap?

The standardised Brier scores of FFI superforecasters (–0.36) were almost perfectly similar to that of the initial forecasts of superforecasters in GJP (–0.37). [17] Moreover, even though regular forecasters in the FFI tournament were worse at prediction than GJP forecasters overall (probably due to no

... (read more)

Paal Fredrik Skjørten Kvarberg

Yes, the 60 FFI supers were selected and evaluated on the same 150 questions (Beadle, 2022, 169-170). Beadle also identified the top 100 forecasters based on the first 25 questions, and evaluated their performance on the basis of the remaining 125 questions to see if their accuracy was stable over time, or due to luck. Similarly to the GJP studies, he found that they were consistent over time (Beadle, 2022, 128-131). I should note that I have not studied the report very thoroughly, so I may be mistaken about this. I'll have a closer look when I have the time and correct the answer above if it is wrong!

Hey, I think the fourth column was introduced somehow… You can see it by searching for "Mandel (2019)"

A Windfall Clause for CEO could worsen AI race dynamics

Thank you very much, Dane and the tech team!

More as food for thought... but maybe "broad investor base" is a bit of exaggeration? Index funds are likely to control a significant fraction of these corporations, and it's unclear if the board members they appoint would represent ordinary people. Especially when owning ETF != owning actual underlying stocks.

From an old comment of mine:

Due to the rise of index funds (they "own" > 1/5 of American public companies), it seems that an alternative strategy might be trying to rise in the ranks of firms like BlackRock, Vanguard, or SSGA. It's not unprecede

... (read more)

https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts

The table here got all messed up. Could it be fixed?

Dane Magaway

This has now been fixed. Our tech team has resolved the issue by using dummy bullet points to widen the columns. Thanks for reaching out! Let me know if you run into any issues on your end.

Dane Magaway

Hi, Misha! Thanks for reaching out. We're on it and will let you know when it's sorted.

Two directions for research on forecasting and decision making

A Windfall Clause for CEO could worsen AI race dynamics

Thanks for highlighting Beadle (2022), I will add it to our review!

I wonder how FFI Superforecasters were selected? It's important to first select forecasters who are doing good and then evaluate their performance on new questions to avoid the issue of "training and testing on the same data."

Paal Fredrik Skjørten Kvarberg

Good question! There were many differences between the approaches by FFI and the GJP. One of them is that no superforecasters were selected and grouped in the FFI tournament. Here is google's translation of a relevant passage: "In FFI's tournament, the super forecasters consist of the 60 best participants overall. FFI's tournament was not conducted one year at a time, but over three consecutive years, where many of the questions were not decided during the current year and the participants were not divided into experimental groups. It is therefore not appropriate to identify new groups of super forecasters along the way" (2022, 168). You can translate the entirety of 5.4 here for further clarification on how Beadle defines superforecasters in the FFI tournament.

Misha_Yagudin1y7

How much of the objection would be fixed if Windfall Clause required the donations to be under the board's oversight?

Larks

Good question! My guess is not that much, though it depends on the details. In a traditional corporation, the board is elected by the shareholders to protect their interests. If everyone is attentive, it seems like the shareholders might start voting partly based on how the board members would influence the windfall. You could imagine political parties nominating candidates for the board that shareholders would choose between depending on their ideology as well as their expertise with regard the object-level business of the firm. If this is the case, it seems we've basically reverted to a delegated democracy version of shareholder primacy where shareholders effectively get part of their dividend in the form of a pooled DAF vote. If directors/shareholders act with a perhaps more typical level of diligence for corporate governance, I would expect the board to provide a check on the most gross violations (e.g. spending all the money on yachts for the CEO, or funding Al Qaeda) but to give the CEO and management a lot of discretion over playpumps vs AMF or Opera vs ACLU. In practice, the boards of many tech startups seem quite weak. In some cases the founders have super-voting shares; in other cases they are simply charismatic and have boards full of their friends. You can verify this for many of the large public tech companies; I don't know as much about governance at the various LLM startups but in general I would imagine governance to be even weaker by default. In these cases I wouldn't expect much impact from board oversight.

Misha_Yagudin1y8

Thank you, Hauke, just contributed an upvoted to the visibility of one good post — doing my part!

Alternatively, is there a way to apply field customization (like hiding community posts and up-weighting/down-weighting certain tags) to https://forum.effectivealtruism.org/allPosts?

NunoSempere

Yes, ctrl+F on "customize tags"

EA Philippines' Progress in 2022 🇵🇭

Is there a way to only show posts with ≥ 50 upvotes on the Frontpage?

Hauke Hillebrandt

Stop free-riding! voting on new content is a public good, Misha ;P

Misha_Yagudin1y11

A random thought. Philippines is famous for having a flourishing personal/executive assistant industry (e.g., https://www.athenago.com/). I guess there is a demand for assistants who are engaged in EA and know EA culture; IIRC, people who listed themselves at https://pineappleoperations.org/ were overbooked sometime ago. Have you thought about that as a recommended career path?

redbermejo

Yes, though we weren't able to work on initiatives to actively advocate for this. Looking back, perhaps this is partly due to my beliefs on the matter (I don't believe that those with minimal work experience can be really useful assistants) and my focus (I work mostly with students). While I encouraged EA Philippines students to take on operations-oriented roles in general during career advising in my past role as CB, I did not focus on PA. While there is a demand for PAs/ExAs - to be a good one, you need to have excellent client management abilities ("Are you the right assistant for client X?") and enough experience with past organizational logistics work to navigate well to be three steps ahead of your client. You also need to be able to "train" your client on how to leverage your skills better to maximize their productivity (some people don't know how to use assistants). Otherwise, you won't be as helpful in your impact and may end up just being additional overhead to the EA leader you have as a client. Those new to the workforce don't generally have these skills as they mostly get developed and honed over time. I have met some assistants (some from Athena) from our Professionals fellowship who support EAs and, while I'm not privy to their actual work performance, my initial impression is that they all have some decent past work experience. As EA Philippines invests its energy in Professionals outreach, this could be something to put more time in exploring strategic initiatives that encourage this as a viable career path to pursue more intentionally. CC @Elmerei Cuevas @Alethea Faye Cendaña

Update to Samotsvety AGI timelines

Misha_Yagudin1y9

Thank you! We agree and [...], so hopefully, it's more informative and is not about edge cases of Turing Test passing.

We chose to use an imperfect definition and indicated to forecasters that they should interpret the definition not “as is” but “in spirit” to avoid annoying edge cases.

aogara

Fair enough. I think people conceive of AGI too monolithically, and don't sufficiently distinguish between the risk profiles of different trajectories. The difference between economic impact and x-risk is the most important, but I think it's also worth forecasting domain-specific capabilities (natural language, robotics, computer vision, etc). Gesturing towards "the concept we all agree exists but can't define" is totally fair, but I think the concept you're gesturing towards breaks down in important ways.

Update to Samotsvety AGI timelines

Misha_Yagudin1y14

I've preregistered a bunch of soft expectations about the next generation of LLMs and encouraged others in the group to do the same. But I don't intend to share mine on the Forum. I haven't written down my year-by-year expectations with a reasonable amount of detail yet.

Some intuitions about fellowship programs

If EA Community-Building Could Be Net-Negative, What Follows?

The person in charge of the program should be unusually productive/work long hours/etc. because otherwise, they would lack the mindset, tacit knowledge, and intuitions that go into having an environment optimized for productivity. E.g., most people undervalue the time and time of others and hence significantly underinvest in time-saving/convenience/etc. stuff at work.

(Sorry if mentioned above; haven't read the post.)

Joel Becker

I am uncertain whether it's important for program leads to be hard-working for the reason you describe. (I am very confident that hard-working-ness helped me personally a lot, but it doesn't feel obvious that this went through the 'understands hard-working-ness in others' channel.) Very, very strongly agree with the importance of an environment that values people's time very highly. Small changes/mindset shifts here can have outsized impact. Lots of room for improvement too. (Parts of this are covered under "basic amenities" but definitely more to add.)

If EA Community-Building Could Be Net-Negative, What Follows?

The point was that there is a non-negligible probability that EA will end up negative.

SebastianSchmidt

Yes, I agree that there's a non-negligible P that this will happen and that some events will be very harmful (heavy-tailed). Currently, however, saying that it's >10% seems too high but I could definitely change my mind. But I'm sufficiently worried about this to be skeptical of broad and low-fidelity outreach and I solicit advice from people who are generally skeptical of all forms of movement-building to be sure that we're sufficiently circumspect in what we do.

Your 2022 EA Forum Wrapped 🎁

If you think that movement building is effective in supporting the EA movement, you need to think that the EA movement is negative. I honestly can't see how you can be very confident in the latter. Skrewing things up is easy; unintentionally messing up AI/LTF stuff seems easy and given high-stakes causing massive amounts of harm is an option (it's not an uncommon belief that FLI's Puerto Rico conferences turned out negatively, for example).

SebastianSchmidt

"If you think that movement building is effective in supporting the EA movement, you need to think that the EA movement is negative."I think you might mean something like "If you think that movement building is effective in supporting the EA movement, you need to think that the EA movement is definitely not negative."?. I think it depends on how we operationalize community-building. I can definitely see how some forms of community-building is probably negative and I'd want for it to be high quality and relatively targetted. What are some of the reasons why people think the Puerto Rico conference is negative?

On being compromised

Misha_Yagudin1y31

I read it, not as a list of good actors doing bad things. But as a list of idealistic actors [at least in public perception] not living up to their own standards [standards the public ascribes to them].

Misha_Yagudin1y10

Looking back on my upvotes, a surprisingly few great posts this year (< 10 if not ~5). Don't have a sense of how things were last year.

Good things that happened in EA this year

Misha_Yagudin1y8

Thanks, I wasn't aware of some of these outside my cause areas/focus/scope of concern. Very nice to see others succeeding/progressing!

Given how much things are going on in EA these days (I can't keep up even with the forum) might be good to have this as a quarterly thread/post and maybe invite others to celebrate their successes in the comments.

Do we know success rates for organizations/initiatives?

If Global Health Emergency is meant to mean public health emergency of international concern , then the base rate is roughly 45% = 7 / 15.5: declared 7 times, while the appropriate regulation come into force in mid-2007.

Lizka

Great, thanks! Really appreciate this; I was really off — I think I had quickly taken my number/base rate for pandemics, and referenced a list of PHEICs I thought was for the 21st century without checking or noticing that this only starts in 2007. I might just go for this base rate, then.

AGI Timelines in Governance: Different Strategies for Different Timeframes

Consider suggesting it to https://forum.effectivealtruism.org/posts/H7xWzvwvkyywDAEkL/creating-a-database-for-base-rates

Deena Englander

Thanks! Will do so.

Misha_Yagudin1y7

Well, yeah, I struggle with interpreting that:

Prescriptive statements have no truth value — hence I have trouble understanding how they might be more likely to be true.
Comparing "what's more likely to be true" is also confusing as, naively, you are comparing two probabilities (your best guesses) of X being true conditional on "T " and "not T;" and one is normally very confident in their arithmetic abilities.
There are less naive ways of interpreting that would make sense, but they should be specified.
Lastly and probably most importantly, a "probability

Misha_Yagudin1y9

I am quite confused about what probabilities here mean, especially with prescriptive sentences like "Build the AI safety community in China" and "Beware of large-scale coordination efforts."

I also disagree with the "vibes" of probability assignment to a bunch of these, and the lack of clarity on what these probabilities entail makes it hard to verbalize these.

simeon_c

Hey Misha! Thanks for the comment! As I wrote in note 2, I'm here claiming that this claim is more likely to be true under these timelines than the other timelines. But how could I make it clearer without bothering too much? Maybe putting note 2 under the table in italic? I see, I hesitated in the trade-off (1) "put no probabilities" vs (2) "put vague probabilities" because I feel like that the second gives a lot more signal on how confident I am in what I say and allow people to more fruitfully disagree but at the same time it gives a "seriousness" signal which is not good when the predictions are not actual predictions. Do you think that putting no probabilities would have been better? By "I also disagree with the vibes of probability assignment to a bunch of these", do you mean that it seems over/underconfident in a bunch of ways when you try to do a similar exercise?

Clarifications on diminishing returns and risk aversion in giving

Misha_Yagudin1y73

Apologies for maybe sounding harsh: but I think this is plausibly quite wrong and nonsubstantive. I am also somewhat upset that such an important topic is explored in a context where substantial personal incentives are involved.

One reason is that the post that gives justice to the topic should explore possible return curves, and this post doesn't even contextualize betting with how much money EA had at the time (~$60B)/has now(~$20B) until the middle of the post where it mentions it in passing: "so effectively increase the resources going towards them by m... (read more)

Robert_Wiblin1y22

Hi Misha — with this post I was simply trying to clarify that I understood and agreed with critics on the basic considerations here, in the face of some understandable confusion about my views (and those of 80,000 Hours).

So saying novel things to avoid being 'nonsubstantial' was not the goal.

As for the conclusion being "plausibly quite wrong" — I agree that a plausible case can be made for both the certain $1 billion or the uncertain $15 billion, depending on your empirical beliefs. I don't consider the issue settled, the points you're making are interesti... (read more)

Ingredients for creating disruptive research teams

CEA/EV + OP + RP should engage an independent investigator to determine whether key figures in EA knew about the (likely) fraud at FTX

Interesting thread on early RAND culture: https://twitter.com/jordanschnyc/status/1593294746725756929

Misha_Yagudin1y57

Yes, more broadly, I think that we should think about governance more… I guess there are a bunch of low-hanging fruits we can import from the broader world, e.g., someone doing internal-to-EA investigative journalism could have unraveled risks related to FTX/Alameda leadership or just did an independent risk analysis (e.g., this forecasting question put the risk of FTX default at roughly 8%/yr — I am not sure betters had any private information, I think just base-rates give probability around 10%).

Joel Becker1y21

Jehan gives some additional suggestions I liked here. Including rules about:

"fraternization and power relationships."
Anti-corruption.

Might not have affected things in the FTX case, but perhaps worth considering whilst the window for significant reform is wide open.

Tracking the money flows in forecasting

Is AI forecasting a waste of effort on the margin?

Great! I think you missed a few from newer ones from https://ftxfuturefund.org/all-grants/?_area_of_interest=epistemic-institutions

Saul Munn

5mo

this link is dead, here's an archived version i found!

Misha_Yagudin1y10

I think the value of information is really high for the Future Fund. If p(doom) is really high (e.g., the largest prize is claimed), they might decide to almost exclusively focus on AI stuff — this would be a major organizational change that (potentially/hopefully) would help with AI risk reduction quite a bit.

Emrik

Mh, agreed. The general arguments in the post are probably overwhelmed in most cases by considerations specific to each case.

Samotsvety Nuclear Risk update October 2022

[Cause Exploration Prizes] Training experts to be forecasters

Another follow-up forecast from Swift: https://www.swiftcentre.org/what-would-be-the-consequences-of-a-nuclear-weapon-being-used-in-the-russia-ukraine-war/

Reslab Request for Information: EA hardware projects

I don't think your argument reflects much on the importance of forecasting. E.g., it might be the case that forecasting is much more important than whatever experts are going (in absolute terms), but nonetheless, experts should do their things because no one else can substitute them. (To be clear, this is a hypothetical against the structure of the argument.)

I think it's best to access the value of information you can get from forecasting directly.

Hopefully, we can make forecasts credible and communicate it to sympathetic experts on such teams.

Misha_Yagudin2y8

Just want to flag that "hardware" is a bit misleading, as I think people often/mostly use it as shorthand for computer hardware , especially with communities' focus on AI/compute. Maybe disambiguate it straight after TL;DR or in TL;DR.

Joel Becker

Sorry about that! Changed in TL;DR to "physical engineering projects." (Note that these prototypes could plausibly use electronics etc.. So might not make sense to rule out computer hardware, although of course we want to be clear that the scope is broader.)

Prediction Markets in The Corporate Setting

‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

I think CFTC has no authority over play-money internal prediction markets, so that undercuts illegality a bit.

I guess one might even experiment with structuring them as real money markets, e.g., by paying winnings as "bonuses."

Misha_Yagudin2y6

do we actually have better-than-order-of-magnitude knowledge about all of these parameters except Containment?)

Sorta kinda, yes? For example, convincingly arguing that any conditional probability in Carlsmith decomposition is less than 10% (while not inflating others) would probably win the main prize given that "I [Nick Beckstead] am pretty sympathetic to the analysis of Joe Carlsmith here." + Nick is x3 higher than Carlsmith at the time of writing the report.

Froolow

My understanding of what everyone is producing (Carlsmith, Beckstead etc) is their point estimate / most likely probability for some proposition being true. Shifting this point estimate to below 10% would be near enough a prize, but plenty of real-world applications have highish point estimates with a lower bound uncertainty that is very low. The application where I am most familiar with this effect is clinical trials for oncology drugs; it isn't uncommon for the point estimate for a drug's effectiveness to be (say) 50% better than all other drugs on the market, but with a 95% confidence interval that covers no better at all, or even sometimes substantially worse. It seems to me to be quite a radical claim that we have better knowledge of AI Risk across nearly all parameters than we have of an oncology drug across a single parameter following a clinical trial.

My experience experimenting with a bunch of antidepressants I'd never heard of

Misha_Yagudin2y11

Seems like esketamine with "some effect in a day", a comparative lack of side effects, and lack of withdrawal issues might be an attractive option. I am curious why wasn't it on your list?

Samotsvety Nuclear Risk update October 2022

Misha_Yagudin2y8

A forecast from Swift Center: https://www.swiftcentre.org/will-russia-use-a-nuclear-weapon/

Upd: seems important to note that we have an overlap of ~2 forecasters, I think.

Misha_Yagudin

Another follow-up forecast from Swift: https://www.swiftcentre.org/what-would-be-the-consequences-of-a-nuclear-weapon-being-used-in-the-russia-ukraine-war/

Samotsvety Nuclear Risk update October 2022

Misha_Yagudin2y3

Hey Dan, thanks for sanity-checking! I think you and feruell are correct to be suspicious of these estimates, we laid out reasoning and probabilities for people to adjust to their taste/confidence.

I agree outliers are concerning (and find some of them implausible), but I likewise have an experience of being at 10..20% when a crowd was at ~0% (for a national election resulting in a tie) and at 20..30% when a crowd was at ~0% (for a SCOTUS case) [likewise for me being ~1% while the crowd was much higher; I also on occasion was wrong updating x20 as a res

... (read more)

Dan_Keys2y15

It would be interesting whether the forecasters with outlier numbers stand by those forecasts on reflection, and to hear their reasoning if so. In cases where outlier forecasts reflect insight, how do we capture that insight rather than brushing them aside with the noise? Checking in with those forecasters after their forecasts have been flagged as suspicious-to-others is a start.

The p(month|year) number is especially relevant, since that is not just an input into the bottom line estimate, but also has direct implications for individual planning. The plan ... (read more)

Overreacting to current events can be very costly

Misha_Yagudin2y35

Another important consideration that is not often mentioned (here and in our forecast) is how much more/less impact you expect to have after a full-out Russia-NATO nuclear war that destroys London.

Comparing top forecasters and domain experts

Questions on databases of AI Risk estimates

Asking forecasters about their expertise, or about their thinking patterns is not useful in terms of predicting which individuals will prove consistently accurate. Examining their behaviors, such as belief updating patterns, as well as their psychometric scores related to fluid intelligence offer more promising avenues. Arguably the most impressive performance in our study was for registered intersubjective measures, which rely on comparisons between individual and consensus estimates. Such measures proved valid as predictors of relative accuracy.

From the conclusion of this new paper https://psyarxiv.com/rm49a/

Answer by Misha_YagudinOct 02, 20222

Nicole Noemi gathers some forecasts about AI risk (a) from Metaculus, Deepmind co-founders, Eliezer Yudkowsky, Paul Christiano, and Aleja Cotra's report on AI timelines.

h/t Nuño

Froolow

Thank you, really appreciate the information

Ingredients for creating disruptive research teams