Bio

I currently lead EA funds.

Before that, I worked on improving epistemics in the EA community at CEA (as a contractor), as a research assistant at the Global Priorities Institute, on community building, and Global Health Policy.

Unless explicitly stated otherwise, opinions are my own, not my employer's.

You can give me positive and negative feedback here.

Comments
398

Topic contributions
6

Fair - can you give some examples of questions you'd use?

I'd bet that current models with less than $ 100,000 of post-training enhancements achieve median human performance on this task.

Seems plausible the metaculus judges would agree, especially given that that comment is quite old.

I think the mainline plan looks more like use the best agents/model internally and release significantly less capable general agents/models, very capable but narrow agents/models, or AI generated products.

Yeah that’s fair. I’m a lot more bullish on getting AI systems that satisfy the linked question’s definition than my own one.

Did you look at the metaculus resolution criteria? They seem extremely weak to me, would be intersted to know which critiera you think o3 (or whatever the best OAI model is) is furthest away from.

calebp
4
2
0
78% agree

AGI by 2028 is more likely than not


Most of my uncertainty is from potentially not understanding the criteria. They seem extremely weak to me:
 

  • * Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize.
  • Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the "Winogrande" challenge or comparable data set for which human performance is at 90+%
  • Be able to score 75th percentile (as compared to the corresponding year's human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages.
  • Be able to learn the classic Atari game "Montezuma's revenge" (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)

     

I wouldn't be surprised if we've already passed this.

calebp
2
1
0
78% agree

Bioweapons are an existential risk


Note that imo almost all the x-risk from bio routes through AI, and is better thought of as an AI-risk threat model.

You might believe future GPU hours are currently underpriced (e.g. maybe we'll soon develop AI systems that can automate valuable scientific research). In such a scenario, GPU hours would become much more valuable, while standard compute credits (which iiuc are essentially just money designated for computing resources) would not increase in value. Buying the underlying asset directly might be a straightforward way to invest in GPU hours now before their value increases dramatically.

Maybe there are cleverer ways to bet on price of GPU hours dramatically increasing that are conceptually simpler than nvidia share prices increasing, idk.

I think that "moods" should be a property of the whole discourse, as opposed to specific posts. I find it a bit annoying when commenters say a specific post has a missing mood - most posts don't aim to represent the whole discourse.

I think this kind of investigation would be valuable, but I'm not sure what concrete questions you'd imagine someone answering to figure this out.

Load more