Ozzie Gooen

4867Berkeley, CA, USAJoined Dec 2014


I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.


Amibitous Altruistic Software Efforts


Topic Contributions

Some thoughts on the greater project:  

- The greater prospect of “let’s have collaborative estimates of the impacts of key longtermist projects” is something I strongly want to see, but I think it’s also *really* difficult to do well.

- This experiment went through a few early strategies. I think the results are clearly mediocre (in that estimates were all over the place, and were wildly inconsistent), but could be a good place to build much better work. 

- I see this very much as an MVP, so I’d expect it to have severe limitations. I generally prefer processes of “build a bunch of MPVs and test them out, and see what fails”, then one of “spend a whole lot of time getting it right the first time.”

- The fact that estimates were inconsistent suggests that elicitation is very difficult to do well, but also that there’s a great deal of improvement to be done. So, future work is probably less tractable than expected, but more important.

- I’m still very bullish on relative evalutions, but think that they will require a lot of clever innovations to do well. 

- I think that longer-term, it would be promising to have people submit relative evaluations as long Squiggle (or similar) files. I’m unsure how these can best be displayed or organized for specific discussions.

Some thoughts on the estimations:  

- I think this is the first most/any of us have really had to estimate the relative value of these kinds of longtermist projects. There’s been very little literature on this before. I think the numbers are correspondingly questionable (including my own).

- Utility elicitation for comparing one item to another that could be negative, in particular, was really poor. I tried some naive Squiggle calculations that clearly weren’t very accurate. I’m not sure what tool would be best here, maybe there’s some custom drawing-with-mouse tool that could work, or people could figure out better quantitative function representations.

- It’s very hard to evaluate these sorts of projects without much more data. Ideally, there would be a lot of data gathering. For example, if a program funds 10 people to do work, we’d ideally have a good table of all of their outputs, and comments from people in the area about how good these comments were. A lot of evaluation work can reduce to “effective systems to gather objective and subjective information from diverse sets of sources.”

- I believe people estimated “how valuable do you think this is” instead of, “how valuable do you think a council would think this is?” The latter should be much more uncertain, and possibly much more important to readers (if done well).

- From what I remember, I think my main disagreement with other evaluators is that some had much narrower ranges than I thought were reasonable. I guess that some of this is part of a learning process.

I think much of the issue is that:
1. It took a while to ramp up to being able to do things such as the marketing campaign for WWOTF. It's not trivial to find the people and buy-in necessary. Previous EA books haven't had similar.
2. Even when you have that capacity, it's typically much more limited than we'd want.

I imagine EAs will get better at this over time. 

We might be quibbling a bit over what "really valuable" means. I agree that CEA definitely could have prioritized these higher, and likely would have if they cared about it much more. 

I think they would be happy to have evaluations done if they were very inexpensive or free, for whatever that's worth. This is much better than with many orgs, who would try to oppose evaluations even if they are free; but perhaps it is suboptimal. 

Very quick note:

I agree that more public evaluations of things like CEA programs and programs that they fund, would be really valuable. I'm sure people at CEA would agree too. In my experience at QURI, funders are pretty positive about this sort of work.

One of the biggest challenges is in finding strong people to do it. Generally the people qualified to do strong evaluation work are also qualified to do grant funding directly, so just go and do that. It's hard to do evaluation well, and public writeups present a bunch of extra challenges, many of which aren't very fun.

If people here have thoughts on how we can scale public evaluation, I'd be very curious. 

Yea; from my perspective, the total benefit of these posts might be higher than ever given the fact that there's much more activity and decisions being in the space now. That said, I imagine the costs are higher too (Larks's time is more valuable)

A few quick thoughts:

  1. I always appreciate well-meaning discussion and thought this brought up some good points. That said, I overall don't really agree with it.
  2. It's a lot of work to organize an event for many people. In the last year, total global attendance at EAGs (between all of them) seems to have grown by around an order of magnitude or so; from maybe ~500 ~3 years ago, to maybe ~6k this year? My impression is that it's been correspondingly tricky to scale the CEA team in charge of this growth. I imagine specific proposals to "Open EAGs" would look like some combination of charging a fair bit more for them and/or allowing 10k+ people at each event. This doesn't at all seem trivial. Maybe it would be easy to do a very minimalist/mediocre version of a huge event, but I imagine if that were done, people would find a lot of reasons to complain.
  3. My personal proposal is that eventually, it would be nice (assuming there are people available to do it) to try out essentially an "EA Open", with 5k-15k people. If this works, then rename "EA Open" to "EA Global", and then continue having a smaller "EA Global event", but now named something more like, "Super boring detailed EA summit." This way the senior people could still have their event, and others can have some "EA" event to go to (that's called "EA Global", if they care about that so much). 
  4. Even if a bigger conference isn't set up, I think "EA Global" might be a mediocre name for the current conference. The harm caused by the resentment of people not getting invited to it might outweigh the benefits of making it seem more accessible to some who do get in, but wouldn't have applied otherwise. The branding could likewise change to make the focus seem more professional/dedicated.
  5. Some people treat "EAG" as "A professional venue to do work meetings", and others treat it as "the place for the cool kids to be cool with each other". I'd probably prioritize the former for a few reasons.
  6. Most of the benefits of attending EAG get watered down when you increase the size. I imagine the 10k-person version would look very different. The senior people that do show up wouldn't have much time per person, and would be there for different reasons (recruitment, some very selective mentorship). The experience for most people would be "a chance to talk with many others who are vaguely interested in EA". This doesn't sound very exciting to me, but maybe it could be made to work somehow. 

I'm really happy to see these detailed suggestions & improvements, they're really useful. 

Squiggle is still an early language, there are definitely a lot of fixes for things like these to be done. 

Quick question:
> This is the primary reason I wasn't able to completely verify the Squiggle model against Causal

Any harder numbers here would be really useful, to get a better sense. I just looked at this model, which takes me a few seconds to render. (This is also too much to be done each keystroke, similar to Squiggle).  I'd expect Squiggle to be slower than Causal (for one, Squiggle is much newer and not a startup), but I'm of course curious how much it is.

Reminder: Only a few days left for this! Submissions are closed on September 1st.

A few very quick points:

1. Neat to see this being modeled! It's an important see of variables, could use more interesting attempts.
2. Note that you can click "copy share link" in Squiggle to have a link that opens that direct model. https://www.squiggle-language.com/playground/#code=eNqtVE1vgkAQ%2FSsTTmArLkZrStImmpjGU5PaI6kZy6IbYdFlqTHW%2F96hWkMVEJty3Hlf%2B2bD1kjm8XqcRhGqjeEGGCb89vts6AsdK8PVKqUTIYUWGI5XqZjNQj7WSsiZ4RoyjaZcPQdDTNqs3YYHuLO7C9AxOJ2FJz3ZaoEWEQchYcNRJYCB5goysCd%2F2H2fzEQsMSSdvn4lgqktEtt6EuhDQmazJxWv9fwFNadZk9msmzkxu7OHnYZpgGkWUG%2FAseANtOXJXY0MFwDNU9vs1svJQIRhBhaKj4LRx2aACT8Ep9yMOb3eGWyIOVTXZtCCe2qzAEfzCgvqpFT5TzvJaeWr%2Bd16ST%2BN8yxZhsNmKQoeeTDN%2BZSLn6dx9%2BD6hNt66tRI72pts2sd5CPuC5T%2FLQ6fj7BKUWoRctM%2Bmi0nDivjjuSJmUOPq3md5bsfmA7L3Io2xzG5%2BBrcqmHFRvZNVb6zisprsQs6pV%2BDsfsCPtfA6w%3D%3D

3. Some of the key variables get to be negative, which worries me. You can use "truncateLeft()" to remove the negative values, or "SampleSet.map({|x| x < 0 ? 0 : x})" to set them to 0. (Which is likely more appropriate here).

4. I know Nuno did some very similar modeling here, but hasn't written about it much yet.

5. Be sure to apply to the upcoming Squiggle competition! 


I think this is a solid report[1] of a really interesting area. 

I previously gave this as feedback; but wanted to bring it up here:
My hunch is that the #1 potential benefit of this is the gains to Wisdom and Intelligence. The main goal I see is for humans (especially near effective altruists) to become smart enough to us over the hurdle of the upcoming X-risks. 

It's possible that improvements to benevolence or coordination could be great as well. 

If brain-computer interfaces could become really feasible in the next 10-40 years, they seem like a really huge deal to me; much more exciting than many pharmaceuticals, education advances, and many other interesting industries. 

It would be great to see forecasts on this, for anyone interested. 

  1. ^

    This doesn't mean I've verified the facts, I'm not an expert in this field.

Load More