All of Adam Binks's Comments + Replies

Thanks for the newsletter!

Looks like a typo:
> a version of GPT-4 released in 2023 outperformed a version of GPT-4 released in 2021

2
aogara
1mo
Thanks, fixed!

As well as Fatebook for Slack, at Sage we've made other infrastructure aimed at EAs (amongst others!):

  • Fatebook: the fastest way to make and track predictions
  • Fatebook for Chrome: Instantly make and embed predictions, in Google Docs and anywhere else on the web
  • Quantified Intuitions: Practice assigning credences to outcomes with a quick feedback loop
2
Arepo
2mo
Thanks Adam. I've edited those in.

This month's Estimation Game is about effective altruism! You can play here: quantifiedintuitions.org/estimation-game/december

Ten Fermi estimation questions to help you train your estimation skills. Play solo, or with a team - e.g. with friends, coworkers, or your EA group (see info for organisers).

It's also worth checking the archive for other estimation games you might be interested in, e.g. we've ran games on AI, animal welfare + alt proteins, nuclear risk, and big picture history.

2
Jason
4mo
Tough confidence interval questions this round (at least to me).

I'm curious about B12 supplements - I currently take a multivitamin which has 50µg B12, my partner takes a multivitamin with 10µg B12. Should we be taking additional B12 tablets on top of this? (We're both vegan)

I saw in that post a recommendation for 100µg tablets, but google says the RDA is 2.4µg, do you know why there's this gap?

2
Abby Hoskin
7mo
Hahaha, thanks for posting!! :)

I think some subreddits do a good job of moderating to create a culture which is different from the default reddit culture, e.g. /r/askhistorians. See this post for an example, where there are a bunch of comments deleted, including one answer which didn't cite enough sources. Maybe this is what you have in mind when you refer to "moderating with an iron fist" though, which you mention might be destructive!

Seems like the challenge with reddit moderation is that users are travelling between subreddits all the time, and most have low quality/effort discussion... (read more)

We've added a new deck of questions to the calibration training app - The World, then and now.

What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.

Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!

We've also split the existing questions into decks, so you can focus on the topics you're most interested in:

Ah thank you! I've just pushed what should be a fix for this (hard to fully test as I'm in the UK).

1
Angelina Li
9mo
Thanks so much! :) FYI that the top level helper text seems fixed: But the prediction-level helper text is still not locale aware: (Again, not a big deal at all :) )

The July Estimation Game is now live: a 10 question Fermi estimation game all about big picture history! https://quantifiedintuitions.org/estimation-game/july

Question 1:

I was also wondering this - did 80k link to it in their newsletter (which has a big audience)?

Relatedly, I wonder if you can see differences in reported source by the place the survey respondent navigated to the survey from?

Thank you!

Do you look at non-anonymized user data in your analytics and tracking?

No - we don't look at non-anonymised user data in our analytics. We use Google Analytics events, so we can see e.g. a graph of how many forecasts are made each day, and this tracks the ID of each user so we can see e.g. how many users made forecasts each day (to disambiguate a small number of power-users from lots of light users). IDs are random strings of text that might look like cwudksndspdkwj. I think you'd call technically this "pseudo-anonymised" because user IDs are sto... (read more)

1
Angelina Li
9mo
By the way, very tiny bug report: The datestamps are rendering a bit weird? I see the correct date stamp for today under the date select, but the description text in italics is rendering as 'Yesterday', and the 'data-tip' value in the HTML is wrong. Obviously not a big deal, just passing it on :) I'm currently in PST time, where it is 9:39am on 2023.07.25, if it matters. (Let me know if you'd prefer to receive bug reports somewhere else?)
2
Angelina Li
9mo
Thanks for the fast response, all of this sounds very reasonable! :)

Thank you! I'm interested to hear how you find it!

often lacks the motivation to do so consistently

Very relatable! The 10 Conditions for Change framework might be helping for thinking of ways to do it more consistently (if on reflection you really want to!) Fatebook aims to help with 1, 2, 4, 7, and 8, I think.

One way to do more prediction I'm interested in is integrating prediction into workflows. Here are some made-up examples:

  • At the start of a work project, you always forecast how long it'll take (I think this is almost always an important question, and
... (read more)

In many ways Fatebook is a successor to PredictionBook (now >11 years old!) If you've used PredictionBook in the past, you can import all your PredictionBook questions and scores to Fatebook.

In a perfect world, this would also integrate with Alfred on my mac so that it becomes extremely easy and quick to create a new private question


I'm thinking of creating a Chrome extension that will let you type /forecast Will x happen? anywhere on the internet, and it'll create and embed an interactive Fatebook question.

I'm thinking of primarily focussing on Google Docs, because I think the EA community could get a lot of mileage out of making and tracking predictions embedded in reports, strategy docs, etc. This extension would also work in messaging apps, on social media, and even here on the forum (though first-party support might be better for the forum!). 

Great, thanks!

The format could be "[question text]? [resolve date]" where the question mark serves as the indicator for the end of the question text, and the resolve date part can interpret things like "1w", "1y", "eoy", "5d"

I'm interested in adding power user shortcuts like this! 

Currently, if your question text includes a date that Fatebook can recognise, it'll prepopulate the "Resolve by" field with that date. This works for a bunch of common phrases, e.g. "in two weeks" "by next month" "by Jan 2025" "by February" "by tomorrow".

If you play around w... (read more)

The June Estimation Game is animal welfare + alt proteins themed! 10 Fermi estimation questions. You can play here: quantifiedintuitions.org/estimation-game/june

Seems like academic research groups would be a better reference class than YC companies for most alignment labs.

If they're trying to build an org that scales a lot, and is funded by selling products, YC companies is a good reference class, but if they're an org of researchers working somewhat independently or collaborating on hard technical problems, funded by grants, that sounds much more similar to an academic research group.

Unsure how to define success for an academic research group, any ideas? They seem to more often be exploratory and less goal-oriented.

Hmm, I'm not aware of a way to do this (but there might be one). Maybe you could generate two versions of the deck from your orgmode file, one with the Anki with Uncertainty card types and the other with plain card types?

4
Pablo
10mo
Unfortunately, the Emacs package that integrates org-mode with Anki is very poorly maintained and I'm no longer using it for that reason. Currently, my approach is to keep the normal deck but document how to use the add-on, both in the GitHub repository and in the EA Forum post announcing the release of the new version.

I'm excited to see the return of the careers guide as the core 80k resource (vs the key ideas series)! I think it's a better way to provide value to people, because a careers guide is about the individual ("how can I think about what to do with my career?") rather than about 80k ("what are the key ideas of 80k/EA?")

Nice! Thanks for the heads up Elliot - which page are you seeing a missing certificate on? Seems to be working for me

1
ElliotJDavies
1y
Seems to be working for me too now

I've added a basic calibration curve, thanks for the suggestion! 

You can find it in the app's Home tab (click on Fatebook in the left sidebar > Home tab at the top) once at least one question you've forecasted on has resolved.

Great, glad to hear it!

Aggregation choices (e.g. geo mean of odds would be nice)

Geo mean of odds is a good idea - it's probably a more sensible default. How would you feel about us using that everywhere, instead of the current arithmetic mean?

Brier scores for users

You can see your own absolute and relative Brier score in the app home (click Fatebook in the sidebar). If you're thinking of a team-wide leaderboard - that's on our list! Though some users said they wouldn't like this to avoid Goodharting, so I've not prioritised it so far, and will include a te... (read more)

4
Matt_Lerner
1y
I thought of some other down-the-line feature requests * Google Sheets integration (we currently already store our forecasts in a Google sheet) * Relatedly, ability to export to CSV (does this already exist and I just missed it?) * Ability to designate a particular resolver * Different formal resolution mechanisms, like a poll of users.
5
Matt_Lerner
1y
Ah, great! I think it would be nice to offer different aggregation options, though if you do offer one I agree that geo mean of odds is the best default. But I can imagine people wanting to use medians or averages, or even specifying their own aggregation functions. Especially if you are trying to encourage uptake by less technical organizations, it seems important to offer at least one option that is more legible to less numerate people.

I think you could implement a spaced repetition feature based on how many orders of magnitude you’re off, where the more OOMs you're off, the earlier it prompts you with the same question again

 

This is a great idea, so we made Anki with Uncertainty to do exactly this!

Thank you Hauke for the suggestion :D

I think we'll keep the calibration app as a pure calibration training game, where you see each question only once. Anki is already the king of spaced repetition, so adding calibration features to it seemed like a natural fit.

Super interesting to see this analysis,  especially the table of current capabilities - thank you!

 I have interpreted [feasible] as, one year after the forecasted date, have AI labs achieved these milestones, and disclosed this publicly? 

It seems to me that this ends up being more conservative than  the original "Ignore the question of whether they would choose to" , which presumably makes the expert forecasts worse than they seem to be here.

For example,  a task like "win angry birds" seems pretty achievable to me, just that no one... (read more)

5
PatrickL
1y
Thanks Adam :) I have a rough (i.e. considered for <15 minutes) take: if top labs one year ago had attempted these particular milestones, and had the same policies on disclosing capabilities as they currently seem to,  then there's a 40-50% chance they would have achieved 2 of Angry Birds, Atari fifty , Laundry and Go low by now. But I don't put much weight on my prediction, whereas I put a lot more weight on my analysis of what has happened (though this is also somewhat subjective!).  I agree though that checking what has actually happened ends up being more conservative than the original "Ignore the question of whether they would choose to" , which makes the expert forecasts worse than they seem to be here. This is a weakness of this analysis! And of the resolvability of the original survey. Do you have an estimate of how many of the tasks would have been achieved by now if labs tried a year ago?

Thanks for the feedback Forslack! I'm curious whether you'd prefer to play without logging in because you don't have a Google account or because you don't want to share your email?

2
Jason
1y
Not Forslack, but if you're going to ask for permission for Google to share all that info you should have a clear privacy policy visible for what you'll do with it. Also, I don't think you have to request all that info from Google, like real name, to use a Google login.

Thanks very much for the feedback, this is really helpful!

If anyone has question suggestions, I'd really appreciate them! I think crowdsourcing questions will help us make them super varied and globally relevant. I made a suggestion form here https://forms.gle/792QQAfqTrutAH9e6

Thanks for organising! I had a great time, I'd love to see more of these events. Maybe you could circulate a Google Doc beforehand to help people brainstorm ideas, comment on each other's ideas, and indicate interest in working on ideas. You could prepopulate it with ideas you've generated as the organisers. That way when people show up they can get started faster - I think we spent the first hour or so choosing our idea.

(Btw - our BOTEC calculator's first page is at this URL.)

2
Jonny Spicer
1y
I think this is a great idea, thanks for the feedback - I completely agree we want people to be able to hit the ground running on the day. I would imagine groups are most effective when they're formed around strong coders, perhaps there's a way we can work that into the doc. One thing we're considering is an ongoing Discord server, where people could see ideas/projects/who's working on what, etc. The idea would be that the server would persist between events, and move more towards having ongoing projects as above. I think this could potentially solve some of the cold start issues, but I am also hesitant to ask people to join yet another Discord server, and it'd probably need to reach a critical mass of people in order to be valuable. Having written out this comment, I think we will likely start it and push to get it to a good size, and if not we can re-evaluate. Thanks for pointing out the bad link, I've corrected it now!
1
Stenemo
1y
Yes, I like their work! It is great that there are many complementing ways to learn these important topics. Although I have not yet found a good comprehensive playlist for those who want to learn by watching a summary of important concepts.

Interesting to think about! 

But for this kind of bargain to work, wouldn't you need confidence that the you in other worlds would uphold their end of the bargain? 

E.g., if it looks like I'm in videogame-world, it's probably pretty easy to spend lots of time playing videogames. But can I be confident that my counterpart in altruism-world will actually allocate enough of their time towards altruism?

(Note I don't know anything about Nash bargains and only read the non-maths parts of this post, so let me know if this is a basic misunderstanding!)

4
Eric Neyman
1y
Great question -- you absolutely need to take that into account! You can only bargain with people who you expect to uphold the bargain. This probably means that when you're bargaining, you should weight "you in other worlds" in proportion to how likely they are to uphold the bargain. This seems really hard to think about and probably ties in with a bunch of complicated questions around decision theory.

This is a really useful round-up, thank you!

A data-point on this - today I was looking for and couldn't find this graph. I found effectivealtruismdata.com but sadly it didn't have these graphs on it. So would be cool to have it on there, or at least link to this post from there!

Thanks Jack, great to see this!

Pulling out the relevant part as a quote for other readers:

  • On average, it took about 25 hours to organize and run a campaign (20 hours by organizers and 5 hours by HIP).
  • The events generated an average of 786 USD per hour of counterfactual donations to effective charities.
  • This makes fundraising campaigns a very cost effective means of counterfactual impact; as a comparison, direct work that generates 1,000,000 USD of impact equivalent per year equates to around 500 USD per hour.

Great results so far!

High Impact Professionals supported 8 EAs to run fundraising drives at their workplace in 2021, raising $240k in counterfactual dollars. On an hourly basis, organizing those events proved to be as impactful as direct work

Could you share the numbers you used to calculate this? I.e. how many hours to organise an event, counterfactual dollars per hour organising/running events, and your estimate for the value per hour of direct work?

7
Jack Lewars
2y
Hi Adam - sure - https://bit.ly/3BiJRP3 We'll also link to this in the OP.

it'd be really valuable for more EA-aligned people to goddamn write summaries at all

To get more people to write summaries for long forum posts, we could try adding it to the forum new post submission form? e.g. if the post text is over x words, a small message shows up advising you to add a summary.

Or maybe you're thinking more of other formats, like Google docs?

3
MichaelA
2y
Yeah, I've actually discussed that idea briefly with the EA Forum team and I think it'd probably be good. I'll send a link to this thread to them to give them one more data point in favor of doing this. (Though it's plausible to me that there's some reason they shouldn't do this which I'm overlooking - I'd trust their bottom-line views here more than mine.) But yeah, I'm also thinking of GDocs, blog posts posted elsewhere, and any other format, so I think we also need nudges like this post. 

Great to see this writeup, thank you!

In the runup to EAG SF I've been thinking a bit about travel funding allocation. I thought I could take this opportunity to share two problems and tentative solutions, as I imagine they hold across different conferences (including EAGx Boston).

Thing 1: Uncertainty around how much to apply for

In conversations with other people attending I've found that people are often quite uncertain and nervous when working out how much to apply for. 

One way to improve this could be to encourage applicants to follow a simple proce... (read more)

Thanks Ankush! For this first round, we keep things intentionally short, but if your project progresses to later rounds then there will be plenty of opportunities to share more details.

it is a pdf that I would love to get valued and be shared with the world and anyone who wants to hear about longtermism project

Posting your ideas here on the EA Forum could be a great way to get feedback from other people interested in longtermism!

Thanks Stuart, I'll DM you to work out the details here!

Maybe something helpful to think about is, what's your goal?

E.g. maybe:

  • You want to stay on top of new papers in AI capabilities
  • You want to feel connected to the AI safety research community
  • You want to build a network of people in AI research / AI safety research, so that in future you could ask people for advice about a career decision
  • You want to feel more motivated for your own self study in machine learning
  • You want to workshop your own ideas around AI, and get rapid feedback from researchers and thinkers

I think for some goals, Twitter is unusually helpfu... (read more)

and the answer is “randomista development, animal welfare, extreme pandemic mitigation and AI alignment”

 

Some people came up with a set of answers, enough of us agree with this set and they’ve been the same answers for long enough that they’re an important part of EA identities

I think some EAs would consider work on other areas like space governance and improving institutional decision-making highly impactful. And some might say that randomista development and animal welfare are less impactful than work on x-risks, even though the community has focussed on them for a long time.

1[comment deleted]2y

This is exciting! If you've got this far in your planning yet, I'd love to hear more about how the journal will be promoted and how you plan for readers to find you? Do you have any examples of "user stories" - stories about the kind of reader you'd hope to attract, how they'd find the journal, and what it might lead them to do subsequently?

It's also a nice nudge for people to read the books (I remember reading Doing Good Better in a couple of weeks because a friend/organiser had lent it to me and I didn't want to keep him waiting).

Great to see tools like this that make assumptions clear - I think not only useful as a calculator but as a concrete operalisation of your model of AI risk, which is a good starting point for discussion. Thanks for creating!

Hi Tom! I think this idea of giving based on the signalling value is an interesting one.

One idea - I wonder if you could capture a lot of the signalling value while only moving a small part of your donation budget to non-xrisk causes?

How that would work: when you're talking to people about your GWWC donations, if you think they'd be more receptive to global health/animal ideas you can tell them about your giving to those charities. And then (if you think they'd be receptive) you can go on to say that ultimately you think the most pressing problems are xri... (read more)

Quick meta note to say I really enjoyed the length of this post, exploring one idea in enough detail to spark thoughts but high readable. Thank you!

You might be aware of this but for others reading -  there's a calculator to help you work out the value of your time.

 I think it's worth doing once (and repeating when your circumstances change, e.g. new job), then just using that as a general heuristic to make time-money tradeoffs, rather than deliberating every time.

If I was an EA grantmaker, I'd want to start small by maybe hiring an educational-youtube-video personality (like John Green's "Crash Course") to make an Effective Altruism series. 

I think this is in the works! Kurtzegat got a $2.8m grant from Open Phil

See also A Happier World and Rational Animations.

Great post, thank you! This is useful as a guide to what to try and add in to intro fellowships, in particular:

There are a lot of real professional people in EA, and those people are influencing things in the real world – EA is by no means just a philosophy discussion club, even if your local EA club is one (and it does not have to be one forever!)

I think this is a really important realisation to have as someone doing an intro fellowship/getting into EA. My guess is that realising this makes it a lot easier to think seriously about making career choices ba... (read more)

3
Ada-Maaria Hyvärinen
2y
With EA career stories I think it is important to to keep in mind that new members might not read them the same way as more engaged EAs who already know what organization is considered cool and effective within EA.  When I started attending local EA meetups I met a person who worked at OpenPhil (maybe as a contractor? I can't remember the details), but I did not find it particularly impressive because I did not know what OpenPhilanthropy was and assumed the "phil" stood for "philosophy". 
3
Tom Gardiner
2y
I was going to suggest the last point, but you're way ahead of me! In the next couple of years, the first batch of St Andrews EAs will have fully entered the world of work/advanced study, and keeping some record of what the alumni are doing would be meaningful.  [As highlighted in the thread post, we are two EAs who know each other outside the forum.]
Load more