All of Adam Binksmith&#x27;s Comments + Replies

Project idea: AI for epistemics

a tool to create a dashboard of publicly available forecasts on different platforms

You might be interested in Metaforecast (you can create custom dashboards).

Also loosely related - on AI Digest we have a timeline of AI forecasts pulling from Metaculus and Manifold.

EffectiveAdvocate🔸

I am aware of Metaforecast, but from what I understood, it is no longer maintained. Last time I checked, it did not work with Metaculus anymore. It is also not very easy to use, to be honest.

AI for epistemics/forecasting is something we're considering working on at Sage - we're hiring technical members of staff. I'd be interested to chat to other people thinking about this.

Depending on the results of our experiments, we might integrate this into our forecasting platform Fatebook, or build something new, or decide not to focus on this.

Job Post Template

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets

[Do you have a work trial? This will be a deal breaker for many]

Based on your conversations with developers, do you have a rough guess at what % this is a deal breaker for?

I'm curious if this is typically specific to an in-person work trial, vs how much deal-breaking would be avoided by a remote trial, e.g. 3 days Sat-Mon.

Yonatan Cale

It's less of "%" and more of "who will this intimidate". Many of your top candidates will (1) currently be working somewhere, and (2) will look at many EA aligned jobs, and if many of them require a work trial then that could be a problem. (I just hired someone who was working full time, and I assume if we required a work trial then he just wouldn't be able to do it without quitting) Easy ways to make this better: 1. If you have flexibility (for example, whether the work trial is local or remote, or when it is, or something else), then say that in the job post. 1. It was common for me to hear that candidates didn't even apply because of something like that which is written as a strict requirement, and then for me to hear from an employer that they didn't really care about it. 2. If your candidates will feel comfortable talking to you and telling you about things like this, and then you can find a solution together - I imagine that would be great. Also, some candidates will WANT a work trial to see how the job actually is. I asked for a work trial in my current job. Also, CEA does work trials. You could ask them how it goes. (But they won't hear about people who didn't even apply, I guess)

Adam Binksmith2y1

Thanks for the newsletter!

Looks like a typo:
> a version of GPT-4 released in 2023 outperformed a version of GPT-4 released in 2021

aog

Thanks, fixed!

Let's advertise EA infrastructure projects, Feb 2024

Let's celebrate some wins

As well as Fatebook for Slack, at Sage we've made other infrastructure aimed at EAs (amongst others!):

Fatebook: the fastest way to make and track predictions
Fatebook for Chrome: Instantly make and embed predictions, in Google Docs and anywhere else on the web
Quantified Intuitions: Practice assigning credences to outcomes with a quick feedback loop

Arepo

Thanks Adam. I've edited those in.

Adam Binks's Quick takes

Adam Binksmith2y5

Forecasting

This month's Estimation Game is about effective altruism! You can play here: quantifiedintuitions.org/estimation-game/december

Ten Fermi estimation questions to help you train your estimation skills. Play solo, or with a team - e.g. with friends, coworkers, or your EA group (see info for organisers).

It's also worth checking the archive for other estimation games you might be interested in, e.g. we've ran games on AI, animal welfare + alt proteins, nuclear risk, and big picture history.

Jason

Tough confidence interval questions this round (at least to me).

Adam Binksmith2y1

I'm curious about B12 supplements - I currently take a multivitamin which has 50µg B12, my partner takes a multivitamin with 10µg B12. Should we be taking additional B12 tablets on top of this? (We're both vegan)

I saw in that post a recommendation for 100µg tablets, but google says the RDA is 2.4µg, do you know why there's this gap?

Adam Binksmith2y5

Here's the link to apply! https://80000hours.org/speak-with-us/

Abby Babby

Hahaha, thanks for posting!! :)

AMA: We’re the Forum team (and we’re hiring). Ask us anything!

Adam Binksmith2y4

I think some subreddits do a good job of moderating to create a culture which is different from the default reddit culture, e.g. /r/askhistorians. See this post for an example, where there are a bunch of comments deleted, including one answer which didn't cite enough sources. Maybe this is what you have in mind when you refer to "moderating with an iron fist" though, which you mention might be destructive!

Seems like the challenge with reddit moderation is that users are travelling between subreddits all the time, and most have low quality/effort discussion... (read more)

Quantified Intuitions: An epistemics training website including a new EA-themed calibration app

Adam Binksmith2y1

We've added a new deck of questions to the calibration training app - The World, then and now.

What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.

Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!

We've also split the existing questions into decks, so you can focus on the topics you're most interested in:

Fatebook: the fastest way to make and track predictions

Ah thank you! I've just pushed what should be a fix for this (hard to fully test as I'm in the UK).

Angelina Li

Thanks so much! :) FYI that the top level helper text seems fixed: But the prediction-level helper text is still not locale aware: (Again, not a big deal at all :) )

Adam Binks's Quick takes

EA Survey 2022: How People Get Involved in EA

Forecasting

The July Estimation Game is now live: a 10 question Fermi estimation game all about big picture history! https://quantifiedintuitions.org/estimation-game/july

Question 1:

Adam Binksmith3y6

I was also wondering this - did 80k link to it in their newsletter (which has a big audience)?

Relatedly, I wonder if you can see differences in reported source by the place the survey respondent navigated to the survey from?

Fatebook: the fastest way to make and track predictions

Adam Binksmith3y3

Thank you!

Do you look at non-anonymized user data in your analytics and tracking?

No - we don't look at non-anonymised user data in our analytics. We use Google Analytics events, so we can see e.g. a graph of how many forecasts are made each day, and this tracks the ID of each user so we can see e.g. how many users made forecasts each day (to disambiguate a small number of power-users from lots of light users). IDs are random strings of text that might look like cwudksndspdkwj. I think you'd call technically this "pseudo-anonymised" because user IDs are sto... (read more)

Angelina Li

By the way, very tiny bug report: The datestamps are rendering a bit weird? I see the correct date stamp for today under the date select, but the description text in italics is rendering as 'Yesterday', and the 'data-tip' value in the HTML is wrong. Obviously not a big deal, just passing it on :) I'm currently in PST time, where it is 9:39am on 2023.07.25, if it matters. (Let me know if you'd prefer to receive bug reports somewhere else?)

Angelina Li

Thanks for the fast response, all of this sounds very reasonable! :)

Fatebook: the fastest way to make and track predictions

Fatebook: the fastest way to make and track predictions

Thank you! I'm interested to hear how you find it!

often lacks the motivation to do so consistently

Very relatable! The 10 Conditions for Change framework might be helping for thinking of ways to do it more consistently (if on reflection you really want to!) Fatebook aims to help with 1, 2, 4, 7, and 8, I think.

One way to do more prediction I'm interested in is integrating prediction into workflows. Here are some made-up examples:

At the start of a work project, you always forecast how long it'll take (I think this is almost always an important question, and

Adam Binksmith3y9

In many ways Fatebook is a successor to PredictionBook (now >11 years old!) If you've used PredictionBook in the past, you can import all your PredictionBook questions and scores to Fatebook.

Adam Binksmith3y4

In a perfect world, this would also integrate with Alfred on my mac so that it becomes extremely easy and quick to create a new private question

I'm thinking of creating a Chrome extension that will let you type /forecast Will x happen? anywhere on the internet, and it'll create and embed an interactive Fatebook question. EDIT: we created this, the Fatebook browser extension.

I'm thinking of primarily focussing on Google Docs, because I think the EA community could get a lot of mileage out of making and tracking predictions embedded in reports, strategy docs... (read more)

Fatebook: the fastest way to make and track predictions

Critiques of non-existent AI safety labs: Yours

Great, thanks!

The format could be "[question text]? [resolve date]" where the question mark serves as the indicator for the end of the question text, and the resolve date part can interpret things like "1w", "1y", "eoy", "5d"

I'm interested in adding power user shortcuts like this!

Currently, if your question text includes a date that Fatebook can recognise, it'll prepopulate the "Resolve by" field with that date. This works for a bunch of common phrases, e.g. "in two weeks" "by next month" "by Jan 2025" "by February" "by tomorrow".

If you play around w... (read more)

Adam Binks's Quick takes

Adam Binksmith3y4

Animal welfareShow more

The June Estimation Game is animal welfare + alt proteins themed! 10 Fermi estimation questions. You can play here: quantifiedintuitions.org/estimation-game/june

Adam Binksmith3y43

Seems like academic research groups would be a better reference class than YC companies for most alignment labs.

If they're trying to build an org that scales a lot, and is funded by selling products, YC companies is a good reference class, but if they're an org of researchers working somewhat independently or collaborating on hard technical problems, funded by grants, that sounds much more similar to an academic research group.

Unsure how to define success for an academic research group, any ideas? They seem to more often be exploratory and less goal-oriented.

Anki deck for "Some key numbers that (almost) every EA should know"

Give feedback on the new 80,000 Hours career guide

Hmm, I'm not aware of a way to do this (but there might be one). Maybe you could generate two versions of the deck from your orgmode file, one with the Anki with Uncertainty card types and the other with plain card types?

Pablo

Unfortunately, the Emacs package that integrates org-mode with Anki is very poorly maintained and I'm no longer using it for that reason. Currently, my approach is to keep the normal deck but document how to use the add-on, both in the GitHub repository and in the EA Forum post announcing the release of the new version.

Adam Binksmith3y7

I'm excited to see the return of the careers guide as the core 80k resource (vs the key ideas series)! I think it's a better way to provide value to people, because a careers guide is about the individual ("how can I think about what to do with my career?") rather than about 80k ("what are the key ideas of 80k/EA?")

Fatebook for Slack: Track your forecasts, right where your team works

Fatebook for Slack: Track your forecasts, right where your team works

Great :)

Fatebook for Slack: Track your forecasts, right where your team works

Nice! Thanks for the heads up Elliot - which page are you seeing a missing certificate on? Seems to be working for me

ElliotJDavies

Seems to be working for me too now

Fatebook for Slack: Track your forecasts, right where your team works

I've added a basic calibration curve, thanks for the suggestion!

You can find it in the app's Home tab (click on Fatebook in the left sidebar > Home tab at the top) once at least one question you've forecasted on has resolved.

Adam Binksmith3y3

Great, glad to hear it!

Aggregation choices (e.g. geo mean of odds would be nice)

Geo mean of odds is a good idea - it's probably a more sensible default. How would you feel about us using that everywhere, instead of the current arithmetic mean?

Brier scores for users

You can see your own absolute and relative Brier score in the app home (click Fatebook in the sidebar). If you're thinking of a team-wide leaderboard - that's on our list! Though some users said they wouldn't like this to avoid Goodharting, so I've not prioritised it so far, and will include a te... (read more)

Matt_Lerner

I thought of some other down-the-line feature requests * Google Sheets integration (we currently already store our forecasts in a Google sheet) * Relatedly, ability to export to CSV (does this already exist and I just missed it?) * Ability to designate a particular resolver * Different formal resolution mechanisms, like a poll of users.

Matt_Lerner

Ah, great! I think it would be nice to offer different aggregation options, though if you do offer one I agree that geo mean of odds is the best default. But I can imagine people wanting to use medians or averages, or even specifying their own aggregation functions. Especially if you are trying to encourage uptake by less technical organizations, it seems important to offer at least one option that is more legible to less numerate people.

Quantified Intuitions: An epistemics training website including a new EA-themed calibration app

Adam Binksmith3y8

I think you could implement a spaced repetition feature based on how many orders of magnitude you’re off, where the more OOMs you're off, the earlier it prompts you with the same question again

This is a great idea, so we made Anki with Uncertainty to do exactly this!

Thank you Hauke for the suggestion :D

I think we'll keep the calibration app as a pure calibration training game, where you see each question only once. Anki is already the king of spaced repetition, so adding calibration features to it seemed like a natural fit.

Scoring forecasts from the 2016 “Expert Survey on Progress in AI”

The Estimation Game: a monthly Fermi estimation web app

Super interesting to see this analysis, especially the table of current capabilities - thank you!

I have interpreted [feasible] as, one year after the forecasted date, have AI labs achieved these milestones, and disclosed this publicly?

It seems to me that this ends up being more conservative than the original "Ignore the question of whether they would choose to" , which presumably makes the expert forecasts worse than they seem to be here.

For example, a task like "win angry birds" seems pretty achievable to me, just that no one... (read more)

PatrickL

Thanks Adam :) I have a rough (i.e. considered for <15 minutes) take: if top labs one year ago had attempted these particular milestones, and had the same policies on disclosing capabilities as they currently seem to, then there's a 40-50% chance they would have achieved 2 of Angry Birds, Atari fifty , Laundry and Go low by now. But I don't put much weight on my prediction, whereas I put a lot more weight on my analysis of what has happened (though this is also somewhat subjective!). I agree though that checking what has actually happened ends up being more conservative than the original "Ignore the question of whether they would choose to" , which makes the expert forecasts worse than they seem to be here. This is a weakness of this analysis! And of the resolvability of the original survey. Do you have an estimate of how many of the tasks would have been achieved by now if labs tried a year ago?

The Estimation Game: a monthly Fermi estimation web app

Thanks for the feedback Forslack! I'm curious whether you'd prefer to play without logging in because you don't have a Google account or because you don't want to share your email?

Jason

Not Forslack, but if you're going to ask for permission for Google to share all that info you should have a clear privacy policy visible for what you'll do with it. Also, I don't think you have to request all that info from Google, like real name, to use a Google login.

EA London Hackathon Retrospective

Thanks very much for the feedback, this is really helpful!

If anyone has question suggestions, I'd really appreciate them! I think crowdsourcing questions will help us make them super varied and globally relevant. I made a suggestion form here https://forms.gle/792QQAfqTrutAH9e6

Adam Binksmith3y10

Thanks for organising! I had a great time, I'd love to see more of these events. Maybe you could circulate a Google Doc beforehand to help people brainstorm ideas, comment on each other's ideas, and indicate interest in working on ideas. You could prepopulate it with ideas you've generated as the organisers. That way when people show up they can get started faster - I think we spent the first hour or so choosing our idea.

(Btw - our BOTEC calculator's first page is at this URL.)

Jonny Spicer 🔸

I think this is a great idea, thanks for the feedback - I completely agree we want people to be able to hit the ground running on the day. I would imagine groups are most effective when they're formed around strong coders, perhaps there's a way we can work that into the doc. One thing we're considering is an ongoing Discord server, where people could see ideas/projects/who's working on what, etc. The idea would be that the server would persist between events, and move more towards having ongoing projects as above. I think this could potentially solve some of the cold start issues, but I am also hesitant to ask people to join yet another Discord server, and it'd probably need to reach a critical mass of people in order to be valuable. Having written out this comment, I think we will likely start it and push to get it to a good size, and if not we can re-evaluate. Thanks for pointing out the bad link, I've corrected it now!

100+ Youtube videos with important practical knowledge you probably should know

Adam Binksmith3y4

You might be interested in Clearer Thinking's 60+ interactive tools and learning modules :)

Stenemo

Yes, I like their work! It is great that there are many complementing ways to learn these important topics. Although I have not yet found a good comprehensive playlist for those who want to learn by watching a summary of important concepts.

Solving for the optimal work-life balance with geometric rationality

Tracking the money flows in forecasting

Interesting to think about!

But for this kind of bargain to work, wouldn't you need confidence that the you in other worlds would uphold their end of the bargain?

E.g., if it looks like I'm in videogame-world, it's probably pretty easy to spend lots of time playing videogames. But can I be confident that my counterpart in altruism-world will actually allocate enough of their time towards altruism?

(Note I don't know anything about Nash bargains and only read the non-maths parts of this post, so let me know if this is a basic misunderstanding!)

Eric Neyman

Great question -- you absolutely need to take that into account! You can only bargain with people who you expect to uphold the bargain. This probably means that when you're bargaining, you should weight "you in other worlds" in proportion to how likely they are to uphold the bargain. This seems really hard to think about and probably ties in with a bunch of complicated questions around decision theory.

Adam Binksmith3y7

This is a really useful round-up, thank you!

Historical EA funding data

Book a corporate event for Giving Season

A data-point on this - today I was looking for and couldn't find this graph. I found effectivealtruismdata.com but sadly it didn't have these graphs on it. So would be cool to have it on there, or at least link to this post from there!

Book a corporate event for Giving Season

Thanks Jack, great to see this!

Pulling out the relevant part as a quote for other readers:

On average, it took about 25 hours to organize and run a campaign (20 hours by organizers and 5 hours by HIP).
The events generated an average of 786 USD per hour of counterfactual donations to effective charities.
This makes fundraising campaigns a very cost effective means of counterfactual impact; as a comparison, direct work that generates 1,000,000 USD of impact equivalent per year equates to around 500 USD per hour.

Adam Binksmith3y3

Great results so far!

High Impact Professionals supported 8 EAs to run fundraising drives at their workplace in 2021, raising $240k in counterfactual dollars. On an hourly basis, organizing those events proved to be as impactful as direct work

Could you share the numbers you used to calculate this? I.e. how many hours to organise an event, counterfactual dollars per hour organising/running events, and your estimate for the value per hour of direct work?

Jack Lewars

Hi Adam - sure - https://bit.ly/3BiJRP3 We'll also link to this in the OP.

Using the “executive summary” style: writing that respects your reader’s time

Adam Binksmith4y4

it'd be really valuable for more EA-aligned people to goddamn write summaries at all

To get more people to write summaries for long forum posts, we could try adding it to the forum new post submission form? e.g. if the post text is over x words, a small message shows up advising you to add a summary.

Or maybe you're thinking more of other formats, like Google docs?

MichaelA🔸

Yeah, I've actually discussed that idea briefly with the EA Forum team and I think it'd probably be good. I'll send a link to this thread to them to give them one more data point in favor of doing this. (Though it's plausible to me that there's some reason they shouldn't do this which I'm overlooking - I'd trust their bottom-line views here more than mine.) But yeah, I'm also thinking of GDocs, blog posts posted elsewhere, and any other format, so I think we also need nudges like this post.

EAGxBoston 2022: Retrospective

Adam Binksmith4y2

Great to see this writeup, thank you!

In the runup to EAG SF I've been thinking a bit about travel funding allocation. I thought I could take this opportunity to share two problems and tentative solutions, as I imagine they hold across different conferences (including EAGx Boston).

Thing 1: Uncertainty around how much to apply for

In conversations with other people attending I've found that people are often quite uncertain and nervous when working out how much to apply for.

One way to improve this could be to encourage applicants to follow a simple proce... (read more)

Announcing the Clearer Thinking Regrants program

Announcing the Clearer Thinking Regrants program

Update: deadline extended to July 22nd!

Announcing the Clearer Thinking Regrants program

Thanks Ankush! For this first round, we keep things intentionally short, but if your project progresses to later rounds then there will be plenty of opportunities to share more details.

it is a pdf that I would love to get valued and be shared with the world and anyone who wants to hear about longtermism project

Posting your ideas here on the EA Forum could be a great way to get feedback from other people interested in longtermism!

AI Twitter accounts to follow?

Thanks Stuart, I'll DM you to work out the details here!