All of Adam Binksmith's Comments + Replies

This would be convenient! I wonder if you could have a fairly-decent first pass via a Chrome extension that hides all non-vegan items from the UI (or just greys them out).

You could probably use LLMs to do a decent first-pass on whether items are vegan. It'll be obvious for many (e.g. vegetables, meat), and for non-obvious ones you could kick off a research agent who finds an up to date ingredient list or discussion thread. Then add the ability for users to correct classification mistakes and you'd probably be able to classify most foods quite accurately.

Then promote the Chrome extension via vegan magazines, influencers, veganuary, etc.

0
Tom Cohen Ben-Arye
This is a thoughtful suggestion, and technically it’s quite feasible. The key issue, though, is effortlessness, which is the main behavioral bottleneck we’re trying to solve. A Chrome extension (or app) is, by definition, an opt-in solution. It only helps people who are already motivated enough to discover it, install it, and keep using it. In practice, that means it mainly serves committed vegans - not the much larger group of new vegans, flexitarians, or people trying to reduce animal products, where most of the impact lies. By contrast, a native vegan filter inside the supermarket UI is default-available, works on mobile, and requires zero setup. That difference matters a lot: friction at the point of purchase is one of the strongest predictors of vegan dropout. We agree that LLMs, crowdsourcing, and user corrections are promising for data generation. In fact, those approaches are likely part of the backend solution. But as a delivery mechanism, browser extensions don’t scale to mainstream users in the way platform-level features do. In short: the proposed extension could be a useful prototype or data-gathering tool, but it doesn’t solve the core problem - making vegan shopping effortless for everyone, by default.
1
UriKatz
I used a meal logging app once and the database it had was incredible, though not perfect. If the item had a barcode, the app had its nutritional data. So extension, agent, even an app with a camera can all work. Of course, I live in the US.

Very useful post!

Slowness

Relative to other foundations of a similar size, I think OP moves fast; relative to startups, other AIS founders, and smaller organisations (i.e., almost all other AIS organisations), I think OP moves slowly.

I'm curious what this slowness feels like as a grantmaker. I guess you progress one grant at speed and then it goes off for review and you work on other stuff, and then ages later your first grant comes back from review, and then maybe there are a few rounds of this? Or is it more spending more time on each thing than you might... (read more)

6
cb
Thanks!  Yeah, so I think the best way to think of the slowness is that there are are bottlenecks to grants getting made: things need to get signed off on by senior decision-makers, and they're very capacity-constrained (hence, in part, hiring for more senior generalists), so it might take a while for people to get to any particular grant decision you want them to get to. Also, as a more junior grantmaker, you're incentivized to make it as easy as possible for these senior decisionmakers to engage with your thoughts and not need follow-up information from you, which pushes towards you spending more time on grant investigations.  In terms of the options you listed, I think it's closest to "spending more time on each thing than you might prefer". (All this being said, I do think leadership is aware of this and working on ways we can move faster, especially for low-risk grants. Recently, we've been able to make low-risk technical grants much faster and with less time invested, which I think has been an exciting development!)

I think this post's argument assumes your $100k is lost by default if you don't have a will, but on a quick GPT-5 query it looks like in the UK it goes to your spouse, parents, siblings or siblings' children, and in California something similar. Assuming you're survived by a spouse or these family members and you're happy with your assets going to them then it seems like it's not the same as cancelling a $15/mo subscription. (But plausibly still worth it, I think it just needs a bit more explanation!)

2
AgentMa🔸
Yes good point. Was thinking of this as "how many potential QUALYs would be lost if your money went to your not-so-EA relatives". But yes, if you think they would spend the money wisely then this makes sense. 

I enjoy [...] strategy, new projects, improving systems

 

Maybe advising other orgs would be a good fit for this? E.g. advising startups in your area

Really great presentation on the tool, I was impressed when I stumbled across this a few weeks ago!

Great thanks!

We have two outputs in mind with this project:

1. Reports on a specific thinker (e.g. Gwern) or body of work's predictions. These would probably be published individually or showing interesting comparisons, similar to the Futurists track record in Cold Takes (based on Arb's Big Three research)
2. A dashboard ranking the track records of lots of thinkers

For (2), I agree that cherry picking would be bad, and we'd want it to cover a good range.

For our initial outputs from (1) though, I'm excited about specifically picking thinkers who people would ... (read more)

1
Aidan🔸
Awesome, thanks Adam, this makes a lot of sense. I'd be excited to see reports on specific thinkers like Gwern and Yuval Noah Harrari. I'd be especially excited to look at the track records of institutions, like frontier developers or governments (e.g. the UK Government or its AISI).

Very interesting!

I'd be interested to hear a bit more about what a restrained system would be able to do. 

For example, could I make two restrained AGIs, one which has the goal:

A) "create a detailed plan plan.txt for maximising profit"

And another which has the goal:

B) "execute the plan written in plan.txt"?

If not, I'm not clear on why "make a cure for cancer" is scope-insensitive but "write a detailed plan for [maximising goal]" is scope-sensitive

Some more test case goals to probe the definition:

C) "make a maximal success rate cure for cancer"

D) "write a detailed plan for generating exactly $10^100 USD profit for my company"

a tool to create a dashboard of publicly available forecasts on different platforms

 

You might be interested in Metaforecast (you can create custom dashboards).

Also loosely related - on AI Digest we have a timeline of AI forecasts pulling from Metaculus and Manifold.

1
EffectiveAdvocate🔸
I am aware of Metaforecast, but from what I understood, it is no longer maintained. Last time I checked, it did not work with Metaculus anymore. It is also not very easy to use, to be honest. 

AI for epistemics/forecasting is something we're considering working on at Sage - we're hiring technical members of staff. I'd be interested to chat to other people thinking about this.

Depending on the results of our experiments, we might integrate this into our forecasting platform Fatebook, or build something new, or decide not to focus on this.

[Do you have a work trial? This will be a deal breaker for many]
 

Based on your conversations with developers, do you have a rough guess at what % this is a deal breaker for?

I'm curious if this is typically specific to an in-person work trial, vs how much deal-breaking would be avoided by a remote trial, e.g. 3 days Sat-Mon.

5
Yonatan Cale
It's less of "%" and more of "who will this intimidate". Many of your top candidates will (1) currently be working somewhere, and (2) will look at many EA aligned jobs, and if many of them require a work trial then that could be a problem. (I just hired someone who was working full time, and I assume if we required a work trial then he just wouldn't be able to do it without quitting)   Easy ways to make this better: 1. If you have flexibility (for example, whether the work trial is local or remote, or when it is, or something else), then say that in the job post.  1. It was common for me to hear that candidates didn't even apply because of something like that which is written as a strict requirement, and then for me to hear from an employer that they didn't really care about it. 2. If your candidates will feel comfortable talking to you and telling you about things like this, and then you can find a solution together - I imagine that would be great.   Also, some candidates will WANT a work trial to see how the job actually is. I asked for a work trial in my current job.   Also, CEA does work trials. You could ask them how it goes. (But they won't hear about people who didn't even apply, I guess)

Thanks for the newsletter!

Looks like a typo:
> a version of GPT-4 released in 2023 outperformed a version of GPT-4 released in 2021

2
aog
Thanks, fixed!

As well as Fatebook for Slack, at Sage we've made other infrastructure aimed at EAs (amongst others!):

  • Fatebook: the fastest way to make and track predictions
  • Fatebook for Chrome: Instantly make and embed predictions, in Google Docs and anywhere else on the web
  • Quantified Intuitions: Practice assigning credences to outcomes with a quick feedback loop
2
Arepo
Thanks Adam. I've edited those in.

This month's Estimation Game is about effective altruism! You can play here: quantifiedintuitions.org/estimation-game/december

Ten Fermi estimation questions to help you train your estimation skills. Play solo, or with a team - e.g. with friends, coworkers, or your EA group (see info for organisers).

It's also worth checking the archive for other estimation games you might be interested in, e.g. we've ran games on AI, animal welfare + alt proteins, nuclear risk, and big picture history.

2
Jason
Tough confidence interval questions this round (at least to me).

I'm curious about B12 supplements - I currently take a multivitamin which has 50µg B12, my partner takes a multivitamin with 10µg B12. Should we be taking additional B12 tablets on top of this? (We're both vegan)

I saw in that post a recommendation for 100µg tablets, but google says the RDA is 2.4µg, do you know why there's this gap?

2
Abby Babby
Hahaha, thanks for posting!! :)

I think some subreddits do a good job of moderating to create a culture which is different from the default reddit culture, e.g. /r/askhistorians. See this post for an example, where there are a bunch of comments deleted, including one answer which didn't cite enough sources. Maybe this is what you have in mind when you refer to "moderating with an iron fist" though, which you mention might be destructive!

Seems like the challenge with reddit moderation is that users are travelling between subreddits all the time, and most have low quality/effort discussion... (read more)

We've added a new deck of questions to the calibration training app - The World, then and now.

What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.

Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!

We've also split the existing questions into decks, so you can focus on the topics you're most interested in:

Ah thank you! I've just pushed what should be a fix for this (hard to fully test as I'm in the UK).

1
Angelina Li
Thanks so much! :) FYI that the top level helper text seems fixed: But the prediction-level helper text is still not locale aware: (Again, not a big deal at all :) )

The July Estimation Game is now live: a 10 question Fermi estimation game all about big picture history! https://quantifiedintuitions.org/estimation-game/july

Question 1:

I was also wondering this - did 80k link to it in their newsletter (which has a big audience)?

Relatedly, I wonder if you can see differences in reported source by the place the survey respondent navigated to the survey from?

Thank you!

Do you look at non-anonymized user data in your analytics and tracking?

No - we don't look at non-anonymised user data in our analytics. We use Google Analytics events, so we can see e.g. a graph of how many forecasts are made each day, and this tracks the ID of each user so we can see e.g. how many users made forecasts each day (to disambiguate a small number of power-users from lots of light users). IDs are random strings of text that might look like cwudksndspdkwj. I think you'd call technically this "pseudo-anonymised" because user IDs are sto... (read more)

1
Angelina Li
By the way, very tiny bug report: The datestamps are rendering a bit weird? I see the correct date stamp for today under the date select, but the description text in italics is rendering as 'Yesterday', and the 'data-tip' value in the HTML is wrong. Obviously not a big deal, just passing it on :) I'm currently in PST time, where it is 9:39am on 2023.07.25, if it matters. (Let me know if you'd prefer to receive bug reports somewhere else?)
1
Angelina Li
Thanks for the fast response, all of this sounds very reasonable! :)

Thank you! I'm interested to hear how you find it!

often lacks the motivation to do so consistently

Very relatable! The 10 Conditions for Change framework might be helping for thinking of ways to do it more consistently (if on reflection you really want to!) Fatebook aims to help with 1, 2, 4, 7, and 8, I think.

One way to do more prediction I'm interested in is integrating prediction into workflows. Here are some made-up examples:

  • At the start of a work project, you always forecast how long it'll take (I think this is almost always an important question, and
... (read more)

In many ways Fatebook is a successor to PredictionBook (now >11 years old!) If you've used PredictionBook in the past, you can import all your PredictionBook questions and scores to Fatebook.

In a perfect world, this would also integrate with Alfred on my mac so that it becomes extremely easy and quick to create a new private question


I'm thinking of creating a Chrome extension that will let you type /forecast Will x happen? anywhere on the internet, and it'll create and embed an interactive Fatebook question. EDIT: we created this, the Fatebook browser extension.

I'm thinking of primarily focussing on Google Docs, because I think the EA community could get a lot of mileage out of making and tracking predictions embedded in reports, strategy docs... (read more)

Great, thanks!

The format could be "[question text]? [resolve date]" where the question mark serves as the indicator for the end of the question text, and the resolve date part can interpret things like "1w", "1y", "eoy", "5d"

I'm interested in adding power user shortcuts like this! 

Currently, if your question text includes a date that Fatebook can recognise, it'll prepopulate the "Resolve by" field with that date. This works for a bunch of common phrases, e.g. "in two weeks" "by next month" "by Jan 2025" "by February" "by tomorrow".

If you play around w... (read more)

The June Estimation Game is animal welfare + alt proteins themed! 10 Fermi estimation questions. You can play here: quantifiedintuitions.org/estimation-game/june

Seems like academic research groups would be a better reference class than YC companies for most alignment labs.

If they're trying to build an org that scales a lot, and is funded by selling products, YC companies is a good reference class, but if they're an org of researchers working somewhat independently or collaborating on hard technical problems, funded by grants, that sounds much more similar to an academic research group.

Unsure how to define success for an academic research group, any ideas? They seem to more often be exploratory and less goal-oriented.

Hmm, I'm not aware of a way to do this (but there might be one). Maybe you could generate two versions of the deck from your orgmode file, one with the Anki with Uncertainty card types and the other with plain card types?

4
Pablo
Unfortunately, the Emacs package that integrates org-mode with Anki is very poorly maintained and I'm no longer using it for that reason. Currently, my approach is to keep the normal deck but document how to use the add-on, both in the GitHub repository and in the EA Forum post announcing the release of the new version.

I'm excited to see the return of the careers guide as the core 80k resource (vs the key ideas series)! I think it's a better way to provide value to people, because a careers guide is about the individual ("how can I think about what to do with my career?") rather than about 80k ("what are the key ideas of 80k/EA?")

Nice! Thanks for the heads up Elliot - which page are you seeing a missing certificate on? Seems to be working for me

1
ElliotJDavies
Seems to be working for me too now

I've added a basic calibration curve, thanks for the suggestion! 

You can find it in the app's Home tab (click on Fatebook in the left sidebar > Home tab at the top) once at least one question you've forecasted on has resolved.

Great, glad to hear it!

Aggregation choices (e.g. geo mean of odds would be nice)

Geo mean of odds is a good idea - it's probably a more sensible default. How would you feel about us using that everywhere, instead of the current arithmetic mean?

Brier scores for users

You can see your own absolute and relative Brier score in the app home (click Fatebook in the sidebar). If you're thinking of a team-wide leaderboard - that's on our list! Though some users said they wouldn't like this to avoid Goodharting, so I've not prioritised it so far, and will include a te... (read more)

4
Matt_Lerner
I thought of some other down-the-line feature requests * Google Sheets integration (we currently already store our forecasts in a Google sheet) * Relatedly, ability to export to CSV (does this already exist and I just missed it?) * Ability to designate a particular resolver * Different formal resolution mechanisms, like a poll of users.
5
Matt_Lerner
Ah, great! I think it would be nice to offer different aggregation options, though if you do offer one I agree that geo mean of odds is the best default. But I can imagine people wanting to use medians or averages, or even specifying their own aggregation functions. Especially if you are trying to encourage uptake by less technical organizations, it seems important to offer at least one option that is more legible to less numerate people.

I think you could implement a spaced repetition feature based on how many orders of magnitude you’re off, where the more OOMs you're off, the earlier it prompts you with the same question again

 

This is a great idea, so we made Anki with Uncertainty to do exactly this!

Thank you Hauke for the suggestion :D

I think we'll keep the calibration app as a pure calibration training game, where you see each question only once. Anki is already the king of spaced repetition, so adding calibration features to it seemed like a natural fit.

Super interesting to see this analysis,  especially the table of current capabilities - thank you!

 I have interpreted [feasible] as, one year after the forecasted date, have AI labs achieved these milestones, and disclosed this publicly? 

It seems to me that this ends up being more conservative than  the original "Ignore the question of whether they would choose to" , which presumably makes the expert forecasts worse than they seem to be here.

For example,  a task like "win angry birds" seems pretty achievable to me, just that no one... (read more)

5
PatrickL
Thanks Adam :) I have a rough (i.e. considered for <15 minutes) take: if top labs one year ago had attempted these particular milestones, and had the same policies on disclosing capabilities as they currently seem to,  then there's a 40-50% chance they would have achieved 2 of Angry Birds, Atari fifty , Laundry and Go low by now. But I don't put much weight on my prediction, whereas I put a lot more weight on my analysis of what has happened (though this is also somewhat subjective!).  I agree though that checking what has actually happened ends up being more conservative than the original "Ignore the question of whether they would choose to" , which makes the expert forecasts worse than they seem to be here. This is a weakness of this analysis! And of the resolvability of the original survey. Do you have an estimate of how many of the tasks would have been achieved by now if labs tried a year ago?

Thanks for the feedback Forslack! I'm curious whether you'd prefer to play without logging in because you don't have a Google account or because you don't want to share your email?

2
Jason
Not Forslack, but if you're going to ask for permission for Google to share all that info you should have a clear privacy policy visible for what you'll do with it. Also, I don't think you have to request all that info from Google, like real name, to use a Google login.

Thanks very much for the feedback, this is really helpful!

If anyone has question suggestions, I'd really appreciate them! I think crowdsourcing questions will help us make them super varied and globally relevant. I made a suggestion form here https://forms.gle/792QQAfqTrutAH9e6

Thanks for organising! I had a great time, I'd love to see more of these events. Maybe you could circulate a Google Doc beforehand to help people brainstorm ideas, comment on each other's ideas, and indicate interest in working on ideas. You could prepopulate it with ideas you've generated as the organisers. That way when people show up they can get started faster - I think we spent the first hour or so choosing our idea.

(Btw - our BOTEC calculator's first page is at this URL.)

2
Jonny Spicer 🔸
I think this is a great idea, thanks for the feedback - I completely agree we want people to be able to hit the ground running on the day. I would imagine groups are most effective when they're formed around strong coders, perhaps there's a way we can work that into the doc. One thing we're considering is an ongoing Discord server, where people could see ideas/projects/who's working on what, etc. The idea would be that the server would persist between events, and move more towards having ongoing projects as above. I think this could potentially solve some of the cold start issues, but I am also hesitant to ask people to join yet another Discord server, and it'd probably need to reach a critical mass of people in order to be valuable. Having written out this comment, I think we will likely start it and push to get it to a good size, and if not we can re-evaluate. Thanks for pointing out the bad link, I've corrected it now!
1
Stenemo
Yes, I like their work! It is great that there are many complementing ways to learn these important topics. Although I have not yet found a good comprehensive playlist for those who want to learn by watching a summary of important concepts.

Interesting to think about! 

But for this kind of bargain to work, wouldn't you need confidence that the you in other worlds would uphold their end of the bargain? 

E.g., if it looks like I'm in videogame-world, it's probably pretty easy to spend lots of time playing videogames. But can I be confident that my counterpart in altruism-world will actually allocate enough of their time towards altruism?

(Note I don't know anything about Nash bargains and only read the non-maths parts of this post, so let me know if this is a basic misunderstanding!)

4
Eric Neyman
Great question -- you absolutely need to take that into account! You can only bargain with people who you expect to uphold the bargain. This probably means that when you're bargaining, you should weight "you in other worlds" in proportion to how likely they are to uphold the bargain. This seems really hard to think about and probably ties in with a bunch of complicated questions around decision theory.

This is a really useful round-up, thank you!

A data-point on this - today I was looking for and couldn't find this graph. I found effectivealtruismdata.com but sadly it didn't have these graphs on it. So would be cool to have it on there, or at least link to this post from there!

Thanks Jack, great to see this!

Pulling out the relevant part as a quote for other readers:

  • On average, it took about 25 hours to organize and run a campaign (20 hours by organizers and 5 hours by HIP).
  • The events generated an average of 786 USD per hour of counterfactual donations to effective charities.
  • This makes fundraising campaigns a very cost effective means of counterfactual impact; as a comparison, direct work that generates 1,000,000 USD of impact equivalent per year equates to around 500 USD per hour.

Great results so far!

High Impact Professionals supported 8 EAs to run fundraising drives at their workplace in 2021, raising $240k in counterfactual dollars. On an hourly basis, organizing those events proved to be as impactful as direct work

Could you share the numbers you used to calculate this? I.e. how many hours to organise an event, counterfactual dollars per hour organising/running events, and your estimate for the value per hour of direct work?

7
Jack Lewars
Hi Adam - sure - https://bit.ly/3BiJRP3 We'll also link to this in the OP.

it'd be really valuable for more EA-aligned people to goddamn write summaries at all

To get more people to write summaries for long forum posts, we could try adding it to the forum new post submission form? e.g. if the post text is over x words, a small message shows up advising you to add a summary.

Or maybe you're thinking more of other formats, like Google docs?

3
MichaelA🔸
Yeah, I've actually discussed that idea briefly with the EA Forum team and I think it'd probably be good. I'll send a link to this thread to them to give them one more data point in favor of doing this. (Though it's plausible to me that there's some reason they shouldn't do this which I'm overlooking - I'd trust their bottom-line views here more than mine.) But yeah, I'm also thinking of GDocs, blog posts posted elsewhere, and any other format, so I think we also need nudges like this post. 

Great to see this writeup, thank you!

In the runup to EAG SF I've been thinking a bit about travel funding allocation. I thought I could take this opportunity to share two problems and tentative solutions, as I imagine they hold across different conferences (including EAGx Boston).

Thing 1: Uncertainty around how much to apply for

In conversations with other people attending I've found that people are often quite uncertain and nervous when working out how much to apply for. 

One way to improve this could be to encourage applicants to follow a simple proce... (read more)

Thanks Ankush! For this first round, we keep things intentionally short, but if your project progresses to later rounds then there will be plenty of opportunities to share more details.

it is a pdf that I would love to get valued and be shared with the world and anyone who wants to hear about longtermism project

Posting your ideas here on the EA Forum could be a great way to get feedback from other people interested in longtermism!

Thanks Stuart, I'll DM you to work out the details here!

Maybe something helpful to think about is, what's your goal?

E.g. maybe:

  • You want to stay on top of new papers in AI capabilities
  • You want to feel connected to the AI safety research community
  • You want to build a network of people in AI research / AI safety research, so that in future you could ask people for advice about a career decision
  • You want to feel more motivated for your own self study in machine learning
  • You want to workshop your own ideas around AI, and get rapid feedback from researchers and thinkers

I think for some goals, Twitter is unusually helpfu... (read more)

Load more