[Note that I have also cross-posted a version of this post to LessWrong. This version of the post is for both half-baked EA ideas and half-baked AI Safety ideas, whereas the LessWrong version is for half-baked AI Safety ideas specifically.]

I keep having ideas for projects I would like to see being done, but I keep not having enough time available to really think through those ideas, let alone try to implement them. Practically, the alternatives for me are to either post something half-baked, or to not post at all. I don't want to spam the group with half-thought-through posts, but I also want to post these ideas, even in their current state, in case some of them do have merit and the post inspires someone to take up those ideas.

Originally I was going to start writing up some of these ideas in my Shortform, but I figured that if I have this dilemma then likely other people do as well. So to encourage others to at least post their half-baked ideas somewhere, I am putting up this post as a place where other people can post their own ideas without worrying about making sure they formulate those ideas to the point where they'd merit their own post.

If you have several ideas, please post them in separate comments so that people can consider each of them individually. Unless of course they're closely related to each other, in which case it might be best to post them together - use your best judgment.

[This post was also inspired by a LessWrong suggestion from Zvi to create something similar to my AGI Safety FAQ / all-dumb-questions-allowed thread, but for ideas / potentially dumb solutions rather than questions. Which is why I am bundling half-baked AI Safety ideas together with the half-baked EA ideas. If people prefer and I ever do a follow-up to this post, I can split those into two different posts. Or I can keep the AI Safety part of the discussion on LessWrong and the EA ideas part here.]




Sorted by Click to highlight new comments since:

In direct violation of the instruction to put ideas in distinct comments, here's a list of ideas most of which are so underbaked they're basically raw: 


  • Buy a hotel/condo/appt building (maybe the Travelodge?) in Berkeley and turn it into an EA Hotel
  • Offer to buy EAs no-doctor-required blood tests like this one that test for common productivity-hampering issues (e.g. b12 deficiency, anemia, hypothyroidism)
  • Figure out how to put to good use some greater proportion of the approximately 1 Billion recent college grads who want to work at an "EA org" 
    • This might look like a collective of independent-ish researchers? 
  • (uncertain) I think a super high impact thing that a single high school senior could decide to do is to attend a good state honors college or non-Ivy university without a thriving EA club/scene and make it happen there. 
    • I tentatively think it would be worth, say, turning down Harvard to go to the U of Maryland Honors college and start an EA group there
      • To incentivize this, make it cool and normal to credibly say that this is what one did
    • To a first approximation, 0% of students outside of the Ivies and maybe 10 other campuses have heard of EA. The fruit is underground at this point

Animal welfare

  • (Ok this one is like 60% baked) A moderately-intensive BOTEC says that wild fish slaughter accounts for 250 million years of extreme fish suffering per year from asphyxiation or being gutted alive
    • The linked BOTEC also includes my conservative, lower bound estimate that this is morally equivalent to ~5B human deaths per year
    • Idea: idk, like do something about this. Like figure out how to make it cheap and easy to kill fish en masse in a quicker and/or more humanely
    • This one might turn into an actual forum post
  • Related: can we just raise (farm) a ton of fish ourselves, but using humane practices, with donations subsidizing the cost difference relative to standard aquaculture
    • This also might turn into a blog post 

Hot takes

  • There should be way more all-things-considered, direct comparisons between cause areas. 
    • In particular, I don't think a complete case has been made (even from a total utilitarian, longtermist perspective) that at the current funding margin, it makes sense to spend marginal dollars on longtermist-motivated projects instead of animal welfare projects.

Note (as of 6pm June 24) I may update this comment and/or break parts into their own comments as I recall other ideas I've had

I'm helping with creating a central platform connecting funders, talent, ideas, and advisors. Let me know if you'd like to be involved or are interested in more infos on it!

Would you mind posting a link to it?

From a Twitter thread a few days ago (lightly edited/formatted), with plenty of criticism in the replies there

Probably batshit crazy but also maybe not-terrible megaproject idea: build a nuclear reactor solely/mainly to supply safety-friendly ML orgs with unlimited-ish free electricity to train models 

Looks like GPT-3 took something like $5M to train, and this recent 80k episode really drives home that energy cost is a big limiting factor for labs and a reason why only OpenAI/Deepmind are on the cutting edge

In 2017, the smallest active U.S. nuclear reactor generated ~5M MWh. It looks like that amount of energy would sell for $600M

But it only costs about $150M to generate that amount via a nuclear plant. And larger power plants would give a delta of a couple billion per year between cost and price

The theory of change here is that it would allow Redwood/Anthropic/other future orgs to have a decisive advantage over less safety-conscious ones. This is especially relevant if making an aligned AGI requires a lot more compute (and hence electricity) than making a Clippy

Also, I don't see why it couldn't be built in some country with more nuclear-friendly policies...and wouldn't even have to be connected to the grid I'm thinking of a plant connected to a Boeing plant-sized warehouse full of GPUs and nothing else (except maybe an EA hotel lol)

I legit want to know where on the spectrum from "Hi! 👋 nuclear physicist here. This is incredibly stupid, please delete your account" to "huh this warrants a longer+more rigorous EA Forum post" this seems to fall. Thanks!

I've often thought about the idea of paying automated, narrow-AI systems such as warehouse bots or factory robots a wage even though they're not sentient or anything would help with many of the issues ahead of us with increased general automation. As employment goes down (less tax money) and unemployment (voluntary or otherwise) and therefore social welfare goes up, it creates a considerable strain. Paying automated systems a 'wage' which can then be taxed might help alleviate that. It wouldn't be a wage, obviously, more like an ongoing fee for using such systems to be paid towards the cost of caring for humans. Bonus if that money actually goes into a big pot which helps reimburse people who suffer harm from automated systems. Might be a good stop-gap until our economy adjusts correctly, as tax revenue wouldn't dip as far.

Obviously this is MASSIVE spitball territory, not an idea I've thought about seriously because I literally don't have the time, but could be an interesting idea.  First step would be to check if automation is actually resulting in employment going down, because not sure there's evidence of that yet.

Economists have thought a bit about automation taxes (which is essentially what you're suggesting). See, e.g., this paper.

Awesome, thanks for the link! :)

This was a top-level LW post from a few days ago aptly titled "Half-baked alignment idea: training to generalize" (that didn't get a ton of attention):

Thanks to Peter Barnett and Justis Mills for feedback on a draft of this post. It was inspired by Eliezer's Lethalities post and Zvi's response.

Central idea: can we train AI to generalize out of distribution

I'm thinking, for example, of an algorithm like the following:

  1. Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level writing (this being one instance of the object level)
    1. Assign the system a meta-level award based on how well it performs (without any additional training) at generalizing; in this case, that is, predicting the next word from more advanced, complex writing (perhaps using many independent tests of this task without updating/learning between each test, and allowing parameters to update only after the meta-level aggregate score is provided)
      • Note: the easy→hard generalization is not a necessary feature. Generalization could be from fiction→nonfiction writing or internet→native print text, for instance.
    2. After all these independent samples are taken, provide the AI its aggregate or average score as feedback
  2. (Maybe?) repeat all of step I on a whole new set of training and testing texts (e.g., using text from a different natural language like Mandarin)
    1. Repeat this step an arbitrary number of times
      • For example, using French text, then Korean, then Arabic, etc.
  3. Each time a “how well did you generalize” score is provided (which is given once per natural language in this example), the system should improve at the general task of generalizing from simple human writing to more complex human writing, (hopefully) to the point of being able to perform well at generalizing from simple Hindi (or whatever) text to advanced Hindi prediction even if it had never seen advanced Hindi text before.
  4. ^Steps 1-3 constitute the second meta-level of training an AI to generalize, but we can easily treat this process as a single training instance (e.g., rating how well the AI generalizes to Hindi advanced text after having been trained on doing this in 30 other languages) and iterate over and over again. I think this would look like:
    1. Running the analogs of steps 1-4 on generalizing from
      • (a) simple text to advanced text in many languages
      • (b) easy opponents to hard ones across many games,
      • (c) photo generation of common or general objects ("car") to rare/complex/specific ones ("interior of a 2006 Honda Accord VP"), across many classes of object
    2. And (hopefully) the system would eventually be able to generalize from simple Python code training data to advanced coding tasks even though it had never seen any coding at all before this.

And, of course, we can keep on adding piling layers on. 

A few notes

  • I think the following is one way of phrasing what I hope might happen with method: we are using RL to teach an ML system how to do ML in such a way that it sacrifices some in-distribution predictive power for the ability to use its “knowledge” more generally without doing anything that seems dumb to us. 
  • Of course, there are intrinsic limits to any system’s ability to generalize. The system in question can only generalize using knowledge X if X exists as information in the object-level training provided to it.
    • This limits what we should expect of the system.
      • For example, I am almost certain that even an arbitrarily smart system will not be able to generate coherent Mandarin text from English training data, because the meaning of Mandarin characters doesn’t exist as “latent knowledge” in even a perfect understanding of English.

Anyone here know Python?

My hands-on experience with ML extends to linear regression in R and not an inch more, so I'm probably not the best person to test this theory out. I've heard some LWers know a bit of Python, though.

If that's you, I'd be fascinated and thankful to see if you can implement this idea using whatever data and structure you think would work best, and would be happy to collaborate in whatever capacity I can. 


Appendix: a few brief comments (from someone with much more domain knowledge than me) and responses (from me):


Is this just the same as training it on this more complex task (but only doing one big update at the end, rather than doing lots of small updates)?

Response (which may help to clarify why I believe the idea might work)

I don't think so, because the parameters don't change/update/improve between each of those independent tests. Like GPT-3 in some sense has a "memory" of reading Romeo and Juliet, but that's only because its parameters updated as a result of seeing the text.

But also I think my conception depends on the system having "layers" of parameters corresponding to each layer of training. 

So train on simple English-->only "Simple English word generation" parameters are allowed to change...but then you tell it how well it did at generalizing out of distribution, and now only its "meta level 1 generalization" parameters are allowed to change.

Then you do the whole thing again but with German text, and its "Meta level 1 generalization" parameters are allowed to change again using SGD or whatever. If this works, it will be the reason why it can do well at advanced Hindi text without ever having read advanced Hindi.

Treat this whole process as the object level, and then it updates/improves "meta level 2 generalization" parameters.


This looks vaguely like curriculum learning, which apparently doesn't really work in LLMs https://arxiv.org/abs/2108.02170, I think a similar experiment would be like train on simple+advanced text for English, French, Mandarin etc, but only simple Hindi, and then see if it can do complex Hindi. 


I think that's a pretty different thing because there are no meta level parameters. Seems like fundamentally just a flavor of normal RL

Or do pretraining with English, French, Mandarin, and Hindi, but only do fine tuning with English, French, Mandarin, and see if it can then do the tasks it was fine tuned for in Hindi. 

My prediction: it learns to generalize a bit (the scores on the novel Hindi tasks are higher than if there was no fine tuning with the other languages) but worse than the other languages generalize. As the models are scaled up, this 'generalization gap' gets smaller.

Seems like this might depend on the relative scaling of different meta level parameters (which I described above)? 

Like for example whenever you scale the # of object level params by a factor of 2, you have to scale the number of nth meta level parameters by 2^(n+1).

Curated and popular this week
Relevant opportunities