This seems interesting. Are there ways you think these ideas could be incorporated into LLM training pipelines or experiments we could run to test the advantages and potential limits vs RLHF/conventional alignment strategies? Also do you think using developmental constraints and then techniques like RLHF could be potentially more effective than either alone?
I'd like to see more rigorous engagement with big questions like where value comes from, what makes a good future, how we know, and how this affects cause prioritization. I think it's generally assumed "consciousness is where value comes from, so maximize it in some way." Yet some of the people who have studied consciousness most closely from a phenomenological perspective seem to not think that (e.g. zen masters, Tibetan lamas, other contemplatives, etc), let alone scale it to cosmic levels. Why? Is third person philosophical analysis alone missing someth...
For 2, what's "easiest to build and maintain" is determined by human efforts to build new technologies, cultural norms, and forms of governance.
For 11 there isn't necessarily a clear consensus on what "exceptional" means or how to measure it, and ideas about what it is are often not reliably predictive. Furthermore, organizations are extremely risk averse in hiring and there are understandable reasons for this - they're thinking about how to best fill a specific role with someone who they will take a costly bet on. But this is rather different than thinkin...
Yes, what you are scaling matters just as much as the fact that you are scaling. So now developers are scaling RL post training and pretraining using higher quality synthetic data pipelines. If the point is just that training on average internet text provides diminishing real world returns in many real-world use cases, then that seems defensible; that certainly doesn't seem to be the main recipe any company is using for pushing the frontier right now. But it seems like people often mistake this for something stronger like "all training is now facing insurm...
Shouldn't we be able to point to some objective benchmark if GPT-4.5 was really off trend? It got 10x the SWE-Bench score of GPT-4. That seems like solid evidence that additional pretraining continued to produce the same magnitude of improvements as previous scaleups. If there were now even more efficient ways than that to improve capabilities, like RL post-training on smaller o-series models, why would you expect OpenAI not to focus their efforts there instead? RL was producing gains and hadn't been scaled as much as self-supervised pretraining, so it was...
It's very difficult to do this with benchmarks, because as the models improve benchmarks come and go. Things that used to be so hard that it couldn't do better than chance quickly become saturated and we look for the next thing, then the one after that, and so on. For me, the fact that GPT-4 -> GPT4.5 seemed to involve climbing about half of one benchmark was slower progress than I expected (and the leaks from OpenAI suggest they had similar views to me). When GPT-3.5 was replaced by GPT-4, people were losing their minds about it — both internally and o...
I think this is an interesting vision to reinvigorate things and do kind of feel sometimes "principles first" has been conflated with just "classic EA causes."
To me, "PR speak" =/= clear effective communication. I think the lack of a clear, coherent message is most of what bothers people, especially during and after a crisis. Without that, it's hard to talk to different people and meet them where they're at. It's not clear to me what the takeaways were or if anyone learned anything.
I feel like "figuring out how to choose leaders and build institution...
Ok thanks for the details. Off the top of my head I can think of multiple people interested in AI safety who probably fit these (though the descriptions I think still could be more concretely operationalized) and fit into categories such as: founders/cofounders, several years experience in operations analytics and management, several years experience in consulting, multiple years experience in events and community building/management. Some want to stay in Europe, some have families, but overall I don't recall them being super constrained.
I've actually been working on how to get better AI safety plans so I would be keen to chat with anyone who is interested in this. I think the best plan so far (covering the Alignment side) is probably Google's [2504.01849] An Approach to Technical AGI Safety and Security. On the more governance side, one of the most detailed one's is probably AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
I've been thinking about coup risks more lately so would actually be pretty keen to collaborate or give feedback on any early stuff. There isn't much work on this (for example, none at RAND as far as I can tell).
I think EAs have frequently suffered from a lack of expertise, which causes pain in areas like politics. Almost every EA and AI safety person was way off on the magnitude of change a Trump win would create - gutting USAID easily dwarfs all of EA global health by orders of magnitude. Basically no one took this seriously as a possibility, or at...
"Basically no one took this seriously as a possibility, or at least I do not know of anyone."
I alluded to this over a year ago in this comment, which might count in your book as taking it seriously. But to be honest, where we are at in Day 100 of this administration is not the territory I expected us to be in until at least the 2nd year.
I think these people do exist (those that appreciated the second term for the risks it presented) and I'll count myself as one of them. I think we are just less visible because we push this concern a lot less in the EA disc...
This makes me wonder if there could be good setups for evaluating AI systems as groups. You could have separate agent swarms in different sandboxes competing on metrics of safety and performance. The one that does better gets amplified. The agents may then have some incentive to enforce positive social norms for their group against things like sandbagging or deception. When deployed they might have not only individual IDs but group or clan IDs that tie them to each other and continue this dynamic.
Maybe there is some mechanism where membership gets sh...
That's a really broad question though. If you asked something like, which system unlocked the most real-world value in coding, people would probably say the jump to a more recent model like o3-mini or Gemini 2.5
You could similarly argue the jump from infant to toddler is much more profound in terms of general capabilities than college student to phd but the latter is more relevant in terms of unlocking new research tasks that can be done.
So it seems like you're saying there are at least two conditions: 1) someone with enough resources would have to want to release a frontier model with open weights, maybe Meta or a very large coalition of the opensource community if distributed training continues to scale, 2) it would need at least enough dangerous capability mitigations like unlearning and tamper resistant weights or cloud inference monitoring, or be behind the frontier enough so governments don't try to stop it. Does that seem right? What do you think is the likely price range for AGI?&n...
This is a thoughtful post so it's unfortunate it hasn't gotten much engagement here. Do you have cruxes around the extent to which centralization is favorable or feasible? It seems like small models that could be run on a phone or laptop (~50GB) are becoming quite capable and decentralized training runs work for 10 billion parameter models which are close to that size range. I don't know its exact size, but Gemini Flash 2.0 seems much better than I would have expected a model of that size to be in 2024.
Interesting. People probably aren't at peak productivity or even working at all for some part of those hours, so you could probably cut the hours by 1/4. This narrows the gap between what GPT2030 can achieve in a day and what all humans can together.
Assuming 9 billion people work 8 hours that's ~8.22 million years of work in a day. But given slowdowns in productivity throughout the day we might want to round that down to ~6 million years.
Additionally, GPT2030 might be more effective than even the best human workers at their peak hours. If it's ...
Interesting post - I particularly appreciated the part about the impact of Szilard's silence not really affecting Germany's technological development. This was recently mentioned in Leopold Aschenbrenner's manifesto as an analogy for why secrecy is important, but I guess it wasn't that simple. I wonder how many other analogies are in there and elsewhere that don't quite hold. Could be a useful analysis if anyone has the background or is interested.
I think it's good to critically interrogate this kind of analysis. I don't want to discourage that. But as someone who publicly expressed skepticism about Flynn's chances, I think there are several differences that mean it warrants closer consideration. The polls are much closer for this race, Biden is well known and experienced at winning campaigns, and the differences between the candidates in this race seem much larger. Based on that it at least seems a lot more reasonable to think Biden could win and that it will be a close race worth spending some effort on.
Very interesting.
1. Did you notice an effect of how large/ambitious the ballot initiative was? I remember previous research suggesting consecutive piecemeal initiatives were more successful at creating larger change than singular large ballot initiatives.
2. Do you know how much the results vary by state?
3. How different do ballot initiatives need to be for the huge first advocacy effect to take place? Does this work as long as the policies are not identical or is it more of a cause specific function or something in between? Does it have a smooth gradient or is it discontinuous after some tipping point?
That's a good point. Although 1) if people leave a company to go to one that prioritizes AI safety, then this means there are fewer workers at all the other companies who feel as strongly. So a union is less likely to improve safety there. 2) It's common for workers to take action to improve safety conditions for them, and much less common for them to take action on issues that don't directly affect their work, such as air pollution or carbon pollution, and 3) if safety inclined people become tagged as wanting to just generally slow down the company, then hiring teams will likely start filtering out many of the most safety minded people.
I've thought about this before and talked to a couple people in labs about it. I'm pretty uncertain if it would actually be positive. It seems possible that most ML researchers and engineers might want AI development to go as quickly or more than leadership if they're excited about working on cutting edge technologies or changing the world or for equity reasons. I remember some articles about how people left Google for companies like OpenAI because they thought Google was too slow, cautious, and lost its "move fast and break things" ethos.
Really appreciate this post. Recently I've felt less certain about whether slowing down AI is feasible or helpful in the near future.
I think how productive current alignment and related research is at the moment is a key crux for me. If it's actually quite valuable at the moment, maybe having more time would seem better.
It does seem easier to centralize now when there are fewer labs and entrenched ways of doing things, though it's possible that exponentially rising costs could lead to centralization through market dynamics anyway. Though maybe that would be short lived and some breakthrough after would change the cost of training dramatically.
Actually, they are the same type of error. EA prides itself on using evidence and reason rather than taking the assessments of others at face value. So the idea that others did not sufficiently rely on experts who could obtain better evidence and reasoning to vet FTX is less compelling to me as an after-the-fact explanation to justify EA as a whole not doing so. I think probably just no one really thought much about the possibility and looking for this kind of social proof helps us feel less bad.
Yeah, I do sometimes wonder if perhaps there's a reason we find it difficult to resolve this kind of inquiry.
Yes, I think they're generally pretty wary of saying much exactly since it's sort of beyond conceptual comprehension. Something probably beyond our ideas of existence and nonexistence.
Glad to hear that! You're welcome :)
On Flynn Campaign: I don't know if it's "a catastrophe" but I think it is maybe an example of overconfidence and naivete. As someone who has worked on campaigns and follows politics, I thought the campaign had a pretty low chance of success because of the fundamentals (and asked about it at the time) and that other races would have been better to donate to (either state house races to build the bench or congressional candidates with better odds like Maxwell Frost, a local activist who ran for the open seat previously held by Val Demings, listed pandemic pr...
I think the main obstacle is tractability: there doesn't seem to be any known methodology that could be applied to resolve this question in a definitive way. And it's not clear how we could even attempt to find such a method. Whereas projects related to areas such as preventing pandemics and making sure AI isn't misused or poorly designed seem 1) incredibly important, 2) tractable - it looks like we're making some progress and have and can find directions to make further progress (better PPE, pathogen screening, new vaccines, interpretability, agent founda...
I've definitely seen well-meaning people mess up interactions without realizing it in my area (non-EA related). This seems like a really important point and your experience seems very relevant given all the recent talk about boards and governance. Would love to hear more of your thoughts either here or privately.
Seems interesting, I'll def check it out sometime
Jokes aside, this is a cool idea. I wonder if reading it yourself and varying the footage, or even adapting the concepts into something would help it be more attractive to watch. Though of course these would all increase the time investment cost. I can't say it's my jam but I'd be curious to see how these do on TikTok though since they seem to be a sort of prevalent genre/content style.
Seems important to check whether the people hired actually fit into those experience requirements or have more experience. If the roles are very competitive then it could be much higher.