All of Peter's Comments + Replies

Seems important to check whether the people hired actually fit into those experience requirements or have more experience. If the roles are very competitive then it could be much higher. 

This seems interesting. Are there ways you think these ideas could be incorporated into LLM training pipelines or experiments we could run to test the advantages and potential limits vs RLHF/conventional alignment strategies? Also do you think using developmental constraints and then techniques like RLHF could be potentially more effective than either alone?

1
Petra Vojtassakova
. IThank you for your question! On incorporating developmental ideas into LLM pipelines: The main challenge is connecting continuous developmental dynamics with discrete token prediction. My approach treats developmental scaffolding as the pre alignment stage, as outlined in my post A Developmental Approach to AI Safety. The Hybrid Reflective Learning System (HRLS): Question buffer: Logs uncertainty and contradictions instead of suppressing them Principle Cards: high level ethical scaffolds (similiar to the value matrices used in Twin V3) Reflective Updates: the model learns why boundaries exist rather than treating them as arbitrary rules.   One way to test these ideas in LLM is to introduce an attractor stability objective during pre training. Twins V3 uses eigenvalue spectra of the recurrent matrix as a proxy for structural coherence, applying a similar constraint could encourage stable internal identity before any behavioral fine tuning occurs.  Hybrid alignment strategies: I also think a hybrid approach is promising. My hypothesis is that developmental scaffolding builds a stable structure, making models less vulnerable to the known failure modes of RLHF. - compliance collapse - identity fragmentation - learned helplessness / self suppression these corresponds to the patterns I called Alignment Stress Signatures. The question I am most interested in is:  Does developmental scaffolding reduce RLHF sample complexity and prevent the pathologies RLHF tends to introduce ?  Proposed experiment 1. Developmental grounding base - before any RLHF the model first establishes internal consistency:  -stable judgments across Principle Cards - coherent distinction between its core values - low identity fragmentation - a baseline “compass” that is not dependent on suppression This phase can be supported by structure coherence losses attractor stability to encourage identity continuity. 2. Dvelopmentally staged exposure Instead of overwhelming the model with large toxic datase
Answer by Peter8
1
3

I'd like to see more rigorous engagement with big questions like where value comes from, what makes a good future, how we know, and how this affects cause prioritization. I think it's generally assumed "consciousness is where value comes from, so maximize it in some way." Yet some of the people who have studied consciousness most closely from a phenomenological perspective seem to not think that (e.g. zen masters, Tibetan lamas, other contemplatives, etc), let alone scale it to cosmic levels. Why? Is third person philosophical analysis alone missing someth... (read more)

For 2, what's "easiest to build and maintain" is determined by human efforts to build new technologies, cultural norms, and forms of governance.

For 11 there isn't necessarily a clear consensus on what "exceptional" means or how to measure it, and ideas about what it is are often not reliably predictive. Furthermore, organizations are extremely risk averse in hiring and there are understandable reasons for this - they're thinking about how to best fill a specific role with someone who they will take a costly bet on. But this is rather different than thinkin... (read more)

Would be curious to hear more. I'm interested in doing more independent projects in the near future but am not sure how they'd be feasible. 

What do you think is causing the ball to be dropped?

A lack of sufficiently strategic, dedicated, and ambitious altruists. Deferrence to authority figures in the EA community when people should be thinking more independently. Suboptimal status and funding allocation etc. 

Yes, what you are scaling matters just as much as the fact that you are scaling. So now developers are scaling RL post training and pretraining using higher quality synthetic data pipelines. If the point is just that training on average internet text provides diminishing real world returns in many real-world use cases, then that seems defensible; that certainly doesn't seem to be the main recipe any company is using for pushing the frontier right now. But it seems like people often mistake this for something stronger like "all training is now facing insurm... (read more)

Shouldn't we be able to point to some objective benchmark if GPT-4.5 was really off trend? It got 10x the SWE-Bench score of GPT-4. That seems like solid evidence that additional pretraining continued to produce the same magnitude of improvements as previous scaleups. If there were now even more efficient ways than that to improve capabilities, like RL post-training on smaller o-series models, why would you expect OpenAI not to focus their efforts there instead? RL was producing gains and hadn't been scaled as much as self-supervised pretraining, so it was... (read more)

It's very difficult to do this with benchmarks, because as the models improve benchmarks come and go. Things that used to be so hard that it couldn't do better than chance quickly become saturated and we look for the next thing, then the one after that, and so on. For me, the fact that GPT-4 -> GPT4.5 seemed to involve climbing about half of one benchmark was slower progress than I expected (and the leaks from OpenAI suggest they had similar views to me). When GPT-3.5 was replaced by GPT-4, people were losing their minds about it — both internally and o... (read more)

Maybe or maybe not - people also thought we would run out of training data years ago. But that has been pushed back and maybe won't really matter given improvements in synthetic data, multimodal learning, and algorithmic efficiency. 

1
Yarrow Bouchard 🔸
What part do you think is uncertain? Do you think RL training could become orders of magnitude more compute efficient? 
  1. It seems more likely that RL does actually allow LLMs to learn new skills.
  2. RL + LLMs is still pretty new but we already have clear signs it exhibits scaling laws with the right setup just like self-supervised pretraining. This time they appear to be sigmoidal, probably based on something like each policy or goal or environment they're trained with. It has been about 1 year since o1-preview and maybe this was being worked on to some degree about a year before that.
  3. The Grok chart contains no numbers, which is so strange I don't think you can conclude much f
... (read more)
3
Yarrow Bouchard 🔸
Isn't the point just that the amount of compute used for RL training is now roughly the same as the amount of compute used for self-supervised pre-training? Because if this is true, then obviously scaling up RL training compute another 1,000,000x is obviously not feasible. My main takeaway from this post is not whether RL training would continue to provide benefits if it were scaled up another 1,000,000x, just that the world doesn't have nearly enough GPUs, electricity, or investment capital for that to be possible.

I think this is an interesting vision to reinvigorate things and do kind of feel sometimes "principles first" has been conflated with just "classic EA causes."

To me, "PR speak" =/= clear effective communication. I think the lack of a clear, coherent message is most of what bothers people, especially during and after a crisis. Without that, it's hard to talk to different people and meet them where they're at. It's not clear to me what the takeaways were or if anyone learned anything. 

I feel like "figuring out how to choose leaders and build institution... (read more)

Ok thanks for the details. Off the top of my head I can think of multiple people interested in AI safety who probably fit these (though the descriptions I think still could be more concretely operationalized) and fit into categories such as: founders/cofounders, several years experience in operations analytics and management, several years experience in consulting, multiple years experience in events and community building/management. Some want to stay in Europe, some have families, but overall I don't recall them being super constrained. 

What does strong non-technical founder-level operator talent actually mean concretely? I feel like I see lots of strong people struggle to find any role in the space. 

6
Vaidehi Agarwalla 🔸
Very curious if you can describe the types of people you know, their profiles, what cause areas and roles they are have applied for, what constraints they have if any.  But typically (not MECE, written quickly, not in order of importance, some combination could work etc.): * "Relentlessly resourceful" from Paul Graham covers a bunch of it better than I could * Strong intuitions for people management and org building (likely from experience) * Strong manager / leader * Gets shit done - can move quickly, decisive, keeps the momentum going, strong prioritization skills * Cares about structure and can implement systems, but only when they actually matter for helping the org achieve their goals * Able to switch between object level, in the weeds work and strategic thinking, and willing to do in the weeds work if needed (for earlier stage orgs)

Would be curious to hear your thoughts on this as one strategy for eliciting better plans 

2
Davidmanheim
Definitely seems reasonable, but it would ideally need to be done somewhere high prestige.

Do you have ideas about how we could get better plans?

2
Davidmanheim
Convince funders to invest in building those plans, to sketch out futures and treaties that could work rbustly to stop the coming likely nightmare of default AGI/ASI futures.

I've actually been working on how to get better AI safety plans so I would be keen to chat with anyone who is interested in this. I think the best plan so far (covering the Alignment side) is probably Google's [2504.01849] An Approach to Technical AGI Safety and Security. On the more governance side, one of the most detailed one's is probably AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

I've been thinking about coup risks more lately so would actually be pretty keen to collaborate or give feedback on any early stuff. There isn't much work on this (for example, none at RAND as far as I can tell). 

I think EAs have frequently suffered from a lack of expertise, which causes pain in areas like politics. Almost every EA and AI safety person was way off on the magnitude of change a Trump win would create - gutting USAID easily dwarfs all of EA global health by orders of magnitude. Basically no one took this seriously as a possibility, or at... (read more)

"Basically no one took this seriously as a possibility, or at least I do not know of anyone."

I alluded to this over a year ago in this comment, which might count in your book as taking it seriously. But to be honest, where we are at in Day 100 of this administration is not the territory I expected us to be in until at least the 2nd year.

A screenshot of a computer

AI-generated content may be incorrect.

I think these people do exist (those that appreciated the second term for the risks it presented) and I'll count myself as one of them. I think we are just less visible because we push this concern a lot less in the EA disc... (read more)

1[comment deleted]

This makes me wonder if there could be good setups for evaluating AI systems as groups. You could have separate agent swarms in different sandboxes competing on metrics of safety and performance. The one that does better gets amplified. The agents may then have some incentive to enforce positive social norms for their group against things like sandbagging or deception. When deployed they might have not only individual IDs but group or clan IDs that tie them to each other and continue this dynamic. 

Maybe there is some mechanism where membership gets sh... (read more)

2
Holly Elmore ⏸️ 🔸
This is totally spitballing, but doing anything that encourages modularity in the circuits (or perhaps at another level?) of the AIs and the ability to swap mind modules would be really good for interpretability.  Ever since this project, I've had a vague sense that genome architecture has something interesting to teach us about interpreting/predicting NNs, but I've never had a particularly useful insight from it. Love this book on it by Micheal Lynch if anyone's interested.
2
Holly Elmore ⏸️ 🔸
I've heard this idea of AI group selection floated a few times but people used to say it was too computationally intensive. Now who knows? Closest biology the idea brings to mind is this paper showing that selecting chickens as groups leads to better overall yields (in factory farming :( ) for the reasons you predict-- they aren't as aggressive or stressed by crowding as the chickens that are individually selected for the biggest yields. 

I'm not sure the policies have been mostly worked out but not implemented. Figuring out technical AI governance solutions seems like a big part of what is needed. 

That's a really broad question though. If you asked something like, which system unlocked the most real-world value in coding, people would probably say the jump to a more recent model like o3-mini or Gemini 2.5

You could similarly argue the jump from infant to toddler is much more profound in terms of general capabilities than college student to phd but the latter is more relevant in terms of unlocking new research tasks that can be done. 

  1. States and corporations rely on humans. They have no incentive to get rid of them. AGI would mean you don't have to rely on humans. So AGIs or people using AGIs might not care about humans anymore or even see them as an obstacle.
  2. States and corporations aren't that monolithic; they are full of competing factions and people who often fail to coordinate or behave rationally. AGI will probably be much better at strategizing and coordinating.
  3. States and corporation are constrained by balance of power with other states/corporations/actors. Superhuman AIs might not have this problem if they exceed all of humanity combined or might think they have more in common with each other than with humans.

Hmm maybe it could still be good to try things in case timelines are a bit longer or an unexpected opportunity arises? For example, what if you thought it was 2 years but actually 3-5?

4
Ozzie Gooen
I wasn't trying to make the argument that it would definitely be clear when this window closes. I'm very unsure of this. I also expect that different people have different beliefs, and that it makes sense for them to then take corresponding actions. 

So it seems like you're saying there are at least two conditions: 1) someone with enough resources would have to want to release a frontier model with open weights, maybe Meta or a very large coalition of the opensource community if distributed training continues to scale, 2) it would need at least enough dangerous capability mitigations like unlearning and tamper resistant weights or cloud inference monitoring, or be behind the frontier enough so governments don't try to stop it. Does that seem right? What do you think is the likely price range for AGI?&n... (read more)

3
Nikola
Yup those conditions seem roughly right. I'd guess the cost to train will be somewhere between $30B and $3T. I'd also guess the government will be very willing to get involved once AI becomes a major consideration for national security (and there exist convincing demonstrations or common knowledge that this is true).

This is a thoughtful post so it's unfortunate it hasn't gotten much engagement here. Do you have cruxes around the extent to which centralization is favorable or feasible? It seems like small models that could be run on a phone or laptop (~50GB) are becoming quite capable and decentralized training runs work for 10 billion parameter models which are close to that size range.  I don't know its exact size, but Gemini Flash 2.0 seems much better than I would have expected a model of that size to be in 2024. 

4
Nikola
I'm guessing that open weight models won't matter that much in the grand scheme of things - largely because once models start having capabilities which the government doesn't want bad actors to have, companies will be required to make sure bad actors don't get access to models (which includes not making the weights available to download). Also, the compute needed to train frontier models and the associated costs are increasing exponentially, meaning there will be fewer and fewer actors willing to spend money to make models they don't profit from.

Do you think there's a way to tell the former group apart from people who are closer to your experience (hearing earlier would be beneficial)?

4
Joey🔸
I think a semi-decent amount of broadly targeted adult-based outreach would have resulted in me finding out about EA (e.g., I watched a lot of TED Talks and likely would have found out about EA if it had TED Talks at that point). I also think mediums that are not focused on a given age but also do not penalize someone for it would have been effective. For example, when I was young, I took part in a lot of forums in part because they didn't care about or know my age.

Interesting. People probably aren't at peak productivity or even working at all for some part of those hours, so you could probably cut the hours by 1/4. This narrows the gap between what GPT2030 can achieve in a day and what all humans can together. 

Assuming 9 billion people work 8 hours that's ~8.22 million years of work in a day. But given slowdowns in productivity throughout the day we might want to round that down to ~6 million years. 

Additionally, GPT2030 might be more effective than even the best human workers at their peak hours. If it's ... (read more)

This is a pretty interesting idea. I wonder if what we perceive as clumps of 'dark matter' might be or contain silent civilizations shrouded from interference. 

Maybe there is some kind of defense dominant technology or strategy that we don't yet comprehend. 

2
Magnus Vinding
The dark matter thought has crossed my mind too (and others have also speculated along those lines). Yet the fact that dark matter appears to have been present in the very early universe speaks strongly against it — at least when it comes to the stronger "be" conjecture, less so the weaker "contain" conjecture, which seems more plausible.

Interesting post - I particularly appreciated the part about the impact of Szilard's silence not really affecting Germany's technological development. This was recently mentioned in Leopold Aschenbrenner's manifesto as an analogy for why secrecy is important, but I guess it wasn't that simple. I wonder how many other analogies are in there and elsewhere that don't quite hold. Could be a useful analysis if anyone has the background or is interested. 

Huh had no idea this existed

This exists here but I haven't updated it in about a year. If someone wants to take it over or automate it that could be good: EA Talks (formerly EARadio)

Peter
19
10
0

I think it's good to critically interrogate this kind of analysis. I don't want to discourage that. But as someone who publicly expressed skepticism about Flynn's chances, I think there are several differences that mean it warrants closer consideration. The polls are much closer for this race, Biden is well known and experienced at winning campaigns, and the differences between the candidates in this race seem much larger. Based on that it at least seems a lot more reasonable to think Biden could win and that it will be a close race worth spending some effort on. 

  1. Interesting. Are there any examples of what we might consider a relatively small policy changes that received huge amounts of coverage? Like for something people normally wouldn't care about. Maybe these would be informative to look at compared to more hot button issues like abortion that tend to get a lot of coverage. I'm also curious if any big issues somehow got less attention than expected and how this looks for pass/fail margins compared to other states where they got more attention. There are probably some ways to estimate this that are better than o
... (read more)

I'd be curious to hear about potential plans to address any of these, especially talent development and developing the pipeline of AI safety and governance. 

2
michel
Any plans to address these would come from the individuals or orgs working in this space. (This event wasn't a collective decision-making body, and wasn't aimed at creating a cross-org plan to address these—it was more about helping individuals refine their own plans).  Re the talent development pipeline for AI safety and governance, some relevant orgs/programs I'm aware of off the top of my head include: * Arkrose * SERI MATS * Constellation * Fellowships like AI Futures * Blue Dot Impact * AI safety uni groups like MAIA and WAISI * ... and other programs mentioned on 80K job board and EA Opportunity Board

Very interesting. 
1. Did you notice an effect of how large/ambitious the ballot initiative was? I remember previous research suggesting consecutive piecemeal initiatives were more successful at creating larger change than singular large ballot initiatives. 

2. Do you know how much the results vary by state?

3. How different do ballot initiatives need to be for the huge first advocacy effect to take place? Does this work as long as the policies are not identical or is it more of a cause specific function or something in between? Does it have a smooth gradient or is it discontinuous after some tipping point?

2
zdgroff
1. I look at some things you might find relevant here. I try to measure the scale of the impact of a referendum. I do this two ways. I have just a subjective judgment on a five-point scale, and then I also look at predictions of the referendum's fiscal impact from the secretary of state. Neither one is predictive. I also look at how many people would be directly affected by a referendum and how much news coverage there was before the election cycle. These predict less persistence. 2. This is something I plan to do more, but they can't vary that much because when I look at variables that vary across states (e.g., requirements to get on the ballot), I don't see much of a difference. 3. I'm not totally sure what your question is, but I think you might be interpreting my results as saying that close referendums are especially persistent. I'm only focusing on close referendums because it's a natural experiment—I'm not saying there's something special about them otherwise. I'm just estimating the effect of passing a marginal referendum on whether the policy is in place later on. I can try to think about whether this holds for things that are not close by looking at states with supermajority requirements or by looking at legislation, and it looks like things are similar when they're not as close.

This is an inspiring amount of research. I really appreciate it and am enjoying reading it. 

That's a good point. Although 1) if people leave a company to go to one that prioritizes AI safety, then this means there are fewer workers at all the other companies who feel as strongly. So a union is less likely to improve safety there. 2) It's common for workers to take action to improve safety conditions for them, and much less common for them to take action on issues that don't directly affect their work, such as air pollution or carbon pollution, and 3) if safety inclined people become tagged as wanting to just generally slow down the company, then hiring teams will likely start filtering out many of the most safety minded people. 

I've thought about this before and talked to a couple people in labs about it. I'm pretty uncertain if it would actually be positive. It seems possible that most ML researchers and engineers might want AI development to go as quickly or more than leadership if they're excited about working on cutting edge technologies or changing the world or for equity reasons. I remember some articles about how people left Google for companies like OpenAI because they thought Google was too slow, cautious, and lost its "move fast and break things" ethos. 

1
dEAsign
As you have said there are examples of individuals have left firms because they feel their company is too cautious. Conversely there are individuals who have left for companies that priorities AI safety. If we zoom out and take the outside view, it is common for those individuals who form a union to take action to slow down or stop their work or take action to improve safety. I do not know an example of a union that has instead prioritised acceleration.

Really appreciate this post. Recently I've felt less certain about whether slowing down AI is feasible or helpful in the near future. 

I think how productive current alignment and related research is at the moment is a key crux for me. If it's actually quite valuable at the moment, maybe having more time would seem better. 

It does seem easier to centralize now when there are fewer labs and entrenched ways of doing things, though it's possible that exponentially rising costs could lead to centralization through market dynamics anyway. Though maybe that would be short lived and some breakthrough after would change the cost of training dramatically. 

Yes, it seems difficult to pin those down. Looking forward to the deeper report!

I really want to see more discussion about this. There's serious effort put in. I've often felt that nuclear is perhaps overlooked/underemphasized even within EA. 

2
Joel Tan🔸
The expected disvalue is really high, especially compared to other longtermist risks, where the per annum probabilities of bad stuff happening is fundamentally low! The worry, I think, is concentrated on how tractable any intervention is, in a context where it's hard to know the chances of success before the fact, and about as hard to do attribution after.

Actually, they are the same type of error. EA prides itself on using evidence and reason rather than taking the assessments of others at face value. So the idea that others did not sufficiently rely on experts who could obtain better evidence and reasoning to vet FTX is less compelling to me as an after-the-fact explanation to justify EA as a whole not doing so. I think probably just no one really thought much about the possibility and looking for this kind of social proof helps us feel less bad. 

Yeah, I do sometimes wonder if perhaps there's a reason we find it difficult to resolve this kind of inquiry. 

Yes, I think they're generally pretty wary of saying much exactly since it's sort of beyond conceptual comprehension. Something probably beyond our ideas of existence and nonexistence. 

Glad to hear that! You're welcome :)

On Flynn Campaign: I don't know if it's "a catastrophe" but I think it is maybe an example of overconfidence and naivete. As someone who has worked on campaigns and follows politics, I thought the campaign had a pretty low chance of success because of the fundamentals (and asked about it at the time) and that other races would have been better to donate to (either state house races to build the bench or congressional candidates with better odds like Maxwell Frost, a local activist who ran for the open seat previously held by Val Demings, listed pandemic pr... (read more)

5
RyanCarey
In the first example, you complain that EA neglected typical experts and "EA would have benefited from relying on more outside experts" but in the second example, you say that EA "prides itself on NOT just doing what everyone else does but using reason and evidence to be more effective", so should have realised the possible failure of FTX. These complaints seem exactly opposite to one another, so any actual errors made must be more subtle.

I think the main obstacle is tractability: there doesn't seem to be any known methodology that could be applied to resolve this question in a definitive way. And it's not clear how we could even attempt to find such a method. Whereas projects related to areas such as preventing pandemics and making sure AI isn't misused or poorly designed seem 1) incredibly important, 2) tractable - it looks like we're making some progress and have and can find directions to make further progress (better PPE, pathogen screening, new vaccines, interpretability, agent founda... (read more)

2[anonymous]
I agree with @MikeJohnson on thought experiments falling within a deist frame (such as Nick Bostrom's Simulation Hypothesis), however I'd hardly say these make TI tractable. I'd rather say that research into quantum consciousness or string theory etc. have very strong scientific bases and I personally think they have set good precedents for concluding TI. I.e., they make a good case for just how tractable TI can be. A good book that sums this up pretty well is Jeffrey M. Schwartz M.D.'s "The Mind and the Brain". He goes into the implications of quantum consciousness and the potential for there to be Creator's that we could possibly be influenced by via String Theory related physics, and that this could be tested for. I think people would be surprised by just how tractable this could be, but honestly it's contingent on the nature of a Creator if that Creator does exist. Like I said in the last clause of my post, if the Creator's don't want to be found or are impossible to observe, then we are wasting our time no matter how theoretically tractable TI might be, so ultimately I have to say I sort of agree with your point, Peter!   As for your point on impermanence, I'm pretty sure every religion believes that everything continues forever; although some do get nuanced regarding whether or not that "forever" is divided up into infinite separate lives like the aforementioned Buddhists, but even they believe that once you've obtained complete enlightenment and have shed your Karma you exit the cycle of 轮回 (lun hui) and enter an eternal state of peace. The only group of people I can think of who don't believe something along the lines of an eternal afterlife in a heaven or hell world are die-hard heat-death atheists, which is a pretty small subset of the atheist population if I'm correct. Ultimately, its still a part of TI that deserves answering I think. As for your last point, I definitely see the merit of your point there! Thanks a bunch for sharing that! It's an awesom
8
MikeJohnson
Speaking broadly, I think people underestimate the tractability of this class of work, since we’re already doing this sort of inquiry under different labels. E.g., 1. Nick Bostrom coined, and Roman Yampolskiy has followed up on, the Simulation Hypothesis, which is ultimately a Deist frame; 2. I and others have written various inquiries about the neuroscience of Buddhist states (“neuroscience of enlightenment” type work); 3. Robin Hanson has coined and offered various arguments around the Great Filter. In large part, I don’t think these have been supported as longtermist projects, but it seems likely to me that there‘s value in pulling these strings, and each is at least directly adjacent to theological inquiry.

Thank you - I had forgotten about that post and it was really helpful. 

I've definitely seen well-meaning people mess up interactions without realizing it in my area (non-EA related). This seems like a really important point and your experience seems very relevant given all the recent talk about boards and governance. Would love to hear more of your thoughts either here or privately. 

Seems interesting, I'll def check it out sometime 

Jokes  aside, this is a cool idea. I wonder if reading it yourself and varying the footage, or even adapting the concepts into something  would help it be more attractive to watch. Though of course these would all increase the time investment cost. I can't say it's my jam but I'd be curious to see how these do on TikTok though since they seem to be a sort of prevalent genre/content style. 

Load more