New & upvoted

Customize feedCustomize feed

Quick takes

Show community
View more
Set topic
Frontpage
Global health
Animal welfare
Existential risk
Biosecurity & pandemics
10 more
In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3. With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4. Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one. Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?) When their commitments go against their commercial interests, we can’t trust their commitments. You should not work at Anthropic on AI capabilities.
Speaking from what I've personally seen, but it's reasonable to assume it generalizes. There's an important pool of burned out knowledge workers, and one of the major causes is lack of value alignment, i.e. working for companies that only care about profits. I think this cohort would be a good target for a campaign: * Effective giving can provide meaning for the money they make * Dedicating some time to take on voluntary challenges can help them with burnout (if it's due to meaninglessness)
Who said EA was dying? I have 1400 contacts on my EAG London spreadsheet! Yeah I know it's a bit of a lame datapoint and this is more of a tweet than a forum post but hey.... 😘
Would a safety-focused breakdown of the EU AI Act be useful to you? The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/ What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.  It would include: * Provisions related to transparency, human oversight, and systemic risks * Notes on how technical safety tools (e.g. interpretability, scalable oversight, evals) might interface with conformity assessments, or the compliance exemptions available for research work. * Commentary on loopholes or compliance dynamics that could shape industry behavior * What the Act doesn't currently address from a frontier risk or misalignment perspective Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation. If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.  And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :). Thanks in advance for the feebdack!
Potential Megaproject: 'The Cooperation Project' (or the like) This is a very loose idea, based on observations like these: * We have ongoing geopolitical tensions (e.g. China-US, China-Taiwan, Russia-Ukraine) and a lot of resources and attention spent on those. * We have (increasing?) risks from emerging technology that potentially threaten everyone. It's difficult to estimate the risk levels, but there seems to be an emerging consensus that we are on a reckless path, even from perspectives concerned purely with individual or national self-interest. The project would essentially seek to make a clear case for broad cooperation toward avoiding widely agreed-upon bad outcomes from emerging technologies — outcomes that are in nobody's interest. The work could, among other things, consist in reaching out to key diplomats as well as doing high-visibility public outreach that emphasizes cooperation as key to addressing risks from emerging technologies. Reasons it might be worth pursuing: * The degree of cooperation between major powers, especially wrt tech development, is plausibly a critical factor in how well the future will go. Even marginal improvements might be significant. * A strong self-interested case can seemingly be made for increasing cooperation, but a problem might be its relatively low salience as well as primitive status and pride psychology preventing this case from being acted on. * Even if the case is fairly compelling to people, other motivations might nevertheless feel more compelling and motivating; slight pushes in terms of how salient certain considerations are, both in the minds of the public and leaders, could potentially tip the scales in terms of which paths end up being pursued. * The broader goal seems quite commonsensical and like something few people would outright oppose (though see the counter-considerations below). * The work might act as a lever or catalyst of sorts: one can make compelling arguments regarding specific tec