AI safety researcher
On a global scale I agree. My point is more that due to the salary standards in the industry, Eliezer isn't necessarily out of line in drawing $600k, and it's probably not much more than he could earn elsewhere; therefore the financial incentive is fairly weak compared to that of Mechanize or other AI capabilities companies.
Being really good at your job is a good way to achieve impact in general, because your "impact above replacement" is what counts. If a replacement level employee who is barely worth hiring has productivity 100, and the average productivity is 150, the average employee will get 50 impact above replacement. If you do your job 1.67x better than average (250 productivity), you earn 150 impact above replacement, which is triple the average.
I strongly disagree with a couple of claims:
MIRI's business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
[...] The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn't really transferable to anything else.
However, things I agree with
If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
the Mechanize co-founders decided to start the company after forming their views on AI safety.
The Yudkowsky/Soares/MIRI argument about AI alignment is specifically that an AGI's goals and motivations are highly likely to be completely alien from human goals and motivations in a way that's highly existentially dangerous.
See the gpt-5 report. "Working lower bound" is maybe too strong; maybe it's more accurate to describe it as an initial guess at a warning threshold for rogue replication and 10x uplift (if we can even measure time horizons that long). I don't know what the exact reasoning behind 40 hours was, but one fact is that humans can't really start viable companies using plans that only take a ~week of work. IMO if AIs could do the equivalent with only a 40 human hour time horizon and continuously evade detection, they'd need to use their own advantages and have made up many current disadvantages relative to humans (like being bad at adversarial and multi-agent settings).
What scale is the METR benchmark on? I see a line that "Scores are normalized such that 100% represents a 50% success rate on tasks requiring 8 human-expert hours.", but is the 0% point on the scale 0 hours?
METR does not think that 8 human hours is sufficient autonomy for takeover; in fact 40 hours is our working lower bound.
What if we decide that the Amazon rainforest has a negative WAW sign? Would you be in favor of completely replacing it with a parking lot, if doing so could be done without undue suffering of the animals that already exist there?
Definitely not completely replacing because biodiversity has diminishing returns to land. If we pave the whole Amazon we'll probably extinct entire families (not to mention we probably cause ecological crises elsewhere and disrupt ecosystem services etc), whereas on the margin we'll only extinct species endemic to the deforested regions.
If the research on WAW comes out super negative I could imagine it being OK to replace half the Amazon with higher-welfare ecosystems now, and work on replacing the rest when some crazy AI tech allows all changes to be fully reversible. But the moral parliament would probably still not be happy about this. Eg killing is probably bad, and there is no feasible way to destroy half the Amazon in the near term without killing most of the animals in it.
My guess is something like: Many organizations have quarterly caps on the number of false claims published. Their employees often want to make false claims, but towards the end of the quarter they're at the cap, so they delay the post to the first day of the next quarter when space is available.
Okay, but why only April 1? Well, on Jan 1 everyone is on holiday, and on July 1 everyone is out enjoying the good weather. Oct 1 coincides with national holidays in populous countries like China, Nigeria, and the US, and people are also hung over from fiscal New Year's Eve. So we only really see the effect on April 1.
I would strongly predict that a false claims spike also happens in places with bad weather on July 1. Unfortunately, most places are in the Northern Hemisphere where it's warm, and Australia has good weather all year, so I think this is only testable when it snows in New Zealand.