Strong advocate of just having a normal job and give to effective charities.
Doctor in Australia giving 10% forever
Not familiar with this paper so sunk to using Claude to give myself a summary. The misalignment "blind audit game" seems a bit sus
"Three auditing teams played this game under conditions mimicking those of AI developers conducting real pre-deployment alignment audits, including access to model weights and training data"
They had access to the training documents? That doesn't seem like detecting alignment, that's just a search through files to find one with malicious instructions
"A fourth team was given affordances that a lab-external auditors would have—namely, only API access to sample from the model—and failed to discover its hidden objective despite it being possible in principle to do so."
🤔 when they don't have the training documents they can't detect the misalignment?
If they're claiming that finding a file with malicious instructions is detecting misalignment then this would update me further against AI Safety research.
Edit: Actually it looks like the teams were using some clever auto-encoder method rather than just a search through documents. In any case this seems all pretty artificial. This method might detect misalignment if it is due to a few malicious documents mixed-in with the training data and where the malicious behaviour is easy to detect. This feels like killing cancer cells in a petri dish - easy and doesn't tell you much.
That’s a fair point. At either end of the extreme of outcomes: “ASI kills us all” or “ASI quickly uplifts everyone out of poverty” almost all decisions/actions we make today are pretty meaningless.
But if the next few decades fall somewhere between those two extremes, which I think they probably will, the impact of improving people’s lives remains substantial.
(NOTE: Coming at this from a place of: a. ignorance of what the AI Safety community actually does and b. not wanting to take the ego hit of admitting that I have been wrong about my long-held skepticism of AI Safety)
I think it was and is fair to be skeptical of the shift to AI Safety in EA on the basis that it's not that tractable, and that there's there's not clear evidence that the AI Safety movement has had a positive effect on the trajectory of AI.
I think the AI Safety community will be tempted to think they've normalised in the zeitgeist ideas about superintelligent AIs and the philosphical questions and risks that arise from them, but 2001: A Space Odyssey came out in 1968, Terminator in 1984 and The Matrix in 1999 etc.. The ideas of superintelligant AIs and the existential risks of them are diffused through modern culture and it's possible that The Pope and The UN would have made the same statements about them given the recent progress of LLMs regardless of the AI Safety movement.
Are there many ideas in If Anyone Builds It, Everyone Dies that weren't broadly covered in Terminator/The Matrix/2001 a Space Odyssey/Dune etc.?
I haven't seen strong evidence for the direct work of the AI Safety movement reducing existential risks from AI:
Interpretability research seems far from being able to understand more than a few components at a time. And also the companies making AI would likely have been incentivised to do this work regardless of the AI Safety movement because customers don't want a black box.
Â
From the outside it seems there's a good argument that the AI situation would have evolved pretty similarly regardless of EA/AI Safety input.
From that position, it's easy to believe that if EA had just stuck to Earning To Give and malaria nets and decaging chickens then the impact would have been greater, both directly and because the movement might not have lost as much momentum when AI Safety alienated people.
I agree that the depth of the evidence conversations doesn't lend itself to amateur discussion on the forum and I also feel like there's not much I have to add to the GHD discussions here because of that.
Don't think it's fair to say it's not prioritised among the orgs. My understanding is that Coefficient Giving still gives huge amounts to GiveWell charities and grants.
“direct altruistic focus strategically so as to be of positive utility”
Vague and evasive. Say what you mean. If you want to keep poor people poor until some new technology comes out, you should say that. If you don’t think further development will ever be justified, you should say that (so that your contention can be discarded as absurd and impractical)
“From the sumatriptan RCT: 3% were pain-free at 10 minutes after placebo.”
This is an irrational comparison. You’re comparing your best case scenario anecdote to the results of an RCT.
It’s possible that one of those 3% of people would have an anecdote for sumatriptan as convincing as yours: causing rapid resolution of their headache. That anecdote would not be representative.
I’m not saying you’re wrong about psychedelics and cluster headache. I desperately hope you’re right and there is an easy fix. Anecdote leads people astray constantly and we have to have a high suspicion of it.
Thanks Ben. I actually suggested both in my original comment: both
(a) that there is market incentive for the companies to do this themselves so ?did AI Safety movement really move the dial on this?,
and also
(b) that I'm skeptical of the value of interpretability research (based only on not having seen anything impressive come from it, but I'm very ignorant of the field)