Off and on projects in epistemic public goods, AI alignment (mostly interested in multipolar scenarios, cooperative AI, ARCHES, etc.), and community building. I've probably done my best work as a research engineer on a cryptography team, I'm pretty bad at product engineering.
"EV is measure times value" is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.
Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?
Yeah, I think "ASI implies an extreme case of lock-in" is a major tendency in the literature (especially sequences-era), but 1. people disagree about whether "alignment" refers to something that outsmarts even this implication or not, then they disagree about relative tractability and plausibility of the different alignment visions, and 2. this is very much a separate set of steps that provide room for disagreement among people who broadly accept Eliezer-like threatmodels (doomcoin stuff).
I don't want to zero in on actually-existing Eliezer (at whichever time step), I'm more interested in like a threatmodel class or cluster around lack of fire alarms, capabilities we can't distinguish from magic, things of this nature.
Super great post. I've been thinking about posting a nuance in (what I think about) the Eliezer class of threat models but haven't gotten around to it. (Warning: negative valence, as I will recall the moment I first underwent visceral sadness at the alignment problem).
Rob Bensinger tweeted something like "if we stick the landing on this, I'm going to lose an unrecoverable amount of bayes points", and for two years already I've had a massively different way of thinking about deployment of advanced systems because I find something like a "law of mad science" very plausible.
The high level takeaway is that (in this class of threat models) we can "survive takeoff" (not that I don't hate that framing) and accumulate lots of evidence that the doomcoin landed on heads (really feeling like we're in the early stages of a glorious transhuman future or a more modest FALGSC), for hundreds of years. And then someone pushes a typo in a yaml file to the server, and we die.
There seems to be very little framing of "mostly Eliezer-like 'flipping the doomcoin' scenario, where forecasters thus far have only concerned themselves with the date of the first flip, but from then on the doomcoin is flipped on new years eve at midnight every year until it comes up tails and we die". In other words, if we are obligated to hustle the weight of the doomcoin now before the first flip, then we are at least as obligated to apply at least constant vigilance, forevermore, and there's a stronger case to be made for demanding strictly increasing vigilance (pulling the weight of the doomcoin further and further every year). (this realization was my visceral sadness moment, in 2021 on discord, whereas before I was thinking about threat models as like a fun and challenging video game RNG or whatever).
I think the oxford folks have some literature on "existential security", which I just don't buy or expect at all. It seems deeply unlikely to me that there will be tricks we can pull after the first time the doomcoin lands on heads to keep it from flipping again. I think the "pivotal act" literature from MIRI tries to discuss this, by thinking about ways we can get some freebie years thrown in there (new years eve parties with no doomcoin flip), which is better than nothing. But this constant/increasing vigilance factor or the repeated flips of doomcoin seems like a niche informal inside view among people who've been hanging out longer than a couple years.
Picking on Eliezer as a public intellectual for a second, insofar as my model of him is accurate (that his confidence that we die is more of an "eventually" thing and he has very little relation to Conjecture, who in many worlds will just take a hit to their brier score in 2028, which Eliezer will be shielded from because he doesn't commit to dates), I would have liked to see him retweet the Bensinger comment and warn us about all the ways in which we could observe wildly transformative AI not kill everyone, declare victory, then a few hundred years later push a bad yaml file to the server and die.
(All of this modulo my feelings that "doomcoin" is an annoying and thought-destroying way of characterizing the distribution over how you expect things to go well and poorly, probably at the same time, but that's it's own jar of paperclips)
I mean in a sense a venue that hosts torres is definitionally trashy due to https://markfuentes1.substack.com/p/emile-p-torress-history-of-dishonesty except insofar as they haven't seen or don't believe this Fuentes person.
I totally screwed up by not actually reading this post until today, even though I spoke a bunch with Xuan in meatspace about it and related topics.
I want to highlight it as a positive example of how I think epistemic diversity claims should be made! It's literally like "I found a specific thing in your blindspots that I expect to provide value in these particular ways to help you accomplish your high level goals", which is great. Making a positive case for "contracts are the primitive units we want to study in alignment, not preferences" really hit hard and gave the author license to have earlier described their unease with the literature's blindspots.
Yes. I think of this as "do things that don't scale" applied to acts of kindness.
Ambitiously impartial massive levers/wins are still the right thing to want, but the daily path to them might be more intricate than, say, your behavior during a fastforward in the Click universe.
Back to the PG analogy: I think it's rather too often EAs do the equivalent of saying "I will ascend from scrappy garage band to lex luthor in like a year, by doing things similar to what lex luthor is doing now", when in reality you can't start up a startup by acting like a 2023 FAANG acts. The playbooks actually have nothing in common even if FAANGs were all garage bands at one point in time. I'm glad EA cultivates ambition and everything, but, YC cultivates ambition probably more effectively than EA does.