All posts

Old

Week of Sunday, 28 April 2024
Week of Sun, 28 Apr 2024

AI safety 28
Community 15
Building effective altruism 14
Announcements and updates 10
Animal welfare 8
Policy 8
More

Frontpage Posts

28
· · 1m read
16
defun
· · 1m read
15
niplav
· · 6m read
187
kta
· · 15m read
3
· · 1m read
40
· · 1m read

Personal Blogposts

Quick takes

70
William_S
14d
5
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
59
tlevin
17d
5
I think some of the AI safety policy community has over-indexed on the visual model of the "Overton Window" and under-indexed on alternatives like the "ratchet effect," "poisoning the well," "clown attacks," and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable. I'm not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of "Overton Window-moving" strategies executed in practice have larger negative effects via associating their "side" with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies. In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea "outside the window" and this actually makes the window narrower. But I think the visual imagery of "windows" actually struggles to accommodate this -- when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences. Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).
Trump recently said in an interview (https://time.com/6972973/biden-trump-bird-flu-covid/) that he would seek to disband the White House office for pandemic preparedness. Given that he usually doesn't give specifics on his policy positions, this seems like something he is particularly interested in. I know politics is discouraged on the EA forum, but I thought I would post this to say: EA should really be preparing for a Trump presidency. He's up in the polls and IMO has a >50% chance of winning the election. Right now politicians seem relatively receptive to EA ideas, this may change under a Trump administration.
21
MathiasKB
18d
4
Excerpt from the most recent update from the ALERT team:   Highly pathogenic avian influenza (HPAI) H5N1: What a week! The news, data, and analyses are coming in fast and furious. Overall, ALERT team members feel that the risk of an H5N1 pandemic emerging over the coming decade is increasing. Team members estimate that the chance that the WHO will declare a Public Health Emergency of International Concern (PHEIC) within 1 year from now because of an H5N1 virus, in whole or in part, is 0.9% (range 0.5%-1.3%). The team sees the chance going up substantially over the next decade, with the 5-year chance at 13% (range 10%-15%) and the 10-year chance increasing to 25% (range 20%-30%).   their estimated 10 year risk is a lot higher than I would have anticipated.
Not sure how to post these two thoughts so I might as well combine them. In an ideal world, SBF should have been sentenced to thousands of years in prison. This is partially due to the enormous harm done to both FTX depositors and EA, but mainly for basic deterrence reasons; a risk-neutral person will not mind 25 years in prison if the ex ante upside was becoming a trillionaire. However, I also think many lessons from SBF's personal statements e.g. his interview on 80k are still as valid as ever. Just off the top of my head: * Startup-to-give as a high EV career path. Entrepreneurship is why we have OP and SFF! Perhaps also the importance of keeping as much equity as possible, although in the process one should not lie to investors or employees more than is standard. * Ambition and working really hard as success multipliers in entrepreneurship. * A career decision algorithm that includes doing a BOTEC and rejecting options that are 10x worse than others. * It is probably okay to work in an industry that is slightly bad for the world if you do lots of good by donating. [1] (But fraud is still bad, of course.) Just because SBF stole billions of dollars does not mean he has fewer virtuous personality traits than the average person. He hits at least as many multipliers than the average reader of this forum. But importantly, maximization is perilous; some particular qualities like integrity and good decision-making are absolutely essential, and if you lack them your impact could be multiplied by minus 20.     [1] The unregulated nature of crypto may have allowed the FTX fraud, but things like the zero-sum zero-NPV nature of many cryptoassets, or its negative climate impacts, seem unrelated. Many industries are about this bad for the world, like HFT or some kinds of social media. I do not think people who criticized FTX on these grounds score many points. However, perhaps it was (weak) evidence towards FTX being willing to do harm in general for a perceived greater good, which is maybe plausible especially if Ben Delo also did market manipulation or otherwise acted immorally. Also note that in the interview, SBF didn't claim his donations offset a negative direct impact; he said the impact was likely positive, which seems dubious.