Working (0-5 years experience)
213Norfolk, VA 23517, USAJoined Mar 2022


I'm an artist, writer, and human being.

To be a little more precise: I make video games, edit Wikipedia, and write here and on LessWrong!


This has created a potentially dangerous mismatch in public perception between what the more serious AI safety researchers think they're doing (e.g. reducing X risk from AGI), and what the public thinks AI safety is doing (e.g. developing methods to automate partisan censorship, to embed woke values into AI systems, and to create new methods for mass-customized propaganda).

This is the crux of the problem, yes. I don’t think this is because of a “conservative vs liberal” political rift though; the left is just as frustrated by, say, censorship of sex education or queer topics as the right may be upset by censorship of “non-woke” discussion—what matters is that the particular triggers for people on what is appropriate or not to censor are extremely varied, both across populations and across time. I don’t think it’s necessary to bring politics into this as an explanatory factor (though it may of course exacerbate existing tension).

Yes, the consequences are probably less severe in this context, which is why I wouldn't consider this a particularly strong argument. Imo, it's more important to understand this line of thinking for the purpose of modeling outsider's reactions to potential censorship, as this seems to be how people irl are responding to OpenAI, et al's policy decisions.

I would also like to emphasize again that sometimes regulation is necessary, and I am not against it on principle, though I do believe it should be used with caution; this post is critiquing the details of how we are implementing censorship in large models, not so much its use in the first place.

There's nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea.

The argument in that section was not actually  an object-level one, but rather an argument from history and folk deontological philosophy (in the sense that "censorship is bad" is a useful, if not perfect, heuristic used in most modern Western societies). Nonetheless, here's a few  reasons why what you mentioned could be a bad idea: Goodhart's law, the Scunthorpe Problem, and the general tendency for unintended side effects. We can't directly measure "diversity" or assign an exact "violence level" to a piece of text or media (at least not without a lot more context which we may not always have), so instead any automated censorship program is forced to use proxies for toxicity instead.To give a real-world and slightly silly example, TikTok's content filters have led to almost all transcriptions of curse words and sensitive topics to be replaced with some similar-sounding but unrelated words, which in turn has spawned a new form of internet "algospeak." (I highly recommend reading the linked article if you have the time) This was never the intention of the censors, but people adopted to optimize for the proxy by changing their dialect instead of their content actually becoming any less toxic. On a darker note, this also had a really bad side effect where videos about vital-but-sensitive topics such as sex education, pandemic preparedness, war coverage, etc. became much harder to find and understand (to outsiders) as a result. Instead of increasing diversity, well-meaning censorship can lead to further breakdowns in communication surprisingly often.

Came across this post today—I assume the bounty has been long-closed by now?

Thanks, I think I somehow missed some of those!

Thanks for the clarification! I might try to do something on the Orthogonality thesis if I get the chance, since I think that tends to be glossed over in a lot of popular introductions.

Answer by YitzAug 05, 20223

My perspective on the issue is that by accepting the wager, you are likely to become far less effective at achieving your terminal goals, (since even if you can discount higher-probability wagers, there will eventually be a lower-probability one that you won’t be able to think your way out of and thus have to entertain on principle), and become vulnerable to adversarial attacks, leading to actions which in the vast majority of possible universes are losing moves. If your epistemics require that you spend all your money on projects that will, for all intents and purposes do nothing (and which if universally followed would lead to a clearly dystopian world where only muggers get money), then I’d wager that the epistemics are the problem. Rationalists, and EAs, should play to win, and not fall prey to obvious basilisks of our own making.

Question—is $20,000 awarded to every entry which qualifies under the rules, or is there one winner selected among the pool of all who submit an entry?

This is really exciting! I’m glad there are so many talented people on the case, and hope the good news will only grow from here :)

I strongly agree with you on points one and two, though I’m not super confident on three. For me the biggest takeaway is we should be putting more effort into attempts to instill “false” beliefs which are safety-promoting and self-stable.

Load More