Did any of the boosters of real-money prediction markets correctly predict that prediction market platforms would be quickly dominated by thinly disguised sports gambling?
(I mean this question literally and earnestly, not as a snide takedown of prediction markets or their proponents)
Below are (lightly edited) excerpts from a draft research report at Forethought I wrote about AI (super) persuasion. I decided this section didn’t make sense to include this in an “intro to superpersuasion” article [1], but think it’s an interesting and potentially important subquestion that other people might find valuable to model as well.
Could superpersuasion be relevant to AIs?
That is, could some of our superpersuasion worries also apply to AIs persuading other AIs, or humans trying to use AIs to persuade AIs?
I’ll answer this with a firm maybe!
On targeted persuasion
Reasons you might think this is not a real worry:
1. Right now if you want to get an AI to do something and you have control over their inputs, jailbreaks are a much more effective way to manipulate them than human-type manipulation and persuasion
2. AIs think pretty differently from us, and their internals are structured very differently.
3. By the time we reach superintelligence, AIs will have a pretty good sense of how to counteract these worries, or we’re probably pretty effed anyway.
But to dampen those objections, we just need a conjunction of a) our solution to jailbreaks don’t work on “normal” persuasion (this seems reasonable enough to me, jailbreak defense is likely to be a combination of tricks like adversarial training, specific classifiers, and other fairly specific tricks), b) AIs are persuaded, very loosely speaking, by the same things we’re persuaded by (seems right to me, in the literature they even have similar cognitive biases), and c) we develop AI superpersuasion before general superintelligence. A plausible enough conjunction!
On memetic search
On the memetic search side, AIs probably aren’t going to be caught up in the same memetic fervors as us, and are likely more immune in general. On the other hand, there are a few distinct reasons to be more worried:
1. Foundation model AIs are much more similar to each other than we are to each other (at least tod
The UK is set to pass a law that bans the sale of tobacco to anyone born after 2008. Once the king signs it into law, the UK will become the second country in the world to introduce a generational smoking ban, after the Maldives did so last November. (New Zealand also considered such a ban a few years ago, but did not go through with it.)
I wrote a quick post about why I think people committed to working on ASI+animals should be making sure we don't spread wild animal suffering throughout the universe.
Full post here: https://naiveconsequentialism.substack.com/p/dont-green-the-universe
People have pretty different background expectations about what the most relevant or worrying kind of AI misalignment /takeover/... scenario would look like. This also corresponds to different views on when they expect signs of it to be visible (such that not seeing those signs or seeing something else would update them). Among other issues, I think this confuses discussion around whether (e.g.) "alignment is easy" or how we should be updating.[1]
My brain likes pictures, so I've found it useful to tag different views and discussions via the following diagrams (these are pretty "raw"/not-distilled):
1)
2) And a second one, roughly:
how much systems at a given capability level appear safe vs where they actually are on the path or spectrum to the kind of safety we care about[2].
(This also has some more notes on how people might relate differently to the same results / evidence.)
These are very messy sketches! I'm sharing because I made a hacky commitment to post short things and in case it's useful for someone (or in case a comment helps clarify things for me, which I'd definitely appreciate). There's some chance that I'll clean these up and update them later.
1. ^
Related: "conflationary alliances" (see also a post with a version of this dynamic about "charity" on the Forum)
2. ^
Again, a huge part of the problem/confusion seems to be that this is a very underdetermined term; see footnote above. Also feels sort of related to things I wrote about here
(altohugh I'm guessing it's also partly due to the fact that this is a basically unedited sketch and I first drew this because a similar image had come to mind in a variety of contexts, and I wanted a version I could adapt as needed - i.e. it was meant to be flexible. If I were making a v2 I'd probably want to commit more, though.)
Read the Better Futures series here, and discuss it here, all week.