Hide table of contents

"In times of change, learners inherit the earth, while the learned find themselves beautifully equipped to deal with a world that no longer exists"

—Eric Hoffer

 

Summary:

We're already in the timeline where the research and manipulation of the human thought process is widespread; SOTA psychological research systems require massive amounts of human behavior data, which in turn requires massive numbers of unsuspecting test subjects (users) in order to automate the process of analyzing and exploiting human targets. This therefore must happen covertly, and both the US and China have a strong track record of doing things like this. This outcome is a strong attractor state since anyone with enough data can do it, and it naturally follows that powerful organizations would deny others access e.g. via data poisoning. Most people are already being persuaded that this is harmless, even though it is obviously ludicrously dangerous. Therefore, we are probably already in a hazardously transformative world and must take standard precautions immediately.

 

This should not distract people from AI safety. This is valuable because the AI safety community must survive. This problem connects to the AI safety community in the following way:

State survival and war power ==> already depends on information warfare capabilities.

Information warfare capabilities ==> already depends on SOTA psychological research systems.

SOTA psychological research systems ==> already improves and scales mainly from AI capabilities research, diminishing returns on everything else.[1]

AI capabilities research ==> already under siege from the AI safety community.

Therefore, the reason why this might be such a big concern is:

State survival and war power ==> their toes potentially already being stepped on by the AI safety community?

Although it's also important to note that people with access to SOTA psychological research systems are probably super good at intimidation and bluffing, it's also the case that the AI safety community needs to get a better handle on the situation if we are in the bad timeline; and the math indicates that we are already well past that point.

 

The Fundamental Problem

If there were intelligent aliens, made of bundles of tentacles or crystals or plants that think incredibly slowly, their minds would also have discoverable exploits/zero days, because any mind that evolved naturally would probably be like the human brain, a kludge of spaghetti code that is operating outside of its intended environment.

They would probably not even begin to scratch the surface of finding and labeling those exploits, until, like human civilization today, they began surrounding thousands or millions of their kind with sensors that could record behavior several hours a day and find webs of correlations.

In the case of humans, the use of social media as a controlled environment for automated AI-powered experimentation appears to be what created that critical mass of human behavior data. Current 2020s capabilities for psychological research and manipulation vastly exceed the 20th century academic psychology paradigm. 

The 20th century academic psychology paradigm still dominates our cultural impression of what it means to research the human mind; but when the effectiveness of psychological research and manipulation starts increasing by an order of magnitude every 4 years, it becomes time to stop mentally living in a world that was stabilized by the fact that manipulation attempts generally failed.

The capabilities of social media to steer human outcomes are not advancing in isolation, they are parallel to a broad acceleration in the understanding and exploitation of the human mind, which itself is a byproduct of accelerating AI capabilities research.

By comparing people to other people and predicting traits and future behavior, multi-armed bandit algorithms can predict whether a specific research experiment or manipulation strategy is worth the risk of undertaking at all in the first place; resulting in large numbers of success cases a low detection rate (as detection would likely yield a highly measurable response, particularly with substantial sensor exposure).

When you have sample sizes of billions of hours of human behavior data and sensor data, millisecond differences in reactions from different kinds of people (e.g. facial microexpressions, millisecond differences at scrolling past posts covering different concepts, heart rate changes after covering different concepts, eyetracking differences after eyes passing over specific concepts, touchscreen data, etc) transform from being imperceptible noise to becoming the foundation of webs of correlations mapping the human mind.

Unfortunately, the movement of scrolling past a piece of information on a social media news feed with a mouse wheel or touch screen will generate at least one curve since the finger accelerates and decelerates each time. Trillions of those curves are outputted each day by billions of people. These curves are linear algebra, the perfect shape to plug into ML.

Social media’s individualized targeting uses deep learning and massive samples of human behavioral data to procedurally generate an experience that fits human mind like a glove, in ways we don't fully understand, but allow hackers incredible leeway to find ways to optimize for steering people’s thinking or behavior in measurable directions, insofar as those directions are measurable. This is something that AI can easily automate. I originally thought that going deeper into human thought reading/interpretability was impossible, but I was wrong; the human race is already well past that point as well, due to causality inference.

Most people see social media influence as something that happens to people lower on the food chain, but this is no longer true; there is at least one technique, Clown Attacks, that generally works on all humans who aren't already aware of it, regardless of intelligence or commitment to truthseeking; and that particular technique became discoverable with systems vastly weaker than the ones that exist today. I don't know what manipulation strategies the current systems can find, but I can predict with great confidence that they're well above the clown attack. 

First it started working on 60% of people, and I didn’t speak up, because my mind wasn’t as predictable as people in that 60%. Then, it started working on 90% of people, and I didn’t speak up, because my mind wasn’t as predictable as the people in that 90%. Then it started working on me. And by then, it was already too late, because it was already working on me.

 

They would notice and pursue these capabilities

If the big 5 tech companies (Facebook, Google, Microsoft, Amazon, and Apple) notice ways to steer people towards buying specific products, or to feel a wide variety of compulsions to avoid quitting the platform, and to prevent/counteract other platforms from running multi-armed bandit algorithms to automatically exploit strategies (e.g. combinations of posts) to plunder their users, then you can naturally assume that they've noticed their capabilities to steer people in a wide variety of other directions as well. 

The problem is that major governments and militaries are overwhelmingly incentivized and well-positioned to exploit those capabilities for offensive and defensive information warfare; if American companies abstain from manipulation capabilities, and Chinese companies naturally don't, then American intelligence agencies worry about a gap in information warfare capabilities and push the issue.

Government/military agencies are bottlenecked by competence, which is difficult to measure due to a lack of transparency at higher levels and high employee turnover at lower levels, but revolving-door employment easily allows them to source flexible talent from the talent pools of the big 5 tech companies; this practice is endangered by information warfare itself, further driving interest in information warfare superiority. 

Access to tech company talent pools also determine the capability of intelligence agencies to use OS exploits and chip firmware exploits needed to access the sensors in the devices of almost any American, not just the majority of Americans that leave various sensor permissions on, which allows even greater access to the psychological research needed to compromise critical elites such as the AI safety community

The capabilities of these systems to run deep bayesian analysis on targets dramatically improves with more sensor data, particularly video feed on the face and eyes; this is not necessary to run interpretability on the human brain, but it increases these capabilities by yet more orders of magnitude (particularly for lie detection and conversation topic aversion).

 

This can't go on

There isn’t much point in having a utility function in the first place if hackers can change it at any time. There might be parts that are resistant to change, but it’s easy to overestimate yourself on this; especially if the effectiveness of SOTA influence systems have been increasing by an order of magnitude every 4 years. Your brain has internal conflicts, and they have human internal causality interpretability systems. You get constant access to the surface, but they get occasional access to the deep.

The multi-armed bandit algorithm will keep trying until it finds something that works. The human brain is a kludge of spaghetti code, so there’s probably something somewhere. 

The human brain has exploits, and the capability and cost of social media platforms to use massive amounts of human behavior data to find complex social engineering techniques is a profoundly technical matter, you can’t get a handle on this with intuition or pre-2010s historical precedent. 

Thus, you should assume that your utility function and values are at risk of being hacked at an unknown time, and should therefore be assigned a discount rate to account for the risk over the course of several years.

Slow takeoff over the course of the next 10 years alone guarantees that this discount rate is too high, in reality, for people in the AI safety community to continue to go on believing that it is something like zero. 

I think that approaching zero is a reasonable target, but not with the current state of affairs where people don’t even bother to cover up their webcams, have important and sensitive conversations about the fate of the earth in rooms with smartphones, sleep in the same room or even the same bed as a smartphone, and use social media for nearly an hour a day (scrolling past nearly a thousand posts). 

The discount rate in this environment cannot be considered “reasonably” close to zero if the attack surface is this massive; and the world is changing this quickly. We are gradually becoming intractably compromised by slow takeoff before the party even starts.

If people have anything they value at all, and the AI safety community almost certainly does have that, then the current AI safety paradigm of zero effort is wildly inappropriate, it is basically total submission to invisible hackers. The human brain has low-hanging exploits and the structures of power have already been set in motion, so we must take the standard precautions immediately.

The AI safety community must survive the storm.

  1. ^

    e.g. Facebook could hire more psychologists to label data and correlations, but has greater risk of one of them breaking their NDA and trying to leak Facebook's manipulation capabilities to the press like Snowden and Fishback. Stronger AI means more ways to make users label their own data.

No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 31m read
 · 
James Özden and Sam Glover at Social Change Lab wrote a literature review on protest outcomes[1] as part of a broader investigation[2] on protest effectiveness. The report covers multiple lines of evidence and addresses many relevant questions, but does not say much about the methodological quality of the research. So that's what I'm going to do today. I reviewed the evidence on protest outcomes, focusing only on the highest-quality research, to answer two questions: 1. Do protests work? 2. Are Social Change Lab's conclusions consistent with the highest-quality evidence? Here's what I found: Do protests work? Highly likely (credence: 90%) in certain contexts, although it's unclear how well the results generalize. [More] Are Social Change Lab's conclusions consistent with the highest-quality evidence? Yes—the report's core claims are well-supported, although it overstates the strength of some of the evidence. [More] Cross-posted from my website. Introduction This article serves two purposes: First, it analyzes the evidence on protest outcomes. Second, it critically reviews the Social Change Lab literature review. Social Change Lab is not the only group that has reviewed protest effectiveness. I was able to find four literature reviews: 1. Animal Charity Evaluators (2018), Protest Intervention Report. 2. Orazani et al. (2021), Social movement strategy (nonviolent vs. violent) and the garnering of third-party support: A meta-analysis. 3. Social Change Lab – Ozden & Glover (2022), Literature Review: Protest Outcomes. 4. Shuman et al. (2024), When Are Social Protests Effective? The Animal Charity Evaluators review did not include many studies, and did not cite any natural experiments (only one had been published as of 2018). Orazani et al. (2021)[3] is a nice meta-analysis—it finds that when you show people news articles about nonviolent protests, they are more likely to express support for the protesters' cause. But what people say in a lab setting mig