IvanVendrov

Wiki Contributions

Comments

Aligning Recommender Systems as Cause Area

Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I've come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it's not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.

If there isn't a clear problem you're going to have huge sign uncertainty on the impact of any given change"

This is closer to a crux. I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.

From your Alignment Forum comment,

The core feature of AI alignment is that the AI system deliberately and intentionally does things, and creates plans in new situations that you hadn't seen before, which is not the case with recommender systems.

This seems like the real crux. I'm not sure how exactly you define "deliberately and intentionally" but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).

A Viral License for AI Safety

This is a helpful counterpoint. From big tech companies' perspective, I think that GPL (and especially aGPL) is close to the worst case scenario, since it destroys the ability to have proprietary software and can pose an existential risk to the company by empowering their competitors. Most of the specific clauses we discuss are not nearly so dangerous - they at most impose some small overhead on using or releasing the code. Corrigibility is the only clause that I can see being comparably dangerous: depending on the mechanism used to create future versions of the license, companies may feel they are giving too much control over their future to a third party.

Aligning Recommender Systems as Cause Area

Agreed that's an important distinction. I just assumed that if you make an aligned system, it will become trusted by users, but that's not at all obvious.

Aligning Recommender Systems as Cause Area

My mental model of why Facebook doesn't have "turn off inflammatory political news" and similar switches is because 99% of their users never toggle any such switches, so the feature won't affect any of the metrics they track, so no engineer or product manager has an incentive to add it. Why won't users toggle the switches? Part of it is laziness; but mostly I think users don't trust that the system will faithfully give them what they want based on a single short description like "inflammatory political news" -what if they miss out on an important national story? What if a close friend shares a story with them and they don't see it? What if their favorite comedian gets classified as inflammatory and filtered out?

As additional evidence that we're more bottlenecked by research than by incentives, consider Twitter's call for research to measure the "health" of Twitter conversations, and Facebook's decision to demote news content. I believe if you gave most companies a robust and well-validated metric (analogous to differential privacy) for alignment with user value, they would start optimizing for it even at the cost of some short term growth/revenue.

The monopoly point is interesting. I don't think existing recommender systems are well modelled as monopolies; they certainly behave as if they are in a life-and-death struggle with each other, probably because their fundamental product is "ways to occupy your time" and that market is extremely competitive. But a monopoly might actually be better because it wouldn't have the current race to the bottom in pursuit of monetisable eyeballs.

Aligning Recommender Systems as Cause Area

The first two links are identical; was that your intention?

Thanks for the catch - fixed.

Aligning Recommender Systems as Cause Area

Definitely the latter. Though I would frame it more optimistically as "better alignment of recommender systems seems important, there's a lot of plausible solutions out there, let's prioritize them and try out the few most promising ones". Actually doing that prioritization was out of scope for this post but definitely something we want to do - and are looking for collaborators on.

Aligning Recommender Systems as Cause Area

To my mind they are fully complementary: Iterated Amplification is a general scheme for AI alignment, whereas this post describes an application area where we could use and learn more about various alignment schemes. I personally think using amplification for aligning recommender systems is very much worth trying. It would have great direct positive effects if it worked, and the experiment would shed light on the viability of the scheme as a whole.