User Comment Replies

Aligning Recommender Systems as Cause Area

Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I've come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it's not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.

If there isn't a clear problem you're going to have huge sign uncertainty on the impact of any given change"

This is closer to a crux. I think there are... (read more)

Rohin Shah

Some illustrative hypotheticals of how these could go poorly: * To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn't enough data and your reward model places high reward on very negative things it hasn't seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good). * Same situation, except the problem is that the examples you collected weren't representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from "mindless fun", hurting the people who wanted mindless fun) * Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was "unvirtuous fun", and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn't hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.) * You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think "wow, X is so bad, they do Y, I bet I could find tons of examples of that" and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm. * You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since

A Viral License for AI Safety

IvanVendrov5y4

This is a helpful counterpoint. From big tech companies' perspective, I think that GPL (and especially aGPL) is close to the worst case scenario, since it destroys the ability to have proprietary software and can pose an existential risk to the company by empowering their competitors. Most of the specific clauses we discuss are not nearly so dangerous - they at most impose some small overhead on using or releasing the code. Corrigibility is the only clause that I can see being comparably dangerous: depending on the mechanism used to create future versions of the license, companies may feel they are giving too much control over their future to a third party.

technicalities

I think I generalised too quickly in my comment; I saw "virality" and "any later version" and assumed the worst. But of course we can take into account AGPL backfiring when we design this licence! One nice side effect of even a toothless AI Safety Licence: it puts a reminder about safety into the top of every repo. Sure, no one reads licences (and people often ignore health and safety rules when it gets in their way, even at their own risk). But maybe it makes things a bit more tangible like LICENSE.md gives law a foothold into the minds of devs.

Aligning Recommender Systems as Cause Area

IvanVendrov7y2

Agreed that's an important distinction. I just assumed that if you make an aligned system, it will become trusted by users, but that's not at all obvious.

Aligning Recommender Systems as Cause Area

IvanVendrov7y11

My mental model of why Facebook doesn't have "turn off inflammatory political news" and similar switches is because 99% of their users never toggle any such switches, so the feature won't affect any of the metrics they track, so no engineer or product manager has an incentive to add it. Why won't users toggle the switches? Part of it is laziness; but mostly I think users don't trust that the system will faithfully give them what they want based on a single short description like "inflammatory political news" -what if they miss out on an important national

... (read more)

William_S

Appreciate that point that they are competing for time (as I was only thinking of monopolies over content). If the reason it isn't used is that users don't "trust that the system will give what they want given a single short description", then part of the research agenda for aligned recommender systems is not just producing systems that are aligned, but systems where their users have a greater degree of justified trust that they are aligned (placing more emphasis on the user's experience of interacting with the system). Some of this research could potentially take place with existing classification-based filters.

Aligning Recommender Systems as Cause Area

IvanVendrov7y4

The first two links are identical; was that your intention?

Thanks for the catch - fixed.

Aligning Recommender Systems as Cause Area

IvanVendrov7y3

Definitely the latter. Though I would frame it more optimistically as "better alignment of recommender systems seems important, there's a lot of plausible solutions out there, let's prioritize them and try out the few most promising ones". Actually doing that prioritization was out of scope for this post but definitely something we want to do - and are looking for collaborators on.

Aligning Recommender Systems as Cause Area

IvanVendrov7y1

To my mind they are fully complementary: Iterated Amplification is a general scheme for AI alignment, whereas this post describes an application area where we could use and learn more about various alignment schemes. I personally think using amplification for aligning recommender systems is very much worth trying. It would have great direct positive effects if it worked, and the experiment would shed light on the viability of the scheme as a whole.

Milan Griffes

Thanks. I guess I'm fuzzy on what your actual research proposal is. Are you proposing to implement an Iterated Amplification approach on existing recommender systems? Or are you more agnostic about specific implementations? ("Hey, better alignment of recommender systems seems important, but we don't yet know what to do about that specifically.")

Effective Altruism Forum
EA Forum

All of IvanVendrov's Comments + Replies