All of Charlie Steiner's Comments + Replies

I think the intersection with recommender algorithms - both in terms of making them, and in terms of efforts to empower people in the face of them - is interesting.

Suppose you have an interface that interacts with a human user by recommending actions (often with a moral component) in reaction to prompting (voice input seems emotionally powerful here), and that builds up a model of the user over time (or even by collecting data about the user much like every other app). How do you build this to empower the user rather than just reinforcing their most predictable tendencies? How to avoid top-down bias pushed onto the user by the company / org making the app?

Thanks for your thorough response, and yeah, I'm broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.

I don't think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importa... (read more)

1
johnjnay
2y
Interesting. I will think more about the sandwiching approach between non-legal experts and legal experts.

Presumably you're aware of various Dylan Hadfield-Menell papers, e.g. https://dl.acm.org/doi/10.1145/3514094.3534130 , https://dl.acm.org/doi/10.1145/3306618.3314258 ,  https://dl.acm.org/doi/10.1145/3514094.3534130 

And of course Xuan's talk ( https://www.lesswrong.com/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems )

But, to be perfectly honest... I think there's part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.

First, the bad: The notion th... (read more)

4
johnjnay
2y
Hi Charlie, thank you for your comment. I  cite many of Dylan's papers in the longer form version of this post: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031 I will check out Xuan's talk. Thanks for sharing that. Instead of: I could expand the statement to cover the larger project of what we are working on: One of the primary goals of this research agenda is to teach AI to follow the spirit of the law in a human-recognizable way. This entails leveraging existing human capabilities for the "law-making" / "contract-drafting" part (how do  we use  the theory and practice of law about how to tell agents what to do?), and conducting research on building AI capabilities for the interpretation part (how do our machine learning processes use data and processes from the theory and practice of law about how agents interpret those directives / contracts?).  Reinforcement learning with human attorney feedback (there are more than 1.3 million lawyers in the US) via natural language interactions with AI models is potentially a powerful process to teach (through training, or fine-tuning, or extraction of templates for in-context prompting of large language models) statutory interpretation, argumentation, and case-based reasoning, which can then be applied more broadly for AI alignment. Models could be trained to assist human attorney evaluators, which theoretically, in partnership with the humans, could allow the combined human-AI evaluation team to have capabilities that surpass the legal understanding of the expert humans alone.  The Foundation Models in use today, e.g., GPT-3, have, effectively, conducted a form of behavioral cloning on a large portion of the Internet to leverage billions of human actions (through natural language expressions). It may be possible to, similarly, leverage billions of human legal data points to build Law Foundation Models through large-scale language model self-supervision on pre-processed legal text data.  Aspects of legal s

One thing that confused me was the assumption at various points that the oracle is going to pay out the entire surplus generated. That'll get the most projects done, but it will have bad results because you'll have spent the entire surplus on charity yachts.

The oracle should be paying out what it takes to get projects done. Not in the sense of labor theory of value, I mean that if you are having trouble attracting projects, payouts should go up, and if you have lots of competition for funding, payouts should go down.

This is actually a lot like a monopsony ... (read more)

4
RyanCarey
2y
I agree it's a problem for the entire surplus to go to the seller. But that problem isn't impactful people getting rich. It's that if the certs are too expensive there'll be too few buyers to clear the market. So I agree that payouts should probably be tuneable. If you want the actual impact of a project to still be known, then you could have the "percent of impact purchased" be the tuneable parameter, if buyers aren't too sensitive to it.

I was already strongly considering moving to Boston, so this makes me feel lucky :)

Neat! Sadly I can't interact with the grants.futureoflife.org webpage yet because my "join the community" application is still sitting around.

1
ggilgallon
2y
Thanks for letting us know. We're looking into it!

I think moderated video calls are my favorite format, as boring as that is. I.e. you have a speaker and also a moderator who picks people to ask questions, cuts people off or prompts them to keep talking depending on their judgment, etc.

Another thing I like, if it seems like people are interested in talking about multiple different things after the main talk / QA / discussion, is splitting up the discussion into multiple rooms by topic. I think Discord is a good application for this. Zoom is pretty bad at this but can be cajoled into having the right functionality if you make everyone a co-host, I think Microsoft Teams is fine but other people have problems, and other people think GatherTown is fine but I have problems.

I'm curious about your takes on the value-inverted versions of the repugnant and very-repugnant conclusions. It's easy to "make sense" of a preference (e.g. for positive experiences) by deciding not to care about it after all, but doing that doesn't actually resolve the weirdness in our feelings about aggregation.

Once you let go of trying to reduce people to a 1-dimensional value first and then aggregate them second, as you seem to be advocating here in ss. 3/4, I don't see why we should try to hold onto simple rules like "minimize this one simple thing." ... (read more)

2
Teo Ajantaival
2y
Thanks! I’m not sure what exactly they are. If either of them means to “replace a few extremely miserable lives with many, almost perfectly untroubled ones”, then it does not sound repugnant to me. But maybe you meant something else. (Perhaps see also these comments about adding slightly less miserable people to hell to reduce the most extreme suffering therein, which seems, to me at least, to result in an overall more preferable population when repeated multiple times.) Did you mean 1. the subjective preference, of the “lives worth living” themselves, to have positive experiences, or 2. the preference of an outside observer, who is looking at the population comparison diagrams, to count those lives as having isolated positive value? If 1, then I would note that e.g. the antifrustrationist and tranquilist accounts would care about that subjective preference, as they would see it as a kind of dissatisfaction with the preferrer’s current situation. Yet when we are looking at only causally isolated lives, these views, like all minimalist views, would say that there is no need to create dissatisfied (or even perfectly fulfilled) beings for their own sake in the first place. (In other words, creating and fulfilling a need is only a roundabout way to not having the need in the first place, unless we also consider the positive roles of this process for other needs, which we arguably should do in the practical world.) If 2, then I’d be eager to understand what seems to be missing with the previous “need-based” account. (I agree that the above points are unrelated to how to aggregate e.g. small needs vs. extreme needs. But in a world with extreme pains, I might e.g. deprioritize any amount of isolated small pains, i.e. small pains that do not increase the risk of extreme pains nor constitute a distraction or opportunity cost for alleviating extreme pains. Perhaps one could intuitively think of this as making “the expected amount of extreme pains” the common curren

Academics choose to work on things when they're doable, important, interesting, publishable, and fundable. Importance and interestingness seem to be the least bottlenecked parts of that list.

The root of the problem is difficulty in evaluating the quality of work. There's no public benchmark for AI safety that people really believe in (nor do I think there can be, yet - talk about AI safety is still a pre-paradigmatic problem), so evaluating the quality of work actually requires trusted experts sitting down and thinking hard about a paper - much harder than... (read more)

That's a good point. I'm a little worried that coarse-grained metrics like "% unemployment" or "average productivity of labor vs. capital" could fail to track AI progress if AI increases the productivity of labor. But we could pick specific tasks like making a pencil, etc. and ask "how many hours of human labor did it take to make a pencil this year?" This might be hard for diverse task categories like writing a new piece of software though.

What would a plausible capabilities timeline look like, such that we could mark off progress against it?

Rather than replacing jobs in order of the IQ of humans that typically end up doing them (the naive anthropocentric view of "robots getting smarter"), what actually seems to be happening is that AI and robotics develop capabilities for only part of a job at a time, but they do it cheap and fast, and so there's an incentive for companies/professions to restructure to take advantage of AI. Progressions of jobs eliminated is therefore going to be weird and ... (read more)

3
Stefan_Schubert
3y
I think it's useful to talk about job displacement as well, even if it's partial rather than full. We've talked about job displacement due to automation (most of which is unrelated to AI) for centuries, and it seems useful to me. It doesn't assume that machines (e.g. AI) are solving tasks in the same way as humans would do; only that they reduce the need for human labour. Though I guess it depends on what you want to do - for some purposes, it may be more useful to look at AI capabilities regarding more specific tasks.

Scalability, or cost?

When I think of failure to scale, I don't just think of something with high cost (e.g. transmutation of lead to gold), but something that resists economies of scale.

Level 1 resistance is cost-disease-prone activities that haven't increased efficiency in step with most of our economy, education being a great example. Individual tutors would greatly increase results for students, but we can't do it. We can't do it because it's too expensive. And it's too expensive because there's no economy of scale for tutors - they're not like solar pa... (read more)

1
PeterSlattery
3y
Thanks, this was particularly useful for me!
1
Harrison Durland
3y
(+2, I really like the breakdown of different effects. I haven't really tried critically analyzing it for issues, but I definitely feel like it helped carve out/prop up some initial ideas)