Thanks for your thorough response, and yeah, I'm broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.
I don't think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importa...
Presumably you're aware of various Dylan Hadfield-Menell papers, e.g. https://dl.acm.org/doi/10.1145/3514094.3534130 , https://dl.acm.org/doi/10.1145/3306618.3314258 , https://dl.acm.org/doi/10.1145/3514094.3534130
And of course Xuan's talk ( https://www.lesswrong.com/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems )
But, to be perfectly honest... I think there's part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.
First, the bad: The notion th...
One thing that confused me was the assumption at various points that the oracle is going to pay out the entire surplus generated. That'll get the most projects done, but it will have bad results because you'll have spent the entire surplus on charity yachts.
The oracle should be paying out what it takes to get projects done. Not in the sense of labor theory of value, I mean that if you are having trouble attracting projects, payouts should go up, and if you have lots of competition for funding, payouts should go down.
This is actually a lot like a monopsony ...
Neat! Sadly I can't interact with the grants.futureoflife.org webpage yet because my "join the community" application is still sitting around.
I think moderated video calls are my favorite format, as boring as that is. I.e. you have a speaker and also a moderator who picks people to ask questions, cuts people off or prompts them to keep talking depending on their judgment, etc.
Another thing I like, if it seems like people are interested in talking about multiple different things after the main talk / QA / discussion, is splitting up the discussion into multiple rooms by topic. I think Discord is a good application for this. Zoom is pretty bad at this but can be cajoled into having the right functionality if you make everyone a co-host, I think Microsoft Teams is fine but other people have problems, and other people think GatherTown is fine but I have problems.
I'm curious about your takes on the value-inverted versions of the repugnant and very-repugnant conclusions. It's easy to "make sense" of a preference (e.g. for positive experiences) by deciding not to care about it after all, but doing that doesn't actually resolve the weirdness in our feelings about aggregation.
Once you let go of trying to reduce people to a 1-dimensional value first and then aggregate them second, as you seem to be advocating here in ss. 3/4, I don't see why we should try to hold onto simple rules like "minimize this one simple thing." ...
Academics choose to work on things when they're doable, important, interesting, publishable, and fundable. Importance and interestingness seem to be the least bottlenecked parts of that list.
The root of the problem is difficulty in evaluating the quality of work. There's no public benchmark for AI safety that people really believe in (nor do I think there can be, yet - talk about AI safety is still a pre-paradigmatic problem), so evaluating the quality of work actually requires trusted experts sitting down and thinking hard about a paper - much harder than...
That's a good point. I'm a little worried that coarse-grained metrics like "% unemployment" or "average productivity of labor vs. capital" could fail to track AI progress if AI increases the productivity of labor. But we could pick specific tasks like making a pencil, etc. and ask "how many hours of human labor did it take to make a pencil this year?" This might be hard for diverse task categories like writing a new piece of software though.
What would a plausible capabilities timeline look like, such that we could mark off progress against it?
Rather than replacing jobs in order of the IQ of humans that typically end up doing them (the naive anthropocentric view of "robots getting smarter"), what actually seems to be happening is that AI and robotics develop capabilities for only part of a job at a time, but they do it cheap and fast, and so there's an incentive for companies/professions to restructure to take advantage of AI. Progressions of jobs eliminated is therefore going to be weird and ...
Scalability, or cost?
When I think of failure to scale, I don't just think of something with high cost (e.g. transmutation of lead to gold), but something that resists economies of scale.
Level 1 resistance is cost-disease-prone activities that haven't increased efficiency in step with most of our economy, education being a great example. Individual tutors would greatly increase results for students, but we can't do it. We can't do it because it's too expensive. And it's too expensive because there's no economy of scale for tutors - they're not like solar pa...
I think the intersection with recommender algorithms - both in terms of making them, and in terms of efforts to empower people in the face of them - is interesting.
Suppose you have an interface that interacts with a human user by recommending actions (often with a moral component) in reaction to prompting (voice input seems emotionally powerful here), and that builds up a model of the user over time (or even by collecting data about the user much like every other app). How do you build this to empower the user rather than just reinforcing their most predictable tendencies? How to avoid top-down bias pushed onto the user by the company / org making the app?