All of TedSanders's Comments + Replies

Excellent post.

I want to highlight something that I missed on the first read but nagged me on the second read.

You define transformative AGI as:

1. Gross world product (GWP) exceeds 130% of its previous yearly peak value
2. World primary energy consumption exceeds 130% of its previous yearly peak value
3. Fewer than one billion biological humans remain alive on Earth

You predict when transformative AGI will arrive by building a model that predicts when we'll have enough compute to train an AGI.

But I feel like there's a giant missing link - what are the odds tha... (read more)

7
Matthew_Barnett
9mo
Thanks for the comment. I think you're right that my post neglected to discuss these considerations. On the other hand, my bottom-line probability distribution at the end of the post deliberately has a long tail to take into account delays such as high cost, regulation, fine-tuning, safety evaluation, and so on. For these reasons, I don't think I'm being too aggressive. Regarding the point about high cost in particular: it seems unlikely to me that TAI will have a prohibitively high inference cost. As you know, Joseph Carlsmith estimated brain FLOP with a central estimate of 10^15 FLOP/s. This is orders of magnitude higher than the cost of LLMs today, and it would still be comparable to prevailing human wages at current hardware prices. In addition, there are more considerations that push me towards TAI being cheap: 1. A large fraction of our economy can be automated without physical robots. The relevant brain anchor for intellectual tasks is arguably the cerebral cortex rather than the full human brain. And according to Wikipedia, "There are between 14 and 16 billion neurons in the human cerebral cortex." It's not clear to me how many synapses there are in the cerebral cortex, but if the synapse-to-neuron ratio is consistent throughout the brain, then the inference cost of the cerebral cortex is plausibly about 1/5th the inference cost of the whole brain. 2. The human brain is plausibly undertrained relative to its size, due to evolutionary constraints that push hard against delaying maturity in animals. As a consequence, ML models with brain-level efficiency can probably match human performance at much lower size (and thus, inference cost). I currently expect this consideration to mean that the human brain is 2-10x larger than "necessary". 3. The chinchilla scaling laws suggest that inference costs should grow at about half the rate as training costs. This is essentially the dual consideration of the argument I gave in the post about data not being a major b
5
Lorenzo Buonanno
1y
It was announced two days ago: https://forum.effectivealtruism.org/posts/NZz3Das7jFdCBN9zH/announcing-the-open-philanthropy-ai-worldviews-contest

Absolutely. A few comments:

  • Stated preference (uplifting documentaries) and revealed preference (reality TV crime shows) are different
  • Asking people for their preference is quite difficult - only a small fraction of Netflix users give star ratings or thumb ratings. In general, users like using software to achieve their immediate goals. It's tough to get them to invest time and skill into making it better in the future. For most people, each app is a tiny tiny slice of their day and they don't want to do work to optimize anything. Customization and user contr
... (read more)

I work at Netflix on the recommender. It's interesting to read this abstract article about something that's very concrete for me.

For example, the article asks, "The key question any model of the problem needs to answer is - why aren’t recommender systems already aligned."

Despite working on a recommender system, I genuinely don't know what this means. How does one go about measuring how much a recommender is aligned with user interests? Like, I guarantee 100% that people would rather have the recommendations given by Netflix ... (read more)

1
aviv
2y
I just added a comment above which aims to provide a potential answer to this question—that you can use "approaches like those I describe here (end of the article; building on this which uses mini-publics)".  This may not directly get you something to measure, but it may be able to elicit the values needed for defining an objective function. You provide the example of this very low bar: The goal here would be to scope out what a much higher bar might look like. 
2
tamgent
3y
Thanks for raising this. I appreciate specification is hard, but I think there's a broader lens on 'user interests' with more acknowledgement for the behavioural side. What users want in one moment isn't always the same as what they might endorse when in a less slippery behavioural setting or upon reflection. You might say this is a human not a technical problem. True, but we can design systems to that help us optimize for our long-term goals and that is a different task to optimizing for what we click on in a given moment. Sure it's much harder to specify, but I think user research can be done. Thinking about the user more holistically could open up new innovations too. Imagine a person has watched several videos in a row about weight loss and rather than keeping them on the couch longer, it learns to respond with good nudges: prompts them to get up and go for a run, reminds them of their personal goals for the day (because it has such integrations), messages your running buddy, closes itself (and has nice configurable settings with good defaults),  or advertises joining a local running group (right now the local running group would not afford the advert, but in a world where recommenders weight ad quality to somehow include long-term preferences of the user, that might be different).  I understand the measurement frustration issue, the task is harder than just optimising for views and clicks though (not just technically, also to align to the company's bottom line). However, I do think little steps towards better specification can help, and I'd love to read future user research on it at Netflix.

I'm not sure about users definitely preferring the existing recommendations to random ones - I actually have been trying to turn off YouTube recommendations because they make me spend more time on YouTube than I want. Meanwhile other recommendation systems send me news that is worse on average than the rest of the news I consume (from different channels). So in some cases at least, we could use a very minimal standard of: a system is aligned if the user better off because the recommendation system exists at all.

This is a pretty blunt metric, and proba... (read more)

7
Max_Daniel
4y
Thanks for sharing your perspective. I find it really helpful to hear reactions from practitioners.