#177 – Recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps (Nathan Labenz on the 80,000 Hours Podcast)

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·3d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·1w ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·5d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·5d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·4d ago·1m read

Episode summary

There’s really no risk of a self-driving car taking over the world or doing anything… It’s not going to get totally out of our control. It can only do one thing. It’s an engineered system with a very specific purpose, right? It’s not going to start doing science one day by surprise. So I think that’s all very good. We should embrace that type of technology.

And I try to be an example of holding that belief and championing that at the same time as saying, hey, something that can do science and pursue long-range goals of arbitrary specification, that is like a whole different kind of animal.

- Nathan Labenz

Back in December, we released an episode where Rob Wiblin interviewed Nathan Labenz — AI entrepreneur and host of The Cognitive Revolution podcast — on his takes on the pace of development of AGI and the OpenAI leadership drama, based on his experience red teaming an early version of GPT-4 and the conversations with OpenAI staff and board members that followed.

In today’s episode, their conversation continues, with Nathan diving deeper into:

What AI now actually can and can’t do — across language and visual models, medicine, scientific research, self-driving cars, robotics, weapons — and what the next big breakthrough might be.

Why most people, including most listeners, probably don’t know and can’t keep up with the new capabilities and wild results coming out across so many AI applications — and what we should do about that.

How we need to learn to talk about AI more productively — particularly addressing the growing chasm between those concerned about AI risks and those who want to see progress accelerate, which may be counterproductive for everyone.

Where Nathan agrees with and departs from the views of ‘AI scaling accelerationists.’

The chances that anti-regulation rhetoric from some AI entrepreneurs backfires.

How governments could (and already do) abuse AI tools like facial recognition, and how militarisation of AI is progressing.

Preparing for coming societal impacts and potential disruption from AI.

Practical ways that curious listeners can try to stay abreast of everything that’s going on.

And plenty more.

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Transcriptions: Katy Moore

Highlights

AI discourse

Rob Wiblin: It seems to me, and I think to quite a lot of people, that the online conversation about AI, and AI safety, and pausing AI versus not, has gotten a bit worse over the last couple of months: the conversation has gotten more aggressive, people who I think know less have become more vocal, people have been pushed a bit more into ideological corners. It’s kind of now you know what everyone is going to say, maybe before they’ve had much to say about it yet. Whereas a year ago, even six months ago, it felt a lot more open: people were toying with ideas a lot more, it was less aggressive, people were more open-minded.

Nathan Labenz: That is my perception, unfortunately. And I guess my simple explanation for it would be that it’s starting to get real, and there’s starting to be actual government interest. And when you start to see these congressional hearings, and then you start to see voluntary White House commitments, and then you see an executive order — which is largely just a few reporting requirements for the most part, but still, is kind of the beginning — then anything around politics and government is generally so polarised and ideological that maybe people are starting to just fall back into those frames. That’s my theory. I don’t have a great theory, or I’m not super confident in that theory.

There are definitely some thought leaders that are particularly aggressive in terms of pushing an agenda right now. I mean, I’m not breaking any news to say Marc Andreessen has put out some pretty aggressive rhetoric just within the last month or two. The Techno-Optimist Manifesto, where I’m like, I agree with you on like 80%, maybe even 90% of this. We’ve covered the self-driving cars, and there’s plenty of other things where I think, man, it’s a real bummer that we don’t have more nuclear power. And I’m very inclined to agree on most things.

Rob Wiblin: Shame we can’t build apartments.

Nathan Labenz: Yeah, for god’s sake. But I don’t think he’s done the discourse any favours by framing the debate in terms of, like, he used the term “the enemy” and he just listed out a bunch of people that he perceives to be the enemy. And that really sucks.

The kind of classic thought experiment here is like, if aliens came to Earth, we would hopefully all by default think that we were in it together, and we would want to understand them first and what their intentions are, and whether they would be friendly to us or hostile to us or whatever — and really need to understand that before deciding what to do. Unfortunately, it feels like that’s kind of the situation that we’re in. The aliens are of our own creation, but they are these sort of strange things that are not very well understood yet. We don’t really know why they do what they do, although we are making a lot of progress on that.

I don’t think it’s helping anybody for technology leaders to be giving out their lists of enemies. I don’t really think anybody needs to be giving out our lists of enemies. It would be so tragicomic if you imagine actual aliens showing up, then imagine the people calling each other names and deciding who’s enemies of whom before we’ve even figured out what the aliens are here for.

And so I feel like we’re kind of behaving really badly, honestly, to be dividing into camps before we’ve even got a clear picture of what we’re dealing with. That’s just crazy to me as to exactly why it’s happening. I think there have been a few quite negative contributions, but it also does just seem to be where society is at right now. You know, we saw the same thing with vaccines, right? I mean, I’m not like a super vaccine expert, but safe to say that that discourse was also unhealthy, right?

Rob Wiblin: I could find certain areas for improvement.

Nathan Labenz: Yeah. Here we had a deadly disease and then we had life-saving medicine. And I think it’s totally appropriate to ask some questions about that life-saving medicine, and its safety and possible side effects — the “just asking questions” defence I’m actually kind of sympathetic to. But the discourse, safe to say it was pretty deranged.

And here we are again, where it seems like there’s really no obvious reason for people to be so polarised about this, but it is happening and I don’t know that there’s all that much that can be done about it. I think my best hope for the moment is just that the extreme techno-optimist, techno-libertarian, don’t-tread-on-me, right-to-bear-AI faction is potentially just self-discrediting. I really don’t think that’s the right way forward, and if anything, I think they may end up being harmful to their own goals, just like the OpenAI board was perhaps harmful to its own goals.

When you have a leading billionaire chief of major VC funds saying such extreme things, it really does invite the government to come back and be like, “Oh, really? That’s what you think? That’s what you’re going to do if we don’t put any controls on you? Well, then guess what? You’re getting them.” It doesn’t seem like good strategy. It may be a good strategy for deal flow, if your goal is to attract other uber-ambitious founder types — if you just want, like, Travis Kalanick to choose your firm in his next venture, and you want that type of person to take your money, then maybe it’s good for that. But if you actually are trying to convince the policymakers that regulation is not needed, then I don’t think you’re on the path to being effective there. So it’s very strange. It’s very hard to figure out.

Self-driving cars

Nathan Labenz: I think I have a somewhat contrarian take on this, because it does still seem like the predominant view is that it’s going to be a while still and obviously Cruise has recently had a lot of problems due to one incident plus perhaps maybe a cover up of that incident. It’s not entirely clear exactly what happened there.

But I’m a little confused by this, because yes, the leading makers — and that would be Tesla, Waymo, and Cruise — have put out numbers that say pretty clearly that they are safer than human drivers. And they can measure this in a bunch of different ways; it can be kind of complicated, exactly what do you compare to and under what conditions. The AI doesn’t have to drive in extreme conditions, so it can just turn off.

And this is why I think probably China will beat us in the self-driving car race, if not the AI race overall, is because I think they’ll go around and just change the environment, right? And say, “If we have trees blocking stop signs, or we have stop signs that are ambiguous, or we have whatever these sort of environmental problems, then we should fix them; we should clean up the environment so it works well.” And we just have seemingly no will here, certainly in the United States, to do that sort of thing.

So I’m bummed by that. And I really try to carry that flag proudly too, because, you know, so many people — and this is a problem in society at large; it’s not just an AI problem — but people get invested in terms of their identity on different sides of issues, and everybody seems to polarise and go to their coalition on questions which aren’t obviously related. So I try to emphasise the places where I think just sane first-principles thinking kind of breaks those norms. And one I think is self-driving cars: really good, I would love to see those accelerated, I would love to have one.

It would be more useful to me if Tesla actually made it more autonomous. Probably the biggest reason I haven’t bought one is that it still really requires you to pay close attention. And I’m a competent driver, but we have a couple members of our family who are not great drivers, and this would be a real benefit to their safety. But one of the problems is it requires you to monitor it so closely, and if you lapse or don’t monitor it in just the way that you want, it gives you a strike, and after a few strikes, they just kick you off the self-driving program.

So unfortunately, I think the drivers that would actually be most benefited from this would probably end up getting kicked out of the program, and then it would have been pointless to have bought one in the first place. So I would endorse giving more autonomy to the car, and I think that would make people in my personal family safer. But we’re just not there.

And I hold that belief at the same time as all these kind of more cautious beliefs that I have around super general systems. And the reasons for that are I think pretty obvious, really, but for some reason don’t seem to carry the day. The main one is that driving cars is already very dangerous. A lot of people die from it, and it’s already very random and it’s not fair. It’s already not just. So if you could make it less dangerous, make it more safe overall, even if there continues to be some unfairness and some injustice and some literal harms to people, that seems to be good.

And there’s really no risk of a self-driving car taking over the world or doing anything… It’s not going to get totally out of our control. It can only do one thing. It’s an engineered system with a very specific purpose, right? It’s not going to start doing science one day by surprise. So I think that’s all very good. We should embrace that type of technology. And I try to be an example of holding that belief and championing that at the same time as saying, hey, something that can do science and pursue long-range goals of arbitrary specification, that is like a whole different kind of animal.

Robotics

Nathan Labenz: One very particular thing I wanted to shout out too, because this is one of the few examples where GPT-4 has genuinely outperformed human experts, is from a paper called “Eureka” — I think a very appropriate title — from Jim Fan’s group at NVIDIA. What they did is used GPT-4 to write the reward models, which are then used to train a robotic hand. So one of the tasks that they were able to get a robotic hand to do, is twirl a pencil in the hand. This is something that I’m not very good at doing, but it’s this sort of thing, wobbling it around the fingers.

What’s hard about this is multiple things, of course, but one thing that’s particularly hard if you’re going to try to use reinforcement learning to teach a robot to do this, is you have to have a reward function that tells the system how well it’s doing. So these systems learn by just kind of fumbling around, and then getting a reward, and then updating so as to do more of the things that get the high reward and less of the things that get the low reward. But in the initial fumbling around, it’s kind of hard to tell, Was that good? Was that bad? You’re nowhere close.

They call this the “sparse reward problem,” or at least that’s one way that it’s talked about: if you are so far from doing anything good that you can’t get any meaningful reward, then you get no signal, then you have nothing to learn from. So how do you get over that initial hump? Well, humans write custom reward functions for particular tasks. We know, we think we know, we have a sense of what good looks like. So if we can write a reward function to observe what you do and tell you how good it is, then our knowledge encoded through that reward function can be used as the basis for hopefully getting you going in the early going.

It turns out that GPT-4 is significantly better than humans at writing these reward functions for these various robot hand tasks, including twirling the pencil — significantly so, according to that paper. And this is striking to me, because when you think about writing reward functions, that’s by definition expert, right? There’s not like any amateur reward function writers out there. This is the kind of thing that the average person doesn’t even know what it is, can’t do it at all, is just totally going to give you a blank stare even at the whole subject. So you’re into expert territory from the beginning.

And to have GPT-4 exceed what the human experts can do just suggests that… It’s very rare. I have not seen many of these, but this is one where I would say, there is GPT-4 doing something that, would you say that’s beyond its training data? Probably. Somewhat at least. Would you say it is an insight?

Rob Wiblin: Seems insight-adjacent.

Nathan Labenz: Yeah, I would say so. I mean, it’s not obviously not an insight. So I had used this term of eureka moments, and I had said for the longest time, no eureka moments. I’m now having to say precious few eureka moments, because I at least feel like I have one example, and notably the paper is called “Eureka.” So that’s definitely one to check out if you want to see what I would consider one of the frontier examples of GPT-4 outperforming human experts.

Medicine

Nathan Labenz: Again, this is just exploding. It has not been long since Med-PaLM 2 was announced from Google, and this was a multimodal model that is able to take in not just text, but also images, also genetic data, histology images — different kinds of images like x-rays, but also tissue slides — and answer questions using all these inputs. And to basically do it at roughly human level: on eight out of nine dimensions on which it was evaluated, it was preferred by human doctors to human doctors. Mostly the difference there was pretty narrow, so it would be also pretty fair to say it was like a tie across the board if you wanted to just round it. But in actual blow-by-blow on the nine dimensions, it did win eight out of nine of the dimensions. So that’s medical-question answering with multimodal inputs — that’s a pretty big deal.

Rob Wiblin: Isn’t this just going to be an insanely useful product? Imagine how much all doctors earn across the world, answering people’s questions, looking at samples of things, getting test results, answering people’s questions. You can automate that, it sounds like. Maybe I’m missing that there’s going to be all kinds of legal issues and application issues, but it’s just incredible.

Nathan Labenz: Yeah. I think one likely scenario, which might be as good as we could hope for there, would be that human doctors prescribe: that that would be kind of the fallback position of, yeah, get all your questions answered, but when it comes to actual treatment, then a human is going to have to review and sign off on it. That could make sense. Not even sure that necessarily is the best, but there’s certainly a defence of it.

So that’s Med-PaLM 2. That has not been released. It is, according to Google, in kind of early testing with trusted partners — which I assume means health systems or whatever. People used to say, “Why doesn’t Google buy a hospital system?” At this point, they really might ought to, because just implementing this holistically through an entire… There’s obviously a lot of layers in a hospital system. That could make a tonne of sense.

And GPT-4 also, especially with Vision now, is there too. It hasn’t been out for very long, but there was a paper announced in just the last couple of weeks where there’s a couple of notable details here too. They basically say, we evaluated GPT-4V (V for Vision) — on challenging medical image cases across 69 clinicopathological conferences — so wide range of different things — and it outperformed human respondents overall and across difficulty levels, skin tones, and all different image types except radiology, where it matched humans. So again, just extreme breadth is one of the huge strengths of these systems.

And that skin tones thing really jumped out at me, because that has been one of the big questions and challenges around these sorts of things. Like maybe it’s doing OK on these benchmarks, maybe it’s doing OK on these cherry-picked examples, but there’s a lot of diversity in the world. What about people who look different? What about people who are different in any number of ways? We’re starting to see those thresholds crossed as well. So yeah, the AI doctor is not far off, it seems.

Then there’s also, in terms of biomedicine, AlphaFold and the more recent expansion to AlphaFold is also just incredibly game changing. There are now drugs in development that were kind of identified through AlphaFold.

Kids and artificial friends

Nathan Labenz: I’ve done one episode only so far with the CEO of Replika, the virtual friend company, and I came out of that with very mixed feelings. On the one hand, she started that company before language models, and she served a population — and continues to, I think, largely serve a population — that has real challenges, right? Many of them anyway. Such that people are forming very real attachment to things that are very simplistic.

And I kind of took away from that, man, people have real holes in their hearts. If something that is as simple as Replika 2022 can be something that you love, then you are kind of starved for real connection. And that was kind of sad. But I also felt like the world is rough for sure for a lot of people, and if this is helpful to these people, then more power to them. But then the flip side of that is it’s now getting really good. So it’s no longer just something that’s just good enough to soothe people who are suffering in some way, but is probably getting to the point where it’s going to be good enough to begin to really compete with normal relationships for otherwise normal people. And that too, could be really weird.

For parents, I would say ChatGPT is great, and I do love how ChatGPT, even just in the name, always kind of presents in this robotic way and doesn’t try to be your friend. It will be polite to you, but it doesn’t want to hang out with you.

Rob Wiblin: “Hey, Rob. How are you? How was your day?”

Nathan Labenz: It’s not bidding for your attention, right? It’s just there to help and try to be helpful and that’s that. But the Replika will send you notifications: “Hey, it’s been a while. Let’s chat.” And as those continue to get better, I would definitely say to parents, get your kids ChatGPT, but watch out for virtual friends. Because I think they now definitely can be engrossing enough that… You know, maybe I’ll end up looking back on this and being like, “I was old fashioned at the time,” but virtual friends are I think something to be developed with extreme care. And if you’re just a profit-maximising app that’s just trying to drive your engagement numbers — just like early social media, right? — you’re going to end up in a pretty unhealthy place, from the user standpoint.

I think social media has come a long way, and to Facebook or Meta’s credit, they’ve done a lot of things to study wellbeing, and they specifically don’t give angry reactions weight in the feed. And that was a principled decision that apparently went all the way up to Zuckerberg: “Look, we do get more engagement from things that are getting angry reactions.” And he was like, “No, we’re not weighting. We don’t want more anger. Angry reactions we will not reward with more engagement.” OK, boom: that’s policy. But they’ve still got a lot to sort out.

And in the virtual friend category, I just imagine that taking quite a while to get to a place where a virtual friend from a VC app that’s pressured to grow is also going to find its way toward being a form factor that would actually be healthy for your kids. So I would hold off on that if I were a parent — and I could exercise that much control over my kids, which I know is not always a given.

Nowcasting vs forecasting

Rob Wiblin: Yeah, it’s an interesting question: Is it more worth forecasting where things will be in the future versus is it more valuable to spend an extra hour understanding where we stand right now?

On the forecasting the future side, one mistake that I perceive some people as making is just looking at what’s possible now and saying, “I’m not really that worried about the things that GPT-4 can do. It seems like at best it’s capable of misdemeanours, or it’s capable of speeding up some bad things that would happen anyway. So, not much to see here. I’m not going to stress about this whole AI thing.” That seems like a big mistake to me, inasmuch as the person’s not looking at all of the trajectory of where we might be in a couple of years’ time. You know, it’s worth paying attention to the present, but also worth projecting forward where we might be in future.

On the other hand, the future is where we will live. But sadly, predicting how it is is challenging. So if you try to ask, “What will language models be capable of in 2027?” you’re kind of guessing. We all have to guess. So informed speculation.

Whereas if you focus on what they’re capable of doing now, you can at least get a very concrete answer to that. So if the suggestions that you’re making or the opinions that you have are inconsistent with what is already the case, with examples that you could just find if you went looking for them, then you could potentially very quickly fix mistakes that you’re making in a way that someone merely speculating about how things might be in the future is not going to correct your views.

And I guess especially just given how many new capabilities are coming online all the time, how many new applications people are developing and how much space there is to explore, what capabilities these enormous very general models already have that we haven’t even noticed, there’s clearly just a lot of juice that one can get out of that. If someone’s saying, “I’m not worried because I don’t think these models will be capable of independently pursuing tasks,” and then you can show them an example of a model at least beginning to independently pursue tasks, even if in a somewhat clumsy way, then that might be enough to get them to rethink the opinion that they have.

Nathan Labenz: Yeah, one quick comment on just predicting the future: I’m all for that kind of work as well, and I do find a lot of it pretty compelling. So I don’t mean to suggest that my focus on the present is at the exclusion or in conflict with understanding the future. If anything, hopefully better understanding of the present informs our understanding of the future.

And one thing that you said really is my biggest motivation, which is just that I think in some sense, the future is now — in that people have such a lack of understanding of what currently exists that what they think is the future is actually here — and so if we could close the gap in understanding, so that people did have a genuinely accurate understanding of what is happening now, I think they would have a healthier respect and even a little fear of what the future might hold. So it’s kind of like I think the present is compelling enough to get people’s attention that you should project into the future, especially if you’re a decision maker in this space. But if you’re just trying to get people to kind of wake up and pay attention, then I think the present is enough.