Great post (especially for a first one, kudos)!
One recent piece of evidence that updated me further towards "many smart people are quite confused about the problem and in particular anthropomorphize current AI systems a lot" was Lex Friedman's conversation with Eliezer (e.g. at 1:02:36):
(I spent a couple of hours thinking about and discussing with a friend and want to try and write some tentative conclusions down.)
I think the mere fact that Meta was able to combine strategic thinking and dialog to let an AI achieve its goals in a collaborative and competitive environment with humans should give us some pause. On the technical level, I think the biggest contribution are two things: having the RL-trained strategy model that explicitly models the other human agents and finds the balance between optimal play and “human play”; and establishing an interface between that module and the language model via encoding the intent.
I think the Cicero algorithm is bad news for AI safety. It demonstrates yet another worrying capability of AI, and Meta's open publishing norm is speeding up this type of work and making it harder to control.
Of course, the usual rebuttal to concerns about this sort of paper is still valid: This is just a game, which makes two important things easier:
One of the most interesting questions is where things will go from here. I think adding deception to the model should allow it to play even better and I’d give it 50% that somebody will make that work within the next year. Beyond that, now that the scaffolding is established, I expect superhuman level agents to be developed in the next year as well. (I’m not quite sure how “superhuman” you can get in Diplomacy – maybe the current model is already there – but maybe the agent could win any tournament it enters.)
Beyond that, I’m curious when and where we will see some of the techniques established in the paper in real-world applications. I think the bottleneck here will be finding scenarios that can be modeled with RL and benefit from talking with humans. This seems like a difficult spot to identify. When talking with humans is required, this usually indicates a messy environment where actions and goals aren’t clearly defined. A positive example I can think of is a language coach. The goal could be to optimize test scores, the action would be picking from a set of exercises. This alone could already work well, but if you add in human psychology and the fact that e.g. motivation can be a key-driver in learning, then dialog becomes important as well.
Similarly to this: On the application for the Technical Staff role you write that there will be live coding in TF or PyTorch. Does this part of the application apply to all candidates? I'd imagine that there's lots of software and research engineering work that doesn't require (deep) familiarity with an ML framework.
Thank you for the post. I really enjoyed the writing and I think it's a valuable perspective that could easily get lost on the forum!
I totally understand this motivation and I'm currently doing the same.
I'm a little worried that it's hard to do this with integrity though. Maybe if you are careful with what you say (e.g. "Cheapest way to save a life" rather than "Most effective way to do good") you can get away without lying, but if you really believe the arguments in the talk it still starts to feel like dangerous territory to me.
Thanks for putting this together! I sympathize a lot with the difficulty of getting people's time/attention in the workplace and the desire to make material crisp and relevant. I'm always worried, though, that a little disclaimer of "this isn't the whole story" won't be enough to prevent people from assuming they've understood EA and then potentially going around and spreading false ideas. (Canonical reference is probably The Fidelity Model Of Spreading Ideas.)
One idea that has come up (I think at the LinkedIn group and in some presentation that we've done at Google) is to instead do some branding in the direction of "Effective Giving" explicitly. This already narrows down the scope and somewhat protects the EA "brand".
Thanks so much for this series! We're thinking about formalizing the EA at Google efforts further in 2022 and this is a really helpful resource!
Current general strategy: I'm hoping that I can identify opportunities that are missed by "bigger funders" (either due to differing values, funding sizes or other concerns). This year I think I found two opportunities where donations are plausibly better than giving to EA Funds. I probably spent around 20 hours on this decision and am not fully comfortable with it. Most of the time was spent in "mere discussions" rather than me sitting down and trying to compute cost-effectiveness in a spreadsheet. I'm unsure about investing to give, but I am investing a lot of my savings and may well give these away later anyway.
That link is broken for me.
I just finished reading the book. Thank you, Magnus, for putting this together! I thought I'd share my quick take on it here:
This book seems very important. I endorsed a suffering-focused view before, but Magnus does a great job of collecting many relevant facts and arguments. The book is exceptionally well researched and Magnus tries hard to anticipate counter-arguments and takes them seriously. The book is also well structured and easy to follow despite being very dense.
I found the first 3 chapters a little weak/long. I think this is primarily because:
I then got stuck for a while on some of his graphic descriptions of extreme suffering in chapter 4 (which are tough, but important, I think). From chapter 5 onwards the book really picked up IMO. In fact, chapter 5 itself ("A Moral Realist Case for Minimizing Extreme Suffering") might be the most important one of the first section.
For me personally the book has increased my conviction to make the reduction of suffering a foremost priority and given me some new ways to think about how we should try and accomplish this -- let's see how I manage to turn that into action.