Topic Contributions


On Deference and Yudkowsky's AI Risk Estimates

The above seems voluminous and I believe this is the written output with the goal of defending a person.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me. Your point?

Yeah, no, it's the exact opposite.

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said. (I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)

Karnofsky started off disagreeing that there is any problem at all in 2007 when he was introduced to MIRI via EA, and merely thought there were some interesting points. Interesting, but certainly not worth sending any money to MIRI or looking for better alternative ways to invest in AI safety. These ideas kept developing, and Karnofsky kept having to engage, steadily moving from 'there is no problem' to intermediate points like 'but we can make tool AIs and not agent AIs' (a period in his evolution I remember well because I wrote criticisms of it), which he eventually abandons. You forgot to screenshot the part where Karnofsky writes that he assumed 'the experts' had lots of great arguments against AI risk and the Yudkowsky paradigm and that was why they just bother talking about it, and then moved to SF and discovered 'oh no', that not only did those not exist, the experts hadn't even begun to think about it. Karnofsky also agrees with many of the points I make about Bostrom's book & intellectual pedigree ("When I'd skimmed Superintelligence (prior to its release), I'd felt that its message was very similar to - though more clearly and carefully stated than - the arguments MIRI had been making without much success." just below where you cut off). And so here we are today, where Karnofsky has not just overseen donations of millions of dollars to MIRI and AI safety NGOs or the recruitment of MIRI staffers like ex-MIRI CEO Muehlhauser, but it remains a major area for OpenPhil (and philanthropies imitating it like FTX). It all leads back to Eliezer. As Karnofsky concludes:

One of the biggest changes is the one discussed above, regarding potential risks from advanced AI. I went from seeing this as a strange obsession of the community to a case of genuine early insight and impact. I felt the community had identified a potentially enormously important cause and played a major role in this cause's coming to be taken more seriously. This development became - in my view - a genuine and major candidate for a "hit", and an example of an idea initially seeming "wacky" and later coming to seem prescient.

Of course, it is far from a settled case: many questions remain about whether this cause is indeed important and whether today's preparations will look worthwhile in retrospect. But my estimate of the cause's likely importance - and, I believe, conventional wisdom among AI researchers in academia and industry - has changed noticeably.

That is, Karnofsky explicitly attributes the widespread changes I am describing to the causal impact of the AI risk community around MIRI & Yudkowsky. He doesn't say it happened regardless or despite them, or that it was already fairly common and unoriginal, or that it was reinvented elsewhere, or that Yudkowsky delayed it on net.

I'm really sure even a median thought leader would have better convinced the person written this.

Hard to be convincing when you don't exist.

On Deference and Yudkowsky's AI Risk Estimates

Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):

  • calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored; he's said (consistently since at least SL4 that I've observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I think this is true: they do look extremely dangerous by default, and we still do not have adequate solutions to problems like "how do we talk about human values in a way which doesn't hardwire them dangerously into a reward function which can't be changed?" This is something actively researched now in RL & AI safety, and which continues to lack any solution you could call even 'decent'. (If you have ever been surprised by any result from causal influence diagrams, then you have inadvertently demonstrated the value of this.) More broadly, we still do not have any good proof or approach that we can feasibly engineer any of that with prosaic alignment approaches, which tend towards the 'patch bugs as you find them' or 'make systems so complex you can't immediately think of how they fail' approach to security that we already knew back then was a miserable failure. Eliezer hasn't been shown to be wrong here.

  • I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views. (Take a look at his OB posts on AI the past few years. Hanson is not exactly running victory laps, either on DL, foom, or ems. It would be too harsh to compare him to Gary Marcus... but I've seen at least one person do so anyway.) I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that extremely simple generic architectures written down in a few dozen lines of code, with large capability differences between very similar lines of code, solving many problems in many fields and subsuming entire subfields as simply another minor variant, with large generalizing models (as opposed to the very strong small-models-unique-to-each-individual-problem-solved-case-by-case-by-subject-experts which Hanson & Drexler strongly advocated and which was the ML mainstream at the time) powered by OOMs more compute, steadily increasing in agency, is a short description of Yudkowsky's views on what the runup will look like and how DL now works.

  • "his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field."

    Yet, the number who take it seriously since Eliezer started advocating it in the 1990s is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added). This is missing the forest for a few trees; if you are going to argue that a bit of regression to the mean in extreme beliefs should be taken as some evidence against Eliezer, then you must also count the initial extremity of the beliefs leading to these NGOs doing AI safety & people at them doing AI safety at all as much evidence for Eliezer.† (What a perverse instance of Simpson's paradox.)

    There's also the caveat mentioned there that the reduction may simply be because they have moved up other scenarios like the part 2 scenario where it's not a singleton hard takeoff but a multipolar scenario (a distinction of great comfort, I'm sure), which is a scenario which over the past few years is certainly looking more probable due to how DL scaling and arms races work. (In particular, we've seen some fast followups - because the algorithms are so simple that once you hear the idea described at all, you know most of it.) I didn't take the survey & don't work at the listed NGOs, but I would point out that if I had gone pro sometime in the past decade & taken it, under your interpretation of this statistic, you would conclude "Gwern now thinks Eliezer was wrong". Something to think about, especially if you want to consider observations like "this statistic claims most people are moving away from Eliezer's views, even though when I look at discussions of scaling, research trends, and what startups/NGOs are being founded, it sure looks like the opposite..."

* Flare has been, like Roko's Basilisk, one of those things where the afterlife of it has been vastly greater than the thing itself ever was, and where it gets employed in mutually contradictory ways by critics

† I find it difficult to convey what incredibly hot garbage AI researcher opinions in the '90s were about these topics. And I don't mean the casual projections that AGI would take until 2500 AD or whatever, I mean basics like the orthogonality thesis and instrumental drives. Like 'transhumanism', these are terms used in inverse proportion to how much people need them. Even on SL4, which was the fringiest of the fringe in AI alarmism, you had plenty of people reading and saying, "no, there's no problem here at all, any AI will just automatically be friendly and safe, human moral values aren't fragile or need to be learned, they're just, like, a law of physics and any evolving system will embody our values". If you ever wonder how old people in AI like Kurzweil or Schmidhuber can be so gungho about the prospect of AGI happening and replacing (ie. killing) humanity and why they have zero interest in AI safety/alignment, it's because they think that this is a good thing and our mind-children will just automatically be like us but better and this is evolution. ("Say, doth the dull soil / Quarrel with the proud forests it hath fed, / And feedeth still, more comely than itself?"...) If your response to reading this is, "gwern, do you have a cite for all of that? because no real person could possibly believe such a both deeply naive and also colossally evil strawman", well, perhaps that will convey some sense of the intellectual distance traveled.

DeepMind’s generalist AI, Gato: A non-technical explainer

this would lead to catastrophic forgetting

It's unclear that this is true: "Effect of scale on catastrophic forgetting in neural networks". (The response on Twitter from catastrophic forgetting researchers to the news that their field might be a fake field of research, as easily solved by scale as, say, text style transfer, and that continual learning may just be another blessing of scale, was along the lines of "but using large models is cheating!" That is the sort of response which makes me more, not less, confident in a new research direction. New AI forecasting drinking game: whenever a noted researcher dismisses the prospect of scaling creating AGI as "boring", drop your Metaculus forecast by 1 week.)

When you want the agent to learn a new task, I believe you have to retrain the whole thing from scratch on all tasks, which could be quite expensive.

No, you can finetune the model as-is. You can also stave off catastrophic forgetting by simply mixing in the old data. After all, it's an off-policy approach using logged/offline data, so you can have as much of the old data available as you want - hard drive space is cheap.

It seems the 'generalist agent' is not better than the specialized agents in terms of performance, generally.

An "aside from that Ms Lincoln, how was the play" sort of observation. GPT-1 was SOTA using zero-shot at pretty much nothing, and GPT-2 often wasn't better than specialized approaches either. The question is not whether the current, exact, small incarnation is SOTA at everything and is an all-singing-all-dancing silver bullet which will bring about the Singularity tomorrow and if it doesn't, we should go all "Gato: A Disappointing Paper" and kick it to the curb. The question is whether it scales and has easily-overcome problems. That's the beauty of scaling laws, they drag us out of the myopic muck of "yeah but it doesn't set SOTA on everything right this second, so I can't be bothered to care or have an opinion" in giving us lines on charts to extrapolate out to the (perhaps not very distant at all) future where they will become SOTA and enjoy broad transfer and sample-efficient learning and all that jazz, just as their unimodal forebears did.

So I can see an argument here that this points towards a future that is more like comprehensive AI services rather than a future where research is focused on building monolithic "AGIs"

I think this is strong evidence for monolithic AGIs, that at such a small scale, the problems of transfer and the past failures at multi-task learning have already largely vanished and we are already debating whether the glass is half-empty while it looks like it has good scaling using a simple super-general and efficiently-implementable Decision Transformer-esque architecture. I mean, do you think Adept is looking at Gato and going "oh no, our plans to train very large Transformers on every kind of software interaction in the world to create single general agents which can learn useful tasks almost instantly, for all niches, including the vast majority which would never be worth handcrafting specialized agents for - they're doomed, Gato proves it. Look, this tiny model a hundredth the magnitude of what we intend to use, trained on thousands of time less and less diverse data, it is so puny that it trains perfectly stably but is not better than the specialized agents and has ambiguous transfer! What a devastating blow! Guess we'll return all that VC money, this is an obvious dead end." That seems... unlikely.

DeepMind’s generalist AI, Gato: A non-technical explainer

There are limits, however: scaling alone would not allow Gato to exceed expert performance on diverse tasks, since it is trained to imitate the experts rather than to explore new behaviors and perform in novel ways.

Imitation can exceed experts or demonstrations: note that Gato reaches >=100%† expert performance on something like a third of tasks (Figure 5), and does look like it exceeds the 2 robot experts in Figure 10 & some in Figure 17. This is a common mistake about imitation learning and prompt engineering or Decision Transformer/Trajectory Transformer specifically.

An imitation-learning agent can surpass experts in a number of ways: first, experts (especially humans) may simply have 'trembling hands' and make errors occasionally at random; a trained agent which has mastered their policy can simply execute that policy perfectly, never having a brain fart; second, demonstrations can come from experts with different strengths and weaknesses, like a player which is good at the opening but fails in the endgame and vice versa, and by 'stitching together' experts, an agent can have the best of both worlds - why imitate the low-reward behaviors when you observe better high reward ones? Likewise for episodes: keep the good, throw out the bad, distill for a superior product. Self-distillation and self-ensembling are also relevant to note.

More broadly, if we aren't super-picky about it being exactly Gato*, a Decision Transformer is a generative model of the environment, and so can be used straightforwardly for exploration or planning, exploiting the knowledge from all observed states & rewards, even demonstrations from randomized agents, to obtain better results up to the limit of its model of the environment (eg a chess-playing agent can plan for arbitrarily long to improve its next move, but if it hasn't yet observed a castling or promotion, there's going to be limits to how high its Elo strength can go). And it can then retrain on the planning, like MuZero, or self-distillation in DRL and for GPT-3.

More specifically, a Decision Transformer is used with a prompt: just as you can get better or worse code completions out of GPT-3 by prompting it with "an expert wrote this thoroughly-tested and documented code:" or "A amteur wrote sum codez and its liek this ok", or just as you can prompt a CLIP or DALL-E model with "trending on artstation | ultra high-res | most beautiful image possible", to make it try to extrapolate in its latent space to images never in the training dataset, you can 'just ask it for performance' by prompting it with a high 'reward' to sample its estimate of the most optimal trajectory, or even ask it to get 'more than' X reward. It will generalize over the states and observed rewards and implicitly infer pessimal or optimal performance as best as it can, and the smarter (bigger) it is, the better it will do this. Obvious implications for transfer or finetuning as the model gets bigger and can bring to bear more powerful priors and abilities like meta-learning (which we don't see here because Gato is so small and they don't test it in ways which would expose such capabilities in dramatic ways but we know from larger models how surprising they can be and how they can perform in novel ways...).

DL scaling sure is interesting.

* I am not quite sure if Gato is a DT or not, because if I understood the description, they explicitly train only on expert actions with observation context - but usually you'd train a causal Transformer packed so it also predicts all of the tokens of state/action/state/action.../state in the context window, the prefixes 1:n, because this is a huge performance win, and this is common enough that it usually isn't mentioned, so even if they don't explicitly say so, I think it'd wind up being a DT anyway. Unless they didn't include the reward at all? (Rereading, I notice they filter the expert data to the highest-reward %. This is something that ought to be necessary only if the model is either very undersized so it's too stupid to learn both good & bad behavior, or if it is not conditioning on the reward so you need to force it to implicitly condition on 'an expert wrote this', as it were, by deleting all the bad demonstrations.) Which would be a waste, but also easily changed for future agents.

† Regrettably, not broken out as a table or specific numbers provided anywhere so I'm not sure how much was >100%.

Effectiveness is a Conjunction of Multipliers

Well, you know what the stereotype is about women in Silicon Valley high tech companies & their sock needs... (Incidentally, when I wrote a sock-themed essay, which was really not about socks, I was surprised how many strong opinions on sock brands people had, and how expensive socks could be.)

If you don't like the example 'buy socks', perhaps one can replace it with real-world examples like spending all one's free time knitting sweaters for penguins. (With the rise of Ravelry and other things, knitting is more popular than it has been in a long time.)

What's the best machine learning newsletter? How do you keep up to date?

It’s hard to imagine a newsletter that could have picked out that paper at the time as among the most important of the hundreds included. For comparison, I think probably that at the time, there was much more hype and discussion of Hinton and students’ capsule nets (also had a NIPS 2017 paper).

People at the time thought it was a big deal: Even the ones who were not saying it would be "radically new" or "spicy" or "this is going to be a big deal" or a "paradigm shift" were still at least asking if it might be (out of all the hundreds of things they could have been asking about but weren't).

Incidentally, I don't know if I count, but "Attention Is All You Need" was in my June 2017 newsletter & end-of-year best-of list (and capsule nets were not - I didn't like them, and still don't). So, I don't find it hard to imagine a newsletter doing it because I did it myself.

What's the best machine learning newsletter? How do you keep up to date?
Answer by gwernMar 26, 202216

I subscribe to Import AI, Rohin Shah's Alignment newsletter (mostly via the LW/AF), ChinAI (weekly), Ruder's NLP (probably dead), Creative AI (annual), State of AI (annual), Larks (annual), miscellaneous blogs & subreddits (/r/machinelearning/, /r/mlscaling, /r/reinforcementlearning, /r/thisisthewayitwillbe/, being the main ones), and the 2 AKs on Twitter (Arxiv ~daily). If you need even more ML than that, well, you'd better set up an Arxiv RSS feed and drink from the firehose.

Simple comparison polling to create utility functions

I dunno if it's that hard. Comparisons are an old and very well-developed area of statistics, if only for use in tournaments, and you can find a ton of papers and code for pairwise comparisons. I have some & a R utility in a similar spirit on my Resorter page. Compared (ahem) to many problems, it's pretty easy to get started with some Elo or Bradley-Terry-esque system and then work on nailing down your ordinal rankings into more cardinal stuff. This is something where the hard part is the UX/UI and tailoring to use-cases, and too much attention to the statistics may be wankery.

Get In The Van

I humbly request a photo of Buck et al in a van with the caption "Get in loser we're going saving".

EA Forum feature suggestion thread

Absolutely. But you know you are relying on obscurity and relatively modest cost there, and you keep that in mind when you comment. Which is fine. Whereas if you thought that it was secure and breaking it came at a high cost (though it was in fact ~5 seconds of effort away), you might make comments you would not otherwise. Which is less fine.

Load More