Also see paper and results compilation video!

Today, we published "Open-Ended Learning Leads to Generally Capable Agents," a preprint detailing our first steps to train an agent capable of playing many different games without needing human interaction data. ... The result is an agent with the ability to succeed at a wide spectrum of tasks — from simple object-finding problems to complex games like hide and seek and capture the flag, which were not encountered during training. We find the agent exhibits general, heuristic behaviours such as experimentation, behaviours that are widely applicable to many tasks rather than specialised to an individual task.

...

The neural network architecture we use provides an attention mechanism over the agent’s internal recurrent state — helping guide the agent’s attention with estimates of subgoals unique to the game the agent is playing. We’ve found this goal-attentive agent (GOAT) learns more generally capable policies.

...

Playing roughly 700,000 unique games in 4,000 unique worlds within XLand, each agent in the final generation experienced 200 billion training steps as a result of 3.4 million unique tasks. At this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human. And the results we’re seeing clearly exhibit general, zero-shot behaviour across the task space — with the frontier of normalised score percentiles continually improving.

Looking qualitatively at our agents, we often see general, heuristic behaviours emerge — rather than highly optimised, specific behaviours for individual tasks. Instead of agents knowing exactly the “best thing” to do in a new situation, we see evidence of agents experimenting and changing the state of the world until they’ve achieved a rewarding state. We also see agents rely on the use of other tools, including objects to occlude visibility, to create ramps, and to retrieve other objects. Because the environment is multiplayer, we can examine the progression of agent behaviours while training on held-out social dilemmas, such as in a game of “chicken”. As training progresses, our agents appear to exhibit more cooperative behaviour when playing with a copy of themselves. Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently.

My hot take: This seems like a somewhat big deal to me. It's what I would have predicted, but that's scary, given my timelines. I haven't read the paper itself yet but I look forward to seeing more numbers and scaling trends and attempting to extrapolate... When I do I'll leave a comment with my thoughts.

EDIT: My warm take: The details in the paper back up the claims it makes in the title and abstract. This is the GPT-1 of agent/goal-directed AGI; it is the proof of concept. Two more papers down the line (and a few OOMs more compute), and we'll have the agent/goal-directed AGI equivalent of GPT-3. Scary stuff.

56

1
0

Reactions

1
0
Comments10


Sorted by Click to highlight new comments since:

You probably want the link at the top of this post to go directly to the Deepmind paper page, instead of the LessWrong redirect-URL for the link. I.e. the current link is:

https://www.lesswrong.com/out?url=https%3A%2F%2Fdeepmind.com%2Fblog%2Farticle%2Fgenerally-capable-agents-emerge-from-open-ended-play

When it probably should be:

https://deepmind.com/blog/article/generally-capable-agents-emerge-from-open-ended-play

Oops, sorry thanks!

Is there already a handy way to compare computation costs that went into training? E.g. compared to GPT3, AlphaZero, etc.?

I would love to know! If anyone finds out how many PF-DAYs or operations or whatever were used to train this stuff, I'd love to hear it. (Alternatively: How much money was spent on the compute, or the hardware.)

For what it's worth, I've mostly not been interested in AI safety/alignment (and am still mostly not), but this also seems like a pretty big deal to me. I haven't actually read the details, but this is basically not "narrow" AI anymore, right?

I guess the expressions "narrow" and "general" are a bit unfortunate, since I don't really want to call this either. I would want to reserve the term AGI for AI that can do at least this, but can also reason generally and abstractly, and excels at one-shot learning (although there are specific networks designed for one-shot learning, like Siamese networks. Actually, why aren't similar networks used more often,even as subnetworks?).

My take is that indeed, we now have AGI -- but it's really shitty AGI, not even close to human-level. (GPT-3 was another example of this; pretty general, but not human-level.) It seems that we now have the know-how to train a system that combines all the abilities and knowledge of GPT-3 with all the abilities and knowledge of these game-playing agents. Such a system would qualify as AGI, but not human-level AGI. The question is how long it'll take, and how much money (to make it bigger, train for longer) to get to human-level or something dangerously powerful at least.

It seems like this could extend naturally to cooperative inverse reinforcement learning.  Basically, the real world is a new game the AI has to play, and humans decide the reward subjectively (rather than with some explicit rule). The AI has developed some general competence beforehand by playing games, but it has to learn the new rules in the real world, which are not explicit.

My hot take: This seems like a somewhat big deal to me. It's what I would have predicted, but that's scary, given my timelines

Might be confirmation bias. But is it.

I did say it was a hot take. :D If I think of more sophisticated things to say I'll say them. 

 

AGI confirmed? 😬

Curated and popular this week
 ·  · 11m read
 · 
My name is Keyvan, and I lead Anima International’s work in France. Our organization went through a major transformation in 2024. I want to share that journey with you. Anima International in France used to be known as Assiettes Végétales (‘Plant-Based Plates’). We focused entirely on introducing and promoting vegetarian and plant-based meals in collective catering. Today, as Anima, our mission is to put an end to the use of cages for laying hens. These changes come after a thorough evaluation of our previous campaign, assessing 94 potential new interventions, making several difficult choices, and navigating emotional struggles. We hope that by sharing our experience, we can help others who find themselves in similar situations. So let me walk you through how the past twelve months have unfolded for us.  The French team Act One: What we did as Assiettes Végétales Since 2018, we worked with the local authorities of cities, counties, regions, and universities across France to develop vegetarian meals in their collective catering services. If you don’t know much about France, this intervention may feel odd to you. But here, the collective catering sector feeds a huge number of people and produces an enormous quantity of meals. Two out of three children, more than seven million in total, eat at a school canteen at least once a week. Overall, more than three billion meals are served each year in collective catering. We knew that by influencing practices in this sector, we could reach a massive number of people. However, this work was not easy. France has a strong culinary heritage deeply rooted in animal-based products. Meat and fish-based meals remain the standard in collective catering and school canteens. It is effectively mandatory to serve a dairy product every day in school canteens. To be a certified chef, you have to complete special training and until recently, such training didn’t include a single vegetarian dish among the essential recipes to master. De
 ·  · 1m read
 · 
 The Life You Can Save, a nonprofit organization dedicated to fighting extreme poverty, and Founders Pledge, a global nonprofit empowering entrepreneurs to do the most good possible with their charitable giving, have announced today the formation of their Rapid Response Fund. In the face of imminent federal funding cuts, the Fund will ensure that some of the world's highest-impact charities and programs can continue to function. Affected organizations include those offering critical interventions, particularly in basic health services, maternal and child health, infectious disease control, mental health, domestic violence, and organized crime.
 ·  · 1m read
 · 
The belief that it's preferable for America to develop AGI before China does seems widespread among American effective altruists. Is this belief supported by evidence, or it it just patriotism in disguise? How would you try to convince an open-minded Chinese citizen that it really would be better for America to develop AGI first? Such a person might point out: * Over the past 30 years, the Chinese government has done more for the flourishing of Chinese citizens than the American government has done for the flourishing of American citizens. My village growing up lacked electricity, and now I'm a software engineer! Chinese institutions are more trustworthy for promoting the future flourishing of humanity. * Commerce in China ditches some of the older ideas of Marxism because it's the means to an end: the China Dream of wealthy communism. As AGI makes China and the world extraordinarily wealthy, we are far readier to convert to full communism, taking care of everyone, including the laborers who have been permanently displaced by capital. * The American Supreme Court has established "corporate personhood" to an extent that is nonexistent in China. As corporations become increasingly managed by AI, this legal precedent will give AI enormous leverage for influencing policy, without regard to human interests. * Compared to America, China has a head start in using AI to build a harmonious society. The American federal, state, and municipal governments already lag so far behind that they're less likely to manage the huge changes that come after AGI. * America's founding and expansion were based on a technologically-superior civilization exterminating the simpler natives. Isn't this exactly what we're trying to prevent AI from doing to humanity?
Relevant opportunities