It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.

For once, when we wonder ‘how did they do that, what was the big breakthrough that made this work’ the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.

Here's is Claude-3-Opus's summary:

The Risks and Implications of AI Software Engineers

Devin, an AI system developed by Cognition AI, demonstrates remarkable capabilities in writing complex code and completing software engineering tasks autonomously. This breakthrough in AI technology raises significant questions about the future of software development and the potential risks associated with such powerful AI agents.

Key points:

  1. Devin's ability to complete [13.8% of] real-world coding tasks on Upwork without human intervention is a quantum leap in AI capabilities.

  2. The use of AI systems like Devin could lead to a rapid accumulation of technical debt and poorly maintained code if not properly managed.

  3. Ensuring the safe use of Devin and similar AI agents is a major challenge, as they require access to sensitive data and the ability to execute arbitrary code.

  4. The full automation of software engineering by AI could lead to recursive self-improvement (RSI) and potentially catastrophic consequences.

  5. AI agents with the ability to plan, overcome obstacles, and seek resources to achieve their goals may pose existential risks if not properly aligned with human values.

The development of AI systems like Devin highlights the urgent need for proactive measures to ensure the safe and responsible deployment of advanced AI technologies.

Personal take I was really hoping that current architectures could not really support fully autonomous agents, and that it would be a few years away. I'm very concerned about this development, and afraid that the usual policy cycle is falling further behind on AI progress.

16

0
0

Reactions

0
0
Comments9


Sorted by Click to highlight new comments since:

If anyone has good suggestions of what I could email to relevant MEPs (just Zvi's post?) that would be net-positive (e.g. low risk of bad regulation), I'd be happy to hear them.

Ping Joep at PauseAI? He's a big fan of emailing representatives and has some advice. Here's a recording of a talk he gave hosted by ERO in Amsterdam the other night - I think it contains some pointers towards the end. 

Thanks, will do!

This article is quite interesting, I look forward to seeing how developments

However it goes off the deep end halfway through:

Um. I. Uh. I do not think you have thought about the implications of ‘solve cold fusion’ being a thing that one can do at a computer terminal?

"solve cold fusion" is not going to be solved at a computer terminal. "cold fusion" is probably impossible. Ab initio simulations are inherently limited, and require gargantuan computational resources for accurate results, along with widespread experimentation. As a physicist, I am sick to death of fantasy nonsense like this being injected into AI risk speculation. 

This is not a fair critique of the post, he's responding to a hypothetical discussed on Twitter.

As a software engineer, Devin seems very overhyped.

Rather than being a new set of capabilities, I think it’s a repackaging of current capabilities into a new UI.

The AI code assistant space is already very crowded. If this company came out and said they were making another code assistant, no one would have invested in them because there are already great code assistants on the market. Claiming that their product was an “AI software engineer” was the ONLY way for them to get funding and attention.

Also, some of the claims they’ve made involve smoke and mirrors. They claim “it passes the top tech company coding interviews”. It can do that because it’s trained directly on the solutions to the Leetcode questions that top tech companies give. Google search could pass the top tech company interviews by that standard.

People seem to vastly over estimate how much of software development is doing simple code tasks. Only 20% of software development is writing code and maybe 5% is doing simple code work that Devin was doing in the demos. Generative AI seems to have fundamental problems with reasoning, counting, and precision that I suspect will hold it back from being good at software engineering for a while longer.

I hope you are correct! As an outsider, I find it very hard to judge without standardized non-gameable benchmarks for agents.

I hope you are correct. I find it very hard to judge without standardized, non-gameable benchmarks for agents.

I hope you are correct. As an outsider, I find it very hard to judge without standardized, non-gameable benchmarks for agents.

Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Neel Nanda
 ·  · 1m read
 · 
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed! Introduction I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously. These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified to answer these broader strategic questions. I think this is a mistake, a form of undue deference that is both incorrect and unhelpful. I certainly try to have good strategic takes, and I think this makes me better at my job, but this is far from sufficient. Being good at research and being good at high level strategic thinking are just fairly different skillsets! But isn’t someone being good at research strong evidence they’re also good at strategic thinking? I personally think it’s moderate evidence, but far from sufficient. One key factor is that a very hard part of strategic thinking is the lack of feedback. Your reasoning about confusing long-term factors need to extrapolate from past trends and make analogies from things you do understand better, and it can be quite hard to tell if what you're saying is complete bullshit or not. In an empirical science like mechanistic interpretability, however, you can get a lot more fe
Ronen Bar
 ·  · 10m read
 · 
"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).  In this post, I argue that: 1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section). 2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI. The problem What is Moral Alignment? AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings. Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while