It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.

For once, when we wonder ‘how did they do that, what was the big breakthrough that made this work’ the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.

Here's is Claude-3-Opus's summary:

The Risks and Implications of AI Software Engineers

Devin, an AI system developed by Cognition AI, demonstrates remarkable capabilities in writing complex code and completing software engineering tasks autonomously. This breakthrough in AI technology raises significant questions about the future of software development and the potential risks associated with such powerful AI agents.

Key points:

  1. Devin's ability to complete [13.8% of] real-world coding tasks on Upwork without human intervention is a quantum leap in AI capabilities.

  2. The use of AI systems like Devin could lead to a rapid accumulation of technical debt and poorly maintained code if not properly managed.

  3. Ensuring the safe use of Devin and similar AI agents is a major challenge, as they require access to sensitive data and the ability to execute arbitrary code.

  4. The full automation of software engineering by AI could lead to recursive self-improvement (RSI) and potentially catastrophic consequences.

  5. AI agents with the ability to plan, overcome obstacles, and seek resources to achieve their goals may pose existential risks if not properly aligned with human values.

The development of AI systems like Devin highlights the urgent need for proactive measures to ensure the safe and responsible deployment of advanced AI technologies.

Personal take I was really hoping that current architectures could not really support fully autonomous agents, and that it would be a few years away. I'm very concerned about this development, and afraid that the usual policy cycle is falling further behind on AI progress.

16

0
0

Reactions

0
0
Comments9


Sorted by Click to highlight new comments since:

If anyone has good suggestions of what I could email to relevant MEPs (just Zvi's post?) that would be net-positive (e.g. low risk of bad regulation), I'd be happy to hear them.

Ping Joep at PauseAI? He's a big fan of emailing representatives and has some advice. Here's a recording of a talk he gave hosted by ERO in Amsterdam the other night - I think it contains some pointers towards the end. 

Thanks, will do!

This article is quite interesting, I look forward to seeing how developments

However it goes off the deep end halfway through:

Um. I. Uh. I do not think you have thought about the implications of ‘solve cold fusion’ being a thing that one can do at a computer terminal?

"solve cold fusion" is not going to be solved at a computer terminal. "cold fusion" is probably impossible. Ab initio simulations are inherently limited, and require gargantuan computational resources for accurate results, along with widespread experimentation. As a physicist, I am sick to death of fantasy nonsense like this being injected into AI risk speculation. 

This is not a fair critique of the post, he's responding to a hypothetical discussed on Twitter.

As a software engineer, Devin seems very overhyped.

Rather than being a new set of capabilities, I think it’s a repackaging of current capabilities into a new UI.

The AI code assistant space is already very crowded. If this company came out and said they were making another code assistant, no one would have invested in them because there are already great code assistants on the market. Claiming that their product was an “AI software engineer” was the ONLY way for them to get funding and attention.

Also, some of the claims they’ve made involve smoke and mirrors. They claim “it passes the top tech company coding interviews”. It can do that because it’s trained directly on the solutions to the Leetcode questions that top tech companies give. Google search could pass the top tech company interviews by that standard.

People seem to vastly over estimate how much of software development is doing simple code tasks. Only 20% of software development is writing code and maybe 5% is doing simple code work that Devin was doing in the demos. Generative AI seems to have fundamental problems with reasoning, counting, and precision that I suspect will hold it back from being good at software engineering for a while longer.

I hope you are correct! As an outsider, I find it very hard to judge without standardized non-gameable benchmarks for agents.

I hope you are correct. I find it very hard to judge without standardized, non-gameable benchmarks for agents.

I hope you are correct. As an outsider, I find it very hard to judge without standardized, non-gameable benchmarks for agents.

Curated and popular this week
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
Max Taylor
 ·  · 9m read
 · 
Many thanks to Constance Li, Rachel Mason, Ronen Bar, Sam Tucker-Davis, and Yip Fai Tse for providing valuable feedback. This post does not necessarily reflect the views of my employer. Artificial General Intelligence (basically, ‘AI that is as good as, or better than, humans at most intellectual tasks’) seems increasingly likely to be developed in the next 5-10 years. As others have written, this has major implications for EA priorities, including animal advocacy, but it’s hard to know how this should shape our strategy. This post sets out a few starting points and I’m really interested in hearing others’ ideas, even if they’re very uncertain and half-baked. Is AGI coming in the next 5-10 years? This is very well covered elsewhere but basically it looks increasingly likely, e.g.: * The Metaculus and Manifold forecasting platforms predict we’ll see AGI in 2030 and 2031, respectively. * The heads of Anthropic and OpenAI think we’ll see it by 2027 and 2035, respectively. * A 2024 survey of AI researchers put a 50% chance of AGI by 2047, but this is 13 years earlier than predicted in the 2023 version of the survey. * These predictions seem feasible given the explosive rate of change we’ve been seeing in computing power available to models, algorithmic efficiencies, and actual model performance (e.g., look at how far Large Language Models and AI image generators have come just in the last three years). * Based on this, organisations (both new ones, like Forethought, and existing ones, like 80,000 Hours) are taking the prospect of near-term AGI increasingly seriously. What could AGI mean for animals? AGI’s implications for animals depend heavily on who controls the AGI models. For example: * AGI might be controlled by a handful of AI companies and/or governments, either in alliance or in competition. * For example, maybe two government-owned companies separately develop AGI then restrict others from developing it. * These actors’ use of AGI might be dr
Joris 🔸
 ·  · 5m read
 · 
Last week, I participated in Animal Advocacy Careers’ Impactful Policy Careers programme. Below I’m sharing some reflections on what was a really interesting week in Brussels! Please note I spent just one week there, so take it all with a grain of (CAP-subsidized) salt. Posts like this and this one are probably much more informative (and assume less context). I mainly wrote this to reflect on my time in Brussels (and I capped it at 2 hours, so it’s not a super polished draft). I’ll focus mostly on EU careers generally, less on (EU) animal welfare-related careers. Before I jump in, just a quick note about how I think AAC did something really cool here: they identified a relatively underexplored area where it’s relatively easy for animal advocates to find impactful roles, and then designed a programme to help these people better understand that area, meet stakeholders, and learn how to find roles. I also think the participants developed meaningful bonds, which could prove valuable over time. Thank you to the AAC team for hosting this! On EU careers generally * The EU has a surprisingly big influence over its citizens and the wider world for how neglected it came across to me. There’s many areas where countries have basically given a bunch (if not all) of their decision making power to the EU. And despite that, the EU policy making / politics bubble comes across as relatively neglected, with relatively little media coverage and a relatively small bureaucracy. * There’s quite a lot of pathways into the Brussels bubble, but all have different ToCs, demand different skill sets, and prefer different backgrounds. Dissecting these is hard, and time-intensive * For context, I have always been interested in “a career in policy/politics” – I now realize that’s kind of ridiculously broad. I’m happy to have gained some clarity on the differences between roles in Parliament, work at the Commission, the Council, lobbying, consultancy work, and think tanks. * The absorbe