Is there an argument that it is impossible?
There is actually an impossibility argument. Even if you could robustly specify goals in AGI, there is another convergent phenonemon that would cause misaligned effects and eventually remove the goal structures.
You can find an intuitive summary here: https://www.lesswrong.com/posts/jFkEhqpsCRbKgLZrd/what-if-alignment-is-not-enough
Actually, looks like there is a thirteenth lawsuit that was filed outside the US.
A class-action privacy lawsuit filed in Israel back in April 2023.
Wondering if this is still ongoing: https://www.einpresswire.com/article/630376275/first-class-action-lawsuit-against-openai-the-district-court-in-israel-approved-suing-openai-in-a-class-action-lawsuit
I agree that implies that those people are more inclined to spend the time to consider options. At least they like listening to other people give interesting opinions about the topic.
But we’re all just humans, interacting socially in a community. I think it’s good to stay humble about that.
If we’re not, then we make ourselves unable to identify and deal with any information cascades, peer proof, and/or peer group pressures that tend to form in communities.
Three reasons come to mind why OpenPhil has not funded us.
They're not quite doing a brand partnership.
But 80k has featured various safety researchers working at AGI labs over the years. Eg. see OpenAI.
So it's more like 80k has created free promotional content, and given their stamp of approval of working at AGI labs (of course 'if you weigh up your options, and think it through rationally' like your friends).
Hi Conor,
Thank you.
I’m glad to see that you already linked to clarifications before. And that you gracefully took the feedback, and removed the prompt engineer role. I feel grateful for your openness here.
It makes me feel less like I’m hitting a brick wall. We can have more of a conversation.
~ ~ ~
The rest is addressed to people on the team, and not to you in particular:
There are grounded reasons why 80k’s approaches to recommending work at AGI labs – with the hope of steering their trajectory – has supported AI corporations to scale. While disabling effort...
If some employees actually have the guts to whistleblow on current engineering malpractices…
Plenty of concrete practices you can whistleblow on that will be effective in getting society to turn against these companies:
If labs do engage in behavior that is flagrantly reckless, employees can act as whistleblowers.
This is the crux for me.
If some employees actually have the guts to whistleblow on current engineering malpractices, I have some hope left that having AI safety researchers at these labs still turns out “net good”.
If this doesn’t happen, then they can keep having conversations about x-risks with their colleagues, but I don’t quite see when they will put up a resistance to dangerous tech scaling. If not now, when?
Internal politics might change
We’ve seen in ...
Another problem with the NIST approach is an overemphasis on solving for identified risks, rather than precautionary principle (just don’t use scaled tech that could destabilise society at scale), or on preventing and ensuring legal liability for designs that cause situationalised harms.
Safety-washing of AI is harmful as it gives people an out, a chance to repeat the line "well at least they are allegedly doing some safety stuff", which is a convenient distraction from the fact that AI labs are knowingly developing a technology that can cause human extinction. This distraction causes otherwise safety-conscious people to invest in or work in an industry that they would reconsider if they had access to all the information.
Very much agreed.
It is an extreme claim to make in that context, IMO.
I think Benjamin made it to be nuanced. But the nuance in that article is rather one-sided.
If anything, the nuance should be on the side of identifying any ways you might accidentally support the development of dangerous auto-scaling technologies.
First do, no harm.
Do you think it would be better if no one who worked at OpenAI / Anthropic / Deepmind worked on safety?
It depends on what you mean with 'work on safety'.
Standard practice for designing machine products to be safe in other established industries is to first narrowly scope the machinery's uses, the context of use, and the user group.
If employees worked at OpenAI / Anthropic / Deepmind on narrowing their operational scopes, all power to them! That would certainly help. It seems that leadership, who aim to design unscoped automated machinery...
Consider Bridges v South Wales Police, where the court found in favour of Bridges on some elements not because the AI system was biased, but because a Data Protection Impact Assessment (DPIA) had not been carried out. Put simply, SWP hadn’t made sure it wasn’t biased. A DPIA is a foundation-level document in almost any compliance procedure.
This is an interesting anecdote.
It reminds me of how US medical companies having to go through FDA's premarket approval process for software designed for prespecified uses holds them from launching...
Note that we are focussing here on decisions at the individual level.
There are limitations to that.
See my LessWrong comment.
I don't think control is likely to scale to arbitrarily powerful systems. But it may not need to... which sets us up well for the following phases.
Under the concept of 'control', I am including the capacity of the AI system to control their own components' effects.
I am talking about fundamental workings of control. Ie. control theory and cybernetics.
I.e. as general enough that results are applicable to any following phases as well.
Anders Sandberg has been digging lately into fundamental controllability limits.
Could be interesting to talk with Anders.
I did read that compilation of advice, and responded to that in an email (16 May 2023):
"Dear [a],
People will drop in and look at job profiles without reading your other materials on the website. I'd suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.
Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.
Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as 'career steps'.
Consider that your model of this situation may be wrong (account for model error).
Thanks, I appreciate the paraphrase. Yes, that is a great summary.
I'm more optimistic e.g. that control turns out to be useful, or that there are hacky alignment techniques which work long enough to get through to the automation of crucial safety research
I hear this all the time, but I also notice that people saying it have not investigated the fundamental limits to controllability that you would encounter with any control system.
As a philosopher, would you not want to have a more generalisable and robust argument that this is actually going to work ...
Further, I think that there are a bunch of arguments for the value of safety work within labs (e.g. access to sota models; building institutional capacity and learning; cultural outreach) which seem to me to be significant and you're not engaging with.
Let's dig into the arguments you mentioned then.
I think 1 and 3 seem like arguments that reduce the desirability of these roles but it's hard to see how they can make them net-negative.
Yes, specifically by claim 1, positive value can only asymptotically approach 0
(ignoring opportunity costs).
80,000 Hours handpicks jobs at AGI labs.
Some of those jobs don't even focus on safety – instead they look like policy lobbying roles or engineering support roles.
Nine months ago, I wrote my concerns to 80k staff:
Hi [x, y, z]
I noticed the job board lists positions at OpenAI and AnthropicAI under the AI Safety category:
Not sure whom to contact, so I wanted to share these concerns with each of you:
...
- Capability races
- OpenAI's push for scaling the size and applications of transformer-network-based models has led Google and others to copy and compete wi
Hi Remmelt,
Thanks for sharing your concerns, both with us privately and here on the forum. These are tricky issues and we expect people to disagree about how to about how to weigh all the considerations — so it’s really good to have open conversations about them.
Ultimately, we disagree with you that it's net harmful to do technical safety research at AGI labs. In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
We argue for this p...
The ways I tried to pre-empt it failed.
Ie.
Looking back: I should have just held off until I managed to write one explainer (this one) that folks in my circles did not find extremely unintuitive.
Yep, that is what I was referring to.
Good that you raised this concern.
It does seem like you're likely to be more careful in the future
Yes, I am more selective now in what I put out on the forums.
In part, because I am having more one-on-one calls with (established) researchers.
I find there is much more space to clarify and paraphrase that way.
On the forums, certain write-ups seem to draw dismissive comments.
Some combination of:
(a) is not written by a friend or big name researcher.
(b) requires some new counterintuitive re...
I'm worried about these having negative effects, making the AI safety people seem crazy, uninformed, or careless.
If you look at the projects, notice that each is carefully scoped.
To lay a middle ground here:
Thomas' comment was not ad hominem. But I personally think it is somewhat problematic.
Arepo's counterresponse indicates why.
The reason being that alternative hypotheses exist that you would need to test against:
That's an interesting point. I wonder if this would also be the case if EVF (hypothetically) immediately earmarked proceeds from selling Wytham as donations to other organisations.
All of this of course is ignoring how grantmaking works in practice.
Maybe I'm being cynical, but I'd give >30% that funders have declined to fund AI Safety Camp in its current form for some good reason. Has anyone written the case against?
To keep communication open here, here is Oliver Habryka’s LessWrong comment.
I also believe that even if alignment is possible, we need more time to solve it.
The “Do Not Build Uncontrollable AI” area is meant for anyone to join who have this concern.
The purpose of this area is to contribute to restricting corporations from recklessly scaling the training and uses of ML models.
I want the area to be open for contributors who think that:
For transparency, we organisers paid $10K to Arb to do the impact evaluation, using separate funding we were able to source.
The impact assessment was commissioned by AISC, not independent.
This is a valid concern. I have worried about conflicts of interest.
I really wanted the evaluators at Arb to do neutral research, without us organisers getting in the way. Linda and I both emphasised this at an orienting call they invited us too.
From Arb’s side, Gavin deliberately stood back and appointed Sam Holton as the main evaluator, who has no connections with AI Safety Camp. Misha did participate in early editions of the camp though.
All in, this is enough to take the report with a grain of salt. Worth picking apart the analysis and looking for any unsound premises.
Glad you raised these concerns!
I suggest people actually dig themselves for evidence as to whether the program is working.
The first four points you raised seem to rely on prestige or social proof. While those can be good indicators of merit, they are also gameable.
Ie.
If there is one thing you can take away from Linda and I is t...
I kept responding in private conversations on Paul’s arguments, to a point that I decided to share my comments here.
Labs scaling models results in more investment in producing more GPU chips with more flops (see Sam Altman’s play for the UAE chip factory) and less latency between (see the EA start-up Fathom Radiant, which started up offering fibre-optic-connected supercomputers for OpenAI and now probably shifted to Anthropic).
The increasing levels of model combinatorial complexity and outside signal co...
This is an incisive description, Geoff. I couldn't put it better.
I'm confused what the two crosses are doing on your comment.
Maybe the people who disagreed can clarify.
Respect for this comment.
In the original conception of the unilateralist’s curse, the problem arose from epistemically diverse actors/groups having different assessments of how risky an action was.
The mistake was in the people with the rosiest assessment of the risk of an action taking the action by themselves – in disregard of others’ assessments.
What I want more people in AI Safety to be aware of is that there are many other communities out there who think that what “AGI” labs are doing is super harmful and destabilising.
We’re not the one community conce...
Helpful comment from you Lucius in the sheet:
"I think our first follow-up grant was 125k USD. Should be on the LTFF website somewhere. There were subsequent grants also related to the AISC project though. And Apollo Research's interpretability agenda also has some relationship with ideas I developed at AISC."
--> I updated the sheet.
Thanks, we’ll give it a go. Linda is working on sending something in for the “Request for proposals for projects to grow our capacity for reducing global catastrophic risks”
Note though AISC does not really fit OpenPhil’s grant programs because we are not affiliated with a university and because we don’t select heavily on our own conceptions of who are “highly promising young people”.
It turns out there are five six AI Safety Camp alumni working at Apollo, including the two co-founders.
I got to go through alumni's LinkedIn profiles to update our records of post-camp positions.
It's on my to-do list.
This is good to know! I’m glad that the experience helped you get involved in AI Safety work.
Could you search for the LTFF grant here and provide me the link? I must have missed it in my searches.
(Also, it looks I missed two of the four alumni working at Apollo. Will update!)
I appreciate you sharing this. I’ll add it to our list of anecdotes.
Also welcoming people sharing any setbacks or negative experiences they had. We want to know if people have sucky experiences so we find ways to make it not sucky next time. Hoping to have a more comprehensive sense ...
I hadn’t made the GMO protests - AI protests connection.
This reads as a well-researched piece.
The analysis makes sense to me – with an exception to seeing efforts to restrict facial recognition, the Kill Cloud, etc, as orthogonal. I would also focus more on preventing increasing AI harms and Big Tech power consolidation, which most AI-concerned communities agree on.
I spent time digging into Uganda Community Farm’s plans last year, and ended up becoming a regular donor. From reading the write-ups and later asking Anthony about the sorghum training and grain-processing plant projects, I understood Anthony to be thoughtful and strategic about actually helping relieve poverty in the Kamuli & Buyende region.
Here are short explainers worth reading:
UCF focusses on training farmers and giving them the materials and tools needed to build up t... (read more)
Strong upvote for a community member taking the time to evaluate an intervention presented by an "outsider," act on that evaluation, and share it with others. This adds a lot of value!