(crossposted from lesswrong)
I created a simple Google Doc for anyone interested in joining/creating a new org to put down their names, contact, what research they're interested in pursuing, and what skills they currently have. Overtime, I think a network can be fostered, where relevant people start forming their own research, and then begin building their own orgs/get funding. https://docs.google.com/document/d/1MdECuhLLq5_lffC45uO17bhI3gqe3OzCqO_59BMMbKE/edit?usp=sharing
Out of the four major AI companies, three of them seem to be actively trying to build God-level AGI as-fast-as-possible. And none of them are Meta. To paraphrase Conner Leahy, watch the hands, not the mouth. Three of them talk about safety concerns, but actively pursue a reckless agenda. One of them dismisses safety concerns, but seems to lag behind the others, and is not currently moving at breakneck speed. I think the general anti-Meta narrative in EA seems to be because the three other AI companies have used EAs for their own benefit (poaching talent, resources, etc.) I do not think Meta has yet warranted being a target.
I'm curious what you think of this, and if it impedes what you're describing being effective or not: https://arxiv.org/abs/2309.05463
The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
On AI Governance and Public Policy
2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine and can be described in detail, and they or their parents have likely lived through nuclear drills in school. This is closer to trying to explain someone the dangers of nuclear war before Hiroshima, before the Manhattan Project, and before even tnt was developed. They have to conceptualize what an explosion even is, not simply imagining an explosion at greater scale. Most people will simply not have the time or the will to try to grasp this problem, so this runs the risk of having people calling for action to a problem they do not understand, which will likely lead to dismissal by AI Researchers, and possibly short-sighted policies that don’t actually tackle the problem, or even make the problem worse by having the guise of accomplishment. To make matters worse, there is the risk of polarization. Almost any concern with political implications that has gained widespread public attention runs a high risk of becoming polarized. We are still dealing with the ramifications of well-intentioned, but misguided, early advocates in the Climate Change movement two decades ago, who set the seeds for making climate policy part of one’s political identity. This could be even more detrimental than a merely uninformed electorate, as it might push people who had no previous opinion on AI to advocate strongly in favor of capabilities acceleration, and to be staunchly against any form of safety policy. Even if executed using the utmost caution, this does not stop other players from using their own power or influence to hijack the movement and lead it astray.
2023 Me: Ah, Me’22,, the things you don’t know! Many of the concerns of Me’22 I think are still valid, but we’re experiencing what chess players might call a “forced move”. People are starting to become alarmed, regardless of what we say or do, so steering that in a direction we want is necessary. The fire alarm is being pushed, regardless, and if we don’t try to show some leadership in that regard, we risk less informed voices and blanket solutions winning-out. The good news is “serious” people are going on “serious” platforms and actually talking about x-risk. Other good news is that, from current polls, people are very receptive to concerns over x-risk and it has not currently fallen into divisive lines (roughly the same % of those concerned fall equally among various different demographics). This is still a difficult minefield to navigate. Polarization could still happen, especially with an Election Year in the US looming. I’ve also been talking to a lot of young people who feel frustrated not having anything actionable to do, and if those in AI Safety don’t show leadership, we might risk (and indeed are already risking), many frustrated youth taking political and social action into their own hands. We need to be aware that EA/LW might have an Ivory Tower problem, and that, even though a pragmatic, strategic, and careful course of action might be better, this might make many feel “shut out” and attempt to steer their own course. Finding a way to make those outside EA/LW/AIS feel included, with steps to help guide and inform them, might be critical to avoiding movement hijacking.
On Capabilities vs. Alignment Research:
2022 Me: While I strongly agree that not increasing capabilities is a high priority right now, I also question if we risk creating a state of inertia. In terms of the realms of safety research, there are very few domains that do not risk increasing capabilities research. And, while capabilities continues to progress every day, we might risk failing to keep up the speed of safety progress simply because every action risks an increase in capabilities. Rather than a “do no harm” principle, I think counterfactuals need to be examined in these situations, where we must consider if there is a greater risk if we *don’t* do research in a certain domain.
2023 Me: Oh, oh, oh! I think Me’22 was actually ahead of the curve on this one. This might still be controversial, but I think many got the “capabilities space” wrong. Many AIS-inspired theories that could increase capabilities are for systems that could be safer, more interpretable, and easier to monitor by default. And by not working on such systems we instead got the much more inscrutable, dangerous models by default, because the more dangerous models are easier. To quote the vape commercials, “safer != safe” but I still quit smoking in favor of electronics because safer is still at least safer. This is probably a moot point now, though, since I think it’s likely too late to create an entirely new paradigm in AI architectures. Hopefully Me’24 will be happy to tell me we found a new, 100% safe and effective new paradigm that everyone’s hopping on. Or maybe he’ll invent it.
Yann talked at the beginning how their difference in perspectives meant different approaches (open source vs. control/pause). I think a debate about that would have probably been much more productive. I wish someone had asked Melanie what policy proposals as a consequence of x-risk would be counter to policies for the 'short term' risks she spoke of, since her main complaint seemed to be that x-risk was "taking the oxygen out of the room", but I don't know concretely what concerns from x-risk would actually hurt short-term risks.
In terms of public perception, which is important, I think Yann and Bengio came across as more likable (which matters), while Max and Melanie several times interrupted other speakers, and seemed unnecessarily antagonistic toward the others' viewpoints. I love Max, and think he did overall the best in terms of articulating his viewpoints, but I imagine that put some viewers off.
I don't think anyone can win a bidding war against OpenAI right now, because they've established themselves as the current "top dog". Even if some other company can pay them more, they'd probably still choose to work at OpenAI instead, just because they're OpenAI. But not everyone can work at OpenAI, so that still gives us a lot of opportunity. I don't think this would be much of a problem, as long as the metrics for success are set. As mentioned above, x gains in interpretability is something that can be demonstrated, and at that point it doesn't matter who does it, or why they do it. Other fields of alignment are harder to set metrics for, but there are still a good number of unsolved sub-problems that are demonstrable if solved. Set the metrics for success, and then you don't have to worry about value drift.
So I think there's a huge difference between other EA causes and AIS. You can probably accomplish a good number of other objectives in EA without these, but I still think trying to make them higher status might still be useful. It's a way of signaling what's important in society and what's valued. If I knew a way to make someone working in Pandemic Preparedness as high status as the NBA, I probably would.
That being said, AI is a different beast. Places like San Francisco are filled with people working 70+ hours a week, hungry to get ahead in someway in AI. I'd love to tap into that hunger, with the metric for success being Alignment. It would need to have actual metrics for success, though, like provably solving certain certain aspects of the problem, or making a huge discovery in interpretability. If someone can accomplish demonstrable huge gains in this, I don't really care what their personal motivations are.
Should the US start mass-producing hazmat suits? So that, in the event of an engineered pandemic, the spread of the disease can be prevented, while still being able to maintain critical infrastructure/delivery of basic necessities.