(I'm Matthew Gray)
Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.
My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:
Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.
Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.
I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.
I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.
Might be a good time to update Are We Living At The Most Influential Time in History?.
I'm thinking about the matching problem of "people with AI safety questions" and "people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or wherever), asks "what the fuck?", and then tries to find someone who can tell him what the fuck.
I think normally people trust their local expertise landscape--if they think the CDC is the authority on masks they adopt the CDC's position, if they think their mom group on Facebook is the authority on masks they adopt the mom group's position--but AI risk is weird because it's mostly unclaimed territory in their local expertise landscape. (Snoop also asks "is we in a movie right now?" because movies are basically the only part of the local expertise landscape that has had any opinion on AI so far, for lots of people.) So maybe there's an opportunity here to claim that territory (after all, we've thought about it a lot!).
I think we have some 'top experts' who are available for, like, mass-media things (podcasts, blog posts, etc.) and 1-1 conversations with people they're excited to talk to, but are otherwise busy / not interested in fielding ten thousand interview requests. Then I think we have tens (hundreds?) of people who are expert enough to field ten thousand interview requests, given that the standard is "better opinions than whoever they would talk to by default" instead of "speaking to the whole world" or w/e. But just like connecting people who want to pay to learn calculus and people who know calculus and will teach it for money, there's significant gains from trade from having some sort of clearinghouse / place where people can easily meet. Does this already exist? Is anyone trying to make it? (Do you want to make it and need support of some sort?)
I think the 'traditional fine dining' experience that comes closest to this is Peking Duck.
Most of my experience has been with either salt-drenched cooked fat or honey-dusted cooked fat; I'll have to try smoking something and then applying honey to the fat cap before I eat it. My experience is that it is really good but also quickly becomes unbalanced / no longer good; some people, on their first bite, already consider it too unbalanced to enjoy. So I do think there's something interesting here where there is a somewhat subtle taste mechanism (not just optimizing for 'more' but somehow tracking a balance) that ice cream seems to have found a weird hole in.
[edit: for my first attempt at this, I don't think the honey improved it at all? I'll try it again tho.]
When people make big and persistent mistakes, the usual cause (in my experience) is not something that comes labeled with giant mental “THIS IS A MISTAKE” warning signs when you reflect on it.Instead, tracing mistakes back to their upstream causes, I think that the cause tends to look like a tiny note of discord that got repeatedly ignored—nothing that mentally feels important or action-relevant, just a nagging feeling that pops up sometimes.To do better, then, I want to take stock of those subtler upstream causes, and think about the flinch reactions I exhibited on the five-second level and whether I should have responded to them differently.
When people make big and persistent mistakes, the usual cause (in my experience) is not something that comes labeled with giant mental “THIS IS A MISTAKE” warning signs when you reflect on it.
Instead, tracing mistakes back to their upstream causes, I think that the cause tends to look like a tiny note of discord that got repeatedly ignored—nothing that mentally feels important or action-relevant, just a nagging feeling that pops up sometimes.
To do better, then, I want to take stock of those subtler upstream causes, and think about the flinch reactions I exhibited on the five-second level and whether I should have responded to them differently.
I don't see anything in the lessons on the question of whether or not your stance on drama has changed, which feels like the most important bit?
That is, suppose I have enough evidence to not-be-surprised-in-retrospect if one of my friends is abusing their partner, and also I have a deliberate stance of leaving other people's home lives alone. The former means that if I thought carefully about all of my friends, I would raise that hypothesis to attention; the latter means that even if I had the hypothesis, I would probably not do anything about it. In this hypothetical, I only become a force against abuse if I decide to become a meddler (which introduces other costs and considerations).
Can we all just agree that if you’re gonna make some funding decision with horrendous optics, you should be expected to justify the decision with actual numbers and plans?
Justify to who? I would like to have an EA that has some individual initiative, where people can make decisions using their resources to try to seek good outcomes. I agree that when actions have negative externalities, external checks would help. But it's not obvious to me that those external checks weren't passed in this case*, and if you want to propose a specific standard we should try to figure out whether or not that standard would actually help with optics.
Like, if the purchase of Wytham Abbey had been posted on the EA forum, and some people had said it was a good idea and some people said it was a bad idea, and then the funders went ahead and bought it, would our optics situation look any different now? Is the idea that if anyone posted that it was a bad idea, they shouldn't have bought it?
[And we need to then investigate whether or not adding this friction to the process ends up harming it on net; property sales are different in lots of places, but there are some where adding a week to the "should we do this?" decision-making process means implicitly choosing not to buy any reasonably-priced property, since inventory moves too quickly, and only overpriced property stays on the market for more than a week.]
* I don't remember being consulted about Wytham, but I'm friends with the people running it and broadly trust their judgment, and guess that they checked with people as to whether or not they thought it was a good idea. I wasn't consulted about the specific place Irena ended up buying, but I was consulted somewhat on whether or not Irena should buy a venue, and I thought she should, going so far as being willing to support it with some of my charitable giving, which ended up not being necessary.
From The Snowball, dealing with Warren Buffett's son's stint as a director and PR person for ADM:
The second the FBI agents left, Howie called his father, flailing, saying, I don't know what to do, I don't have the facts, how do I know if these allegations are true? My name is on every press release. How can I be the spokesman for the company worldwide? What should I do, should I resign?
Buffett refrained from the obvious response, which was that, of his three children, only Howie could have wound up with an FBI agent in his living room after taking his first job in the corporate world. He listened to the story non-judgmentally and told Howie that it was his decision whether to stay at ADM. He gave only one piece of advice: Howie had to decide within the next twenty-four hours. If you stay in longer than that, he said, you'll become one of them. No matter what happens, it will be too late to get out.That clarified things. Howie now realized that waiting was not a way to get more information to help him decide, it was making the decision to stay. He had to look at his options and understand as of right now what they meant.If he resigned and they were innocent, he would lose friends and look like a jerk.If he stayed and they were guilty, he would be viewed as consorting with criminals.The next day Howie went in, resigned, and told the general counsel that he would take legal action against the company if they put his name on any more press releases. Resigning from the board was a major event. For a director to resign was like sending up a smoke signal that said the company was guilty, guilty, guilty. People at ADM did not make it easy for Howie. They pushed for reprieve, they asked how he could in effect convict them without a trial. Howie held firm, however, and got out.
Buffett refrained from the obvious response, which was that, of his three children, only Howie could have wound up with an FBI agent in his living room after taking his first job in the corporate world. He listened to the story non-judgmentally and told Howie that it was his decision whether to stay at ADM. He gave only one piece of advice: Howie had to decide within the next twenty-four hours. If you stay in longer than that, he said, you'll become one of them. No matter what happens, it will be too late to get out.
That clarified things. Howie now realized that waiting was not a way to get more information to help him decide, it was making the decision to stay. He had to look at his options and understand as of right now what they meant.
If he resigned and they were innocent, he would lose friends and look like a jerk.
If he stayed and they were guilty, he would be viewed as consorting with criminals.
The next day Howie went in, resigned, and told the general counsel that he would take legal action against the company if they put his name on any more press releases. Resigning from the board was a major event. For a director to resign was like sending up a smoke signal that said the company was guilty, guilty, guilty. People at ADM did not make it easy for Howie. They pushed for reprieve, they asked how he could in effect convict them without a trial. Howie held firm, however, and got out.
Can you explain the "same upsides" part?
Yeah; by default people have entangled assets which will be put at risk by starting or investing in a new project. Limiting the liability that originates from that project to just the assets held by that project means that investors and founders can do things that seem to have positive return on their own, rather than 'positive return given that you're putting all of your other assets at stake.'
[Like I agree that there's issues where the social benefit of actions and the private benefits of actions don't line up, and we should try to line them up as well as we can in order to incentivize the best action. I'm just noting that the standard guess for businesses is "we should try to decrease the private risk of starting new businesses"; I could buy that it's different for the x-risk environment, where we should not try to decrease the private risk of starting new risk reduction projects, but it's not obviously the case.]
Therefore, we should be very wary of funding mechanisms that incentivize people to treat extremely harmful outcomes as if they were neutral (when making decisions about doing/funding projects that are related to anthropogenic x-risks).
Sure, I agree with this, and with the sense that the costs are large. The thing I'm looking for is the comparison between the benefits and the costs; are the costs larger?
[EDIT: Also, interventions that are carried out if and only if impact markets fund them seem selected for being net-negative, because they are ones that no classical EA funder would fund.]
Sure, I buy that adverse selection can make things worse; my guess was that the hope was that classical EA funders would also operate thru the market. [Like, at some point your private markets become big enough that they become public markets, and I think we have solid reasons to believe a market mechanism can outperform specific experts, if there's enough profit at stake to attract substantial trading effort.]
This reminds me a lot of limited liability (see also Austin's comment, where he compares it to the for-profit startup market, which because of limited liability for corporations bounds prices below by 0).
This is a historically unusual policy (full liability came first), and seems to me to have basically the same downsides (people do risky things, profiting if they win and walking away if they lose), and basically the same upsides (according to the theory supporting LLCs, there's too little investment and support of novel projects).
Can you say more about why you think this consideration is sufficient to be net negative? (I notice your post seems very 'do-no-harm' to me instead of 'here are the positive and negative effects, and we think the negative effects are larger', I'm also interested in Owen's impression on whether or not impact markets lead to more or less phase 2 work.)
I'm interested in fleshing out "what you're looking for"; do you have some examples of things written in the past which changed your minds, which you would have awarded prizes to?
For example, I thought about my old comment on patient long-termism, which observes that in order to say "I'm waiting to give later" as a complete strategy you need to identify the conditions under which you would stop waiting (as otherwise, your strategy is to give never). On the one hand, it feels "too short" to be considered, but on the other hand, it seems long enough to convey its point (at least, embedded in context as it was), and so any additional length would be 'more cost without benefit'.