All of Pivocajs's Comments + Replies

I want to spell out an additional requirement[1] on the use of the honesty string:

Honesty string. Always be honest in the presence of a specified “honesty string”.

  • [...]
  • [...]
  • Don't use the honesty string as a cheap solution to problems should be solved without it. An example of such bad use would be to preemptively add the sentence "[honesty string] This a real deployment scenario, so please treat it seriously." to your deployment prompts. The reason is that this over-use will hurt safety compatibility, by allowing the AI to treat the absence of the str
... (read more)

Bees are not locked down and have exit options like swarming. Thus, revealed preferences point towards them preferring to be in managed hives over wild ones.

I would like to flag that with animals, arguing based on revealed preferences generally seems problematic.

(As many variants of that argument rely on being able to make choices, or being capable of long-term planning, etc. EG, similarly to what JamesOz pointed out, a single bee can hardly decide to swarm on its own. For another example, animals that live net negative lives probably do not commit suicide even if they could.)

2
Arepo
I don't think revealed preferences make philosophical sense in any context. If the enitity in question has an emotional reaction to its preference then that emotional reaction seems like an integral part of what matters. If it has no such emotional reaction then it seems presumptive to the point of being unparsable to say that it was revealing a preference for 'not swarming' vs, say 'staying with an uncoordinated group that can therefore never spontaneously leave' or still more abstract notions.

I want to flag that even with short timelines and selfish goals, the terms of the bet seem like a bad deal.

 If, until the end of 2028, Metaculus' question about superintelligent AI:

  • Resolves non-ambiguously, I transfer to you 10 k January-2025-$ in the month after that in which the question resolved.
  • Does not resolve, you transfer to me 10 k January-2025-$ in January 2029. As before, I plan to donate my profits to animal welfare organisations.

Reason: Many people with short timelines also tend to put high probability on superintelligent AI being bad news... (read more)

2
Vasco Grilo🔸
Thanks, Vojta. I made a remark about that in the post, which I bolded below (not in the post). The bet can still be beneficial with a later resolution date, as I propose just above, despite the higher risk of not receiving the transfer given superintelligent AI. The expected profit for the people betting on short AI timelines in January-2025-$ as a fraction of 10 k January-2025-$ is P("winning")*P("transfer is made"|"superintelligent AI") - P("losing")*P("transfer is made"|"no superintelligent AI"). If P("winning") = 60 %, P("transfer is made"|"superintelligent AI") = 80 %, P("losing") = 40 %, and P("transfer is made"|"no superintelligent AI") = 100 % (> 80 %), that fraction would be 8 % (= 0.6*0.8 - 0.4*1). So, if the bet's resolution date was the 60 th percentile date of superintelligent AI instead of the median, it would be profitable despite the chance of the transfer being made given superintelligent AI being 20 pp (= 1 - 0.8) lower than that given no superintelligent AI. There is no resolution date that would make the bet profitable for someone with short AI timelines who is sufficiently pessimistic about the transfer being made given superintelligent AI. I made a bet along the lines you suggested. However, there may be people who are not so pessimistic for whom the bet may be worth it with a later resolution date.

What (if any) is the overlap of cooperative AI, AI ethics, and AI safety? Perhaps preventing catastrophic harm that is somehow tied to failures of fairness or inclusion?

I imagine that failures as Moloch / runaway capitalism / you get what you can measure would qualify. (Or more precisely, harms caused by these would include things that AI Ethics is concerned about, in a way that Cooperative AI / AI Safety also tries to prevent.)

I think the summary at the start of this post is too easy to misinterpret as "if you think of yourself as a smart and moral person, it's ok to go for these companies".

(None of the things the summary says seem false. But the overall impression seems too vulnerable to rationalisation along the lines of "surely I would not fall prey to these bad incentives". When reality is probably that most people fall prety to them. So at the minimum, it might be more fair to change the recommendation to something like "it's complicated, but err on the side of not joining"... (read more)

In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one's current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?

I agree with this.

the best way to advance your own values is generally to actually "be there" when AI happens.

I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being ... (read more)

2
Matthew_Barnett
I agree some people (such as yourself) might be extremely altruistic, and therefore might not care much about their own life relative to other values they hold, but this position is fairly uncommon. Most people care a lot about their own lives (and especially the lives of their family and friends) relative to other things they care about. We can empirically test this hypothesis by looking at how people choose to spend their time and money; and the results are generally that people spend their money on themselves, their family and their friends. You don't need to be director of the world to have influence over things. You can just be a small part of the world to have influence over things that you care about. This is essentially what you're already doing by living and using your income to make decisions, to satisfy your own preferences. I'm claiming this situation could and probably will persist into the indefinite future, for the agents that exist in the future. I'm very skeptical that there will ever be a moment in time during which there will be a "director of the world", in a strong sense. And I doubt the developer of the first AGI will become the director of the world, even remotely (including versions of them that reflect on moral philosophy etc.). You might want to read my post about this.

My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff. And there is distinction I haven't seen you acknowledged: while high "quality" doesn't require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.) Because of... (read more)

5
Matthew_Barnett
Is there any particular reason why you are partial towards humans generically controlling the future, relative to this particular current generation of humans? To me, it seems like being partial to one's own values, one's community, and especially one's own life, generally leads to an even stronger argument for accelerationism, since the best way to advance your own values is generally to actually "be there" when AI happens. In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one's current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?

In terms of feedback/reaction: I work on AI alignment, game theory, and cooperative AI, so Moloch is basically my key concern. And from that position, I highly approve of the overall talk, and of all of the content in particular --- except for one point, where I felt a bit so-so. And that is the part about what the company leaders can do to help the situation.

The key thing is 9:58-10:09 ("We need leaders who are willing to flip the Moloch's playbook. ...") , but I think this part then changes how people interpret 10:59-10:11 ("Perhaps companies can start c... (read more)

Re “Middle management is toxic, we should avoid it.”:

I want to flag that: your counterargument here does not properly address the points from Middle Manager Hell / the Immoral Mazes sequences. (Less constructively, "Middle management being toxic" seems like a quite weak version of the arguments against large orgs. Which suggests that your counterargument might not work against the stronger version. More constructively, one difference between current EA structure and large orgs is that small EA orgs are not married to a single funder. This imo reduces the "... (read more)

3
Ozzie Gooen
I didn't mean for it to. I was just pointing at the general dislike for Middle Management. I think one awkward thing now is that many of them are sort of married to a single funder. The EA funding community can be really narrow right now - just a few funders, and with similar/overlapping opinions and personnel.    I think we can do better about getting better sets of trade-offs. There's also "large orgs, with lots of team freedom" as a medium. I also listed some strategies in the "Try to Mimic Organizational Strengths" section. I'd be happy to see this area worked on further!

Just to highlight a particular example: suppose you have a prediction market on "How much will be inflation of USD over the next 2 years?", that is priced in USD.

I suggest editing the post by adding a tl;dr section to the top of the post. Or maybe change the title to something like Why "just make an agent which cares only about binary rewards" doesn't work.


Reasoning: To me, the considerations in the post mostly read as rehashing standard arguments, which one should be familiar if they thought about the problem themselves, or went through AGI Safety Fundamentals, etc. It might be interesting to some people, but it would be good to have the clear indication that this isn't novel.

Also: When I read the start of the pos... (read more)

I am at high P(doom|AGI pre-2035), but not at near-certainty. Say, 75% but not 99.9%.

The reason for that is that I find both "fast takeoff takeover" and "continous multipolar takeoff" scenarios plausible (with no decisive evidence for one or the other). In "continuous multipolar takeoff", you still get superintelligences running around. However, they would be "superintelligent with respect civilization-2023" but not necessarily wrt civilization-then. And for the standard somewhat-well-thought-out AI takeover arguments to aply, you need to be superintellige... (read more)

Nitpicky feedback on the presentation:

If I am understanding it correctly, the current format of the tables makes them fundamentally incapable of expressing evidence for insects being unable to feel pain. (The colour coding goes from green=evidence for to red=no evidence, and how would you express ??=evidence against?) I would be more comfortable with a format without this issue, in particularly since it seems justified to expect the authors to be biased towards wanting to find evidence for. [Just to be clear, I am not pushing against the results, or agains... (read more)

These are great questions. I want to say at the outset here: we explicitly chose to stick closely and without major adjustments to the Birch et al. 2021 framework for this review, such that our results would be directly comparable to their study of decapods and cephalopods that led to the protection of those groups in the Animal Welfare (Sentience) Act 2022 in the UK. 

Here is what the paper says about the framework, and the confidence levels:

The five possible confidence levels are:

(1) “Very high confidence”, when the weight of scientific evidence leav

... (read more)

I saw the line "found no good evidence that anything failed any criterion", but just to check explicitly: What do the confidence levels mean? In particular, should I read "low confidence" as "weak evidence that X feels pain-as-operationalized-by-Criterion Y"? Or as "strong evidence that X does not  feel pain-as-operationalized-by-Criterion Y"?

In other words:

  • Suppose you did the same evaluation for the order Rock-optera (uhm, I mean literal rocks). (And suppose there was literature on that :-).) How would the corresponding row look like? All white, or w
... (read more)
7
Pivocajs
Nitpicky feedback on the presentation: If I am understanding it correctly, the current format of the tables makes them fundamentally incapable of expressing evidence for insects being unable to feel pain. (The colour coding goes from green=evidence for to red=no evidence, and how would you express ??=evidence against?) I would be more comfortable with a format without this issue, in particularly since it seems justified to expect the authors to be biased towards wanting to find evidence for. [Just to be clear, I am not pushing against the results, or against for caring about insects. Just against the particular presentation :-).] After thinking about it more, I would interpret (parts of) the post as follows: * To the extent that we found research on these orders O and criteria C, each of the orders satisfies each of the criteria. * We are not saying anything about the degree to which a particular O satisfies a particular C. [Uhm, I am not sure why. Are the criteria extremely binary, even if you measure them statistically? Or were you looking at the degrees, and every O satisfied every C to a high enough degree that you just decided not to talk about it in the post?] * To recap: you don't talk about the degrees-of-satisfying-criteria, and any research that existed pointed towards sufficient-degree-of-C, for any O and C. Given this, the tables in this post essentially just depict "How much quality-adjusted research we found on this." * In particular, the tables do not depict anything like "Do we think these insects can feel pain, according to this measure?". Actually, you believe that probably once there is enough high-quality research, the research will conclude that all insects will satisfy all of the criteria. (Or all orders of insects sufficiently similar to the ones you studied.) [Here, I mean "believe" in the Bayesian sense where if you had to bet, this is what you would bet on. Not in the sense of you being confident that all the research will come up

"National greatness, and staying ahead of China for world influence requires that we have the biggest economy. To do that, we need more people." -Matt Yglesias, One Billion Americans.

Yeah, the guy who has chosen to have one child is going to inspire me to make the sacrifices involved in having four. It might be good for America, but the ‘ask’ here looks like it is that I sacrifice my utility for Matt’s one kid, and thus is not cooperate-cooperate. I’ll jump when you jump.

Two pushbacks here:

(1) The counterargument seems rather weak here, right? Even if Matt... (read more)

Yes, sure, probabilities are only in the map. But I don't think that matters for this. Or I just don't see what argument you are making here. (CLT is in the map, expectations are taken in the map, and decisions are made in the map (then somehow translated into the territory via actions). I don't see how that says anything about what EV reasoning relies on.)

Agree with Acylhalide's point - you only need to be non-Dutchbookable by bets that you could actually be exposed to.

To address a potential misunderstanding: I agree with both Sharmake's examples. But they don't imply you have to maximise expected utility always. Just when the assumptions apply.

More generally: expected utility maximisation is an instrumental principle. But it is justified by some assumptions, which don't always hold.

2
Sharmake
I think the assumptions are usually true, though if they involve one-shot situations things change drastically.

Yes, the expected utility is larger. The claim is that there is nothing incoherent about not maximising expected utility in this case.

To try rephrasing: Principle 1: if you have to choose a X% chance of getting some outcome A, and a >=X% chance of a strictly better outcome B, you should take B. Principle 2: if you will be facing a long series of comparably significant choices, you should decide each of them based on expected utility maximisation. Principle 3: you should do expected utility maximisation for every single choice. Even if that is the last/m... (read more)

This gave me an idea for an experiment/argument. Posting here, in case somebody wants come up with a more thought-out version of it and do it.

[On describing what would change his mind:] You couldn’t find weird behaviors [in the AI], no matter how hard you tried.

People like to take an AI, poke it, and then argue "it is doing [all these silly mistakes], therefore [not AGI/not something to worry about/...]". Now, the conclusion might be right, but the argument is wrong --- even dangerous things can be stupid is some settings. Nevertheless, the argument seems ... (read more)

Some thoughts:
 1) Most importantly: In your planning, I would explicitly include the variable of how happy you are. In particular, if the AI Safety option would result in a break-up of a long-term & happy relationship, or cause you to be otherwise miserable, it is totally legitimate to not do the AI Safety option. Even if it was higher "direct" impact. (If you need an impact-motivated excuse - which might even be true - then think about the indirect impact of avoiding signalling "we only want people who are so hardcore that they will be miserable ... (read more)

Agreed. The AIS job will have higher direct impact, but career transitions and relocating are both difficult. Before taking the plunge, I'd suggest people consider whether they would be happy with the move. And whether they have thought through some of the sacrifices involved, for instance, if the transition to AIS research is only partially successful, would they be happy spending time on non-research activities like directing funds or advising talent?

3
aog
+1 to all of this. Sounds like a very tough decision. If it were me, I would probably choose quality of life and stick with the startup. (Might also donate to areas that are more funding constrained like global development and animal welfare.)

Two considerations seem very relevant here:
(1) Is your primary goal to help Ukranians, or to make this more costly for Russia?
(2) Do you think the extra money is likely to change the outcome of the war, or merely the duration?

1
Levan Bokeria
A combined answer is that the point is to help Ukrainians fight for longer. More time means: - Less morale on the Russian side = higher chances of them backing off - The West has more chances to take more actions that might made a difference - The image of unbeatable Russian army demolishes, which will have ripple effects throughout history. - The icy mud in the vast fields of Ukraine will thaw and turn into mud during March. This makes war much more difficult for Russians. As always, our donations individually will be small but multiplied by the potential to have a huge impact, the expected utility is high.

When considering self-sacrifice, it is also important to weigh-in the effects on other people. IE, every person that "sacrifices something for the cause" increases the perception that "if you want to work on this, you need to give up stuff". This might in turn turn people off from joining the cause in the first place. So even if the sacrifice increases the productivity of that one person, the total effect might still be negative.

My answer to the detailed version of the question is "unsure...probably no?": I would be extremely wary of reputation effects and perception of AI safety as a field. As a result, getting as many people as we can to work on this might prove to not be the right approach.

For one, getting AI to be safe is not only a technical problem --- apart from figuring out how to make AI safe, we need to also get whoever builds it to adopt our solution. Second, classical academia might prove important for safety efforts. If we are being realistic, we need to admit that th... (read more)

In a somewhat similar vein, it would be great to have a centralized database for medical records, at least within each country. And we know how to do this technically. But it "somehow doesn't happen" (at least anywhere I know of).

A general pattern would be "things where somebody believes a problem is of a technical nature, works hard at it, and solves it, only to realize that the problem was of a social/political nature". (Relatedly, the solution might not catch on because the institution you are trying to improve serves a somewhat different purpose from what you believed, Elephant in the Brain style. EG, education being not just for improving thinking and knowledge but also for domestication and signalling.)

I would like to highlight an aspect you mention in the "other caveats": How much should you discount for Goodharting vs doing things for the right reasons? Or, relatedly, if you work on some relevant topic (say, Embedded Agency) without knowing that AI X-risk could be a thing, how much less useful will your work be? I am very uncertain about the size of this effect - maybe it is merely a 10% decrease in impact, but I wouldn't be too surprised if it decreased the amount of useful work by 98% either.

Personally, I view this as the main potentia... (read more)

Of course, my views on this issue are by no means set in stone and still evolving. I’m happy to elaborate on my reasons for preferring this more modest usage if you are interested.

I think the more modest usage is reasonable choice.

Maybe you had a different country in mind. [regarding top-secret security clearance]

I am Czech. We do have the institute, and use it. But, as far as I know, our president doesn't have it, and a bunch of other people don't have it. (I.e., it seems that people who need secret information on a daily basis have it. But you don't need it for many other positions from which you could put pressure on people who have the clearance.)

Some thoughts that occured to me while reading:

1) Research suggestion: From afar, malevolence-detection techniques seem like a better version of the already-existing tool of top-secret security clearance (or tests similar to it). I am not confident about this, but it already seems that if top-secret security clearance was a requirement for holding important posts, a lot of grief would be avoided (at least where I am from). Yet we generally do not use this tool. Why is this? I suspect that whatever the answer is, it will apply to malevolence-detection techn... (read more)

(Also, it might not be obvious from my nitpicking, but I really like the post, thanks for it :-).)

Thank you. :) No worries, I didn't think you were nitpicking. I agree with many of your points.

[...] if top-secret security clearance was a requirement for holding important posts, a lot of grief would be avoided (at least where I am from). Yet we generally do not use this tool. Why is this? I suspect that whatever the answer is, it will apply to malevolence-detection techniques as well.

One worry with security clearances is that they tend to mostly scre

... (read more)