All of Jim Buhler's Comments + Replies

Interesting, thanks for sharing your thoughts on the process and stuff! (And happy to see the post published!) :)

Interesting, makes sense! Thanks for the clarification and for your thoughts on this! :)

 If I want to prove that technological progress generally correlates with methods that involve more suffering, yes! Agreed.

But while the post suggests that this is a possibility, its main point is that suffering itself is not inefficient, such that there is no reason to expect progress and methods that involve less suffering to correlate by default (much weaker claim).

This makes me realize that the crux is perhaps this below part more than the claim we discuss above. 



While I tentatively think the the most efficient solutions to problems don

... (read more)

I do not mean to argue that the future will be net negative. (I even make this disclaimer twice in the post, aha.) :)

I simply argue that the convergence between efficiency and methods that involve less suffering argument in favor of assuming it'll be positive is unsupported.

There are many other arguments/considerations to take into account to assess the sign of the future.

2
Ben_West
8mo
Ah yeah sorry, what I said wasn't precise; I mean that is not enough to show that there exists one instance of suffering being instrumentally useful, you have to show that this is true in general. (Unless I misunderstood your post?)

Thanks!

Are you thinking about this primarily in terms of actions that autonomous advanced AI systems will take for the sake of optimisation?

Hum... not sure. I feel like my claims are very weak and true even in future worlds without autonomous advanced AIs.


"One large driver of humanity's moral circle expansion/moral improvement has been technological progress which has reduced resource competition and allowed groups to expand concern for others' suffering without undermining themselves".

Agreed but this is more similar to argument (A) fleshed out in this foo... (read more)

Thanks Vasco! Perhaps a nitpick but suffering still doesn't seem to be the limiting factor per se, here. If farmed animals were philosophical zombies (i.e., were not sentient but still had the exact same needs), that wouldn't change the fact that one needs to keep them in conditions that are ok enough to be able to make a profit out of them. The limiting factor is their physical needs, not their suffering itself. Do you agree?

I think the distinction is important because it suggests that suffering itself appears as a limiting factor only insofar as it is strong evidence of physical needs that are not met. And while both strongly correlate in the present, I argue that we should expect this to change.

3
Vasco Grilo
8mo
Thanks for clarifying! Yes, I agree.

Interesting, thanks Ben! I definitely agree that this is the crux. 

I'm sympathetic to the claim that "this algorithm would be less efficient than quicksort" and that this claim is generalizable.[1] However, if true, I think it only implies that suffering is -- by default -- inefficient as a motivation for an algorithm

Right after making my crux claim, I reference some of Tobias Baumann's (2022a, 2022b) work which gives some examples of how significant amounts of suffering may be instrumentally useful/required in cases such as scientific exp... (read more)

1
Ben_West
8mo
Yeah, I find some of Baumann's examples plausible, but in order for the future to be net negative we don't just need some examples, we need the majority of computation to be suffering.[1] I don't think Baumann is trying to argue for that in the linked pieces (or if they are, I don't find it terribly compelling); I would be interested in more research looking into this. 1. ^ Or maybe the vast majority to be suffering. See e.g. this comment from Paul Christiano about how altruists may have outsized impact in the future.
2
Vasco Grilo
8mo
I think it would be helpful if you provided some of those examples in the post.

Thanks, Maxime! This is indeed a relevant consideration I thought a tiny bit about, and Michael St. Jules also brought that up in a comment on my draft.

First of all, it is important to note that UCC affects the neglectedness -- and potentially also the probability -- of "late s-risks", only (i.e., those that happen far away enough from now for the UCC selection to actually have the time to occur). So let's consider only these late s-risks.

We might want to differentiate between three different cases:
1. Extreme UCC (where suffering is not just ignored b... (read more)

Thanks for the comment!

Right now, in rich countries, we seem to live in an unusual period Robin Hanson (2009) calls "the Dream Time". You can survive valuing pretty much whatever you want, which is why there isn't much selection pressure on values. This likely won't go on forever, especially if Humanity starts colonizing space.

(Re religion. This is anecdotical but since you brought up this example: in the past, I think religious people would have been much less successful at spreading their values if they were more concerned about the suffering of the peop... (read more)

Thanks Will! :)

I think I haven't really thought about this possibility.

I know nothing about how things like false vacuum decay work (thankfully, I guess), about how tractable it is, and about how the minds of the agents would work on trying to trigger that operate. And my immediate impression is that these things matter a lot to whether my responses to the first two "obvious objections" sort of apply here as well and to whether "decay-conducive values" might be competitive.

However, I think we can at least confidently say that -- at least in the intra-civ s... (read more)

Thanks for giving arguments pointing the other way! I'm not sure #1 is relevant to our context here, but #2 is definitely worth considering. In the second post of the present sequence, I argue that something like #2 probably doesn't pan out, and we discuss an interesting counter-argument in this comment thread.

Thanks Miranda! :) 

I personally think the strongest argument for reducing malevolence is its relevance for s-risks (see section Robustness: Highly beneficial even if we fail at alignment), since I believe s-risks are much more neglected than they should be.

And the strongest counter-considerations for me would be  

  • Uncertainty regarding the value of the future. I'm generally much more excited about making the future go better rather than "bigger" (reducing X-risk does the latter), so the more reducing malevolence does the latter more than the forme
... (read more)

Right so assuming no early value lock-in and the values of the AGI being (at least somewhat) controlled/influenced by its creators, I imagine these creators to have values that are grabby to varying extents, and these values are competing against one another in the big tournament that is cultural evolution.

For simplicity, say there are only two types of creators: the pure grabbers (who value grabbing (quasi-)intrinsically) and the safe grabbers (who are in favor of grabbing only if it is done in a "safe" way, whatever that means).

Since we're assuming there... (read more)

Thanks a lot for this comment! I linked to it in a footnote. I really like this breakdown of different types of relevant evolutionary dynamics. :)

2
Will Aldred
1y
:)

Thanks for the comment! :) You're assuming that the AGI's values will be pretty much locked-in forever once it is deployed such that the evolution of values will stop, right? Assuming this, I agree. But I can also imagine worlds where the AGI is made very corrigible (such that the overseers stay in control of the AGI's values) and where intra-civ value evolution continues/accelerates. I'd be curious if you see reasons to think these worlds are unlikely.

2
kokotajlod
1y
Not sure I'm assuming that. Maybe. The way I'd put it is, selection pressure towards grabby values seems to require lots of diverse agents competing over a lengthy period, with the more successful ones reproducing more / acquiring more influence / etc. Currently we have this with humans competing for influence over AGI development, but it's overall fairly weak pressure. What sorts of things are you imagining happening that would strengthen the pressure? Can you elaborate on the sort of scenario you have in mind?

If you had to remake this 3D sim of the expansion of grabby aliens based on your beliefs, what would look different, exactly? (Sorry, I know you already answer this indirectly throughout the post, at least partially.)

Do you have any reading to suggest on that topic? I'd  be curious to understand that position more :)

Insightful! Thanks for taking the time to write these.

failing to act in perfect accord with the moral truth does not mean you're not influenced by it at all. Humans fail your conditions 4-7 and yet are occasionally influenced by moral facts in ways that matter.

Agreed and I didn't mean to argue against that so thanks for clarifying! Note however that the more you expect the moral truth to be fragile/complex, the further from it you should expect agents' actions to be.

you expect intense selection within civilizations, such that their members behave so as to

... (read more)
1
yefreitor
1y
Sure, but that doesn't tell you much about what happens afterwards. If the initial colonists' values are locked in ~forever, we should probably expect value drift to be weak in general, which means frontier selection effects have a lot less variation to work with.  At the extreme lower limit with no drift at all, most agents within a mature civilization are about as expansionist as the most expansionist of the initial colonists - but no more so. And this might not be all that much in the grand scheme of things.  At the other end, where most of the space of possible values gets explored, maybe you do get a shockwave of superintelligent sociopaths racing outwards at relativistic speeds - but you also get a vast interior that favors (relatively speaking) long-term survival and material efficiency.

Very interesting, Wei! Thanks a lot for the comment and the links. 

TL;DR of my response: Your argument assumes that the first two conditions I list are met by default, which is I think a strong assumption (Part 1). Assuming that is the case, however, your point suggests there might be a selection effect favoring agents that act in accordance with the moral truth, which might be stronger than the selection effect I depict for values that are more expansion-conducive than the moral truth. This is something I haven't seriously considered and this made me... (read more)

3
yefreitor
1y
The orthogonality thesis could be (and I think almost certainly is) false with respect to some agent-generating processes (e.g., natural selection) and true with respect to others (e.g. Q-learning).

Unfortunately we are unable to sponsor visas, so applicants must be eligible to work in the US.

Isn't it possible to simply contract (rather than employ) those who have or can get an ESTA, such that there's no need for a visa?

3
Derik K
1y
We're definitely not experts on this. It looks like contracting is not allowed, but "independent research" is. So citizens of VWP countries or those obtaining a B1 visa should be eligible to participate, but we would be unable to compensate them beyond incidental expenses (travel, food, and housing). I've updated the website to reflect this.

As far as I know, there are no estimates (at least not public ones). But as Stan pointed out, Tobias Baumann has raised some very relevant considerations in different posts/podcasts.

Fwiw, researchers at the Center on Long-Term Risk think AGI conflict is the most concerning s-risk (see Clifton 2019), although it may be hard to comprehend all the details of why they think that if you just read their posts and don't talk to them.

Thanks Oscar!

predicting future (hopefully wiser and better-informed) values for moral antirealists

Any reason to believe moral realists would be less interested in this empirical work? You seem to assume the goal is to update our values based on those of future people. While this can be a motivation (this is among those of Danaher 2021), we might also worry -- independently from whether we are moral realists or antirealists -- that the expected future evolution of values doesn't point towards something wiser and better-informed (since that's not what evolut... (read more)

4
Oscar Delaney
1y
Ah OK, yes that seems right. I think the main context I have considered the values of future people previously is in trying to frontrun moral progress and get closer to the truth (if it exists) sooner than others, so that is where my mind most naturally went. But yes, if for instance, we were more in a Moloch style world where value was slowly disappearing in favour of ruthless efficiency then indeed that is good to know before it has happened so we can try to stop it.

Yeah so Danaher (2021) coined the term axiological futurism, but research on this topic has existed long before that. For instance, I find those two pieces particularly insightful:

They explore how compassionate values might be selected against because of evolutionary pressures, and be replaced by values more competitive for, e.g., space colonization races. In The Age of Em, Robin Hanson forecasts wh... (read more)

Very interesting post, thanks for writing this!

1. Simulations are not the most efficient way for A and B to reach their agreement. Rather, writing out arguments or formal proofs about each other is much more computationally efficient, because nested arguments naturally avoid stack overflows in a way that nested simulations do not.  In short, each of A and B can write out an argument about each other that self-validates without an infinite recursion.  There are several ways to do this, such as using Löb's Theorem-like constructions (as in this&nbs

... (read more)

I really like the section S-risk reduction is separate from alignment work! I've been surprised by the extent to which people dismiss s-risks on the pretext that "alignment work will solve them anyway" (which is both insufficient and untrue as you pointed out).

I guess some of the technical work to reduce s-risks (e.g., preventing the "accidental" emergence of conflict-seeking preferences) can be considered a very specific kind of AI intent alignment (that only a few cooperative AI people are working on afaik) where we want to avoid worst-case scenarios.&nb... (read more)

(emphasis is mine)

For something to constitute an “s-risk” under this definition, the suffering involved not only has to be astronomical in scope (e.g., “more suffering than has existed on Earth so far”),[5] but also significant compared to other sources of expected future suffering. This last bit ensures that “s-risks,” assuming sufficient tractability, are always a top priority for suffering-focused longtermists. 

Nitpick but you also need to assume sufficient likelihood, right? One might very well be a suffering-focused longtermist and thin... (read more)

2
Lukas_Gloor
1y
Good point, thanks! That's probably the best way to think of it, yeah. I think the definition isn't rigorous enough to withstand lots of scrutiny. Still, in my view, it serves as a useful "pointer." You could argue that the definition implicitly tracks probabilities because in order to assess whether some source of expected suffering constitutes an s-risk, we have to check how it matches up against all other sources of expected suffering in terms of "probability times magnitude and severity." But this just moves the issue to "what's our current expectation over future suffering." It's definitely reasonable for people to have widely different views on this, so it makes sense to have that discussion independent of the specific assumptions behind that s-risk definition.

Interesting! Thanks for writing this. Seems like a helpful summary of ideas related to s-risks from AI.

Another important normative reason for dedicating some attention to s-risks is that the future (conditional on humanity's survival) is underappreciatedly likely to be negative  -- or at least not very positive -- from whatever plausible moral perspective, e.g., classical utilitarianism (see DiGiovanni 2021; Anthis 2022).

While this does not speak in favor of prioritizing s-risks per se, it obviously speaks against prioritizing X-risks which seem to be... (read more)

"[U]nderappreciatedly likely to be negative [...] from whatever plausible moral perspective" could mean many things. I maybe agree with the spirit behind this claim, but I want to flag that, personally, I think it's <10% likely that, if the wisest minds of the EA community researched and discussed this question for a full year, they'd conclude that the future is net negative in expectation for symmetric or nearly-symmetric classical utilitarianism. At the same time, I expect the median future to not be great (partly because I already think the current w... (read more)

Insightful! Thanks for writing this.

> Perhaps it will be possible to design AGI systems with goals that are cleanly separated from the rest of their cognition (e.g. as an explicit utility function), such that learning new facts and heuristics doesn’t change the systems’ values.

In that case, value lock-in is the default (unless corrigibility/uncertainty is somehow part of what the AGI values), such that there's no need for the "stable institution" you keep mentioning, right?

> But the one example of general intelligence we have — humans — instead ... (read more)

2
Lukas Finnveden
1y
If AGI systems had goals that were cleanly separated from the rest of their cognition, such that they could learn and self-improve without risking any value drift (as long as the values-file wasn't modified), then there's a straightforward argument that you could stabilise and preserve that system's goals by just storing the values-file with enough redundancy and digital error correction. So this would make section 6 mostly irrelevant. But I think most other sections remain relevant, insofar as people weren't already convinced that being able to build stable AGI systems would enable world-wide lock-in. I was mostly imagining this scenario as I was writing, so when relevant, examples/terminology/arguments will be taylored for that, yeah.

Oh interesting! Ok so I guess there are two possibilities.

1) Either by “supperrationalists”, you mean something stronger than “agents taking acausal dependences into account in PD-like situations”, which I thought was roughly Caspar’s definition in his paper. And then, I'd be even more confused.

2) Or you really think that taking acausal dependences into account is, by itself, sufficient to create a significant correlation in two decision-algorithms. In that case, how do you explain that I would defect against you and exploit you in one-shot PD (very sorry,... (read more)

2
Dawn Drescher
2y
I think it’s closer to 2, and the clearer term to use is probably “superrational cooperator,” but I suppose that’s probably meant by “superrationalist”? Unclear. But “superrational cooperator” is clearer about (1) knowing about superrationality and (2) wanting to reap the gains from trade from superrationality. Condition 2 can be false because people use CDT or because they have very local or easily satisfied values and don’t care about distant or additional stuff. So just as in all the thought experiments where EDT gets richer than CDT, your own behavior is the only evidence you have about what others are likely to predict about you. The multiverse part probably smooths that out a bit, so your own behavior gives you evidence of increasing or decreasing gains from trade as the fraction of agents in the multiverse that you think cooperate with you increases or decreases. I think it would be “hard” to try to occupy that Goldilocks zone where you maximize the number of agents who wrongly believe that you’ll cooperate while you’re really defecting, because you’d have to simultaneously believe that you’re the sort of agent that cooperates despite actually defecting, which should give you evidence that you’re wrong about what reference class you’re likely to be put in. There may be agents like that out there, but even if that’s the case, they won’t have control over it. The way this will probably be factored in is that superrational cooperators will expect a slightly lower cooperation incidence to agents in reference classes of agents that are empirically very likely to cooperate while not being physically forced to cooperate because being in that reference class makes defection more profitable up to the point where it actually changes the assumptions others are likely to make about the reference class that have enabled the effect in the first place. That could mean that for any given reference class of agent who are able to defect, cooperation “densities” over 99% or s

Thanks for the reply! :)

By "copies", I meant "agents which action-correlate with you" (i.e., those which will cooperate if you cooperate), not "agents sharing your values". Sorry for the confusion.

Do you think all agents thinking superrationaly action-correlate?  This seems like a very strong claim to me. My impression is that the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me is a very small subset of all superrationalists .  As your post suggests, even your past-self doesn't fully action-corr... (read more)

2
Dawn Drescher
2y
  Yes, but by implication not assumption. (Also no, not perfectly at least, because we’ll all always have some empirical uncertainty.) Superrationalists want to compromise with each other (if they have the right aggregative-consequentialist mindset), so they try to infer what everyone else wants (in some immediate, pre-superrationality sense), calculate the compromise that follows from that, determine what actions that compromise implies for the context in which they find themselves (resources and whatnot), and then act accordingly. These final acts can be very different depending on their contexts, but the compromise goals from which they follow correlate to the extent to which they were able to correctly infer what everyone wants (including bargaining solutions etc.). Yes. Hmm, it’s been a couple years since I read the paper, so not sure how that is meant… But I suppose either the decision algorithm is similar (1) because it goes through the superrationality step, or the decision algorithm has to be a bit similar (2) in order for people to consider superrationality in the first place. You need to subscribe to non-causal DTs or maybe have indexical uncertainty of some sort. It might be something that religious people and EAs come up with but that seems weird to most other people. (I think Calvinists have these EDT leanings, so maybe they’d embrace superrationality too? No idea.) I think superrationality breaks down in many earth-bound cases because too many people here would consider it weird, like the whole CDT crowd probably, unless they are aware of their indexical uncertainty, but that’s also still considered a bit weird.

Caspar Oesterheld’s work on Evidential Cooperation in Large Worlds (ECL) shows that some fairly weak assumptions about the shape of the universe are enough to arrive at the conclusion that there is one optimal system of ethics: the compromise between all the preferences of all agents who cooperate with each other acausally. That would solve ethics for all practical purposes. It would therefore have enormous effects on a wide variety of fields because of how foundational ethics is.

ECL recommends that agents maximize a compromise utility function averaging t... (read more)

4
Dawn Drescher
2y
Thanks for the comment! I think that’s a misunderstanding because trading with copies of oneself wouldn’t do anything since you already want the same thing. The compromise between you would be the same as what you want individually. But with ECL you instead employ the concept of “superrationality,” which Douglas Hofstadter, Gary Drescher, and others have already looked into in isolation. You have now learned of superrationality, and others out there have perhaps also figured it out (or will in the future). Superrationality is now the thing that you have in common and that allows you to coordinate our decisions without communicating.  That coordination relies a lot on Schelling points, on extrapolation from the things that we see around us, from general considerations when it comes to what sorts of agents will consider superrationality to be worth their while (some brands of consequentialists surely), etc. I’ve mentioned some real-world examples of ECL for coordinating within and between communities like EA in this article.

For EA group retreats, is it better to apply for the CEA event support you introduced, or for  CEA's support group funding?

1
OllieBase
2y
I think CEA's support group funding for group retreats, just so your group's support is all with the groups team and not spread across CEA. If you're a city / national group supported by CEA, you should contact my colleague Rob Gledhill directly instead :)

I haven't received anything on my side. I think a confirmation by email would be nice, yes. Otherwise, I'll send the application a second time just in case.

Thanks for writing this Jamie!

Concerning the "SHOULD WE FOCUS ON MORAL CIRCLE EXPANSION?"  question, I think something like the following sub-question is also relevant: Will MCE lead to a "near miss" of the values we want to spread? 

Magnus Vinding (2018) argues that someone who cares about a given sentient being, is absolutely not guaranteed to wish what we think is the best for this sentient being. While he argues from a suffering-focused perspective, the problem is still the same from any ethical framework. 
For instance, future people who ... (read more)

I  completely agree with 3 and it's indeed worth clarifying. Even ignoring this, the possibility of humans being more compassionate than pro-life grabby aliens might actually be an argument against human-driven space colonization, since compassion -- especially when combined with scope sensitivity -- might increase agential s-risks related to potential catastrophic cooperation failure between AIs (see e.g., Baumann and Harris 2021, 46:24), which are the most worrying s-risks according to Jesse Clifton's preface of CLR's agenda. A space filled with lif... (read more)

2
Anthony DiGiovanni
3y
Sorry, I wrote that point lazily because that whole list was supposed to be rather speculative. It should be "Singletons about non-life-maximizing values could also be convergent." I think that if some technologically advanced species doesn't go extinct, the same sorts of forces that allow some human institutions to persist for millennia (religions are the best example, I guess) combined with goal-preserving AIs would make the emergence of a singleton fairly likely - not very confident in this, though, and I think #2 is the weakest argument. Bostrom's "The Future of Human Evolution" touches on similar points.

Interesting! Thank you for  writing this up. :) 

It does seem plausible that, by evolutionary forces, biological nonhumans would care about the proliferation of sentient life about as much as humans do, with all the risks of great suffering that entails.

What about the grabby aliens, more specifically? Do they not, in expectation, care about proliferation (even) more than humans do?

All else being equal, it seems -- at least to me -- that civilizations with very strong pro-life values (i.e., that thinks that perpetuating life is good and necessary, ... (read more)

4
Anthony DiGiovanni
3y
That sounds reasonable to me, and I'm also surprised I haven't seen that argument elsewhere. The most plausible counterarguments off the top of my head are: 1) Maybe evolution just can't produce beings with that strong of a proximal objective of life-maximization, so the emergence of values that aren't proximally about life-maximization (as with humans) is convergent. 2) Singletons about non-life-maximizing values are also convergent, perhaps because intelligence produces optimization power so it's easier for such values to gain sway even though they aren't life-maximizing. 3) Even if your conclusion is correct, this might not speak in favor of human space colonization anyway for the reason Michael St. Jules mentions in another comment, that more suffering would result from fighting those aliens.

Thank you for writing this.

  • According to a survey of quantitative predictions, disappointing futures appear roughly as likely as existential catastrophes. [More]

It looks like that Bostrom and Ord included risks of disappointing futures in their estimates on x-risks, which might  make this conclusion a bit skewed, don't you think?

Michael's definition of risks of disappointing futures doesn't include s-risks though, right? 

a disappointing future is when humans do not go extinct and civilization does not collapse or fall into a dystopia, but civilization[1] nonetheless never realizes its potential.

I guess we get something like "risks of negative (or nearly negative) future" adding up the two types.

1
Kaj_Sotala
3y
Depends on exactly which definition of s-risks you're using; one of the milder definitions is just "a future in which a lot of suffering exists", such as humanity settling most of the galaxy but each of those worlds having about as much suffering as the Earth has today. Which is arguably not a dystopian outcome or necessarily terrible in terms of how much suffering there is relative to happiness, but still an outcome in which there is an astronomically large absolute amount of suffering.

Great piece, thanks !

Since you devoted a subsection to moral circle expansion as a way of reducing s-risks, I guess you consider that its beneficial effects outweigh the backfire risks you mention (at least if MCE is done "in the right way"). CRS' 2020 End-of-Year Fundraiser post also induces optimism regarding the impact of increasing moral consideration for artificial minds (the only remaining doubts seem to be about when and how to do it).

I wonder how confident we should be about this (the positiveness of MCE in reducing s-risks), at this point? Have yo... (read more)

5
Tobias_Baumann
3y
Thanks for the comment, this is raising a very important point.  I am indeed fairly optimistic that thoughtful forms of MCE are positive regarding s-risks, although this qualifier of "in the right way" should be taken very seriously - I'm much less sure whether, say, funding PETA is positive. I also prefer to think in terms of how MCE could be made robustly positive, and distinguishing between different possible forms of it, rather than trying to make a generalised statement for or against MCE. This is, however, not a very strongly held view (despite having thought a lot about it), in light of great uncertainty and also some degree of peer disagreement (other researchers being less sanguine about MCE). 

Thanks for writing this! :)

Another potential outcome that comes to mind regarding such projects is a self-fulfilling prophecy effect (provided the predictions are not secret).  I have no idea how much of an (positive/negative) impact it would have though. 

5
David_Althaus
3y
Thanks. :) That's true though this is also an issue for other forecasting platforms—perhaps even more so for prediction markets where you could potentially earn millions by making your prediction come true. From what I can tell, this doesn't seem to be a problem for other forecasting platforms, probably because most forecasted events are very difficult to affect by small groups of individuals. One exception that comes to mind is match fixing. However, our proposal might be more vulnerable to this problem because there will (ideally) be many more forecasted events, so some of them might be easier to affect by a few individuals wishing to make their forecasts come true.