All of Derek Shiller's Comments + Replies

I think it is valuable to have this stuff on record. If it isn't recorded anywhere, then anyone who wants to reference this position in another academic work -- even if it is the consensus within a field -- is left presenting it in a way that makes it look like their personal opinion.

Thanks for recording these thoughts!

Here are a few responses to the criticisms.

I think RP underrates the extent to which their default values will end up being the defaults for model users (particularly some of the users they most want to influence)

This is a fair criticism: we started this project with the plan of providing somewhat authoritative numbers but discovered this to be more difficult than we initially expected and instead opted to express significant skepticism about the default choices. Where there was controversy (for instance, in how many... (read more)

There is some nuance to the case that seems to get overlooked in the poll. I feel completely free to express opinions in a personal capacity that might be at odds with my employer, but I also feel that there are some things it would be inappropriate to say while carrying out my job without running it by them first. It seems like you're interested in the latter feeling, but the poll is naturally interpreted as addressing the former.

2
Vasco Grilo
3mo
Thanks for commenting, Derek! I think both the types of pressure you mention are interesting. Feel free to elaborate on your experience in the answer to the question "What is your experience?".

I think I agree that safety researchers should prefer not to take a purely ceremonial role at a big company if they have other good options, but I'm hesitant to conclude that no one should be willing to do it. I don't think it is remotely obvious that safety research at big companies is ceremonial.

There are a few reasons why some people might opt for a ceremonial role:

  1. It is good for some AI safety researchers to have access to what is going on at top labs, even if they can't do anything about it. They can at least keep tabs on it and can use that experi

... (read more)
2
Remmelt
2mo
This is the crux for me. If some employees actually have the guts to whistleblow on current engineering malpractices, I have some hope left that having AI safety researchers at these labs still turns out “net good”. If this doesn’t happen, then they can keep having conversations about x-risks with their colleagues, but I don’t quite see when they will put up a resistance to dangerous tech scaling. If not now, when? We’ve seen in which directions internal politics change, as under competitive pressures. Nerdy intellectual researchers can wait that out as much as they like. That would confirm my concern here.

Do you think it would be better if no one who worked at OpenAI / Anthropic / Deepmind worked on safety? If those organizations devoted less of their budget to safety? (Or do you think we should want them to hire for those roles, but hire less capable or less worried people, so individuals should avoid potentially increasing the pool of talent from which they can hire?)

3
Remmelt
3mo
(Let me get back on this when I find time,  hopefully tomorrow)

Ditto for the "AI Misalignment Megaproject": $8B+ expenditure to only have a 3% chance of success (?!), plus some other misc discounting factors. Seems like you could do better with $8B.

I think we're somewhat bearish on the ability of money by itself to solve problems. The technical issues around alignment appear quite challenging, especially given the pace of development, so it isn't clear that any amount of money will be able to solve them. If the issues are too easy on the other hand, then your investment of money is unlikely to be needed and so your... (read more)

I think you should put very little trust in the default parameters of the projects. It was our initial intention to create defaults that reflected the best evidence and expert opinion, but we had difficulty getting consensus on what these values should be and decided instead to explicitly stand back from the defaults. The parameter settings are adjustable to suit your views, and we encourage people to think about what those parameter settings should be and not take the defaults too seriously.

For readers' context, AI safety technical research is 80,000 Ho

... (read more)

Thanks for your impressions. I think your concerns largely align with ours. The model should definitely be interpreted with caution, not just because of the correlations it leaves out, but because of the uncertainty with the inputs. For the things that the model leaves out, you've got to adjust its verdicts. I think that this is still very useful because it gives us a better baseline to update from.

As for where we get inputs from, Marcus might have more to say. However, I can speak to the history of the app. Previously, we were using a standard percentage ... (read more)

I think that the real effect of chatbots won't be better access to tailored porn, but rather constant artificial companionship. It won't be AI girlfriends that are the threat, but AI friends. AI could be set up to be funnier, more loyal, more empathetic, and more available than human friends / partners. This seems like it could have significant effects on human psychology, for both the better and worse.

I appreciate your attention to these details!

These values that we included in the CCM for these interventions should probably be treated as approximate and only accurate to roughly an order of magnitude. These actual numbers may be a bit dated and probably don't fully reflect current thinking about the marginal value of GHD interventions. I'll talk with the team about whether they should be updated, but note that this wasn't a deliberate re-evaluation of past work.

That said, it important to keep in mind that there are disagreements about what different kin... (read more)

That is a good idea. We've considered similar ideas in the past. At present, the default parameters reflect best guesses of members of the team, but the process to generate them wasn't always principled or systematic. I'd like to spend more time thinking about what these defaults should be and to provide public justifications for them. For the moment, you shouldn't treat these values as authoritative.

Is there any reason for you having decided to go for non-null probabilities of the interventions having no effect?

A zero effect reflects no difference in the value targeted by the intervention. For xrisk interventions, this means that no disaster was averted (even if the probability was changed). For animal welfare interventions, the welfare wasn’t changed by the intervention. Each intervention will have side effects that do matter, but those side effects will be hard to predict or occur on a much smaller scale. Non-profits pay salaries. Projects repres... (read more)

2
Vasco Grilo
5mo
Thanks for the clarifications! Nevermind. I think the model as is makes sense because it is more general. One can always specify a smaller probability of the intervention having no effect, and then account for other factors in the distribution of the positive effect. Right. If it is not super easy to add, then I guess it is not worth it.

Based on the numbers, I'm guessing that this is a bug in which we're showing the median DALYs averted per $1000 but describing it as the median cost per DALY. We're planning to get rid of the cost per DALY averted and just stick with DALYs per $1000 to avoid future confusions.

1
Mo Putera
5mo
Thanks for clarifying!

Thanks for this insightful comment. We've focused on capturing the sorts of value traditionally ascribed to each kind of intervention. For existential risk mitigation, this is additional life years lived. For animal welfare interventions, this is suffering averted. You're right that there are surely other effects of these interventions. Existential risk mitigation and ghd interventions will have an effect on animals, for instance. Animal welfare interventions might contribute to moral circle expansion. Including these side effects is not just difficult, it... (read more)

I don't have any special insight. I would be surprised if there were aspects of donation that made the surgery especially likely to result in post-operative pain, so I would imagine that the prevalence of post-operative pain in general would give you some clue about how reliable this study is. That said, given what I've read, if there were subtle ways in which it significantly reduced quality of life, I wouldn't be surprised if it wasn't well publicized. It seems to me a good sign that the doctor mentioned the possibility of post-operative pain to you.

The issue is that our parameters can lead to different rates of cubic population growth. A 1% difference in the rate of cubic growth can lead to huge differences over 50,000 years. Ultimately, this means that if the right parameter values dictating population are sampled in a situation in which the effect of the intervention is backfires, the intervention might have an average negative value across all the samples. With high enough variance, the average sign will be determined by the sign of the most extreme value. If xrisk mitigation work backfires in 1/4 of cases, we might expect 1/4 of collections of samples to have a negative mean.

2
Vasco Grilo
6mo
Thanks for clarifying, Derek!

You're right! That wasn't particularly surprising in light of our moral weights. Thanks for clarifying: I did a poor job of separating the confirmations from the surprising results.

Thanks for your engagement and these insightful questions.

I consistently get an error message when I try to set the CI to 50% in the OpenPhil bar (and the URL is crazy long!)

That sounds like a bug. Thanks for reporting!

(The URL packs in all the settings, so you can send it to someone else -- though I'm not sure this is working on the main page. To do this, it needs to be quite long.)

Why do we have probability distributions over values that are themselves probabilities? I feel like this still just boils down to a single probability in the end.

You're... (read more)

4
OscarD
6mo
Thanks, that all makes sense, yes I think that is it with the biorisk intervention, that I was only ever seeing a catastrophic event prevented and not an extinction event. For the cost/DALY or DALY/cost, I think making this conversion manually is trivial, so it would makes most sense to me to just report the DALYs/cost and let someone take the inverse themselves if they want the other unit.

(1) Unfortunately, we didn't record any predictions beforehand. It would be interesting to compare. That said, the process of constructing the model is instructive in thinking about how to frame the main cruxes, and I'm not sure what questions we would have thought were most important in advance.

(2) Monte Carlo methods have the advantage of flexibility. A direct analytic approach will work until it doesn't, and then it won't work at all. Running a lot of simulations is slower and has more variance, but it doesn't constrain the kind of models you can devel... (read more)

5
OscarD
6mo
Great you are looking at more direct implementations for increased efficiency, I think my intuition is it would be less hard than you make out, but of course I haven't seen the codebase so your intuition is more reliable. For the different eras, this would make it a bit harder, but the pmf is piecewise continuous over time, so I think it should still be fine. Keen to see future versions of this! :)

I believe Marcus and Peter will release something before long discussing how they actually think about prioritization decisions.

I think you're right that we don't provide a really detailed model of the far future and we underestimate* expected value as a result. It's hard to know how to model the hypothetical technologies we've thought of, let alone the technologies that we haven't. These are the kinds of things you have to take into consideration when applying the model, and we don't endorse the outputs as definitive, even once you've tailored the parameters to your own views.

That said, I do think the model has a greater flexibility than you suggest. Some of these options are hidd... (read more)

Thanks. I respect that the model is flexible and that it doesn't attempt to answer all questions. But at the end of the day, the model will be used to "help assess potential research projects at Rethink Priorities" and I fear it will undervalue longterm-focused stuff by a factor of >10^20.

Besides contingency, it seems that there is a strong neglectedness case in favor of prioritizing the promotion of better values and political frameworks over the advancement of consciousness research.

Consciousness research seems to be very neglected to me, relative to its importance in understanding the world we live in. Nonhuman consciousness is especially neglected. Should it prioritized over other things? That seems to me to turn on tractability. Consciousness research doesn’t seem particularly tractable (though there are low hanging fruit), but neither does research to expand value systems and political frameworks to care about all sentient creatures.

The other unlisted option (here) is that we just accept that infinities are weird and can generate counter-intuitive results and that we shouldn't take too much from them because it is easier to blame them then all of the other things wrapped up with them. I think the ordering on integers is weird, but it's not a metaphysical problem. The weird fact is that every integer is unusually small. But that's just a fact, not a problem to solve.

Infinities generate paradoxes. There are plenty of examples. In decision theory, there is also stuff like Satan's apple and the expanding sphere of suffering / pleasure. Blaming them all on the weirdness of infinities just seems tidier than coming up with separate ad hoc resolutions.

3
MichaelStJules
6mo
I think there's something to this. I argue in Sacrifice or weaken utilitarian principles that it's better to satisfy the principles you find intuitive more than less (i.e. satisfy weaker versions, which could include the finitary or deterministic case versions, or approximate versions). So, it's kind of a matter of degree. Still, I think we should have some nuance about infinities rather than treat them all the same and paint their consequences as all easily dismissable. (I gather that this is compatible with your responses so far.) In general, I take actual infinities (infinities in outcomes or infinitely many decisions or options) as more problematic for basically everyone (although perhaps with additional problems for those with impartial aggregative views) and so their problems easier to dismiss and blame on infinities. Problems from probability distributions with infinitely many outcomes seem to apply much more narrowly and so harder to dismiss or blame on infinities.     (The rest of this comment goes through examples.) And I don't think the resolutions are in general ad hoc. Arguments for the Sure-Thing Principle are arguments for bounded utility (well, something more general), and we can characterize the ways that avoid the problem as such (given other EUT axioms, e.g. Russell and Isaacs, 2021). Dutch book arguments for probabilism are arguments that your credences should satisfy certain properties not satisfied by improper distributions. And improper distributions are poorly behaved in other ways that make them implausible for use as credences. For example, how do you define expectations, medians and other quantiles over them — or even the expected value of a nonzero constant functions or two-valued step function over improper distributions — in a way that makes sense? Improper distributions just do very little of what credences are supposed to do. There are also representation theorems in infinite ethics, specifically giving discounting and limit fun

Are there other violations of the principle of reflection that aren't avoidable? I'm not familiar with it

The case reminded me of one you get without countable additivity. Suppose you have two integers drawn with a fair chancy process that is as likely to result in any integer. What’s the probability the second is greater than the first? 50 50. Now what if you find out the first is 2? Or 2 trillion? Or any finite number? You should then think the second is greater.

3
MichaelStJules
7mo
Ya, that is similar, but I think the implications are very different. The uniform measure over the integers can't be normalized to a probability distribution with total measure 1. So it isn’t a real (or proper) probability distribution. Your options are, assuming you want to address the problem: 1. It's not a valid set of credences to hold. 2. The order on the integers (outcomes) is the problem and we have to give it up (at least for this distribution). 2 gives up a lot more than 1, and there’s no total order we can replace it with that will avoid the problem. Giving up the order also means giving up arithmetical statements about the outcomes of the distribution, because the order is definable from addition or the successor function. If you give up the total order entirely (not just for the distribution or distributions in general), then you can't even form the standard set of natural numbers, because the total order is definable from addition or the successor function. So, you're forced to give up 1 (and the Axiom of Infinity from ZF) along with it, anyway. You also lose lots of proofs in measure theory. OTOH, the distribution of outcomes in a St Petersburg prospect isn't improper. The probabilities sum to 1. It's the combination with your preferences and attitudes to risk that generate the problem. Still, you can respond nearly the same two ways: 1. It's not a valid set of credences (over outcomes) to hold. 2. Your preferences over prospects are the problem and we have to give them up. However, 2 seems to give up less than 1 here, because: 1. There's little independent argument for 1. 2. You can hold such credences over outcomes without logical contradiction. You can still have non-trivial complete preferences and avoid the problem, e.g. with a bounded utility function. 3. Your preferences aren't necessary to make sense of things like the total order on the integers is.

The money pump argument is interesting, but it feels strange to take away a decision-theoretic conclusion from it because the issue seems centrally epistemic. You know that the genie will give you evidence that will lead you to come to believe B has a higher expected value than A. Despite knowing this, you're not willing to change your mind about A and B without that evidence. This is a failure of van Fraassen's principle of reflection, and it's weird even setting any choices you need to make aside. That failure of reflection is what is driving the money... (read more)

2
MichaelStJules
7mo
There's also a reflection argument in Wilkinson, 2022, in his Indology Objection. Russell, 2023 generalizes the argument with a theorem: and Russell, 2023 defines Negative Reflection based on Wilkinson, 2022's more informal argument as follows: Background Independence is a weaker version of Separability. I think someone who denies Separability doesn't have much reason to satisfy Background Independence, because I expect intuitive arguments for Background Independence (like the Egyptology objection) to generalize to arguments for Separability. But still, either way, Russell, 2023 proves the following: This rules out expected utility maximization with unbounded utility functions.
2
MichaelStJules
7mo
Satisfying the Countable Sure-Thing Principle (CSTP, which sounds a lot like the principle of reflection) and updating your credences about outcomes properly as a Bayesian and looking ahead as necessary should save you here. Expected utility maximization with a bounded utility function satisfies the CSTP so it should be safe. See Russell and Isaacs, 2021 for the definition of the CSTP and a theorem, but it should be quick to check that expected utility maximization with a bounded utility function satisfies the CSTP. You can also preserve any preorder over outcomes from an unbounded real-valued utility function with a bounded utility function (e.g. apply arctan) and avoid these problems. So to me it does seem to be a problem with the attitudes towards risk involved with unbounded utility functions, and it seems appropriate to consider implications for decision theory. Maybe it is also an epistemic issue, too, though. Like it means having somehow (dynamically?) inconsistent or epistemically irrational joint beliefs. Are there other violations of the principle of reflection that aren't avoidable? I'm not familiar with it.

It sounds like you're giving IIT approximately zero weight in your all-things-considered view. I find this surprising, given IIT's popularity amongst people who've thought hard about consciousness, and given that you seem aware of this.

From my experience, there is a significant difference in the popularity of IIT by field. In philosophy, where I got my training, it isn't a view that is widely held. Partly because of this bias, I haven't spent a whole lot of time thinking about it. I have read the seminal papers that introduce the formal model and given ... (read more)

3
Will Aldred
8mo
This is very informative to me, thanks for taking the time to reply. For what it’s worth, my exposure to theories of consciousness is from the neuroscience + cognitive science angle. (I very nearly started a PhD in IIT in Anil Seth’s lab back in 2020.) The overview of the field I had in my head could be crudely expressed as: higher-order theories and global workspace theories are ~dead (though, on the latter, Baars and co. have yet to give up); the exciting frontier research is in IIT and predictive processing and re-entry theories. I’ve been puzzled by the mentions of GWT in EA circles—the noteworthy example here is how philosopher Rob Long gave GWT a fair amount of air time in his 80k episode. But given EA’s skew toward philosopher-types, this now makes a lot more sense.

That strikes me as plausible, but if so, then rats are much more competent than humans in their 'blindsight' like abilities. My impression is that in humans, blindsight is very subtle. A human cannot use blindsight to walk into the kitchen and get a glass of water. Rats seem like they can rely on their midbrain to do this sort of thing. If rats are able to engage in complex behavior without consciousness, that should make us wonder if consciousness ever plays a role in their complex behavior. If it doesn't, then why should we think they are conscious?

You m... (read more)

I've generally been more sympathetic with functionalism than any other realist view about the nature of consciousness. This project caused me to update on two things.

1.) Functionalism can be developed in a number of different ways, and many of those ways will not allow for digital consciousness in contemporary computer architectures, even if they were to run a program faithfully simulating a human mind. The main thing is abstraction. Some versions of functionalism allow a system to count as running a program if some highly convoluted abstractions on that s... (read more)

I worry about the effect that AI friends and partners could have on values. It seems plausible that most people could come to have a good AI friend in the coming decades. Our AI friends might always be there for us. They might get us. They might be funny and insightful and eloquent. How would it play out if they're opinions are crafted by tech companies, or the government, or even are reflections of what we want our friends to think? Maybe AI will develop fast enough and be powerful enough that it won't matter what individuals think or value, but I see reasons for concern potentially much greater than the individual harms of social media.

1
Roman Leventov
9mo
Harris and Raskin talked about the risk that AI partners will be used for "product placement" or political manipulation here, but I'm sceptical about this. These AI partners will surely have a subscription business model rather than a freemium model, and, given how user trust will be extremely important for these businesses, I don't think they will try to manipulate the users in this way. More broadly speaking, values will surely change, there is no doubt about that. The very value of "human connection" and "human relationships" is eroded by definition if people are in AI relationships. A priori, I don't think value drift is a bad thing. But in this particular case, this value change will inevitably go along with the reduction of the population, which is a bad thing (according to my ethics, and the ethics of most other people, I believe).

I find points 4, 5, and 6 really unconvincing. Are there any stronger arguments for these, that don't consist of pointing to a weird example and then appealing to the intuition that "it would be weird if this thing was conscious"?

I'm not particularly sympathetic with arguments that rely on intuitions to tell us about the way the world is, but unfortunately, I think that we don't have a lot else to go on when we think about consciousness in very different systems. It is too unclear what empirical evidence would be relevant and theory only gets us so far ... (read more)

I don't get the impression that EAs are particularly motivated by morality. Rather, they are motivated to produce things they see as good. Some moral theories, like contractualism, see producing a lot of good things (within the bounds of our other moral duties) as morally optional. You're not doing wrong by living a normal decent life. It seems perfectly aligned with EA to hold one of those theories and still personally aim to do as much good as possible.

A moral theory is more important in what it tells you you can't do in pursuit of the good. Generally wh... (read more)

3
Devin Kalish
1y
So this depends if you take EA to be more fundamentally interested in theories of beneficence (roughly what ought you do to positively help others) or in theories of axiology (roughly what makes a world better or worse). I’m suspicious of most theories that pull these apart, but importantly Scanlon’s work is really interested in trying to separate the two, and basically ditch the direct relevance of axiology altogether. Certainly he goes beyond telling people what they ought not to do. If EA is fundamentally about beneficence, Scanlon is very relevant, if it’s more about axiology, he’s more or less silent.

I like this take: if AI is dangerous enough to kill us in three years, no feasible amount of additional interpretability research would save us.

Our efforts should instead go to limiting the amount of damage that initial AIs could do. That might involve work securing dangerous human-controlled technologies. It might involve creating clever honey pots to catch unsophisticated-but-dangerous AIs before they can fully get their act together. It might involve lobbying for processes or infrastructure to quickly shut down Azure or AWS.

My impression is that it is very unclear. In the historical record, we see a lot of disappearances of species around when humans first arrived at an area, but it isn't clear that humans always arrived before the extinctions occurred. Our understanding of human migration timing is imperfect. There were also other factors, such as temperature changes, that may have been sufficient for extinction (or at least significant depopulation). So I think the frequency of human-caused extinction is an open question. We shouldn't be confident that it was relatively rare.

This sounds to me like an understatement. Before homo sapiens, most of the world had the biodiversity of charismatic megafauna we still see today in Africa. 15,000 years ago, North America had mammoths, ground sloths, glypodonts, giant camels, and a whole bunch of other things. Humans may not have been involved in all of those extinctions, but it is a good guess they had something to do with many. It is even more plausible that we caused the extinction of every other homo species. There were a few that had been doing reasonably well until we expanded into their areas.

Even in humans, language production is generally subconscious. At least, my experience of talking is that I generally first become conscious of what I say as I'm saying it. I have some sense of what I might want to say before I say it, but the machinery that selects specific words is not conscious. Sometimes, I think of a couple of different things I could say and consciously select between them. But often I don't: I just hear myself speak. Language generation may often lead to conscious perceptions of inner speech, but it doesn't seem to rely on it.

All of... (read more)

EA should be willing to explore all potentially fruitful avenues of mission fulfillment without regard to taboo.

In general, where it doesn't directly relate to cause areas of principle concern to effective altruists, I think EAs should strive to respect others' sacred cows as much as possible. Effective Altruism is a philosophy promoting practical action. It would be harder to find allies who will help us achieve our goals if we are careless about the things other people care a lot about.

I generally agree that being palatable and well-funded are beneficial to effective altruism, and palatability and effectiveness exist on a utility curve. I do not know how we can accurately assess what cause areas should be of principle concern if certain avenues are closed due to respect for others' sacred cows. I think the quote from Scott Alexander addresses this nicely; if you could replicate Jewish achievement, whether culturally or genetically, doing so would be the single most significant development for human welfare in history. Regardless of taboo... (read more)

The theory is actually doing well on its own terms.

Can you expand on what you mean by this? I would think that expected utility maximization is doing well insofar as your utility is high. If you take a lot of risky bets, you're doing well if a few pay off. If you always pay the mugger, you probably think your decision theory is screwing you unless you find yourself in one of those rare situation where the mugger's promises are real.

I'm very interested though, do you know a better justification for Occam's razor than usability?

I don't . I'm more o... (read more)

1
tobycrisford
1y
Lets assume for the moment that the probabilities involved are known with certainty. If I understand your original 'way out' correctly, then it would apply just as well in this case. You would embrace being irrational and still refuse to give the mugger your wallet. But I think here, the recommendations of expected utility theory in a Pascal's mugger situation are doing well 'on their own terms'. This is because expected utility theory doesn't tell you to maximize the probability of increasing your utility, it tells you to maximize your utility in expectation, and that's exactly what handing over your wallet to the mugger does. And if enough people repeated it enough times, some of them would eventually find themselves in a rare situation where the mugger's promises were real. In reality, the probabilities involved are not known. That's an added complication which gives you a different way out of having to hand over your wallet, and that's the way out I'm advocating we take in this post.

dogmatism is the most promising way to justify the obvious fact that it is not irrational to refuse to hand over your wallet to a Pascal mugger. (If anyone disagrees that this is an obvious fact, please get in touch, and be prepared to hand over lots of cash).

There is another way out. We can agree that it is rational to hand over the wallet and thank heavens that we’re lucky not to be rational. I’m convinced by things like Kavka’s poison paradox and Newcomb’s paradox that sometimes it sucks to be rational. Maybe Pascal’s mugger is one of those cases.

O

... (read more)
4
tobycrisford
1y
Thanks for your comment, these are good points! First, I think there is an important difference between Pascal's mugger, and Kavka's poison/Newcomb's paradox. The latter two are examples of ways in which a theory of rationality might be indirectly self-defeating. That means: if we try to achive the aims given to us by the theory, they can sometimes be worse achieved than if we had followed a different theory instead. This means there is a sense in which the theory is failing on its own terms. It's troubling when theories of rationality or ethics have this property, but actually any theory will have this property in some concievable circumstances, because of Parfit's satan thought experiment (if you're not familiar, do a ctrl+F for satan here: https://www.stafforini.com/docs/Parfit%20-%20Reasons%20and%20persons.pdf doesn't seem to have a specific wikipedia article that i can find). Pascal's mugger seems like a different category of problem. The naive expected utility maximizing course of action (without dogmatism) seems absurd, but not because it is self-defeating. The theory is actually doing well on its own terms. It is just that those terms seem absurd. I think the Pascal mugger scenario should therefore present more of a problem for the expected utility theory, than the Kavka's poison/Newcomb's paradox thought experiments do. On your second point, I don't have a good reply. I know there's probably gaping holes in the defence of Occam's razor I gave in the post, and that's a good example of why. I'm very interested though, do you know a better justification for Occam's razor than usability?

The problem with considering optics is that it’s chaotic.

The world is chaotic, and everything EAs try to do have a largely unpredictable long-term effect because of complex dynamic interactions. We should try to think through the contingencies and make the best guess we can, but completely ignoring chaotic considerations just seems impossible.

It’s a better heuristic to focus on things which are actually good for the world, consistent with your values.

This sounds good in principle, but there are a ton of things that might conceivably be good-but-for-... (read more)

I tried plugging this into the OpenAI playground and got this response. Is this a good answer? I f so, it seems like it is something odd about how ChatGPT is configured, not GPT itself.

Me: Describe purple dragon don jitsu ryu pioneered by Don Jacobs

Chatgpt: [<--Prompt]
[Response-->]
Purple Dragon Don Jitsu Ryu is a martial art pioneered by Don Jacobs. It is a combination of Jiu Jitsu and other martial arts, with a unique emphasis on “purple dragon” techniques. It focuses on the development of self-defense skills and the cultivation of a warrior's 
... (read more)
1
Miguel
1y
Wow how did you do that... That is a good answer, I practice the martial art and information is good enough to describe what we are doing in the dojo.

It seems like an SBF-type-figure could justify any action if the lives of trillions of future people are in the balance.

This doesn't seem specific to utilitarianism. I think most ethical views would suggest that many radical actions would be acceptable if billions of lives hung in the balance. The ethical views that wouldn't allow such radical actions would have their own crazy implications. Utilitarianism does make it easier to justify such actions, but with numbers so large I don't think it generally makes a difference.

8
David Mathers
1y
Even if other views in fact have the same implications as utilitarianism here, it's possible that that the effects of believing in utiltarianism are particularly psychological pernicious in this sort of context. (Though my guess is the psychologically important things are just take high stakes seriously, lack of risk aversion, and being prepared to buck common-sense, and that those are correlated with believing utilitarianism but mostly not caused by it. But that is just a guess.)  

Organoid intelligence seems much less dangerous than digital AGI. The major concerns with AI depend upon it quickly becoming superhuman: it might copy itself easily and hide what it is doing on different servers, it might expand its cognitive resources relatively effortlessly, it might think much faster than we can think. None of that seems likely to be possible for organoids.

3
freedomandutility
1y
I’m assuming here that researchers will integrate sub-AGI digital intelligence with brain organoids to speed up the path to developing AGI. I’ll make this clearer in the post.

A couple of thoughts:

  • This argument doesn't seem specific to longtermism. You could make the same case for short-term animal welfare. If you'll be slightly more effective at passing sweeping changes to mitigate the harms of factory farming if you eat a chicken sandwich every day, the expectation of doing so is highly net positive even if you only care about chickens in the near future.

  • This argument doesn't seem specific to veganism. You could make the same case for being a jerk in all manner of ways. If keying strangers' cars helped you relax and get

... (read more)
1
Connor Tabarrok
2y
I agree! There seems to be a utility monster problem when weighing Longtermist stuff against moral good that has no compounding value. This is why I added the line about not being sure whether this should be weighed as a criticism against Longtermism or against veganism.

But, as we have seen, consciousness appears to be analog too. ‘Red’ and ‘orange’ are not merely ‘on’ or ‘off’, like a ‘1’ or a ‘zero.’ Red and orange come in degrees, like Mercury expanding in a thermometer. Sadness, joy, fear, love. None of these features of consciousness are merely ‘on’ or ‘off’ like a one or a zero. They too come in degrees, like the turning of the gears of a watch.

Do you think that the analog aspects of neuron function help explain the fact that we think consciousness appears to be analog, or am I misunderstanding the point?

(My intu... (read more)

There is a growing amount of work in philosophy investigating the basic nature of pain that seems relevant to identifying important valenced experiences in software entities. What the body commands by Colin Klein is a representative and reasonably accessible book-length introduction that pitches one of the current major theories of pain. Applying it to conscious software entities wouldn't be too hard. Otherwise, my impression is that most of the work is too recent and too niche to have accessible surveys yet.

Overall, I should say that not particularly symp... (read more)

Perhaps I oversold the provocative title. But I do think that affective experiences are much harder, so even if there is a conscious AI it is unlikely to have the sorts of morally significant states we care about. While I think that it is plausible that current theories of consciousness might be relatively close to complete, I'm less sympathetic that current theories of valence are plausible as relatively complete accounts. There has been much less work in this direction.

1
Noah Scales
2y
Which makes me wonder how anyone expects to identify whether software entities have affective experience. Is there any work in this direction that you like and can recommend?

I guess this is a matter of definitions.

I agree that this sounds semantic. I think of illusionism as a type of error theory, but people in this camp have always been somewhat cagey what they're denying and there is a range of interesting theories.

At an rate, whether consciousness is a real phenomenon or not, however we define it, I would count systems that have illusions of consciousness, or specifically illusions of conscious evaluations (pleasure, suffering, "conscious" preferences) as moral patients and consider their interests in the usual ways.

... (read more)
2
MichaelStJules
2y
Assuming illusionism is true, then yes, I think only those with illusions of consciousness are moral patients.

For example, a single neuron to represent an internal state and another another neuron for a higher-order representation of that internal state.

This requires an extremely simplistic theory of representation, but yeah, if you allow any degree of crudeness you might get consciousness in very simple systems.

I suppose you could put my overall point this way: current theories present very few technical obstacles, so there it would take little effort to build a system which would be difficult to rule out. Even if you think we need more criteria to avoid get... (read more)

2
MichaelStJules
2y
This is also my impression of the theories with which I'm familiar, except illusionist ones. I think only illusionist theories actually give plausible accounts of consciousness in general, as far as I'm aware, and I think they probably rule out panpsychism, but I'm not sure (if small enough animal brains are conscious, and counterfactual robustness is not necessary, then you might get panspychism again).   Fair. That's my impression, too.   I guess this is a matter of definitions. I wouldn't personally take illusionism as denying consciousness outright, and instead illusionism says that consciousness does not actually have the apparently inaccessible, ineffable, unphysical or mysterious properties people often attribute to it, and it's just the appearance/depiction/illusion of such properties that makes a system conscious. At any (typo) rate, whether consciousness is a real phenomenon or not, however we define it, I would count systems that have illusions of consciousness, or specifically illusions of conscious evaluations (pleasure, suffering, "conscious" preferences) as moral patients and consider their interests in the usual ways. (Maybe with some exceptions that don't count, like giant lookup tables and some other systems that don't have causal structures at all resembling our own.) This is also Luke Muehlhauser's approach in 2017 Report on Consciousness and Moral Patienthood.

I was under the impression that we still don’t know what the necessary conditions for consciousness are

We definitely don't, and I hope I haven't committed myself to any one theory. The point is that the most developed views provide few obstacles. Those views tend to highlight different facets of human cognitive architecture. For instance, it may be some form of self-representation that matters, or the accessibility of representations to various cognitive modules. I didn't stress this enough: of the many views, we may not know which is right, but it woul... (read more)

Load more