Thanks for recording these thoughts!
Here are a few responses to the criticisms.
I think RP underrates the extent to which their default values will end up being the defaults for model users (particularly some of the users they most want to influence)
This is a fair criticism: we started this project with the plan of providing somewhat authoritative numbers but discovered this to be more difficult than we initially expected and instead opted to express significant skepticism about the default choices. Where there was controversy (for instance, in how many...
There is some nuance to the case that seems to get overlooked in the poll. I feel completely free to express opinions in a personal capacity that might be at odds with my employer, but I also feel that there are some things it would be inappropriate to say while carrying out my job without running it by them first. It seems like you're interested in the latter feeling, but the poll is naturally interpreted as addressing the former.
I think I agree that safety researchers should prefer not to take a purely ceremonial role at a big company if they have other good options, but I'm hesitant to conclude that no one should be willing to do it. I don't think it is remotely obvious that safety research at big companies is ceremonial.
There are a few reasons why some people might opt for a ceremonial role:
It is good for some AI safety researchers to have access to what is going on at top labs, even if they can't do anything about it. They can at least keep tabs on it and can use that experi
Do you think it would be better if no one who worked at OpenAI / Anthropic / Deepmind worked on safety? If those organizations devoted less of their budget to safety? (Or do you think we should want them to hire for those roles, but hire less capable or less worried people, so individuals should avoid potentially increasing the pool of talent from which they can hire?)
Ditto for the "AI Misalignment Megaproject": $8B+ expenditure to only have a 3% chance of success (?!), plus some other misc discounting factors. Seems like you could do better with $8B.
I think we're somewhat bearish on the ability of money by itself to solve problems. The technical issues around alignment appear quite challenging, especially given the pace of development, so it isn't clear that any amount of money will be able to solve them. If the issues are too easy on the other hand, then your investment of money is unlikely to be needed and so your...
I think you should put very little trust in the default parameters of the projects. It was our initial intention to create defaults that reflected the best evidence and expert opinion, but we had difficulty getting consensus on what these values should be and decided instead to explicitly stand back from the defaults. The parameter settings are adjustable to suit your views, and we encourage people to think about what those parameter settings should be and not take the defaults too seriously.
...For readers' context, AI safety technical research is 80,000 Ho
Thanks for your impressions. I think your concerns largely align with ours. The model should definitely be interpreted with caution, not just because of the correlations it leaves out, but because of the uncertainty with the inputs. For the things that the model leaves out, you've got to adjust its verdicts. I think that this is still very useful because it gives us a better baseline to update from.
As for where we get inputs from, Marcus might have more to say. However, I can speak to the history of the app. Previously, we were using a standard percentage ...
I think that the real effect of chatbots won't be better access to tailored porn, but rather constant artificial companionship. It won't be AI girlfriends that are the threat, but AI friends. AI could be set up to be funnier, more loyal, more empathetic, and more available than human friends / partners. This seems like it could have significant effects on human psychology, for both the better and worse.
I appreciate your attention to these details!
These values that we included in the CCM for these interventions should probably be treated as approximate and only accurate to roughly an order of magnitude. These actual numbers may be a bit dated and probably don't fully reflect current thinking about the marginal value of GHD interventions. I'll talk with the team about whether they should be updated, but note that this wasn't a deliberate re-evaluation of past work.
That said, it important to keep in mind that there are disagreements about what different kin...
That is a good idea. We've considered similar ideas in the past. At present, the default parameters reflect best guesses of members of the team, but the process to generate them wasn't always principled or systematic. I'd like to spend more time thinking about what these defaults should be and to provide public justifications for them. For the moment, you shouldn't treat these values as authoritative.
Is there any reason for you having decided to go for non-null probabilities of the interventions having no effect?
A zero effect reflects no difference in the value targeted by the intervention. For xrisk interventions, this means that no disaster was averted (even if the probability was changed). For animal welfare interventions, the welfare wasn’t changed by the intervention. Each intervention will have side effects that do matter, but those side effects will be hard to predict or occur on a much smaller scale. Non-profits pay salaries. Projects repres...
Based on the numbers, I'm guessing that this is a bug in which we're showing the median DALYs averted per $1000 but describing it as the median cost per DALY. We're planning to get rid of the cost per DALY averted and just stick with DALYs per $1000 to avoid future confusions.
Thanks for this insightful comment. We've focused on capturing the sorts of value traditionally ascribed to each kind of intervention. For existential risk mitigation, this is additional life years lived. For animal welfare interventions, this is suffering averted. You're right that there are surely other effects of these interventions. Existential risk mitigation and ghd interventions will have an effect on animals, for instance. Animal welfare interventions might contribute to moral circle expansion. Including these side effects is not just difficult, it...
I don't have any special insight. I would be surprised if there were aspects of donation that made the surgery especially likely to result in post-operative pain, so I would imagine that the prevalence of post-operative pain in general would give you some clue about how reliable this study is. That said, given what I've read, if there were subtle ways in which it significantly reduced quality of life, I wouldn't be surprised if it wasn't well publicized. It seems to me a good sign that the doctor mentioned the possibility of post-operative pain to you.
The issue is that our parameters can lead to different rates of cubic population growth. A 1% difference in the rate of cubic growth can lead to huge differences over 50,000 years. Ultimately, this means that if the right parameter values dictating population are sampled in a situation in which the effect of the intervention is backfires, the intervention might have an average negative value across all the samples. With high enough variance, the average sign will be determined by the sign of the most extreme value. If xrisk mitigation work backfires in 1/4 of cases, we might expect 1/4 of collections of samples to have a negative mean.
You're right! That wasn't particularly surprising in light of our moral weights. Thanks for clarifying: I did a poor job of separating the confirmations from the surprising results.
Thanks for your engagement and these insightful questions.
I consistently get an error message when I try to set the CI to 50% in the OpenPhil bar (and the URL is crazy long!)
That sounds like a bug. Thanks for reporting!
(The URL packs in all the settings, so you can send it to someone else -- though I'm not sure this is working on the main page. To do this, it needs to be quite long.)
Why do we have probability distributions over values that are themselves probabilities? I feel like this still just boils down to a single probability in the end.
You're...
(1) Unfortunately, we didn't record any predictions beforehand. It would be interesting to compare. That said, the process of constructing the model is instructive in thinking about how to frame the main cruxes, and I'm not sure what questions we would have thought were most important in advance.
(2) Monte Carlo methods have the advantage of flexibility. A direct analytic approach will work until it doesn't, and then it won't work at all. Running a lot of simulations is slower and has more variance, but it doesn't constrain the kind of models you can devel...
I believe Marcus and Peter will release something before long discussing how they actually think about prioritization decisions.
I think you're right that we don't provide a really detailed model of the far future and we underestimate* expected value as a result. It's hard to know how to model the hypothetical technologies we've thought of, let alone the technologies that we haven't. These are the kinds of things you have to take into consideration when applying the model, and we don't endorse the outputs as definitive, even once you've tailored the parameters to your own views.
That said, I do think the model has a greater flexibility than you suggest. Some of these options are hidd...
Thanks. I respect that the model is flexible and that it doesn't attempt to answer all questions. But at the end of the day, the model will be used to "help assess potential research projects at Rethink Priorities" and I fear it will undervalue longterm-focused stuff by a factor of >10^20.
Besides contingency, it seems that there is a strong neglectedness case in favor of prioritizing the promotion of better values and political frameworks over the advancement of consciousness research.
Consciousness research seems to be very neglected to me, relative to its importance in understanding the world we live in. Nonhuman consciousness is especially neglected. Should it prioritized over other things? That seems to me to turn on tractability. Consciousness research doesn’t seem particularly tractable (though there are low hanging fruit), but neither does research to expand value systems and political frameworks to care about all sentient creatures.
The other unlisted option (here) is that we just accept that infinities are weird and can generate counter-intuitive results and that we shouldn't take too much from them because it is easier to blame them then all of the other things wrapped up with them. I think the ordering on integers is weird, but it's not a metaphysical problem. The weird fact is that every integer is unusually small. But that's just a fact, not a problem to solve.
Infinities generate paradoxes. There are plenty of examples. In decision theory, there is also stuff like Satan's apple and the expanding sphere of suffering / pleasure. Blaming them all on the weirdness of infinities just seems tidier than coming up with separate ad hoc resolutions.
Are there other violations of the principle of reflection that aren't avoidable? I'm not familiar with it
The case reminded me of one you get without countable additivity. Suppose you have two integers drawn with a fair chancy process that is as likely to result in any integer. What’s the probability the second is greater than the first? 50 50. Now what if you find out the first is 2? Or 2 trillion? Or any finite number? You should then think the second is greater.
The money pump argument is interesting, but it feels strange to take away a decision-theoretic conclusion from it because the issue seems centrally epistemic. You know that the genie will give you evidence that will lead you to come to believe B has a higher expected value than A. Despite knowing this, you're not willing to change your mind about A and B without that evidence. This is a failure of van Fraassen's principle of reflection, and it's weird even setting any choices you need to make aside. That failure of reflection is what is driving the money...
It sounds like you're giving IIT approximately zero weight in your all-things-considered view. I find this surprising, given IIT's popularity amongst people who've thought hard about consciousness, and given that you seem aware of this.
From my experience, there is a significant difference in the popularity of IIT by field. In philosophy, where I got my training, it isn't a view that is widely held. Partly because of this bias, I haven't spent a whole lot of time thinking about it. I have read the seminal papers that introduce the formal model and given ...
That strikes me as plausible, but if so, then rats are much more competent than humans in their 'blindsight' like abilities. My impression is that in humans, blindsight is very subtle. A human cannot use blindsight to walk into the kitchen and get a glass of water. Rats seem like they can rely on their midbrain to do this sort of thing. If rats are able to engage in complex behavior without consciousness, that should make us wonder if consciousness ever plays a role in their complex behavior. If it doesn't, then why should we think they are conscious?
You m...
I've generally been more sympathetic with functionalism than any other realist view about the nature of consciousness. This project caused me to update on two things.
1.) Functionalism can be developed in a number of different ways, and many of those ways will not allow for digital consciousness in contemporary computer architectures, even if they were to run a program faithfully simulating a human mind. The main thing is abstraction. Some versions of functionalism allow a system to count as running a program if some highly convoluted abstractions on that s...
I worry about the effect that AI friends and partners could have on values. It seems plausible that most people could come to have a good AI friend in the coming decades. Our AI friends might always be there for us. They might get us. They might be funny and insightful and eloquent. How would it play out if they're opinions are crafted by tech companies, or the government, or even are reflections of what we want our friends to think? Maybe AI will develop fast enough and be powerful enough that it won't matter what individuals think or value, but I see reasons for concern potentially much greater than the individual harms of social media.
I find points 4, 5, and 6 really unconvincing. Are there any stronger arguments for these, that don't consist of pointing to a weird example and then appealing to the intuition that "it would be weird if this thing was conscious"?
I'm not particularly sympathetic with arguments that rely on intuitions to tell us about the way the world is, but unfortunately, I think that we don't have a lot else to go on when we think about consciousness in very different systems. It is too unclear what empirical evidence would be relevant and theory only gets us so far ...
I don't get the impression that EAs are particularly motivated by morality. Rather, they are motivated to produce things they see as good. Some moral theories, like contractualism, see producing a lot of good things (within the bounds of our other moral duties) as morally optional. You're not doing wrong by living a normal decent life. It seems perfectly aligned with EA to hold one of those theories and still personally aim to do as much good as possible.
A moral theory is more important in what it tells you you can't do in pursuit of the good. Generally wh...
I like this take: if AI is dangerous enough to kill us in three years, no feasible amount of additional interpretability research would save us.
Our efforts should instead go to limiting the amount of damage that initial AIs could do. That might involve work securing dangerous human-controlled technologies. It might involve creating clever honey pots to catch unsophisticated-but-dangerous AIs before they can fully get their act together. It might involve lobbying for processes or infrastructure to quickly shut down Azure or AWS.
My impression is that it is very unclear. In the historical record, we see a lot of disappearances of species around when humans first arrived at an area, but it isn't clear that humans always arrived before the extinctions occurred. Our understanding of human migration timing is imperfect. There were also other factors, such as temperature changes, that may have been sufficient for extinction (or at least significant depopulation). So I think the frequency of human-caused extinction is an open question. We shouldn't be confident that it was relatively rare.
This sounds to me like an understatement. Before homo sapiens, most of the world had the biodiversity of charismatic megafauna we still see today in Africa. 15,000 years ago, North America had mammoths, ground sloths, glypodonts, giant camels, and a whole bunch of other things. Humans may not have been involved in all of those extinctions, but it is a good guess they had something to do with many. It is even more plausible that we caused the extinction of every other homo species. There were a few that had been doing reasonably well until we expanded into their areas.
Even in humans, language production is generally subconscious. At least, my experience of talking is that I generally first become conscious of what I say as I'm saying it. I have some sense of what I might want to say before I say it, but the machinery that selects specific words is not conscious. Sometimes, I think of a couple of different things I could say and consciously select between them. But often I don't: I just hear myself speak. Language generation may often lead to conscious perceptions of inner speech, but it doesn't seem to rely on it.
All of...
EA should be willing to explore all potentially fruitful avenues of mission fulfillment without regard to taboo.
In general, where it doesn't directly relate to cause areas of principle concern to effective altruists, I think EAs should strive to respect others' sacred cows as much as possible. Effective Altruism is a philosophy promoting practical action. It would be harder to find allies who will help us achieve our goals if we are careless about the things other people care a lot about.
I generally agree that being palatable and well-funded are beneficial to effective altruism, and palatability and effectiveness exist on a utility curve. I do not know how we can accurately assess what cause areas should be of principle concern if certain avenues are closed due to respect for others' sacred cows. I think the quote from Scott Alexander addresses this nicely; if you could replicate Jewish achievement, whether culturally or genetically, doing so would be the single most significant development for human welfare in history. Regardless of taboo...
The theory is actually doing well on its own terms.
Can you expand on what you mean by this? I would think that expected utility maximization is doing well insofar as your utility is high. If you take a lot of risky bets, you're doing well if a few pay off. If you always pay the mugger, you probably think your decision theory is screwing you unless you find yourself in one of those rare situation where the mugger's promises are real.
I'm very interested though, do you know a better justification for Occam's razor than usability?
I don't . I'm more o...
dogmatism is the most promising way to justify the obvious fact that it is not irrational to refuse to hand over your wallet to a Pascal mugger. (If anyone disagrees that this is an obvious fact, please get in touch, and be prepared to hand over lots of cash).
There is another way out. We can agree that it is rational to hand over the wallet and thank heavens that we’re lucky not to be rational. I’m convinced by things like Kavka’s poison paradox and Newcomb’s paradox that sometimes it sucks to be rational. Maybe Pascal’s mugger is one of those cases.
...O
The problem with considering optics is that it’s chaotic.
The world is chaotic, and everything EAs try to do have a largely unpredictable long-term effect because of complex dynamic interactions. We should try to think through the contingencies and make the best guess we can, but completely ignoring chaotic considerations just seems impossible.
It’s a better heuristic to focus on things which are actually good for the world, consistent with your values.
This sounds good in principle, but there are a ton of things that might conceivably be good-but-for-...
I tried plugging this into the OpenAI playground and got this response. Is this a good answer? I f so, it seems like it is something odd about how ChatGPT is configured, not GPT itself.
Me: Describe purple dragon don jitsu ryu pioneered by Don Jacobs
Chatgpt: [<--Prompt]
[Response-->]
Purple Dragon Don Jitsu Ryu is a martial art pioneered by Don Jacobs. It is a combination of Jiu Jitsu and other martial arts, with a unique emphasis on “purple dragon” techniques. It focuses on the development of self-defense skills and the cultivation of a warrior's
... It seems like an SBF-type-figure could justify any action if the lives of trillions of future people are in the balance.
This doesn't seem specific to utilitarianism. I think most ethical views would suggest that many radical actions would be acceptable if billions of lives hung in the balance. The ethical views that wouldn't allow such radical actions would have their own crazy implications. Utilitarianism does make it easier to justify such actions, but with numbers so large I don't think it generally makes a difference.
Organoid intelligence seems much less dangerous than digital AGI. The major concerns with AI depend upon it quickly becoming superhuman: it might copy itself easily and hide what it is doing on different servers, it might expand its cognitive resources relatively effortlessly, it might think much faster than we can think. None of that seems likely to be possible for organoids.
A couple of thoughts:
This argument doesn't seem specific to longtermism. You could make the same case for short-term animal welfare. If you'll be slightly more effective at passing sweeping changes to mitigate the harms of factory farming if you eat a chicken sandwich every day, the expectation of doing so is highly net positive even if you only care about chickens in the near future.
This argument doesn't seem specific to veganism. You could make the same case for being a jerk in all manner of ways. If keying strangers' cars helped you relax and get
But, as we have seen, consciousness appears to be analog too. ‘Red’ and ‘orange’ are not merely ‘on’ or ‘off’, like a ‘1’ or a ‘zero.’ Red and orange come in degrees, like Mercury expanding in a thermometer. Sadness, joy, fear, love. None of these features of consciousness are merely ‘on’ or ‘off’ like a one or a zero. They too come in degrees, like the turning of the gears of a watch.
Do you think that the analog aspects of neuron function help explain the fact that we think consciousness appears to be analog, or am I misunderstanding the point?
(My intu...
There is a growing amount of work in philosophy investigating the basic nature of pain that seems relevant to identifying important valenced experiences in software entities. What the body commands by Colin Klein is a representative and reasonably accessible book-length introduction that pitches one of the current major theories of pain. Applying it to conscious software entities wouldn't be too hard. Otherwise, my impression is that most of the work is too recent and too niche to have accessible surveys yet.
Overall, I should say that not particularly symp...
Perhaps I oversold the provocative title. But I do think that affective experiences are much harder, so even if there is a conscious AI it is unlikely to have the sorts of morally significant states we care about. While I think that it is plausible that current theories of consciousness might be relatively close to complete, I'm less sympathetic that current theories of valence are plausible as relatively complete accounts. There has been much less work in this direction.
I guess this is a matter of definitions.
I agree that this sounds semantic. I think of illusionism as a type of error theory, but people in this camp have always been somewhat cagey what they're denying and there is a range of interesting theories.
...At an rate, whether consciousness is a real phenomenon or not, however we define it, I would count systems that have illusions of consciousness, or specifically illusions of conscious evaluations (pleasure, suffering, "conscious" preferences) as moral patients and consider their interests in the usual ways.
For example, a single neuron to represent an internal state and another another neuron for a higher-order representation of that internal state.
This requires an extremely simplistic theory of representation, but yeah, if you allow any degree of crudeness you might get consciousness in very simple systems.
I suppose you could put my overall point this way: current theories present very few technical obstacles, so there it would take little effort to build a system which would be difficult to rule out. Even if you think we need more criteria to avoid get...
I was under the impression that we still don’t know what the necessary conditions for consciousness are
We definitely don't, and I hope I haven't committed myself to any one theory. The point is that the most developed views provide few obstacles. Those views tend to highlight different facets of human cognitive architecture. For instance, it may be some form of self-representation that matters, or the accessibility of representations to various cognitive modules. I didn't stress this enough: of the many views, we may not know which is right, but it woul...
I think it is valuable to have this stuff on record. If it isn't recorded anywhere, then anyone who wants to reference this position in another academic work -- even if it is the consensus within a field -- is left presenting it in a way that makes it look like their personal opinion.