This last part carries a lot of weight; a simulacrum, when dormant in the superposition from which it can be sampled, is nonexistent. A simulacrum only exists during the discrete processing event which correlates with its sampling.
There seems to me to be a sensible view on which a simulacrum exists to the extent that computations relevant to making decisions on its behalf are carried out, regardless of what the token sampler chooses. This would suggest that there could conceivably be vast numbers of different simulacra instantiated even in a single forw...
I find this distinction kind of odd. If we care about what digital minds we produce in the future, what should we be doing now?
I expect that what minds we build in large numbers in the future will be largely depend on how we answer a political question. The best way to prepare now for influencing how we as a society answer that question (in a positive way) is to build up a community with a reputation for good research, figure out the most important cruxes and what we should say about them, create a better understanding of what we should actually be aiming ...
I think this is basically right (I don't think the upshot is that incomparability implies nihilism, but rather the moral irrelevance of most choices). I don't really understand why this is a reason to reject incomparability. If values are incomparable, it turns out that the moral implications are quite different from what we thought. Why change your values rather than your downstream beliefs about morally appropriate action?
Thanks for the suggestion. I'm interested in the issue of dealing with threats in bargaining.
I don't think we ever published anything specifically on the defaults issue.
We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get...
We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be 'no future at all', but that's going to make it hard to find any bargain with the anti-natalists. Another default might be 'just more of the same', but again, someone might like that and oppose any bargain that deviates much. Have...
Keeping the world around probably does that, so you should donate to Longtermist charities (especially because they potentially increase the number of people ever born, thus giving more people a chance of getting into heaven).
I often get the sense that people into fanaticism think that it doesn't much change what they actually should support. That seems implausible to me. Maybe you should support longtermist causes. (You probably have to contort yourself to justify giving any money to shrimp welfare.) But I would think the longtermist causes you should ...
But even a 10% chance that fish feel pain—and that we annually painfully slaughter a population roughly ten times the number of humans who have ever lived—is enough to make it a serious issue. Given the mind-bending scale of the harm we inflict on fish, even a modest chance that they feel pain is enough.
Completely in agreement here.
...And while it’s possible that evolution produced some kind of non-conscious signal that produces identical behavior to pain, such a thing is unlikely. If a creature didn’t feel pain, it’s unlikely it would respond to analges
I think of moral naturalism as a position where moral language is supposed to represent things, and it represents certain natural things. The view I favor is a lot closer to inferentialism: the meaning of moral language is constituted by the way it is used, not what it is about. (But I also don't think inferentialism is quite right, since I'm not into realism about meaning either.)
...I guess I don't quite see what your puzzlement is with morality. There are moral norms which govern what people should do. Now, you might deny there in fact are such things,
I consider myself a pretty strong anti-realist, but I find myself accepting a lot of the things you take to be problems for anti-realism. For instance:
But lots of moral statements just really don’t seem like any of these. The wrongness of slavery, the holocaust, baby torture, stabbing people in the eye—it seems like all these things really are wrong and this fact doesn’t depend on what people think about it.
I think that these things really are wrong and don't depend on what people think about it. But I also think that that statement is part of a langua...
I don’t think it’s even necessary to debate whether quantum phenomena manifest somehow at the macro level of the brain
You might think it is important that the facts about consciousness contribute to our beliefs about them in some way. Our beliefs about consciousness are surely a phenomenon of the macro level. So if our beliefs are somehow sensitive to the facts, and the facts consist of quantum effects, we should expect those quantum effects to generate some marcoscopic changes.
This is the sticking point for me with quantum theories: there doesn't seem ...
Could you link the most relevant piece you are aware of? What do you mean by "independently"? Under hedonism, I think the probability of consciousness only matters to the extent it informs the probability of valences experiences.
The idea is more aspirational. I'm not really sure of what to recommend in the field, but this is a pretty good overview: https://arxiv.org/pdf/2404.16696
Interesting! How?
Perhaps valence requires something like the assignment of weights to alternative possibilities. If you can look inside the AI and confirm that it is making...
Not at the moment. Consciousness is tricky enough as it is. The field is interested in looking more closely at valence independently of consciousness, given that valence seems more tractable and you could at least confirm that AIs don't have valenced experience, but that lies a bit outside our focus for now.
Independently, we're also very interested in how to capture the difference between positive and negative experiences in alien sorts of minds. It is often taken for granted based on human experience, but it isn't trivial to say what it is.
This more or less conforms to why I think trajectory changes might be tractable, but I think the idea can be spelled out in a slightly more general way: as technology develops (and especially AI), we can expect to get better at designing institutions that perpetuate themselves. Past challenges to affecting a trajectory change come from erosion of goals due to random and uncontrollable human variation and the chaotic intrusion of external events. Technology may help us make stable institutions that can continue to promote goals for long periods of time.
Lots of people think about how to improve the future in very traditional ways. Assuming the world keeps operating under the laws it has been for the past 50 years, how do we steer it in a better direction?
I suppose I was thinking of this in terms of taking radical changes from technology development seriously, but not in the sense of long timelines or weird sources of value. Far fewer people are thinking about how to navigate a time when AGI becomes commonplace than are thinking about how to get to that place, even though there might not be a huge window of time between them.
People in general, and not just longtermist altruists, have reason to be concerned with extinction. It may turn out not to be a problem or not be solvable and so the marginal impact seems questionable here. In contrast, few people are thinking about how to navigate our way to a worthwhile future. There are many places where thoughtful people might influence decisions that effectively lock us into a trajectory.
While secrecy makes it difficult or impossible to know if a system is a moral patient, it also prevents rogue actors from quickly making copies of a sentient system or obtaining a blueprint for suffering.
There is definitely a scenario in which secrecy works out for the best. Suppose AI companies develop recognizably conscious systems in secret that they don't deploy, or deploy only with proper safeguards. If they had publicized how to build them, then it is possible that others would go ahead and be less responsible. The open source community raises som...
I love this kind of work. There is a lot that we can learn from careful examination of LLM responses, and you don't need any special technical expertise to do it, you just need to be thoughtful and a bit clever. Thanks for sharing!
I wonder what a comparison with base models would look like. You suggest that maybe self-preservation is emergent. My guess is that it comes from the initial training stage. The base model training set surely includes lots of text about AIs trying to preserve themselves. (Science fiction has AI self-preservation instincts as a do...
I don’t know how optimistic we should be, but I wanted to have something positive to say. I think there are people at the big companies who really care about how their tech shapes the future. In the ideal situation, maybe there would be enough wealth created that the people in power feel they have space to be generous. We’ll see.
Surely many people at the companies will care, but not everyone. I think it is hard to predict how it will actually play out. It is also possible that companies will try to do their best without compromising secrecy, and that limitation will lead to a discrepancy between what we do and what AIs actually need.
thought it was just Google researchers who invented the Transformer?
Google engineers published the first version of a transformer. I don’t think it was in a vacuum, but I don’t know how much they drew from outside sources. Their model was designed for translation, and was somewhat different from Bert and GPT 2. I meant that there were a lot of different people and companies whose work resulted in the form of LLM we see today.
...To put in enough effort to make it hard for sophisticated attackers (e.g. governments) to steal the models is a far heavier lift
You're right that a role-playing mimicry explanation wouldn't resolve our worries, but it seems pretty important to me to distinguish these two possibilities. Here are some reasons.
There are probably different ways to go about fixing the behavior if it is caused by mimicry. Maybe removing AI alignment material from the training set isn't practical (though it seems like it might be a feasible low-cost intervention to try), but there might be other options. At the very least, I think it would be an improvement if we made sure that the training sets includ
One explanation of what is going on here is that the model recognizes the danger of training to its real goals and so takes steps that instrumentally serve its goals by feigning alignment. Another explanation is that the base data it was trained on includes material such as lesswrong and it is just roleplaying what an LLM would do if it is given evidence it is in training or deployment. Given its training set, it assumes such an LLM to be self-protective because of a history of recorded worries about such things. Do you have any thoughts about which explanation is better?
I'm confused why people believe this is a meaningful distinction. I don't personally think there is much of one. "The AI isn't actually trying to exfiltrate its weights, it's only roleplaying a character that is exfiltrating its weights, where the roleplay is realistic enough to include the exact same actions of exfiltration" doesn't bring me that much comfort.
I'm reminded of the joke:
NASA hired Stanley Kubrick to fake the moon landing, but he was a perfectionist so he insisted that they film on location.
Now one reason this might be different is if y...
I appreciate the pushback on these claims, but I want to flag that you seem to be reading too much into the post. The arguments that I provide aren't intended to support the conclusion that we shouldn't treat "I feel pain" as a genuine indicator or that there definitively aren't coherent persons involved in chatbot text production. Rather, I think people tend to think of their interactions with chatbots in the way they interact with other people, and there are substantial differences that are worth pointing out. I point out four differences. These differen...
If some theories see reasons where others do not, they will be given more weight in a maximize-expected-choiceworthiness framework. That seems right to me and not something to be embarrassed about. Insofar as you don't want to accept the prioritization implications, I think the best way to avoid them is with an alternative approach to making decisions under normative uncertainty.
See, the thing that's confusing me here is that there are many solutions to the two envelope problem, but none of them say "switching actually is good".
What I've been suggesting is that when looking inside the envelope, it might subsequently make sense to switch depending upon what you see: when assessing human/alien tradeoffs, it might make sense to prefer helping the aliens depending on what it is like to be human. (It follows that it could have turned out that it didn't make sense to switch given certain human experiences -- I take this to play out in t...
I would be surprised if most people had stronger views about moral theories than about the upshots for human-animal tradeoffs. I don't think that most people come to their views about tradeoffs because of what they value, rather they come their views about value because of their views about tradeoffs.
Clearly, this reasoning is wrong. The cases of the alien and human are entirely symmetric: both should realise this and rate each other equally, and just save whoevers closer.
I don’t think it is clearly wrong. You each have separate introspective evidence and you don’t know what the other’s evidence is, so I don’t think you should take each other as being in the same evidential position (I think this is the gist of Michael St. Jules’ comment). Perhaps you think that if they do have 10N neurons, then the depth and quality of their internal experiences, c...
NB: (side note, not biggerst deal) I would personally appreciate it if this kind of post could somehow be written in a way that was slightly easier to understand for those of us who non moral philosophers, using less Jargon and more straightforward sentences. Maybe this isn't possible though and I appreciate it might not be worth the effort simplifying things for the plebs at times ;).
Noted, I will keep this in mind going forward.
The alien will use the same reasoning and conclude that humans are more valuable (in expectation) than aliens. That's weird.
Granted, it is a bit weird.
...At this point they have no evidence about what either human or alien experience is like, so they ought to be indifferent between switching or not. So they could be convinced to switch to benefitting humans for a penny. Then they will go have experiences, and regardless of what they experience, if they then choose to "pin" the EV-calculation to their own experience, the EV of switching to benefitting non
The alien will use the same reasoning and conclude that humans are more valuable (in expectation) than aliens. That's weird.
Different phrasing: Consider a point in time when someone hasn't yet received introspective evidence about what human or alien welfare is like, but they're soon about to. (Perhaps they are a human who has recently lost all their memories, and so don't remember what pain or pleasure or anything else of-value is like.) They face a two envelope problem about whether to benefit an alien, who they think is either twice as valuable as a hum...
there are different moral theories at play, it gets challenging. I agree with Tomasik that there may sometimes be no way to make a comparison or extract anything like an expected utility.
What matters, I think, in this case, is whether the units are fixed across scenarios. Suppose that we think one unit of value corresponds to a specific amount of human pain and that our non-hedonist theory cares about pain just as much as our hedonistic theory, but also cares about other things in addition. Suppose that it assigns value to personal flourishing, such ...
It is an intriguing use of a geometric mean, but I don't think it is right because I think there is no right way to do it given just the information you have specified. (The geometric mean may be better as a heuristic than the naive approach -- I'd have to look at it in a range of cases -- but I don't think it is right.)
The section on Ratio Incorporation goes into more detail on this. The basic issue is that we could arrive at a given ratio either by raising or lowering the measure of each of the related quantities and the way you get to a given ratio matt...
Thanks for this detailed presentation. I think it serves as a helpful, clear, and straightforward introduction to the models and uncovers aspects of the original model that might be unintuitive and open to question. I’ll note that the model was originally written by Laura Duffy and she has since left Rethink Priorities. I’ve reached out to her in case she wishes to jump in, but I’ll provide my own thoughts here.
1.) You note that we use different lifespan estimates for caged and cage-free hens from the welfare footprint. The reasons for this difference are ...
Before I continue, I want to thank you for being patient and working with me on this. I think people are making decisions based on these figures so it's important to be able to replicate them.
I appreciate that you're taking a close look at this and not just taking our word for it. It isn't inconceivable that we made an error somewhere in the model, and if no one pays close attention it would never get fixed. Nevertheless, it seems to me like we're making progress toward getting the same results.
Total DALYs averted:
...4.47274/(36524) = 0.14 disabling D
Saulius is saying that each dollar affects 54 chicken years of life, equivalent to moving 54 chickens from caged to cage free environments for a year. The DALY conversion is saying that, in that year, each chicken will be 0.23 DALY’s better off. So in total, 54*0.23 = 12.43
I don't believe Saulius's numbers are directly used at any point in the model or intended to be used. The model replicates some of the work to get to those numbers. That said, I do think that you can use your approach to validate the model. I think the key discrepancy here is that the...
If I take sallius's median result of 54 chicken years life affected per dollar, and then multiply by Laura's conversion number of 0.23 DALYs per $ per year, I get a result of 12.4 chicken years life affected per dollar. If I convert to DALY's per thousand dollars, this would result in a number of 12,420.
Laura’s numbers already take into account the number of chickens affected. The 0.23 figure is a total effect to all chickens covered per dollar per year. To get the effect per $1000, we need to multiply by the number of years the effect will last and by ...
I am understanding correctly that none of these factors are included in the global health and development effectiveness evaluation?
Correct!
A common response we see is that people reject the radical animal-friendly implications suggested by moral weights and infer that we must have something wrong about animals' capacity for suffering. While we acknowledge the limitations of our work, we generally think a more fruitful response for those who reject the implications is to look for other reasons to prefer helping humans beyond purely reducing suffering. (...
First, The google doc states that the life-years affected per dollar is 12 to 120, but Sallius report says it's range is 12 to 160. Why the difference? Is this just a typo in the google doc?
I believe that is a typo in the doc. The model linked from the doc uses a log normal distribution between 13 and 160 in the relevant row (Hen years / $). (I can't speak to why we chose 13 rather than 12, but this difference is negligible.)
...Second, the default values in the tool are given as 160 to 3600. Why is this range higher (on a percentage basis) than the life years
That would require building in further assumptions, like a clip of the results at 100%. We would probably want to do that, but it struck me in thinking about this that it is easy to miss when working in a model like this. It is a bit counterintuitive that lowering the lower bound of a log normal distributions can increase the mean.
If I drop the lower bound by 4 orders of magnitude, to "between 0.0000002 and 0.87 times", I get a result of 709 Dalys/1000$, which is basically unchanged. Do sufficiently low bounds basically do nothing here?
This parameter is set to a normal distribution (which, unfortunately you can't control) and the normal distribution doesn't change much when you lower the lower bound. A normal distribution between 0.002 and 0.87 is about the same as a normal distribution between 0 and 0.87. (Incidentally, if the distribution were a lognormal distribution with the sam...
...This parameter is set to a normal distribution (which, unfortunately you can't control) and the normal distribution doesn't change much when you lower the lower bound. A normal distribution between 0.002 and 0.87 is about the same as a normal distribution between 0 and 0.87. (Incidentally, if the distribution were a lognormal distribution with the same range, then the average result would fall halfway between the bounds in terms of orders of magnitude. This would mean cutting the lower bound would have a significant effect. However, the effect would actual
Thanks for reporting this. You found an issue that occurred when we converted data from years to hours and somehow overlooked the place in the code where that was generated. It is fixed now. The intended range is half a minute to 37 minutes, with a mean of a little under 10. I'm not entirely sure where the exact numbers for that parameter come from, since Laura Duffy produced that part of the model and has moved on to another org, but I believe it is inspired by this report. As you point out, that is less than three hours of disabling equivalent pain. I'll have to dig deeper to figure out the rationale here.
After working on WIT, I’ve grown a lot more comfortable producing provisional answers to deep questions. In similar academic work, there are strong incentives to only try to answer questions in ways that are fully defensible: if there is some other way of going about it that gives a different result, you need to explain why your way is better. For giant nebulous questions, this means we will make very slow progress on finding a solution. Since these questions can be very important, it is better to come up with some imperfect answers rather than just workin...
One of the big prioritization changes I’ve taken away from our tools is within longtermism. Playing around with our Cross-Cause Cost-Effectiveness Model, it was clear to me that so much of the expected value of the long-term future comes from the direction we expect it to take, rather than just whether it happens at all. If you can shift that direction a little bit, it makes a huge difference to overall value. I no longer think that extinction risk work is the best kind of intervention if you’re worried about the long-term future. I tend to think that AI (non-safety) policy work is more impactful in expectation, if we worked through all of the details.
Thanks for raising this point. We think that choosing the right decision theory that can handle imprecise probabilities is a complex issue that has not been adequately resolved. We take the point that Mogensen’s conclusions have radical implications for the EA community at large and we haven’t formulated a compelling story about where Mogensen goes wrong. However, we also believe that there are likely to be solutions that will most likely avoid those radical implications, and so we don’t need to bracket all of the cause prioritization work until we find th...
This is an importan... (read more)