If "human-compatible" means anything non-speciesistic, then I agree that it is an unfortunate phrase, since it is misleading. I also think it is misleading to call idealized preferences for "human values," since humans don't actually hold those preferences, as you correctly point out.
You write that
"Which ethical system is correct?" isn't written in the stars or in Plato's heaven; it seems like if the answer is encoded anywhere in the universe, it must be encoded in our brains (or in logical constructs out of brains).
Let X be the claims, which you deny in this quote. If X is taken litterally, then it is a straw man, since no one believes in it. If X is metaphorical, then it is very unclear what its supposed to mean or whether it means anything. The claim that "ethics is encoded somewhere in the universe" is also unclear. My best attempt to ascribe meaning to it is as follows "there is some entity in the universe, which constitutes all of ethics," but claims seems false. The most basic ethical principles is, I believe, in some ways like logical principles. The validity of the argument "p and q, therefore p" is not constituted by any feature of the universe. To see this, imagine an alternative universe, which differs from the real in basically any way you like. It's governed by different laws of nature, contains different lifeforms (or perhaps no life at all) has a different cosmological history etc. If this universe had been real, then "p and q, therefore p" would still be valid. Basic ethical principles like the claim that the suffering is bad, seems just like this. If human preferences (or other features of the universe) where to be different, then suffering would still be bad.
Russels' assumption that "The machine’s only objective is to maximize the realization of human preferences" seems to assume some controversial and (to my judgement) highly implausible moral views. In particular, it is speciesistic, for why should only human preferences be maximized? Why not animal or machine preferences?
One might respond that Russel is giving advice to humans and humans should maximize human preferences, since we should all maximize our own preferences. Thus, he isn't assuming that there is anything morally special about humans and his position is therefore not speciestic. I respond, that maximizing my own prefrences and maximizing human preferences are very different objectives, since there are many humans other than myself. This defence therefore rests on a mischaracterization of Russel's assumption (at least as you outlined it). Furthermore, the assumption that we should maximize our own preferences seems anyway arbitrary and unsurported.
You write that "There are some mechanics that can be deployed to achieve [an AI following the guidelines]. These include game theory, utilitarian ethics, and an understanding of human psychology."
I doubt that a utilitarian ethic is useful for maximizing of human preferences, since utilitarianism is impartial in the sense that it takes everyone's wellbeing into account, human or otherwise. I also doubt that it supports the maximization of the agent's own preferences, where "the agent" is assumed to be an individual human, since human preferences have non-utilitarian features. The precise nature of these features depends on what exactly you mean by "preference," so let me illustrate the point with some sensible-sounding definitions of "preference".
(A) An agent is said to prefer x over y, iff he would choose the certain outcome x over the certain outcome y, when given the option.
This makes it tautological that agents maximizes their preferences, when the necessary factual information is availeble. However, people often behave in non-utilitarian ways even if they posses all the relevant factual information. They may e.g. use their money on luxeries instead of donations, or they may support factory farming by buying its products.
(B) An agent is said to prefer x over y, iff he has an urge/craving towards doing x instead of doing y. To put it in other words, the agent would have to muster some strength of will, if he is to avoid doing x instead of y.
People's cravings/urges can often lead them in non-utilitarian directions (think e.g. of a drug addict who would be better of he could muster the will to quit the drugs).
(C) An agent is said to prefer x over y, iff the feelings/emotions/passions that motivate him towards x are more intense, than those which motivate him towards y. The intensity is here assumed to be some consciously felt feature of the feelings.
Warm glow giving is, by definition, motivated by our feelings/emotions. However, it usually has fairly little impact upon aggragate happiness, so uttilitarianism doesn't recommend it.
(D) An agent is said to prefer x over y, iff he values x more than y.
This definition prompts the question "what does 'valuing' refer to?". One possible answer is to define "valuing" like (C), but (C) has already been dealt with. Another option is the following.
(E) An agent values is x more than y, iff he believes it to be more valuable.
This would make preference-maximization compatible with uttilitarianism, insofar as the agent believes in utilitarism and lacks beliefs that contradict utilitarianism. However, it would also be compatible with any other moral theory whatsoever, so long as we make the analogous assumptions on behalf of that theory.
It seems worth adding two more comments about (E). First, unlike (A), (B) and (C) it introduces a rationale for maximizing one's prefernces. We cannot act on an unknown truth, but only on what we believe to be true. Thus, we must act on our moral beliefs, rather than some unknown moral truth.
Second, (E) seems like a bad analysis of "preference," for although moral views have some preference-like features (specifically, they can motivate behavior), they also have some features, that are more belief-like, than preference-like. They can e.g. serve as premises or conclusions in arguments, one can have credences in them and they can be the subjectmatter of questions.
I think this would be way easier to understand with an equation or two. Let w be overall lifetime wellbeing, let wt be age-specific wellbeing at time t, let L be lifetime and let us denote averages over lifetime by an overbar. If so, it seems like the "normalized age-specific wellfare" is wt,norm=wt/¯w. It is not clear what "this normalized welfare expectancy" refers to, since it can either mean wt,norm or wnorm=∑twt,norm (I assume here that overall wellbeing is the sum of age-specific wellbeing). Thus, the RWE is calculated as follows:
I find both of these formulas to be rather strange, and devoid of a rationale. Have I misunderstood you?
You write that:
(A) "We are profoundly uncertain about whether most animals' lives are dominated by pleasure or suffering, or even how to go about weighing these up. Therefore, it may be prudent to concentrate on a measure of "relative welfare expectancy" (RWE), representing the normalized welfare expectancy of a population divided by its life expectancy."
But you also write that:
(B) "A plausible working hypothesis, however, is that the average welfare experienced by an animal of a given age is proportional to their probability of surviving that period of life."
Unfortunately, these views seem inconsistent. The (A) suggests that we should avoid making assumptions about whether increasing wild animal lifetimes is good or bad for the animals, while the (B) tells us to assume that welfare at a given age depends upon survivorship. However, high survivorship corrosponds to high lifetimes, so these are effectively the same assumptions.
You might defend your position by saying that welfare at each age is very small in expectation, so the expected value of increasing animal lifetimes, while holding welfare at each age constant, is neglible. However, this argument makes a significant assumption about which probability distribution over welfare at each age would be rational. Thus, it doesn't square well with your motivation behind ignoring lifetimes.
Suppose I want to give a counter argument to one of GPI's research papers. Where can I post such a response?
More precisely, I want to argue that the reasoning in " Moral Uncertainty About Population Axiology " is not compatible with the most plausible ways of normalizing different axiologies, such as variance normalization.
What are these other questions about optimal institution design, which you consider more important than voting systems?