All of jacobpfau's Comments + Replies

A mesa-optimization perspective on AI valence and moral patienthood

Ok, seems like this might have been more a terminological misunderstanding on my end. I think I agree with what you say here, 'What if the “Inner As AGI” criterion does not apply? Then the outer algorithm is an essential part of the AGI’s operating algorithm'.

A mesa-optimization perspective on AI valence and moral patienthood

Ok, interesting. I suspect the programmers will not be able to easily inspect the inner algorithm, because the inner/outer distinction will not be as clear cut as in the human case. The programmers may avoid sitting around by fiddling with more observable inefficiencies e.g. coming up with batch-norm v10.

1steve21521moOh, you said "evolution-type optimization", so I figured you were thinking of the case where the inner/outer distinction is clear cut. If you don't think the inner/outer distinction will be clear cut, then I'd question whether you actually disagree with the post :) See the section defining what I'm arguing against [https://www.lesswrong.com/posts/pz7Mxyr7Ac43tWMaC/against-evolution-as-an-analogy-for-how-humans-will-create#Defining__The_Evolution_Analogy_for_AGI_Development___Three_ingredients] , in particular the "inner as AGI" discussion.
A mesa-optimization perspective on AI valence and moral patienthood

Good clarification. Determining which kinds of factoring are the ones which reduce valence is more subtle than I had thought. I agree with you that the DeepMind set-up seems more analogous to neural nociception (e.g. high heat detection). My proposed set-up (Figure 5) seems significantly different from the DM/nociception case, because it factors the step where nociceptive signals affect decision making and motivation. I'll edit my post to clarify.

A mesa-optimization perspective on AI valence and moral patienthood

Your new setup seems less likely to have morally relevant valence. Essentially the more the setup factors out valence-relevant computation (e.g. by separating out a module, or by accessing an oracle as in your example) the less likely it is for valenced processing to happen within the agent.

Just to be explicit here, I'm assuming estimates of goal achievement are valence-relevant. How generally this is true is not clear to me.

1ofer1moI think the analogy to humans suggests otherwise. Suppose a human feels pain in their hand due to touching something hot. We can regard all the relevant mechanisms in their body outside the brain—those that cause the brain to receive the relevant signal—as mechanisms that have been "factored out from the brain". And yet those mechanisms are involved in morally relevant pain. In contrast, suppose a human touches a radioactive material until they realize it's dangerous. Here there are no relevant mechanisms that have been "factored out from the brain" (the brain needs to use ~general reasoning); and there is no morally relevant pain in this scenario. Though generally if "factoring out stuff" means that smaller/less-capable neural networks are used, then maybe it can reduce morally relevant valence risks.
A mesa-optimization perspective on AI valence and moral patienthood

Thanks for the link. I’ll have to do a thorough read through your post in the future. From scanning it, I do disagree with much of it, many of those points of disagreement were laid out by previous commenters. One point I didn’t see brought up: IIRC the biological anchors paper suggests we will have enough compute to do evolution-type optimization before the end of the century. So even if we grant your claim that learning to learn is much harder to directly optimize for, I think it’s still a feasible path to AGI. Or perhaps you think evolution like optimization takes more compute than the biological anchors paper claims?

2steve21521moNah, I'm pretty sure the difference there is "Steve thinks that Jacob is way overestimating the difficulty of humans building AGI-capable learning algorithms by writing source code", rather than "Steve thinks that Jacob is way under estimating the difficulty of computationally recapitulating the process of human brain evolution". For example, for the situation that you're talking about (I called it "Case 2" in my post) I wrote "It seems highly implausible that the programmers would just sit around for months and years and decades on end, waiting patiently for the outer algorithm to edit the inner algorithm, one excruciatingly-slow step at a time. I think the programmers would inspect the results of each episode, generate hypotheses for how to improve the algorithm, run small tests, etc." If the programmers did just sit around for years not looking at the intermediate training results, yes I expect the project would still succeed sooner or later. I just very strongly expect that they wouldn't sit around doing nothing.
A mesa-optimization perspective on AI valence and moral patienthood

Certainly valenced processing could emerge outside of this mesa-optimization context. I agree that for "hand-crafted" (i.e. no base-optimizer) systems this terminology isn't helpful. To try to make sure I understand your point, let me try to describe such a scenario in more detail: Imagine a human programmer who is working with a bunch of DL modules and interpretability tools and programming heuristics which feed into these modules in different ways -- in a sense the opposite end of the spectrum from monolithic language models. This person might program s... (read more)

1steve21521moGPT-3 is of that form, but AlphaGo/MuZero isn't (I would argue). I'm not sure how to settle whether your statement about "most contemporary progress" is right or wrong. I guess we could count how many papers use model-free RL vs model-based RL, or something? Well anyway, given that I haven't done anything like that, I wouldn't feel comfortable making any confident statement here. Of course you may know more than me! :-) If we forget about "contemporary progress" and focus on "path to AGI", I have a post arguing against what (I think) you're implying at Against evolution as an analogy for how humans will create AGI [https://www.lesswrong.com/posts/pz7Mxyr7Ac43tWMaC/against-evolution-as-an-analogy-for-how-humans-will-create] , for what it's worth. Yeah I dunno, I have some general thoughts about what valence looks like in the vertebrate brain (e.g. this [https://www.lesswrong.com/posts/iMM6dvHzco6jBMFMX/value-loading-in-the-human-brain-a-worked-example] is related, and this [https://www.lesswrong.com/posts/TtBik82RQLBCG3h8j/emotional-valence-vs-rl-reward-a-video-game-analogy] ) but I'm still fuzzy in places and am not ready to offer any nice buttoned-up theory. "Valence in arbitrary algorithms" is obviously even harder by far. :-)
A mesa-optimization perspective on AI valence and moral patienthood

Your interpretation is a good summary!

Re comment 1: Yes, sorry this was just meant to point at a potential parallel not to work out the parallel in detail. I think it'd be valuable to work out the potential parallel between the DM agent's predicate predictor module (Fig12/pg14) with my factored-noxiousness-object-detector idea. I just took a brief look at the paper to refresh my memory, but if I'm understanding this correctly, it seems to me that this module predicts which parts of the state prevent goal realization.

Re comment 2: Yes, this should read "(p... (read more)

1ofer1moI guess what I don't understand is how the "predicate predictor" thing can make it so that the setup is less likely to yield models that support morally relevant valence (if you indeed think that). Suppose the environment is modified such that the observation that the agent gets in each time step includes the value of every predicate in the reward specification. That would make the "predicate predictor" useless (I think; just from a quick look at the paper). Would that new setup be more likely than the original to yield models that have morally relevant valence?
Prepare for Counterfactual Donation Matching on Giving Tuesday, Dec. 1, 2020

Ah great, I have pledged. Is this new this year? Or maybe I didn't fill out the pledge last year; I don't remember.

2Gina_Stuessy1yHey Jacob, not new this year. The EA GT team has done an email list at least the past 2 years, but I bet all 3 past years they were involved in the Facebook match. This year we also have an option for pledgers to receive text reminders (U.S. phone #s only).
Prepare for Counterfactual Donation Matching on Giving Tuesday, Dec. 1, 2020

Would it make sense for the Giving Tuesday organization to send out an annual reminder email? I have re-categorized all of my EA newsletters, and so they don't go to my main inbox. Maybe most people have calendar events, or the like, set up. Maybe though for people who almost forgot about Giving Tuesday (like me) a reminder email could be useful!

2AviNorowitz1yHi Jacob. If you complete our sign-up form or our pledge form, then you'll be added to our mailing list and should receive reminders in future years. * Sign-up form: https://eagiv.org/signup [https://docs.google.com/forms/d/e/1FAIpQLSehY848Ya1F8GFA3GOs5E2xjY1Xa8HB8oKWXpItjMa654qX5Q/viewform] * Pledge form: https://eagiv.org/pledge [https://docs.google.com/forms/d/e/1FAIpQLSdRYhgnvQVlUcTvCZ2p2zM4Cve7sfED598u5mmTAbzvrng85g/viewform] You may also want to add a filter to direct emails from contact@eagivingtuesday.org [contact@eagivingtuesday.org] into your primary inbox.
Timeline Utilitarianism

The question of how to aggregate over time may even have important consequences for population ethics paradoxes. You might be interested in reading Vanessa Kosoy's theory here in which she sums an individual's utility over time with an increasing penalty over life-span. Although I'm not clear on the justification for these choices, the consequences may be appealing to many: Vanessa, herself, emphasizes the consequences on evaluating astronomical waste and factory farming.

Some learnings I had from forecasting in 2020

Agreed, I've been trying to help out a bit with Matt Barnett's new question here. Feedback period is still open, so chime in if you have ideas!

I suspect most Metaculites are accustomed to paying attention to how a question's operationalization deviates from its intent FWIW. Personally, I find the Montezuma's revenge criterion quite important without which the question would be far from AGI.

My intent with bringing up this question, was more to ask about how Linch thinks about the reliability of long-term predictions with no obvious frequentist-friendly trac... (read more)

Some learnings I had from forecasting in 2020

Sure at an individual level deference usually makes for better predictions, but at a community level deference-as-the-norm can dilute the weight of those who are informed and predict differently from the median. Excessive numbers of deferential predictions also obfuscate how reliable the median prediction is, and thus makes it harder for others to do an informed update on the median.

As you say, it's better if people contribute information where their relative value-add is greatest, so I'd say it's reasonable for people to have a 2:1 ratio of questions on ... (read more)

Some learnings I had from forecasting in 2020

Do your opinion updates extend from individual forecasts to aggregated ones? In particular how reliable do you think is the Metaculus median AGI timeline?

On the one hand, my opinion of Metaculus predictions worsened as I saw how the 'recent predictions' showed people piling in on the median on some questions I watch. On the other hand, my opinion of Metaculus predictions improved as I found out that performance doesn't seem to fall as a function of 'resolve minus closing' time (see https://twitter.com/tenthkrige/status/1296401128469471235). Are there some observations which have swayed your opinion in similar ways?

2Linch1yI think the best individual forecasters are on average better than the aggregate Metaculus forecasts at the moment they make the prediction. Especially if they spent a while on the prediction. I'm less sure if you account for prediction lag (The Metaculus and community predictions are usually better at incorporating new information), and my assessment for that will depend on a bunch of details. I think as noted by matthew.vandermerwe, the Metaculus question operationalization for "AGI" is very different from what our community typically uses. I don't have a strong opinion on whether a random AI Safety person will do better on that operationalization. For something closer to what EAs care about, I'm pretty suspicious of the current forecasts given for existential risk/GCR estimates (for example in the Ragnarok series [https://www.metaculus.com/questions/?search=cat:series--ragnarok]), and generally do not think existential risk researchers should strongly defer to them (though I suspect the forecasts/comments are good enough that it's generally worth most xrisk researchers studying the relevant questions to read).
4matthew.vandermerwe1yWith regards to the AGI timeline [https://www.metaculus.com/questions/3479/when-will-the-first-artificial-general-intelligence-system-be-devised-tested-and-publicly-known-of/] , it's important to note that Metaculus' resolution criteria are quite different from a 'standard' interpretation of what would constitute AGI[1] [#fn-cYoTLZgwWDryF8SCN-1], (or human-level AI[2] [#fn-cYoTLZgwWDryF8SCN-2], superintelligence[3] [#fn-cYoTLZgwWDryF8SCN-3], transformative AI, etc.). It's also unclear what proportion of forecasters have read this fine print (interested to hear others' views on this), which further complicates interpretation. -------------------------------------------------------------------------------- 1. OpenAI Charter [https://openai.com/charter/] ↩︎ [#fnref-cYoTLZgwWDryF8SCN-1] 2. expert survey [https://arxiv.org/abs/1705.08807] ↩︎ [#fnref-cYoTLZgwWDryF8SCN-2] 3. Bostrom [https://www.nickbostrom.com/superintelligence.html] ↩︎ [#fnref-cYoTLZgwWDryF8SCN-3]
4Pablo1yCan you say more about this? I ask because this behavior seems consistent with an attitude of epistemic deference towards the community prediction when individual predictors perceive it to be superior to what they can themselves predict given their time and ability constraints.
AMA: Tobias Baumann, Center for Reducing Suffering

What kinds of evidence and experience could induce you to update for/against the importance of severe suffering?

Do you believe that exposure to or experience of severe suffering would cause the average EA to focus more heavily on it?

Edit: Moving the question "Thinking counterfactually, what evidence and experiences caused you to have the views you do on severe suffering?" down here because it looks like other commenters already asked another version of it.

7Tobias_Baumann1yI would guess that actually experiencing certain possible conscious states, in particular severe suffering or very intense bliss, could significantly change my views, although I am not sure if I would endorse this as “reflection” or if it might lead to bias. It seems plausible (but I am not aware of strong evidence) that experience of severe suffering generally causes people to focus more on it. However, I myself have fortunately never experienced severe suffering, so that would be a data point to the contrary.
What FHI’s Research Scholars Programme is like: views from scholars

Out of the rejection pool, are there any avoidable failure modes that come to mind -- i.e. mistakes made by otherwise qualified applicants which caused rejection? For example, in a previous EA-org application I found out that I ought to have included more detail regarding potential roadblocks to my proposed research project. This seemed like a valuable point in retrospect, but somewhat unexpected given my experience with research proposals outside of EA.

EDIT: (Thanks to Rose for for answering this question individually and agreeing to let me share her ans... (read more)

My Meta-Ethics and Possible Implications for EA

Thanks for the lively discussion! We've covered a lot of ground, so I plan to try to condense what was said into a follow-up blog post making similar points as the OP but taking into account all of your clarifications.

I’m not sure how broadly you’re construing ‘meta-reactions’, i.e. would this include basically any moral view which a person might reach based on the ordinary operation of their intuitions and reason and would all of these be placed on an equal footing?

'Meta-reactions' are the subset of our universalizable preferences which express prefer

... (read more)
My Meta-Ethics and Possible Implications for EA

[From a previous DM comment]

For moral talk to be capable of serving this practical purpose we just need some degree of people being inclined to respond to the same kinds of things or to be persuaded to share the same attitudes. But this doesn’t require any particularly strong, near-universal consensus or consensus on a particular single thing being morally good/bad. [...] This seems compatible with very, very widespread disagreement in fact: it might be that people are disposed to think that some varying combinations of “fraternity, blood revenge, family

... (read more)
3David_Moss1yI’m afraid now the working week has begun again I’m not going to have so much time to continue responding, but thanks for the discussion. I’m thinking of the various things which fall under the Purity/Disgust [https://en.wikipedia.org/wiki/Moral_foundations_theory#The_five_foundations] (or Sanctity/Degradation) foundation in Haidt’s Moral Foundations Theory. This includes a lot of things related to not eating or otherwise exposing yourself to things which elicit disgust, as well as a lot of sexual morality. Rereading the law books of the Bible gives a lot of examples. The sheer prevalence of these concerns in ancient morality, especially as opposed to modern concerns like promoting positive feeling, is also quite telling IMO. For more on the distinctive role of disgust in morality see here [https://books.google.co.uk/books?hl=en&lr=&id=orL9AgAAQBAJ&oi=fnd&pg=PA111&dq=moral+purity+contamination&ots=apHcj8xkJR&sig=mjwMMZkuDJAaGVYN2X7vShLBux4&redir_esc=y#v=onepage&q=moral%20purity%20contamination&f=false] or here [https://www.researchgate.net/publication/241646937_On_Disgust_and_Moral_Judgment] . I’m not sure how broadly you’re construing ‘meta-reactions’, i.e. would this include basically any moral view which a person might reach based on the ordinary operation of their intuitions and reason and would all of these be placed on an equal footing? If so then I’m inclined to agree, but then I don’t think this account implies anything much at the practical level (e.g. how we should think about animals, population ethics etc.). I may agree with this if, per my previous comment, SMB is construed very broadly i.e. to mean roughly emphasising or making salient shared moral views (of any kind) to each other and persuading people to adopt new moral views. (See Wittgenstein on conversion [https://emergenceofrelativism.weebly.com/uploads/7/6/7/1/76714317/00_kusch_wittgensteins_on_certainty_and_relativism.pdf] for discussion of the latter). I think this may be misconstruing
My Meta-Ethics and Possible Implications for EA

Thanks for the long reply. I feel like our conversation becomes more meaningful as it goes on.

Thanks for clarifying. This doesn't change my response though since I don't think there's a particularly notable convergence in emotional reactions to observing others in pain which would serve to make valenced emotional reactions a particularly central part of the meaning of moral terms. For example, it seems to me like children (and adults) often think that seeing others in pain is funny (c.f. punch and judy shows or lots of other comedy), fun to inflict and o

... (read more)
2David_Moss1yLet me note that I agree (and think it’s uncontroversial) that people often have extreme emotional reactions (including moral reactions) to seeing things like people blown to bits in front of them. So this doesn’t seem like a crux in our disagreement (I think everyone, whatever their metaethical position, endorses this point). OK, so we also agree that people may have a host of innate emotional reactions to things (including, but not limited to valenced emotions). I think I responded to this point directly in the last paragraph of my reply. In brief: if no-one could ever be brought to share any moral views, this would indeed vitiate a large part (though not all) of the function of moral language. But this doesn’t mean “that the meaning of the moral terms depends on or involves consensus about the rightness or wrongness of specific moral things.” All that is required is “some degree of people being inclined to respond to the same kinds of things or to be persuaded to share the same attitudes. But this doesn’t require any particularly strong, near-universal consensus or consensus on a particular single thing being morally good/bad.” To approach this from another angle: suppose people are somewhat capable of being persuaded to share others views and maybe even, in fact, do tend to share some moral views (which I think is obviously actually true), although they may radically disagree to some extent. Now suppose that the meaning of moral language is just something like what I sketched out above (i.e. I disapprove of people who x, I disapprove of those who don’t disapprove of those who x etc.).* In this scenario it seems completely possible for moral language to function even though the meaning of moral terms themselves is (ex hypothesi) not tied up in any way with agreement that certain specific things are morally good/bad. *As I argued above, I also think that such a language could easily be learned without consensus on certain things being good or bad. Hmm, it so
My Meta-Ethics and Possible Implications for EA

I don't think there's a particularly noteworthy consensus about it being bad for other people to be in pain

Sorry, I should've been more clear about what I'm referring to. When you say "People routinely seem to think" and "People sometimes try to argue", I suspect we're talking past each other. I am not concerned with such learned behaviors, but rather with our innate neurologically shared emotional response to seeing someone suffering. If you see someone dismembered it must be viscerally unpleasant. If you see someone strike your mother as a toddler it

... (read more)
2David_Moss1yApologies in advance for long reply. Thanks for clarifying. This doesn't change my response though since I don't think there's a particularly notable convergence in emotional reactions to observing others in pain which would serve to make valenced emotional reactions a particularly central part of the meaning of moral terms. For example, it seems to me like children (and adults) often think that seeing others in pain is funny (c.f. punch and judy shows or lots of other comedy), fun to inflict and often well-deserved. And that's just among modern WEIRD [https://www.pnas.org/content/115/45/11401] children, who tend to be more Harm focused than non-WEIRD people. Plenty of other things seem equally if not more central to morality (though I am not arguing that these are central, or part of the meaning of moral terms). For example, I think there's a good case that people (and primates [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3690609/] for that matter) have innate moral reactions to (un)fairness: if a child is given some ice cream and is happy but then their sibling is given slightly more ice cream and is happy, they will react with moral outrage and will often demand either levelling down their sibling (at a cost to their pleasure) or even just directly inflicting suffering on their sibling. Indeed, children and primates (as well as adults) often prefer that no-one get anything than that an unjust allocation be made, which seems to count somewhat against any simple account of pleasant experience. I think innate reactions to do with obedience/disobedience and deference to authority, loyalty/betrayal, honesty/dishonesty etc. are equally central to morality and equally if not more prominent in the cases through which we actually learn morality. So it seems a bunch of other innate reactions may be central to morality and often morally mandate others suffering, so it doesn't seem likely to me that the very meaning of moral terms can be distinctively tied to the goodness
My Meta-Ethics and Possible Implications for EA

I had been accepted to study for a PhD on the implications of Wittgensteinian meta-philosophy for ethics.

Well, I for one, would've liked to have read the thesis! Wonderful, I suppose then most of my background talk was redundant. When it comes to mathematics, I found the arguments in Kripke's 'Wittgenstein on Rules and Private Language' quite convincing. I would love to see someone do an in depth translation applying everything Kripke says about arithmetic to total utilitarianism. I think this would be quite useful, and perhaps work well with my ideas h

... (read more)
2David_Moss1y>When it comes to mathematics, I found the arguments in Kripke's 'Wittgenstein on Rules and Private Language' quite convincing. I would love to see someone do an in depth translation applying everything Kripke says about arithmetic to total utilitarianism. I think this would be quite useful, and perhaps work well with my ideas here. That makes sense. I personally think that "Kripkenstein's" views are quite different from Wittgenstein's own views on mathematics. It seems there's a bit of a disanalogy between the case of simple addition and the case of moral language. In the case of addition we observe widespread consensus (no-one feels any inclination to start using quus for whatever reason). Conversely it seems to me that moral discourse is characterised by widespread disagreement i.e. we can sensibly disagree about whether it's right or wrong to torture, whether it's right or wrong for a wrongdoer to suffer, whether it's good to experience pleasure if it's unjustly earned and so on. This suggests to me that moral terms aren't defined by reference to certain concrete things we agree are good. >Yes, I agree that what I've been doing looks a lot like language policing, so let me clarify. Rather than claiming talk of population ethics etc. is invalid or incoherent, it would be more accurate to say I see it as apparently baseless and that I do not fully understand the connection with our other uses of moral language... insofar as they expect me to follow along with this extension (indeed insofar as they expect their conclusions about population ethics to have force for non-population-ethicists) they must explain how their extension of moral language follows from our shared ostensive basis for moral language and our shared inductive biases. My arguments have attempted to show that our shared ostensive basis for moral language does not straight-forwardly support talk of population ethics, because such talk does not share the same basis in negatively/positively valence
My Meta-Ethics and Possible Implications for EA

Thanks for the clarification, this certainly helps us get more concrete.

We don't need people to agree even slightly about whether chocolate/durian are tasty or yucky to learn the meanings of terms.

I agree that I was exaggerating my case. In durian-type-food-only worlds we would merely no longer expect 'X is tasty' to convey information to the listener about whether she/he should eat it. This difference does the work in the analogy with morality. Moral language is distinct from expression of other preferences in that we expect morality-based talk to be

... (read more)
2David_Moss1yJP: >I believe that we have much greater overlap in our emotional reaction to experiencing certain events e.g. being hit, and we have much greater overlap in our emotional reaction to witnessing certain painful events e.g. seeing someone lose their child to an explosion. I agree individuals tend to share an aversion to themselves being in pain. I don't think there's a particularly noteworthy consensus about it being bad for other people to be in pain or that it's good for other people to have more pleasure. People routinely seem to think that it's good for others to suffer and be indifferent about others experiencing more pleasure. People sometimes try to argue that people really only want people to suffer in order to reduce suffering, for example, but this doesn't strike me as particularly plausible or as how people characterise their own views when asked. So valenced experience doesn't strike me as having a particularly central place in ordinary moral psychology IMO. >I'm not clear on how it is distinct from desire and other preferences? If we did not have shared aversions to pain, and a shared aversion to seeing someone in pain, then moral language would no longer be distinguishable from talk of desire. I suspect you again disagree here, so perhaps you could clarify how, on your account, we learn to distinguish moral injunctions from personal preference based injunctions? Sure, I just think that moral language differs from desire-talk in various ways unrelated to the specific objects under discussion, i.e. they express different attitudes and perform different functions. For example, if I say "I desire that you give me $10" merely communicates that I would like you to give me $10, there's no implication that you would be apt for disapproval if you didn't. But if I say "It is morally right that you give me $10" this communicates that you would be wrong not to give me $10 and would be apt for disapproval if you did not. (I'm not committed to this particular ana
My Meta-Ethics and Possible Implications for EA

Here's another way of explaining where I'm coming from. The meaning of our words is set by ostensive definition plus our inductive bias. E.g. when defining red and purple we agree upon some prototypical cases of red and purple by perhaps pointing at red and saying 'red'. Then upon seeing maroon for the first time, we call it red because our brains process maroon in a similar way to how they process red. (Incidentally, the first part -- pointing at red -- is also only meaningful because we share inductive biases around pointing and object boundaries.) Of co

... (read more)
2David_Moss1y>I privilege uses of moral language as applied to experiences and in particular pain/pleasure because these are the central cases over which there is agreement, and from which the other uses of moral language flow... I do agree that injunctions may perhaps be the first use we learn of 'bad', but the use of 'bad' as part of moral language necessarily connects with its use in referring to pain and pleasure, otherwise it would be indistinguishable from expressions of desire/threats on the part of the speaker. OK, on a concrete level, I think we just clearly just disagree about how central references to pleasure and pain are in moral language or how necessary they are. I don't think they are particularly central, or even that there is much more consensus about the moral badness of pain/goodness of pleasure than about other issues (e.g. stealing others' property, lying, loyalty/betrayal). It also sounds like you think that for us to learn the meaning of moral language there needs to be broad consensus about the goodness/badness of specific things (e.g. pleasure/pain). I don't think this is so. Take the tastiness example: we don't need people to agree even slightly about whether chocolate/durian are tasty or yucky to learn the meanings of terms. We can observe that when people say chocolate/durian is tasty they go "mmm", display characteristic facial expressions and eat more of it and seek to acquire more in the future, whereas when they say chocolate/durian is yucky they say "eugh" display other characteristic facial expressions, stop eating it and show disinterest in acquiring more in the future. We don't need any agreement at all, as far as I can tell, about which specific things are tasty or yucky to learn the meaning of the terms. Likewise with moral language, I don't think we broadly need widespread agreement about whether specific things are good/bad to learn that if someone says something is "bad" this means they don't want us to do it, they disapprove of it and
My Meta-Ethics and Possible Implications for EA

Thank you for following up, and sorry that I haven't been able to respond as succinctly or clearly as I would've liked. I hope to write a follow up post which more clearly describes the flow of ideas from those contained in my comments to the original blog post as your comments have helped me see where my background assumption are likely do differ from others'.

I see now that it would be better to take a step back to explain at a higher level where I'm coming from. My line of reasoning follows from the ideas of the later Wittgenstein: many words have meanin

... (read more)
2David_Moss1yThanks for your reply. I'm actually very sympathetic to Wittgenstein's account of language: before I decided to move to an area with higher potential impact, I had been accepted to study for a PhD on the implications of Wittgensteinian meta-philosophy for ethics. (I wouldn't use the term metaphilosophy in this context of course, since I was largely focused on the view expressed in PI 119 that "…we may not advance any kind of theory. There must not be anything hypothetical in our considerations. We must do away with all explanation, and description alone must take its place.") All that said, it seems we disagree in quite a few places. DM: JP: I don't think our use of language is limited to the kinds of cases through which we initially learn the use of particular terms. For example, we learn the use of numbers through exceptionally simple cases "If I have one banana and then another banana, I have two bananas" and then later get trained in things like multiplication etc., but then we clearly go on to use mathematical language in much more complex and creative ways, which include extending the language in radical ways. It would be a mistake to conclude that we can't do these things because they go beyond the uses we initially learn and note that Wittgenstein doesn't say this either in his later work in the philosophy of mathematics. I agree it's a common Wittgensteinian move to say that our use of language breaks down when we extend it inappropriately past ordinary usage- but if you look at Wittgenstein's treatment of mathematics it certainly does not tell mathematicians to stop doing the very complex mathematical speculation which is far removed from the ways in which we are initially trained in mathematics. Indeed, I think it's anti-Wittgensteinian to attempt to interfere with or police the way people ordinarily use language in this way. Of course, the Wittgensteinian can call into question certain ways of thinking (e.g. that our ordinary mathematical practice im
My Meta-Ethics and Possible Implications for EA

Yes, thanks for clarifying. I believe that it is necessarily harder to make correct judgements in the domain of population ethics. My stronger claim is that any such judgements, even if correct, only carry force as mediated through our 'call to universality' meta-emotion. Hence, even if we have the right population axiology, this likely should not over-ride our more mundane moral intuitions.

My Meta-Ethics and Possible Implications for EA

Thanks for bringing up these points, I should've been more careful with these distinctions.

The learned meaning of moral language refers to our recollection/reaction to experiences. These reactions include approval, preferences and beliefs. I suspect that of these, approval is learned first. I imagine a parent harshly pronouncing 'Bad!' after a toddler gets singed wandering to close to a fire. Preferences enter the picture when we try to extend our use of moral language beyond the simple cases learned as a child. When we try to compare two things that are a

... (read more)
2David_Moss1yThanks for the reply. I guess I'm still confused about what specific attitudes you see as involved in moral judgments, whether approval, preferences, beliefs or some more complex combination of these etc. It sounds like you see the genealogy of moral terms as involving a melange of all of these, which seems to leave the door quite open as to what moral terms actually mean. It does sound though, from your reply, that you do think that moral language exclusively concerns experiences (and our evaluations of experiences). If so, that doesn't seem right to me. For one, it seems that the vast majority of people (outside of welfarist EA circles) don't exclusively or even primarily make moral judgements or utterances which are about the goodness or badness of experiences (even indirectly). It also doesn't seem to me like the kind of simple moral utterances which ex hypothesi train people in the use of moral language at an early age primarily concern experiences and their badness (or preferences for that matter). It seems equally if not more plausible to speculate that such utterances typically involve injunctions (with the threat of punishment and so on). Thanks for addressing this. This still isn't quite clear to me i.e. what exactly is meant by 'how would you react as person W who observes X and Y'? What conditions of W observing X and Y are required?. For example, does it only specifically refer to how I would react if I were directly observing an act of torture in the room or does it permit broader 'observations' i.e. I can observe that there is such-and-such level of inequality in the distribution of income in a society. The more restrictive definitions don't seem adequate to me to capture how we actually use moral language, but the more permissive ones, which are more adequate, don't seem to suffice to rule out me making judgements about the repugnant conclusion and so on. I agree that answers to population ethics aren't directly entailed by the definition of moral
What questions would you like to see forecasts on from the Metaculus community?

I think the 'Diets of EAs' question could be a decent proxy for the prominence of animal welfare within EA. I think there are similar questions on metaculus for the general US population https://www.metaculus.com/questions/?order_by=-activity&search=vegetarian

I don't see the ethics question as all that useful, since I think most of population ethics presupposes some form of consequentialism.

2alexrjl1yIt looks like a different part of the survey asked about cause prioritisation directly, which seems like it could be closer to what you wanted, my current plan (5 questions) for how to use the survey is here. [https://docs.google.com/document/d/1MgbdH3HNEmirKZ4ElGXVFPxyURmo4ohrtHArWASL6wk/edit]
What questions would you like to see forecasts on from the Metaculus community?

Somewhat unrelated, but I'll leave this thought here anyway: Maybe EA metaculus users could perhaps benefit from posting question drafts as short-form posts on the EA forum.

3alexrjl1yI'm kind of hoping that this thread ends up serving that purpose. There's also a thread [https://www.metaculus.com/questions/956/discussion-topic-what-are-some-suggestions-for-questions-to-launch/] on metaculus where people can post ideas, the difference there is nobody's promising to write them up, and they aren't necessarily EA ideas, but I thought it was worth mentioning. (I do have some thoughts on the top level answer here, but don't have time to write them now, will do soon)
What questions would you like to see forecasts on from the Metaculus community?

Thanks for doing this, great idea! I think Metaculus could provide some valuable insight into how society's/EA's/philosophy's values might drift or converge over the coming decades.

For instance, I'm curious about where population ethics will be in 10-25 years. Something like, 'In 2030 will the consensus within effective altruism be that "Total utilitarianism is closer to describing our best moral theories than average utilitarianism and person affecting views"?'

Having your insight on how to operationalize this would be useful, since I'm not very happy with

... (read more)
6MichaelA1yI'd also be interested in forecasts on these topics. It seems to me that there'd be a risk of self-fulfilling prophecies. That is, we'd hope that what'd happen is: 1. a bunch of forecasters predict what the EA community would end up believing after a great deal of thought, debate, analysis, etc. 2. then we can update ourselves closer to believing that thing already, which could help us get to better decisions faster. ...But what might instead happen is: 1. a relatively small group of forecasters makes relatively unfounded forecasts 2. then the EA community - which is relatively small, unusually connected to Metaculus, and unusually interested in forecasts - updates overly strongly on those forecasts, thus believing something that they wouldn't otherwise have believed and don't have good reasons to believe (Perhaps this is like a time-travelling information cascade [https://en.wikipedia.org/wiki/Information_cascade]?) I'm not saying the latter scenario is more likely than the former, nor that this means we shouldn't solicit these forecasts. But the latter scenario seems likely enough to perhaps be an argument against soliciting these forecasts, and to at least be worth warning readers about clearly and repeatedly if these forecasts are indeed solicited. Also, this might be especially bad if EAs start noticing that community beliefs are indeed moving towards the forecasted future beliefs, and don't account sufficiently well for the possibility that this is just a self-fulfilling prophecy, and thus increase the weight they assign to these forecasts. (There could perhaps be a feedback loop.) I imagine there's always some possibility that forecasts will influence reality in a way that makes the forecasts more or less likely to come true that they would've been otherwise. But this seems more-than-usually-likely when forecasting EA community beliefs (compared to e.g. forecasting geopolitical events).
2alexrjl1yThe best operationalisation here I can see is asking that we are able to attach a few questions if this form to the 2030 EA survey, then asking users to predict what the results will be. If we can get some sort of pre-commitment from whoever runs the survey to include the answers, even better. One thing to think about (and maybe for people to weigh in on here) is that as you get further out in time there's less and less evidence that forecasting performs well. It's worth considering a 2025 date for these sorts of questions too for that reason.
1jacobpfau1ySomewhat unrelated, but I'll leave this thought here anyway: Maybe EA metaculus users could perhaps benefit from posting question drafts as short-form posts on the EA forum.
AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

You discuss at one point in the podcast the claim that as AI systems take on larger and larger real world problems, the challenge of defining the reward function will become more and more important. For example for cleaning, the simple number-of-dust-particles objective is inadequate because we care about many other things e.g. keeping the house tidy and many side constraints e.g. avoiding damaging household objects. This isn't quite an argument for AI alignment solving itself, but it is an argument that the attention and resources poured into AI alignment

... (read more)
EA Forum feature suggestion thread

Perhaps include a short form subsection under the Forum Favorites section? It seems to me that most short form posts have very low visibility.

If the forum admins have traffic statistics, they should be able to get a better sense of the visibility issue than I can. In particular, I suspect the short form section receives a fraction of the traffic of the frontpage, but this should be verified empirically.

Moral Anti-Realism Sequence #3: Against Irreducible Normativity

I enjoyed reading this post! I like Wittgensteinian arguments, and applying them to ethics, so hurrah for this. There was also some lively discussion of it on the EA corner chat.

Another possible misleading motivation for irreducible normativity may be linguistic. It seems to me plausible that anyone who uses the word agony in the standard sense is committing her/himself to agony being undesirable. This is not an argument for irreducible normativity, but it may give you a feeling that there is some intrinsic connection underlying the set of self-evident cas

... (read more)
0antimonyanthony1yCould you please clarify this? As someone who is mainly convinced of irreducible normativity by the self-evident badness of agony - in particular, considering the intuition that someone in agony has reason to end it even if they don't consciously "desire" that end - I don't think this can be dissolved as a linguistic confusion. It's true that for all practical purposes humans seem not to desire their own pain/suffering. But in my discussions with some antirealists they have argued that if a paperclip maximizer, for example, doesn't want not to suffer (by hypothesis all it wants is to maximize paperclips), then such a being doesn't have a reason to avoid suffering. That to me seems patently unbelievable. Apologies if I've misunderstood your point!
jacobpfau's Shortform

Yes, I recently asked a metaculus mod about this, and they said they're hoping to bring back the ai.metaculus sub-domain eventually. For now, I'm submitting everything to the metaculus main domain.

jacobpfau's Shortform

Medium term AI forecasting with Metaculus

I'm working on a collection of metaculus.com questions intended to generate AI domain specific forecasting insights. These questions are intended to resolve in the 1-15 year range, and my hope is that if they're sufficiently independent, we'll get a range of positive and negative resolutions which will inform future forecasts.

I've already gotten a couple of them live, and am hoping for feedback on the rest:

1. When will AI out-perform humans on argument reasoning tasks?

2. When will multi-modal ML out-perform uni-moda

... (read more)
2Lukas_Gloor1yYou might be familiar with https://ai.metaculus.com/questions/. [https://ai.metaculus.com/questions/.] It went dormant unfortunately.
Taking advantage of climate change concerns to channel donations to EA-recommended organizations at low marginal cost (proposal and call for more research)

True, it seems like solar-aid's own estimate these days suggests around $5 per tonne. I can't find a more recent external review unfortunately.

Taking advantage of climate change concerns to channel donations to EA-recommended organizations at low marginal cost (proposal and call for more research)

This is a great point!

I'm somewhat hesitant about the CATF recommendation though. After a brief skim of the Founder's Pledge report, looks like they broke down CATF's efforts into three projects which have/are working out well. If we assume that Founder's Pledge reviewed a number of public advocacy/lobbying groups there's likely to have been a multiple testing issue. In that light, the retrospective cost of CO2e/$ estimate may not be predictive of their future CO2e/$ ratio. That said so long as the majority of their funds go into t... (read more)

1ianps2yThanks, I was not aware (or read it long ago and forgot) of SolarAid, and I particularly like the associated health and economical benefits. But it's hard to make recommendations based on an evaluation from 2013 without at least a confirmatory follow up. Regarding the low costs to offset, indeed, I got incredulous looks and comments about the cost to offset the carbon emissions from the entire company for a whole year as being too low to be accurate... I would say that there was even willingness to spend more (i.e., offset more than what we calculated) since the costs were so cheap. I would need a good argument for why the company should do that, but maybe I can find some for next year's calculation.