All of Jessica_Taylor's Comments + Replies

Another modeling issue is that each individual variable is log-normal rather than normal/uniform. This means that while probability of success is "0.01 to 0.1", suggesting 5.5% as the "average", the actual computed average is 4%. This doesn't make a big difference on its own but it's important when multiplying together lots of numbers. I'm not sure that converting log-normal to uniform would in general lead to better estimates but it's important to flag.

7
Ozzie Gooen
2y
Quick point that I'm fairly suspicious of uniform distributions for such uncertainties.  I'd agree that our format of a 90% CI can be deceptive, especially when people aren't used to it. I imagine it would eventually be really neat to have probability distribution support right in the EA Forum. Until then, I'm curious if there are better ways to write the statistical summaries of many variables.  To me, "0.01 to 0.1" doesn't suggest that 5.5% is the "mean", but I could definitely appreciate that others would think that. 

Why did you take the mean $/QALY instead of mean QALY/$ (which expected value analysis would suggest)? When I do that I get $5000/QALY as the mean.

 

2
NunoSempere
2y
Post I mentioned in the comments below now here: Probability distributions of Cost-Effectiveness can be misleading
4
NunoSempere
2y
I've now edited the post to reference mean(QALYs)/mean($). You can find this by ctrl+f for "EDIT 22/06/2022" and under the graph charts. Note that I've used mean($)/mean(QALYS) ($8k) rather than 1/mean(QALYs/$) ($5k), because it seems to me that is more the quantity of interest, but I'm not hugely certain of this.
3
NunoSempere
2y
Because I didn't think about it and I liked having numbers which were more interpretable, e.g. 3.4*10^-6 QALYs/$ is to me less interpretable that $290k/QALY, and same with 7.7 * 10^-4 vs $1300/QALY. Another poster reached out and mentioned he was writing a post about this particular mistake, so I thought I'd leave the example up.
4
Jessica_Taylor
2y
Another modeling issue is that each individual variable is log-normal rather than normal/uniform. This means that while probability of success is "0.01 to 0.1", suggesting 5.5% as the "average", the actual computed average is 4%. This doesn't make a big difference on its own but it's important when multiplying together lots of numbers. I'm not sure that converting log-normal to uniform would in general lead to better estimates but it's important to flag.

I agree that:

  1. clarifying "what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?" seems like a good question to get agreement on for coordination reasons
  2. we should be looking for tractable ways of answering this question

I think:

a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simp... (read more)

However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can't have, even theoretically.

I'm still not sure why you strongly think there's _no_ principled way; it seems hard to prove a negative. I mentioned that we could make progress on logical counterfactuals; there's also the approach Chalmers talks about here. (I buy that there's reason to suspect there's no principled way if you're not impressed by any proposal so far).

And whenever we have multi

... (read more)
0
MikeJohnson
7y
What you're saying seems very reasonable; I don't think we differ on any facts, but we do have some divergent intuitions on implications. I suspect this question-- whether it's possible to offer a computational description of moral value that could cleanly 'compile' to physics-- would have non-trivial yet also fairly modest implications for most of MIRI's current work. I would expect the significance of this question to go up over time, both in terms of direct work MIRI expects to do, and in terms of MIRI's ability to strategically collaborate with other organizations. I.e., when things shift from "let's build alignable AGI" to "let's align the AGI", it would be very good to have some of this metaphysical fog cleared away so that people could get on the same ethical page, and see that they are in fact on the same page.

I suspect this still runs into the same problem-- in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.

It seems like you're saying here that there won't be clean rules for determining logical counterfactuals? I agree this might be the case but it doesn't seem clear to me. Logical counterfactuals seem pretty confusing and there seems to be a lot of room for better theories about them.

This is an impo

... (read more)
1
MikeJohnson
7y
Right, and I would argue that logical counterfactuals (in the way we've mentioned them in this thread) will necessarily be intractably confusing, because they're impossible to do cleanly. I say this because in the "P & C" example above, we need a frame-invariant way to interpret a change in C in terms of P. However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can't have, even theoretically. (Unless we define both physics and computation through something like constructor theory... at which point we're not really talking about Turing machines as we know them-- we'd be talking about physics by another name.) This is a big part of the reason why I'm a big advocate of trying to define moral value in physical terms: if we start with physics, then we know our conclusions will 'compile' to physics. If instead we start with the notion that 'some computations have more moral value than others', we're stuck with the problem-- intractable problem, I argue-- that we don't have a frame-invariant way to precisely identify what computations are happening in any physical system (and likewise, which aren't happening). I.e., statements about computations will never cleanly compile to physical terms. And whenever we have multiple incompatible interpretations, we necessarily get inconsistencies, and we can prove anything is true (i.e., we can prove any arbitrary physical system is superior to any other). Does that argument make sense? ... that said, it would seem very valuable to make a survey of possible levels of abstraction at which one could attempt to define moral value, and their positives & negatives. Totally agreed!

Thanks for your comments too, I'm finding them helpful for understanding other possible positions on ethics.

With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings.

OK, how about a rule like this:

Physical system P embeds computation C if and only if P has different behavior counterfactu

... (read more)
1
MikeJohnson
7y
I suspect this still runs into the same problem-- in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P. This is an important question: if there exists no clean quarkscomputations mapping, is it (a) a relatively trivial problem, or (b) a really enormous problem? I'd say the answer to this depends on how we talk about computations. I.e., if we say "the ethically-relevant stuff happens at the computational level" -- e.g., we shouldn't compute certain strings-- then I think it grows to be a large problem. This grows particularly large if we're discussing how to optimize the universe! :) Let me push back a little here- imagine we live in the early 1800s, and Faraday was attempting to formalize electromagnetism. We had plenty of intuitive rules of thumb for how electromagnetism worked, but no consistent, overarching theory. I'm sure lots of people shook their head and said things like, "these things are just God's will, there's no pattern to be found." However, it turns out that there was something unifying to be found, and tolerance of inconsistencies & nebulosity would have been counter-productive. Today, we have intuitive rules of thumb for how we think consciousness & ethics work, but similarly no consistent, overarching theory. Are consciousness & moral value like electromagnetism-- things that we can discover knowledge about? Or are they like elan vital-- reifications of clusters of phenomena that don't always cluster cleanly? I think the jury's still out here, but the key with electromagnetism was that Faraday was able to generate novel, falsifiable predictions with his theory. I'm not claiming to be Faraday, but I think if we can generate novel, falsifiable predictions with work on consciousness & valence (I offer some in Section XI, and observations that could be adapted to make falsifiable predictions in Se

(more comments)

Thus, we would need to be open to the possibility that certain interventions could cause a change in a system’s physical substrate (which generates its qualia) without causing a change in its computational level (which generates its qualia reports)

It seems like this means that empirical tests (e.g. neuroscience stuff) aren't going to help test aspects of the theory that are about divergence between computational pseudo-qualia (the things people report on) and actual qualia. If I squint a lot I could see "anthropic evidence" be... (read more)

0
MikeJohnson
7y
Yes, the epistemological challenges with distinguishing between ground-truth qualia and qualia reports are worrying. However, I don't think they're completely intractable, because there is a causal chain (from Appendix C): Our brain's physical microstates (perfectly correlated with qualia) --> The logical states of our brain's self-model (systematically correlated with our brain's physical microstates) --> Our reports about our qualia (systematically correlated with our brain's model of its internal state) .. but there could be substantial blindspots, especially in contexts where there was no adaptive benefit to having accurate systematic correlations.

some more object-level comments on PQ itself:

We can say that a high-level phenomenon is strongly emergent​ with respect to a low-level domain when the high-level phenomenon arises from the low-level domain, but truths concerning that phenomenon are not deducible even in principle from truths in the low-level domain.

Suppose we have a Python program running on a computer. Truths about the Python program are, in some sense, reducible to physics; however the reduction itself requires resolving philosophical questions. I don't know if this means the Pytho... (read more)

2
MikeJohnson
7y
Awesome, I do like your steelman. More thoughts later, but just wanted to share one notion before sleep: With regard to computationalism, I think you've nailed it. Downward causation seems pretty obviously wrong (and I don't know of any computationalists that personally endorse it). Totally agreed, and I like this example. Right- but I would go even further. Namely, given any non-trivial physical system, there exists multiple equally-valid interpretations of what's going on at the computational level. The example I give in PQ is: let's say I shake a bag of popcorn. With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings. Am I causing suffering by shaking that bag of popcorn? And I think all computation is like this, if we look closely- there exists no frame-invariant way to map between computation and physical systems in a principled way... just useful mappings, and non-useful mappings (and 'useful' is very frame-dependent). This introduces an inconsistency into computationalism, and has some weird implications: I suspect that, given any computational definition of moral value, there would be a way to prove any arbitrary physical system morally superior to any other arbitrary physical system. I.e., you could prove both that A>B, and B>A. ... I may be getting something wrong here. But it seems like the lack of a clean quarksbits mapping ultimately turns out to be a big deal, and is a big reason why I advocate not trying to define moral value in terms of Turing machines & bitstrings. Instead, I tend to think of ethics as "how should we arrange the [quarks|negentropy] in our light-cone?" -- ultimately we live in a world of quarks, so ethics is a question of quarks (or strings, or whatnot). Howeve
3
Jessica_Taylor
7y
(more comments) It seems like this means that empirical tests (e.g. neuroscience stuff) aren't going to help test aspects of the theory that are about divergence between computational pseudo-qualia (the things people report on) and actual qualia. If I squint a lot I could see "anthropic evidence" being used to distinguish between pseudo-qualia and qualia, but it seems like nothing else would work. I'm also not sure why we would expect pseudo-qualia to have any correlation with actual qualia? I guess you could make an anthropic argument (we're viewing the world from the perspective of actual qualia, and our sensations seem to match the pseudo-qualia). That would give someone the suspicion that there's some causal story for why they would be synchronized, without directly providing such a causal story. (For the record I think anthropic reasoning is usually confused and should be replaced with decision-theoretic reasoning (e.g. see this discussion), but this seems like a topic for another day)

Thanks for the response; I've found this discussion useful for clarifying and updating my views.

However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it's tractable, based on work already done.)

Suppose we live under the wr... (read more)

0
MikeJohnson
7y
I generally agree with this-- getting it right eventually is the most important thing; getting it wrong for 100 years could be horrific, but not an x-risk. I do worry some that "trusted reflection process" is a sufficiently high-level abstraction so as to be difficult to critique. Interesting piece by Christiano, thanks! I would also point to a remark I made above, that doing this sort of ethical clarification now (if indeed it's tractable) will pay dividends in aiding coordination between organizations such as MIRI, DeepMind, etc. Or rather, by not clarifying goals, consciousness, moral value, etc, it seems likely to increase risks of racing to be the first to develop AGI, secrecy & distrust between organizations, and such. A lot does depend on tractability.

I expect:

  1. We would lose a great deal of value by optimizing the universe according to current moral uncertainty, without the opportunity to reflect and become less uncertain over time.

  2. There's a great deal of reflection necessary to figure out what actions moral theory X recommends, e.g. to figure out which minds exist or what implicit promises people have made to each other. I don't see this reflection as distinct from reflection about moral uncertainty; if we're going to defer to a reflection process anyway for making decisions, we might as well let that reflection process decide on issues of moral theory.

Some thoughts:

IMO the most plausible non-CEV proposals are

  1. Act-based agents, which defer to humans to a large extent. The goal is to keep humans in control of the future.
  2. Task AI, which is used to accomplish concrete objectives in the world. The idea would be to use this to accomplish goals people would want accomplished using AI (including reducing existential risk), while leaving the future moral trajectory in the hands of humans.

Both proposals end up deferring to humans to decide the long-run trajectory of humanity. IMO, this isn't a coincidence; ... (read more)

3
MikeJohnson
7y
Hi Jessica, Thanks for the thoughtful note. I do want to be very clear that I’m not criticizing MIRI’s work on CEV, which I do like very much! - It seems like the best intuition pump & Schelling Point in its area, and I think it has potential to be more. My core offering in this space (where I expect most of the value to be) is Principia Qualia- it’s more up-to-date and comprehensive than the blog post you’re referencing. I pose some hypotheticals in the blog post, but it isn’t intended to stand alone as a substantive work (whereas PQ is). But I had some thoughts in response to your response on valence + AI safety: ->1. First, I agree that leaving our future moral trajectory in the hands of humans is a great thing. I’m definitely not advocating anything else. ->2. But I would push back on whether our current ethical theories are very good- i.e., good enough to see us through any future AGI transition without needlessly risking substantial amounts of value. To give one example: currently, some people make the claim that animals such as cows are much more capable of suffering than humans, because they don’t have much intellect to blunt their raw, emotional feeling. Other people make the claim that cows are much less capable of suffering than humans, because they don’t have the ‘bootstrapping strange loop’ mind architecture enabled by language, and necessary for consciousness. Worryingly, both of these arguments seem plausible, with no good way to pick between them. Now, I don’t think cows are in a strange quantum superposition of both suffering and not suffering— I think there’s a fact of the matter, though we clearly don’t know it. This example may have moral implications, but little relevance to existential risk. However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly
0
kbog
7y
What about moral uncertainty as an alternative to CEV? (https://nebula.wsimg.com/1cc278bf0e7470c060032c9624508149?AccessKeyId=07941C4BD630A320288F&disposition=0&alloworigin=1)

I share Open Phil’s view on the probability of transformative AI in the next 20 years. The relevant signposts would be answers to questions like “how are current algorithms doing on tasks requiring various capabilities”, “how much did this performance depend on task-specific tweaking on the part of programmers”, “how much is performance projected to improve due to increasing hardware”, and “do many credible AI researchers think that we are close to transformative AI”.

In designing the new ML-focused agenda, we imagined a concrete hypothetical (which isn’t ... (read more)

I’ll start by stating that, while I have some intuitions about how the paper will be received, I don’t have much experience making crisp forecasts, and so I might be miscalibrated. That said:

  1. In my experience it’s pretty common for ML researchers who are more interested in theory and general intelligence to find Solomonoff induction and AIXI to be useful theories. I think “Logical Induction” will be generally well-received among such people. Let’s say 70% chance that at least 40% of ML researchers who think AIXI is a useful theory, and who spend a coupl
... (read more)

I agree with Nate that there isn’t much public on this yet. The AAMLS agenda is predicated on a relatively pessimistic scenario: perhaps we won’t have much time before AGI (and therefore not much time for alignment research), and the technology AI systems are based on won’t be much more principled than modern-day deep learning systems. I’m somewhat optimistic that it’s possible to achieve good outcomes in some pessimistic scenarios like this one.

1
Sursee
7y
Totally agreed with Jessica Taylor! she said it all very well!

I think that the ML-related topics we spend the most effort on (such as those in the ML agenda) are currently quite neglected. See my other comment for more on how our research approach is different from that of most AI researchers.

It’s still plausible that some of the ML-related topics we research would be researched anyway (perhaps significantly later). This is a legitimate consideration that is, in my view, outweighed by other considerations (such as the fact that less total safety research will be done if AGI comes soon, making such timelines more... (read more)

The ideal MIRI researcher is someone who’s able to think about thorny philosophical problems and break off parts of them to formalize mathematically. In the case of logical uncertainty, researchers started by thinking about the initially vague problem of reasoning well about uncertain mathematical statements, turned some of these thoughts into formal desiderata and algorithms (producing intermediate possibility and impossibility results), and eventually found a way to satisfy many of these desiderata at once. We’d like to do a lot more of this kind of wo... (read more)

Here's a third resolution. Consider a utility function that is a weighted sum of:

  1. how close a region's population level is to the "ideal" population level for that region (i.e. not underpopulated or overpopulated)
  2. average utility of individuals in this region (not observer-moments in this region)

AMF is replacing lots of lives that are short (therefore low-utility) with fewer lives that are long (therefore higher utility), without affecting population level much. The effect of this could be summarized as "35 DALYs", as in "we i... (read more)