All of AlexMennen's Comments + Replies

A Paradox for Tiny Probabilities and Enormous Values

Timidity seems unobjectionable to me, and the arguments against it in section 3 seem unconvincing.

3.1: Marginal utility in number of lives already dropping off very steeply by 1000 seems implausible, but if we replace 1000 with a sufficiently large number, an agent with a bounded utility function would deny that these prospects keep getting better for the same (rational, imo) reasons they would eventually stop taking the devil's deals to get more years of happy life with high probability.

3.2: It seems perfectly reasonable to me to selectively create valuab... (read more)

Principled extremizing of aggregated forecasts

I am more hesitant to recommend the more complex extremization method where we use the historical baseline resolution log-odds

It's the other way around for me. Historical baseline may be somewhat arbitrary and unreliable, but so is 1:1 odds. If the motivation for extremizing is that different forecasters have access to independent sources of information to move them away from a common prior, but that common prior is far from 1:1 odds, then extremizing away from 1:1 odds shouldn't work very well, and historical baseline seems closer to a common prior than 1... (read more)

3Jaime Sevilla8mo
Thanks for chipping in Alex! Agreed! To give some nuance to my recommendation, the reason I am hesitant is mainly because of lack of academic precedent (as far as I know). Note that the data backs this up! Using "pseudo-historical" odds is quite better than using 1:1 odds. See the appendix for more details. I'd be interested in seeing the results of such experiments using Metaculus data! This one is trippy, I like it!
When pooling forecasts, use the geometric mean of odds

In fact I am quite puzzled by the fact that neither the average of probabilities nor the average of log odds seem to satisfy the basic invariance property of respecting annualized probabilities.

I think I can make sense of this. If you believe there's some underlying exponential distribution on when some event will occur, but you don't know the annual probability, then an exponential distribution is not a good model for your beliefs about when the event will occur, because a weighted average of exponential distributions with different annual probabilities i... (read more)

When pooling forecasts, use the geometric mean of odds

I do agree that when new evidence comes in about the experts  we should change how we weight them. But when we are pooling the probabilities we aren't receiving any extra evidence about the experts (?).

Right, the evidence about the experts come from the new evidence that's being updated on, not the pooling procedure. Suppose we're pooling expert judgments, and we initially consider them all equally credible, so we use a symmetric pooling method. Then some evidence comes in. Our experts update on the evidence, and we also update on how credible each ex... (read more)

When pooling forecasts, use the geometric mean of odds

I wrote a post arguing for the opposite thesis, and was pointed here. A few comments about your arguments that I didn't address in my post:

Regarding the empirical evidence supporting averaging log odds, note that averaging log odds will always give more extreme pooled probabilities than averaging probabilities does, and in the contexts in which this empirical evidence was collected, the experts were systematically underconfident, so that extremizing the results could make them better calibrated. This easily explains why average log odds outperformed averag... (read more)

3Jaime Sevilla9mo
Thank you for your thoughful reply. I think you raise interesting points, which move my confidence in my conclusions down. Here are some comments As in your post, averaging the probs effectively erases the information from extreme individual probabilities, so I think you will agree that averaging log odds is not merely a more extreme version of averaging probs. I nonetheless think this is a very important issue - the difficulty of separating the extremizing effect of log odds from its actual effect. This is an empirical question that we can settle empirically. Using Simon_M's script [] I computed the Brier and log scores for binary Metaculus questions of the extremized means and extremized log odds and extremizing factors between 1 and 3 in intervals of 0.05. In this setting, the top performing metrics are the "optimally" extremized average log odds in term of log loss, surpassing the "optimally" extremized mean of probs. Note that the Brier scores are identical, which is consistent with the average log odds outperforming the average probs only when extreme forecasts are involved. Also notice that the optimal extremizing factor for the average of logodds is lower than for the average of probabilities - this relates to your observation that the average log odds are already relatively extremized compared to the mean of probs. There are reasons to question the validity of this experiment - we are effectively overfitting the extremizing factor to whatever gives the best results [] . And of course this is just one experiment. But I find it suggestive. I am not sure I follow your argument here. I do agree that when new evidence comes in about the
Announcing the Buddhists in EA Group
Thus, I present to you, the Buddhists in EA Facebook group.

Dead link. It says "Sorry, this content isn't available right now
The link you followed may have expired, or the page may only be visible to an audience you're not in."

1G Gordon Worley III3y
Hmm, I'm not sure. The group is set to be publicly visible so anyone should be able to find it and ask to join, although it's a "private" group meaning only members can see who else are members and can see posts. The link is live and works for me, so I'm not sure. As an alternative you can search "Buddhists in Effective Altruism" on Facebook and that should find the group.
Why I think the Foundational Research Institute should rethink its approach

My critique of analytic functionalism is that it is essentially nothing but an assertion of this vagueness.

That's no reason to believe that analytic functionalism is wrong, only that it is not sufficient by itself to answer very many interesting questions.

Without a bijective mapping between physical states/processes and computational states/processes, I think my point holds.

No, it doesn't. I only claim that most physical states/processes have only a very limited collection of computational states/processes that it can reasonably be interpreted as, n... (read more)

I think that's being generous to analytic functionalism. As I suggested in Objection 2, . I'd like to hear more about this claim; I don't think it's ridiculous on its face (per Brian's and Michael_PJ's comments), but it seems a lot of people have banged their head against this without progress, and my prior is formalizing this is a lot harder than it looks (it may be unformalizable). If you could formalize it, that would have a lot of value for a lot of fields. I don't expect you to either. If you're open to a suggestion about how to approach this in the future, though, I'd offer that if you don't feel like reading something but still want to criticize it, instead of venting your intuitions (which could be valuable, but don't seem calibrated to the actual approach I'm taking), you should press for concrete predictions. The following phrases seem highly anti-scientific to me: I.e., these statements seem to lack epistemological rigor, and seem to absolutely prevent you from updating in response to any evidence I might offer, even in principle (i.e., they're actively hostile to your improving your beliefs, regardless of whether I am or am not correct). I don't think your intention is to be closed-minded on this topic, and I'm not saying I'm certain STV is correct. Instead, I'm saying you seem to be overreacting to some stereotype you initially pattern-matched me as, and I'd suggest talking about predictions is probably a much healthier way to move forward if you want to spend more time on this. (Thanks!)
Why I think the Foundational Research Institute should rethink its approach

That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.

If you expanded on this, I would be interested.

I didn't have in mind anything profound. :) The idea is just that "degree of information integration" is one interesting metric along which to compare minds, along with metrics like "number of neurons", "number of synapses", "number of ATP molecules consumed per second", "number of different brain structures", "number of different high-level behaviors exhibited", and a thousand other similar things.
Why I think the Foundational Research Institute should rethink its approach

Speaking of the metaphysical correctness of claims about qualia sounds confused, and I think precise definitions of qualia-related terms should be judged by how useful they are for generalizing our preferences about central cases. I expect that any precise definition for qualia-related terms that anyone puts forward before making quite a lot of philosophical progress is going to be very wrong when judged by usefulness for describing preferences, and that the vagueness of the analytic functionalism used by FRI is necessary to avoid going far astray.

Regardin... (read more)

I agree a good theory of qualia should help generalize our preferences about central cases. I disagree that we can get there with the assumption that qualia are intrinsically vague/ineffable. My critique of analytic functionalism is that it is essentially nothing but an assertion of this vagueness. Without a bijective mapping between physical states/processes and computational states/processes, I think my point holds. I understand it's counterintuitive, but we should expect that when working in these contexts. Correct; they're the sorts of things a theory of qualia should be able to address- necessary, not sufficient. Re: your comments on the Symmetry Theory of Valence, I feel I have the advantage here since you haven't read the work. Specifically, it feels as though you're pattern-matching me to IIT and channeling Scott Aaronson's critique of Tononi, which is a bit ironic since that forms a significant part of PQ's argument why an IIT-type approach can't work. At any rate I'd be happy to address specific criticism of my work. This is obviously a complicated topic and informed external criticism is always helpful. At the same time, I think it's a bit tangential to my critique about FRI's approach: as I noted,
To steelman the popcorn objection, one could say that separating "normal" computations from popcorn shaking requires at least certain sorts of conditions on what counts as a valid interpretation, and such conditions increase the arbitrariness of the theory. Of course, if we adopt a complexity-of-value approach to moral value (as I and probably you think we should), then those conditions on what counts as a computation may be minimal compared with the other forms of arbitrariness we bring to bear. I haven't read Principia Qualia and so can't comment competently, but I agree that symmetry seems like not the kind of thing I'm looking for when assessing the moral importance of a physical system, or at least it's not more than one small part of what I'm looking for. Most of what I care about is at the level of ordinary cognitive science, such as mental representations, behaviors, learning, preferences, introspective abilities, etc. That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.
My current thoughts on MIRI's "highly reliable agent design" work

There's a strong possibility, even in a soft takeoff, that an unaligned AI would not act in an alarming way until after it achieves a decisive strategic advantage. In that case, the fact that it takes the AI a long time to achieve a decisive strategic advantage wouldn't do us much good, since we would not pick up an indication that anything was amiss during that period.

Reasons an AI might act in a desirable manner before but not after achieving a decisive strategic advantage:

Prior to achieving a decisive strategic advantage, the AI relies on cooperation wi... (read more)

That's assuming that the AI is confident that it will achieve a DSA eventually, and that no competitors will do so first. (In a soft takeoff it seems likely that there will be many AIs, thus many potential competitors.) The worse the AI thinks its chances are of eventually achieving a DSA first, the more rational it becomes for it to risk non-cooperative action at the point when it thinks it has the best chances of success - even if those chances were low. That might help reveal unaligned AIs during a soft takeoff. Interestingly this suggests that the more AIs there are, the easier it might be to detect unaligned AIs (since every additional competitor decreases any given AI's odds of getting a DSA first), and it suggests some unintuitive containment strategies such as explicitly explaining to the AI when it would be rational for it to go uncooperative if it was unaligned, to increase the odds of unaligned AIs really risking hostile action early on and being discovered...
What Should the Average EA Do About AI Alignment?

5) Look at the MIRI and 80k AI Safety syllabus, and see if how much of it looks like something you'd be excited to learn. If applicable to you, consider diving into that so you can contribute to the cutting edge of knowledge. This may make most sense if you do it through


Thanks, fixed. I had gotten partway through updating that to say something more comprehensive, decided I needed more time to think about it, and then accidentally saved it anyway.
Semi-regular Open Thread #35

Do any animal welfare EAs have anything to say on animal products from ethically raised animals, and how to identify such animal products? It seems plausible to me that consumption of such animal products could even be morally positive on net, if the animals are treated well enough to have lives worth living, and raising them does not reduce wild animal populations much more than the production of non-animal-product substitutes. Most animal welfare EAs seem confident that almost all animals raised for the production of animal products do not live lives wor... (read more)

I believe funding work on corporate engagement to improve farm animal welfare probably has much higher expected value than any personal decisions about diet. There are limitations in this area regarding room for more funding, but Compassion in World Farming USA is an effective organization that seems to have room for funding in corporate engagement: [] _ That being said, I personally find these questions interesting, and here are some thoughts. I believe the average beef cattle in the US has net positive welfare. So in terms of direct effects on farm animal welfare, I believe eating beef increases welfare. There are indirect effects though, and some are presumably negative, including climate change, and mice and birds killed in fields for feed production. Other indirect effects might be positive (i.e. reducing insect suffering). There are other reasons why people might want to avoid beef though, such as the view that killing animals for food is inherently wrong, or the view that unnecessary harm to an animal (i.e. castration without anesthesia) cannot be offset by X number of happy days on pasture. Beef cattle might be alone in this regard. I thought that the average dairy cow in the US might have net positive welfare but I did some more investigation and now believe their welfare is somewhat net negative. Other potential candidates for animals in the US with net positive welfare may be other small ruminants (sheep, goats) but I couldn't find much evidence on the welfare of these animals. The overwhelming majority eggs in the US come from hens raised in battery cages, which I believe experience strongly net negative welfare. Moving from conventional eggs to cage-free eggs probably substantially
Lunar Colony

One thing to keep in mind is that we currently don't have the ability to create a space colony that can sustain itself indefinitely. So pursuing a strategy of creating a space colony in case of human life on Earth being destroyed probably should look like capacity-building so that we can create an indefinitely self-sustaining space colony, rather than just creating a space colony.

A new reference site: Effective Altruism Concepts

Even though the last paragraph of the expected value maximization article now says that it's talking about the VNM notion of expected value, the rest of the article still seems to be talking about the naive notion of expected value that is linear with respect to things of value (in the examples given, years of fulfilled life). This makes the last paragraph seem pretty out of place in the article.

Nitpicks on the risk aversion article: "However, it seems like there are fewer reasons for altruists to be risk-neutral in the economic sense" is a confu... (read more)

Thanks, I've made some further changes, which I hope will clear things up. Re your first worry, I think that's a valid point, but it's also important to cover both concepts. I've tried to make the distinction clearer. If that doesn't address your worry, feel free to drop me a message or suggest changes via the feedback tab, and we can discuss further.
Principia Qualia: blueprint for a new cause area, consciousness research with an eye toward ethics and x-risk

->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).

I'm curious about this, since you mentioned fixing IIT's flaws. I came to the comments to make the same complaint you were responding to Jessica about.

As I understand their position, MIRI tends to not like IIT because it's insufficiently functionalist-- and too physicalist. On the other hand, I don't think IIT could be correct because it's too functionalist-- and insufficiently physicalist, partially for the reasons I explain in my response to Jessica. The core approach I've taken is to enumerate the sorts of problems one would need to solve if one was to formalize consciousness. (Whether consciousness is a thing-that-can-be-formalized is another question, of course.) My analysis is that IIT satisfactorily addresses 4 or 5, out of the 8 problems. Moving to a more physical basis would address more of these problems, though not all (a big topic in PQ is how to interpret IIT-like output, which is an independent task of how to generate it). Other research along these same lines would be e.g., ->Adam Barrett's FIIH: [] ->Max Tegmark's Perceptronium: []
I had the same response. The document claims that pleasure or positive valence corresponds to symmetry []. This does not look like a metric that is tightly connected to sensory, cognitive, or behavioral features. In particular, it is not specifically connected to liking, wanting, aversion, and so forth. So, like IIT in the cases discussed by Scott Aaronson, it would seem likely to assign huge values (of valence rather than consciousness, in this case) to systems that lack the corresponding functions, and very low values to systems that possess them. The document is explicit about qualia not being strictly linked to the computational and behavioral functions that lead us to, e.g. talk about qualia or withdraw from painful stimuli: The falsifiable predictions are mostly claims that the computational functions will be (imperfectly) correlated with symmetry, but the treatment of boredom appears to allow that these will be quite imperfect: Overall, this seems systematically analogous to IIT in its flaws. If one wanted to pursue an analogy to Aaronson's discussion of trivial expander graphs producing extreme super-consciousness, one could create an RL agent (perhaps in an artificial environment where it has the power to smile, seek out rewards, avoid injuries (which trigger negative reward), favor injured limbs, and consume painkillers (which stop injuries from generating negative reward) whose symmetry could be measured in whatever way the author would like to specify. I think we can say now that we could program the agent in such a way that it sought out things that resulted in either more or less symmetric states, or was neutral to such things. Likewise, switching the signs of rewards would not reliably switch the associated symmetry. And its symmetry could be directly and greatly altered without systematic matching behavioral changes. I would like to know whether the theory in PQ is supposed to predict that
A new reference site: Effective Altruism Concepts

The article on expected value theory incorrectly cites the VNM theorem as a defense of maximizing expected value. The VNM theorem says that for a rational agent, there must exist some measure of value for which the rational agent maximizes its expectation, but the theorem does not say anything about the structure of that measure of value. In particular, it does not say that value must be linear with respect to anything, so it does not give a reason not to be risk averse. There are good reasons for altruists to have very low risk aversion, but the VNM theor... (read more)

Hi Alex, thanks for the comment, great to pick up issues like this. I wrote the article, and I agree and am aware of your original point. Your edit is also correct in that we are using risk aversion in the psychological/pure sense, and so the VNM theory does imply that this form of risk aversion is irrational. However, I think you're right that, given that people are more likely to have heard of the concept of economic risk aversion, the expected value article is likely to be misleading. I have edited to emphasise the way that we're using risk aversion in these articles, and to clarify that VNM alone does not imply risk neutrality in an economic sense. I've also added a bit more discussion of economic risk aversion. Further feedback welcome!
Ask MIRI Anything (AMA)

If many people intrinsically value the proliferation of natural Darwinian ecosystems, and the fact that animals in such ecosystems suffer significantly would not change their mind, then that could happen. If it's just that many people think it would be better for there to be more such ecosystems because they falsely believe that wild animals experience little suffering, and would prefer otherwise if their empirical beliefs were correct, then a human-friendly AI should not bring many such ecosystems into existence.

Ask MIRI Anything (AMA)

I am not a MIRI employee, and this comment should not be interpreted as a response from MIRI, but I wanted to throw my two cents in about this topic.

I think that creating a friendly AI to specifically advance human values would actually turn out okay for animals. Such a human-friendly AI should optimize for everything humans care about, not just the quality of humans' subjective experience. Many humans care a significant amount about the welfare of non-human animals. A human-friendly AI would thus care about animal welfare by proxy through the values of hu... (read more)

I am Nate Soares, AMA!

I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position.

Perhaps the first of them to voice a position on the matter expected a fast takeoff and was held in high regard by the others, so they followed along, having not previously thought about it?

I am Nate Soares, AMA!

It appears that the phrase "Friendly AI research" has been replaced by "AI alignment research". Why was that term picked?

Luke talks about the pros and cons of various terms here []. Then, long story short, we asked Stuart Russell for some thoughts and settled on "AI alignment" (his suggestion, IIRC).
We are Seb Farquhar and Owen Cotton-Barratt from the Global Priorities Project, AUsA!

You seem to have had some success in influencing policy-makers, but almost exclusively UK policy-makers. Do you plan to approach policy-makers in other countries, or help other EAs do so?

2Owen Cotton-Barratt7y
Yes. In fact we have some ongoing engagement on US policy. We have contacts in EU policy we plan to talk to when we have relevant material for them. We have no explicit plans to approach other countries at present, but are open to the possibility depending on where we seem to have the highest-value input to add on policy. We’d love to work with and help similarly-minded people working in this space, and have had a few conversations with US groups. Local groups may often be better placed to influence policy, particularly when the policy mostly has domestic effects.
$10k of Experimental EA Funding

When I read this, I thought, "that's an interesting idea. It's too bad no one will ever try it." I'm glad to see that I have underestimated you.

1Jonas Vollmer7y
Maybe a link to that post should be included at the top of this article.
Problems and Solutions in Infinite Ethics

I'm not sure how to talk about the measurability result though; any thoughts on how to translate it?

Unfortunately, I can't think of a nice ordinary-language way of talking about such nonmeasurability results.

Problems and Solutions in Infinite Ethics

Some kind of nitpicky comments:

3.2: Note that the definition of intergenerational equity in Zame's paper is what you call finite intergenerational equity (and his definition of an ethical preference relation involves the same difference), so his results are actually more general than what you have here. Also, I don't think that "almost always we can’t tell which of two populations is better" is an accurate plain-English translation of "{X,Y: neither XY} has outer measure one", because we don't know anything about the inner measure. In f... (read more)

Thanks! 3.2 good catch – I knew I was gonna mess those up for some paper. I'm not sure how to talk about the measurability result though; any thoughts on how to translate it? 4.3 basically, yeah. It's easier for me to think about it just as a truncation though 4.5 yes you're right – updated 4.7 yes, that's what I mean. Introducing quantifiers seems to make things a lot more complicated though
Introduce Yourself

I'm Alex, a math student at UC Berkeley. My primary EA focus area is existential risk reduction, and I plan on either doing AI risk-reducing research or earning to give or both.