629Joined Jun 2015


As a small comment, I believe discussions of consciousness and moral value tend to downplay the possibility that most consciousness may arise outside of what we consider the biological ecosystem.

It feels a bit silly to ask “what does it feel like to be a black hole, or a quasar, or the Big Bang,” but I believe a proper theory of consciousness should have answers to these questions.

We don’t have that proper theory. But I think we can all agree that these megaphenomena involve a great deal of matter/negentropy and plausibly some interesting self-organized microstructure- though that’s purely conjecture. If we’re charting out EV, let’s keep the truly big numbers in mind (even if we don’t know how to count them yet).

Thank you for this list. 

#2:  I left a comment on Matthew’s post that I feel is relevant: https://forum.effectivealtruism.org/posts/CRvFvCgujumygKeDB/my-current-thoughts-on-the-risks-from-seti?commentId=KRqhzrR3o3bSmhM7c

#16: I gave a talk for Mathematical Consciousness Science in 2020 that covers some relevant items: I’d especially point to 7,8,9,10 in my list here: https://opentheory.net/2022/04/it-from-bit-revisited/

#18+#20: I feel these are ultimately questions for neuroscience, not psychology. We may need a new sort of neuroscience to address them. (What would that look like?)

I posted this as a comment to Robin Hanson’s “Seeing ANYTHING Other Than Huge-Civ Is Bad News” —


I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). Being able to make reasonable conjectures here will greatly improve our a priori expectations and our interpretation of available cosmological evidence.

Premise 1: Eventually, civilizations progress until they can engage in megascale engineering: Dyson spheres, etc.

Premise 2: Consciousness is the home of value: Disneyland with no children is valueless.

Premise 2.1: Over the long term we should expect at least some civilizations to fall into the attractor of treating consciousness as their intrinsic optimization target.

Premise 3: There will be convergence that some qualia are intrinsically valuable, and what sorts of qualia are such.

Conjecture: A key piece of evidence for discerning the presence of advanced alien civilizations will be megascale objects which optimize for the production of intrinsically valuable qualia.

Speculatively, I suspect black holes and pulsars might fit this description.






Reasonable people can definitely disagree here, and these premises may not work for various reasons. But I’d circle back to the first line: I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). In this sense I think we’re leaving value on the table.

Great, thank you for the response.

On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.

That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.

It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.

One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.

  1. What do you see as Aligned AI’s core output, and what is its success condition? What do you see the payoff curve being — i.e. if you solve 10% of the problem, do you get [0%|10%|20%] of the reward?
  2. I think a fresh AI safety approach may (or should) lead to fresh reframes on what AI safety is. Would your work introduce a new definition for AI safety?
  3. Value extrapolation may be intended as a technical term, but intuitively these words also seem inextricably tied to both neuroscience and phenomenology. How do you plan on interfacing with these fields? What key topics of confusion within neuroscience and phenomenology are preventing interfacing with these fields?
  4. I was very impressed by the nuance in your “model fragments” frame, as discussed at some past EAG. As best as I can recall, the frame was: that observed preferences allow us to infer interesting things about the internal models that tacitly generate these preferences, that we have multiple overlapping (and sometimes conflicting) internal models, and that it is these models that AI safety should aim to align with, not preferences per se. Is this summary fair, and does this reflect a core part of Aligned AI’s approach?

Finally, thank you for taking this risk.

I consistently enjoy your posts, thank you for the time and energy you invest.

Robin Hanson is famous for critiques in the form of “X isn’t about X, it’s about Y.” I suspect many of your examples may fit this pattern. To wit, Kwame Appiah wrote that “in life, the challenge is not so much to figure out how best to play the game; the challenge is to figure out what game you’re playing.” Andrew Carnegie, for instance, may have been trying to maximize status, among his peers or his inner mental parliament. Elon Musk may be playing a complicated game with SpaceX and his other companies. To critique assumes we know the game, but I suspect we only have a dim understanding of ”the great game” as it’s being played today.

When we see apparent dysfunction, I tend to believe there is dysfunction, but more deeper in the organizational-civilizational stack than it may appear. I.e. I think both Carnegie and Musk were/are hyper-rational actors responding to a very complicated incentive landscape.

That said, I do think ideas get lodged in peoples’ heads, and people just don’t look. Fully agree with your general suggestion, “before you commit yourself to a lifetime’s toil toward this goal, spend a little time thinking about the goal.”

That said— I’m also loathe to critique doers too harshly, especially across illegible domains like human motivation. I could see how more cold-eyed analysis could lead to wiser aim in what things to build; I could also see it leading to fewer great things being built. I can’t say I see the full tradeoffs at this point.

Most likely infectious diseases also play a significant role in aging- have seen some research suggesting that major health inflection points are often associated with an infection.

I like your post and strongly agree with the gist.

DM me if you’re interested in brainstorming alternatives to the vaccine paradigm (which seems to work much better for certain diseases than others).

Generally speaking, I agree with the aphorism “You catch more flies with honey than vinegar;”

For what it’s worth, I interpreted Gregory’s critique as an attempt to blow up the conversation and steer away from the object level, which felt odd. I’m happiest speaking of my research, and fielding specific questions about claims.

Gregory, I’ll invite you to join the object-level discussion between Abby and I.

Welcome, thanks for the good questions.

Asymmetries in stimuli seem crucial for getting patterns through the “predictive coding gauntlet.” I.e., that which can be predicted can be ignored. We demonstrably screen perfect harmony out fairly rapidly.

The crucial context for STV on the other hand isn’t symmetries/asymmetries in stimuli, but rather in brain activity. (More specifically, as we’re currently looking at things, in global eigenmodes.)

With a nod back to the predictive coding frame, it’s quite plausible that the stimuli that create the most internal symmetry/harmony are not themselves perfectly symmetrical, but rather have asymmetries crafted to avoid top-down predictive models. I’d expect this to vary quite a bit across different senses though, and depend heavily on internal state.

The brain may also have mechanisms which introduce asymmetries in global eigenmodes, in order to prevent getting ‘trapped’ by pleasure — I think of boredom as fairly sophisticated ‘anti-wireheading technology’ — but if we set aside dynamics, the assertion is that symmetry/harmony in the brain itself is intrinsically coupled with pleasure.

Edit: With respect to the Mosers, that’s really cool example of this stuff. I can’t say I have answers here but as a punt, I’d suspect the “orthogonal neural coding of similar but distinct memories” is going to revolve around some pretty complex frequency regimes and we may not yet be able to say exact things about how ‘consonant’ or ‘dissonant’ these patterns are to each other yet. My intuition is that this result about the golden mean being the optimal ratio for non-interaction will end up intersecting with the Mosers’ work. That said I wonder if STV would assert that some sorts of memories are ‘hedonically incompatible’ due to their encodings being dissonant? Basically, as memories get encoded, the oscillatory patterns they’re encoded with could subtly form a network which determines what sorts of new memories can form and/or which sorts of stimuli we enjoy and which we don’t. But this is pretty hand-wavy speculation…

Load More