Topic Contributions


My current thoughts on the risks from SETI

I posted this as a comment to Robin Hanson’s “Seeing ANYTHING Other Than Huge-Civ Is Bad News” —


I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). Being able to make reasonable conjectures here will greatly improve our a priori expectations and our interpretation of available cosmological evidence.

Premise 1: Eventually, civilizations progress until they can engage in megascale engineering: Dyson spheres, etc.

Premise 2: Consciousness is the home of value: Disneyland with no children is valueless.

Premise 2.1: Over the long term we should expect at least some civilizations to fall into the attractor of treating consciousness as their intrinsic optimization target.

Premise 3: There will be convergence that some qualia are intrinsically valuable, and what sorts of qualia are such.

Conjecture: A key piece of evidence for discerning the presence of advanced alien civilizations will be megascale objects which optimize for the production of intrinsically valuable qualia.

Speculatively, I suspect black holes and pulsars might fit this description.






Reasonable people can definitely disagree here, and these premises may not work for various reasons. But I’d circle back to the first line: I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). In this sense I think we’re leaving value on the table.

We're Aligned AI, AMA

Great, thank you for the response.

On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.

That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.

It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.

One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.

We're Aligned AI, AMA
  1. What do you see as Aligned AI’s core output, and what is its success condition? What do you see the payoff curve being — i.e. if you solve 10% of the problem, do you get [0%|10%|20%] of the reward?
  2. I think a fresh AI safety approach may (or should) lead to fresh reframes on what AI safety is. Would your work introduce a new definition for AI safety?
  3. Value extrapolation may be intended as a technical term, but intuitively these words also seem inextricably tied to both neuroscience and phenomenology. How do you plan on interfacing with these fields? What key topics of confusion within neuroscience and phenomenology are preventing interfacing with these fields?
  4. I was very impressed by the nuance in your “model fragments” frame, as discussed at some past EAG. As best as I can recall, the frame was: that observed preferences allow us to infer interesting things about the internal models that tacitly generate these preferences, that we have multiple overlapping (and sometimes conflicting) internal models, and that it is these models that AI safety should aim to align with, not preferences per se. Is this summary fair, and does this reflect a core part of Aligned AI’s approach?

Finally, thank you for taking this risk.

Flimsy Pet Theories, Enormous Initiatives

I consistently enjoy your posts, thank you for the time and energy you invest.

Robin Hanson is famous for critiques in the form of “X isn’t about X, it’s about Y.” I suspect many of your examples may fit this pattern. To wit, Kwame Appiah wrote that “in life, the challenge is not so much to figure out how best to play the game; the challenge is to figure out what game you’re playing.” Andrew Carnegie, for instance, may have been trying to maximize status, among his peers or his inner mental parliament. Elon Musk may be playing a complicated game with SpaceX and his other companies. To critique assumes we know the game, but I suspect we only have a dim understanding of ”the great game” as it’s being played today.

When we see apparent dysfunction, I tend to believe there is dysfunction, but more deeper in the organizational-civilizational stack than it may appear. I.e. I think both Carnegie and Musk were/are hyper-rational actors responding to a very complicated incentive landscape.

That said, I do think ideas get lodged in peoples’ heads, and people just don’t look. Fully agree with your general suggestion, “before you commit yourself to a lifetime’s toil toward this goal, spend a little time thinking about the goal.”

That said— I’m also loathe to critique doers too harshly, especially across illegible domains like human motivation. I could see how more cold-eyed analysis could lead to wiser aim in what things to build; I could also see it leading to fewer great things being built. I can’t say I see the full tradeoffs at this point.

EA Should Spend Its “Funding Overhang” on Curing Infectious Diseases

Most likely infectious diseases also play a significant role in aging- have seen some research suggesting that major health inflection points are often associated with an infection.

I like your post and strongly agree with the gist.

DM me if you’re interested in brainstorming alternatives to the vaccine paradigm (which seems to work much better for certain diseases than others).

A Primer on the Symmetry Theory of Valence

Generally speaking, I agree with the aphorism “You catch more flies with honey than vinegar;”

For what it’s worth, I interpreted Gregory’s critique as an attempt to blow up the conversation and steer away from the object level, which felt odd. I’m happiest speaking of my research, and fielding specific questions about claims.

A Primer on the Symmetry Theory of Valence

Gregory, I’ll invite you to join the object-level discussion between Abby and I.

A Primer on the Symmetry Theory of Valence

Welcome, thanks for the good questions.

Asymmetries in stimuli seem crucial for getting patterns through the “predictive coding gauntlet.” I.e., that which can be predicted can be ignored. We demonstrably screen perfect harmony out fairly rapidly.

The crucial context for STV on the other hand isn’t symmetries/asymmetries in stimuli, but rather in brain activity. (More specifically, as we’re currently looking at things, in global eigenmodes.)

With a nod back to the predictive coding frame, it’s quite plausible that the stimuli that create the most internal symmetry/harmony are not themselves perfectly symmetrical, but rather have asymmetries crafted to avoid top-down predictive models. I’d expect this to vary quite a bit across different senses though, and depend heavily on internal state.

The brain may also have mechanisms which introduce asymmetries in global eigenmodes, in order to prevent getting ‘trapped’ by pleasure — I think of boredom as fairly sophisticated ‘anti-wireheading technology’ — but if we set aside dynamics, the assertion is that symmetry/harmony in the brain itself is intrinsically coupled with pleasure.

Edit: With respect to the Mosers, that’s really cool example of this stuff. I can’t say I have answers here but as a punt, I’d suspect the “orthogonal neural coding of similar but distinct memories” is going to revolve around some pretty complex frequency regimes and we may not yet be able to say exact things about how ‘consonant’ or ‘dissonant’ these patterns are to each other yet. My intuition is that this result about the golden mean being the optimal ratio for non-interaction will end up intersecting with the Mosers’ work. That said I wonder if STV would assert that some sorts of memories are ‘hedonically incompatible’ due to their encodings being dissonant? Basically, as memories get encoded, the oscillatory patterns they’re encoded with could subtly form a network which determines what sorts of new memories can form and/or which sorts of stimuli we enjoy and which we don’t. But this is pretty hand-wavy speculation…

A Primer on the Symmetry Theory of Valence

Hi Abby, I understand. We can just make the best of it.

1a. Yep, definitely. Empirically we know this is true from e.g. Kringelbach and Berridge’s work on hedonic centers of the brain; what we’d be interested in looking into would be whether these areas are special in terms of network control theory.

1c. I may be getting ahead of myself here: the basic approach to testing STV we intend is looking at dissonance in global activity. Dissonance between brain regions likely contribute to this ‘global dissonance’ metric. I’m also interested in measuring dissonance within smaller areas of the brain as I think it could help improve the metric down the line, but definitely wouldn’t need to at this point.

1d. As a quick aside, STV says that ‘symmetry in the mathematical representation of phenomenology corresponds to pleasure’. We can think of that as ‘core STV’. We’ve then built neuroscience metrics around consonance, dissonance, and noise that we think can be useful for proxying symmetry in this representation; we can think of that as a looser layer of theory around STV, something that doesn’t have the ‘exact truth’ expectation of core STV. When I speak of dissonance corresponding to suffering, it’s part of this looser second layer.

To your question — why would STV be true? — my background is in the philosophy of science, so I’m perhaps more ready to punt to this domain. I understand this may come across as somewhat frustrating or obfuscating from the perspective of a neuroscientist asking for a neuroscientific explanation. But, this is a universal thread across philosophy of science: why is such and such true? Why does gravity exist; why is the speed of light as it is? Etc. Many things we’ve figured out about reality seem like brute facts. Usually there is some hints of elegance in the structures we’re uncovering, but we’re just not yet knowledgable to see some universal grand plan. Physics deals with this a lot, and I think philosophy of mind is just starting to grapple with this in terms of NCCs. Here’s something Frank Wilczek (won the 2004 Nobel Prize in physics for helping formalize the Strong nuclear force) shared about physics:

>... the idea that there is symmetry at the root of Nature has come to dominate our understanding of physical reality. We are led to a small number of special structures from purely mathematical considerations--considerations of symmetry--and put them forward to Nature, as candidate elements for her design. ... In modern physics we have taken this lesson to heart. We have learned to work from symmetry toward truth. Instead of using experiments to infer equations, and then finding (to our delight and astonishment) that the equations have a lot of symmetry, we propose equations with enormous symmetry and then check to see whether Nature uses them. It has been an amazingly successful strategy. (A Beautiful Question, 2015)

So — why would STV be the case? ”Because it would be beautiful, and would reflect and extend the flavor of beauty we’ve found to be both true and useful in physics” is probably not the sort of answer you’re looking for, but it’s the answer I have at this point. I do think all the NCC literature is going to have to address this question of ‘why’ at some point.

4. We’re ultimately opportunistic about what exact format of neuroimaging we use to test our hypotheses, but fMRI checks a lot of the boxes (though not all). As you say, fMRI is not a great paradigm for neurotech; we’re looking at e.g. headsets by Kernel and others, and also digging into the TUS (transcranial ultrasound) literature for more options.

5. Cool! I’ve seen some big reported effect sizes and I’m generally pretty bullish on neurofeedback in the long term; Adam Gazzaley‘s Neuroscape is doing some cool stuff in this area too. 

A Primer on the Symmetry Theory of Valence

Good catch; there’s plenty that our glossary does not cover yet. This post is at 70 comments now, and I can just say I’m typing as fast as I can!

I pinged our engineer (who has taken the lead on the neuroimaging pipeline work) about details, but as the collaboration hasn’t yet been announced I’ll err on the side of caution in sharing.

To Michael — here’s my attempt to clarify the terms you highlighted:

  • Neurophysiological models of suffering try to dig into the computational utility and underlying biology of suffering

-> existing theories talk about what emotions ‘do’ for an organism, and what neurochemicals and brain regions seem to be associated with suffering

  • symmetry

Frank Wilczek calls symmetry ‘change without change’. A limited definition is that it’s a measure of the number of ways you can rotate a picture, and still get the same result. You can rotate a square 90 degrees, 180 degrees, and 270 degrees and get something identical; you can rotate a circle any direction and get something identical. Thus we’d say circles have more rotational symmetries than squares (who have more than rectangles, etc)

  • harmony

Harmony has been in our vocabulary a long time, but it’s not a ‘crisp’ word. This is why I like to talk about symmetry, rather than harmony — although they more-or-less point in the same direction

  • dissonance

The combination of multiple frequencies that have a high amount of interaction, but few common patterns. Nails on a chalkboard create a highly dissonant sound; playing the C and C# keys at the same time also creates a relatively dissonant sound

  • resonance as a proxy for characteristic activity

I’m not sure I can give a fully satisfying definition here that doesn’t just reference CSHW; I’ll think about this one more.

  • Consonance Dissonance Noise Signature

A way of mathematically calculating how much consonance, dissonance, and noise there is when we add different frequencies together. This is an algorithm developed at QRI by my co-founder, Andrés 

  • self-organizing systems

A system which isn’t designed by some intelligent person, but follows an organizing logic of its own. A beehive or anthill would be a self-organizing system; no one’s in charge, but there’s still something clever going on

  • Neural Annealing

In November 2019 I released a work speaking of the brain as a self-organizing system. Basically, “when the brain is in an emotionally intense state, change is easier” similar to how when metal heats up and starts to melt, it’s easier to change the shape of the metal

  • full neuroimaging stack

All the software we need to do an analysis (and specifically, the CSHW analysis), from start to finish

  • precise physical formalism for consciousness

A perfect theory of consciousness, which could be applied to anything. Basically a “consciousness meter”

  • STV gives us a rich set of threads to follow for clear neurofeedback targets, which should allow for much more effective closed-loop systems, and I am personally extraordinarily excited about the creation of technologies that allow people to “update toward wholesome”,

Ah yes this is a litttttle bit dense. Basically, one big thing holding back neurotech is we don’t have good biomarkers for well-being. If we design these biomarkers, we can design neurofeedback systems which work better (not sure how familiar you are with neurofeedback)

Load More