473 karmaJoined Jun


I follow Crocker's rules.


Sorted by New
niplav's Shortform
· 3y ago · 1m read


Answer by niplavMay 09, 202330

🔭 Looking for good book on Octopus Behavior

Criteria: Scientific (which rules out The Soul of an Octopus), up to date (which mostly rules out Octopus: Physiology and Behaviour of an Advanced Invertebrate.

Why: I've heard claims that octopuses are quite intelligent, with claims going so far to attribute transmitting knowledge between individuals. I'd like to know more about how similar and different octopus behavior is from human behavior (perhaps shedding light on the space of possible minds/fragility of value).

🔭 Looking for good book/review on Universal Basic Income

Criteria: Book should be ~completely a literature review and summary of current evidence on universal basic income/unconditional cash transfers. I'm not super interested in any moral arguments. The more it talks about actual studies the better. Can be quite demanding statistically.

Why: People have differing opinions on the feasibility/goodness of universal basic income, and there's been a whole bunch of experiment, but I haven't been able to find a good review of that evidence.

🔭 Looking for a good textbook on Cryobiology

Criteria: The more of these properties the textbook has the better. Fundamentals of Cryobiology looks okay but has no exercises.

Why: I have signed up for cryonics, and would like to understand the debate between cryobiologists and cryonicists better.


The last person to have a case of smallpox, Ali Maow Maalin, dedicated years of his life to eradicating polio in the region.

On July 22nd 2013, he died of Malaria, while traveling again after polio had been reintroduced.


Thank you! I'll review the pull request later today, but it looks quite useful :-)

Not sure how much free time I can spend on the forecasting library, but I'll add the sources to the post.


Thanks! Perhaps I'll find the time to incorporate the Metaforecast data into it sometime.


On mental health:

Since AI systems will likely have a very different cognitive structure than biological humans, it seems quite unlikely that they will develop mental health issues like humans do. There are some interesting things that happen to the characters that large-language models "role-play" as: They switch from helpful to mischievous when the right situation arises.

I could see a future in which AI systems are emulating the behavior of specific humans, in which case they might exhibit behaviors that are similar to the ones of mentally ill humans.

On addiction problems:

If one takes the concept of addiction seriously, wireheading is a failure mode remarkably similar to it.


I am somewhat more hopeful about society at large deciding how to use AI systems: I have the impression that wealth has made moral progress faster (since people have more slack for caring about others). This becomes especially stark when I read about very poor people in the past and their behavior towards others.

That said, I'd be happier if we found out how to encode ethical progress in an algorithm and just run that, but I'm not optimistic about our chances of finding such an algorithm (if it exists).


There are several plans for this scenario.

  • Low alignment tax + coordination around alignment: Having an aligned model is probably more costly than having a non-aligned model. This "cost of alignment" is also called the "alignment tax". The goal in some agendas is to lower the alignment tax so far that it is reasonable to institute regulations that mandate these alignment guarantees to be implemented, very similar to safety regulations in the real world, similar to what happened to cars, factory work and medicine. This approach works best in worlds where AI systems are relatively easy to align, they don't become much more capable quickly. Even if some systems are not aligned, we might have enough aligned systems such that we are reasonably protected by those (especially since the aligned systems might be able to copy strategies that unaligned systems are using to attack humanity).
  • Exiting the acute risk period: If there is one (or very few) aligned superintelligent AI systems, we might simply ask it what the best strategy for achieving existential security is, and if the people in charge are at least slightly benevolent they will probably also ask about how to help other people, especially at low cost. (I very much hope the policy people have something in mind to prevent malevolent actors to come into possession of powerful AI systems, though I don't remember seeing any such strategies.)
  • Pivotal act + aligned singleton: If abrupt takeoff scenarios are likely, then one possible plan is to perform a so-called pivotal act. Concretely, such an act would (1) prevent anyone else from building powerful AI systems and (2) allow the creators to think deeply enough about how to build AI that implements our mechanism for moral progress. Such a pivotal act might be to build an AI system that is powerful enough to e.g. "turn all GPUs int rubik's cubes" but not general enough to be very dangerous (for example limiting its capacity for self-improvement), and then augment human intelligence so that the creators can figure out alignment and moral philosophy in full generality and depth. This strategy is useful in very pessimistic scenarios, where alignment is very hard, AIs become smarter through self-improvement very quickly, and people are very reckless about building powerful systems.

I hope this answers the question somewhat :-)


In my conception, AI alignment is the theory of aligning any stronger cognitive system with any weaker cognitive system, allowing for incoherencies and inconsistencies in the weaker system's actions and preferences.

I very much hope that the solution to AI alignment is not one where we have a theory of how to align AI systems to a specific human—that kind of solution seems fraudulent just on technical grounds (far too specific).

I would make a distinction between alignment theorists and alignment engineers/implementors: the former find a theory of how to align any AI system (or set of systems) with any human (or set of humans), the alignment implementors take that theoretical solution and apply it to specific AI systems and specific humans.

Alignment theorists and alignment implementors might be the same people, but the roles are different.

This is similar to many technical problems: You might ask someone trying to find a slope that goes through a could of x/y points, with the smallest distance to each of those points, “But which dataset are you trying to apply the linear regression to?”—the answer is “any”.


There are three levels of answers to this question: What the ideal case would be, what the goal to aim for should be, and what will probably happen.

  • What the ideal case would be: We find a way to encode "true morality" or "the core of what has been driving moral progress" and align AI systems to that.
  • The slightly less ideal case: AI systems are aligned with humanity's Coherent Extrapolated Volition of humans that are currently alive. Hopefully that process figures out what relevant moral patients are, and takes their interests into consideration.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

  • What the goal to aim for should be: Something that is (1) good and (2) humanity can coordinate around. In the best case this approximates Coherent Extrapolated Volition, but looks mundane: Humans build AI systems, and there is some democratic control over them, and China has some relevant AI systems, the US has some, the rest of the world rents access to those. Humanity uses them to become smarter, and figures out relevant mechanisms for democratic control over the systems (as we become richer and don't care as much about zero-sum competition).
  • What is probably going to happen: A few actors create powerful AI systems and figure out how to align them to their personal interests. They use those systems to colonize the universe, but burn most of the cosmic commons on status signaling games.

Technically, I think that AI safety as a technical discipline has no "say" in who the systems should be aligned with. That's for society at large to decide.


I like this idea :-)

I think that there are some tricky questions about comparing across different forecasters and their predictions. If you simply take Brier score, this can be Goodharted: people can choose the "easiest" questions and get way better scores than the ones taking on difficult questions.

I can think of some attempts to go at this:

  • Ranking forecasters:
    • For two forecasters, they get ranked according to their Brier scores on questions they have both forecasted on. I fear that this will lead to cyclical rankings, which could be dealt with using the Smith set or Hodge decomposition.
    • Forecasters are ranked according to their performance relative to all other forecasters on each question. (Making easier questions less impactful on a forecasters score).
  • I'd like to look into credibility theory to see whether it has some insights into ranking with different sample sizes since IMDb uses it for ranking movies.
Load more