Lifelong recursive self-improver, on his way to exploding really intelligently :D
More seriously: my posts are mostly about AI alignment, with an eye towards moral progress and creating a better future instead of risk only.
At the moment I am doing research at CEEALAR on agents whose behaviour is driven by a reflective process analogous to human moral reasoning, rather than by a metric specified by the designer. I'll probably post a short article on this topic before the end of 2023.
Here are some suggested readings from what I've written so far:
-Naturalism and AI alignment
-From language to ethics by automated reasoning
-Criticism of the main framework in AI alignment
Yes I'd like to read a clearer explanation. You can leave the link here in a comment or write me a private message.
Hey!
Thanks for the suggestion. I've read part of the Wikipedia page on Jungian archetypes, but my background is not in psychology and it was not clear to me. The advantage of just saying that our thoughts can be abstract (point 1) is that pretty much everyone understands the meaning of that, while I am not sure this is true if we start introducing concepts like Jungian archetypes and the collective unconscious.
I agree with you that the AI (and AI safety) community doesn't seem to care much about Jungian archetypes. It might be that AI people get the idea anyway, maybe they just express it in different terms (e.g. they talk about the influence of culture on human values, instead of archetypes).
Maybe "only person in the world" is a bit excessive :)
As far as I know, no one else in AI safety is directly working on it. There is some research in the field of machine ethics, about Artificial Moral Agents, that has a similar motivation or objective. My guess is that, overall, very few people are working on this.
What you wrote about the central claim is more or less correct: I actually made only an existential claim about a single aligned agent, because the description I gave is sketchy and really far from the more precise algorithmic level of description. This single agent probably belongs to a class of other aligned agents, but it seems difficult to guess how large this class is.
That is also why I have not given a guarantee that all agents of a certain kind will be aligned.
Regarding the orthogonality thesis, you might find 1.2 in Bostrom's 2012 paper interesting. He writes that objective and intrinsically motivating moral facts need not undermine the orthogonality thesis, since he is using the term "intelligence" as "instrumental rationality". I add that there is also no guarantee that the orthogonality thesis is correct :)
About psychopaths and metaethics, I haven't spent a lot of time on that area of research. Like other empirical evidence, it doesn't seem easy to interpret.
Hey, I just wanted to thank you for writing this!
I'm looking forward to reading future posts in the series; actually, I think it would be great to have series like this one for each major cause area.