RK

Robert Kralisch

Organiser @ AI Safety Camp
31 karmaJoined Working (0-5 years)

Bio

Hey, I am Robert Kralisch, an independent conceptual/theoretical Alignment Researcher. I have a background in Cognitive Science and I am interested in collaborating on an end-to-end strategy for AGI alignment.

I am one of the organizers for the AI Safety Camp 2025, working as a research coordinator by evaluating and supporting research projects that fit under the umbrella of "conceptually sound approaches to AI Alignment".

The three main branches that I aim to contribute to are conceptual clarity (what should we mean by agency, intelligence, embodiment, etc), the exploration of more inherently interpretable cognitive architectures, and Simulator theory.

One of my concrete goals is to figure out how to design a cognitively powerful agent such that it does not become a Superoptimiser in the limit. 

Comments
2

I believe that you are too quick to label this story as absurd. Ordinary technology does not have the capacity to correct towards explicitly smaller changes that still satisfy the objective. If the AGI wants to prevent wars while minimally disturbing the worldwide politics, I find it plausible that it would succeed.

Similarly, just because an AGI has very little visible impact, does not mean that it isn't effectively in control. For a true AGI, it should be trivial to interrupt the second mover without any great upheaval. It should be able to surpress other AGIs from coming into existence without causing too much of a stir.

I do somewhat agree with your reservations, but I find that your way of adressing them seems uncharitable (i.e. "at best completely immoral").

Thank you for writing this. 

I think this post is helpful in two ways, firstly by giving life advice as to how to think about rejection (and perhaps also success), secondly by giving more insight into the review process and how that should inform the application. Looking at the whole process from the reviewer's perspective is great advice.

To me, it seems like one could expand on how the first advice generalizes to other areas of life. One is always presenting oneself in a particular context, where some features are payed attention to and others are not (or at least not initially). This is something that we need to understand clearly when interpreting feedback, be it a bad grade, the rejection of a romantic interest or the frown of a stranger. Each of these events does not necessarily say something about "who we are", they are judgements based on a very limited representation of ourselves and there are multiple steps along the way where things might have gone wrong. 
Naturally, this applies in the other direction as well, though that is less likely to occur. Demonstrating a positive quality without actually having it is less likely than failing to demonstrate said positive quality while actually having it. This can be offset by competent deception and premature self-rejection, respectively.
It is interesting to square this with the point in the post, where the reviewer focuses on avoiding the presumably less common false positive (letting an unsuitable applicant in). I think this just emphasizes that suitable applicants can get rejected and that they shouldn't take this to heart.

As it stands, the post presents the reviewing process as an immutable reality that the potential applicant should adapt to. I think this fits the scope of the post well and makes it immediately helpful.