All of rgb's Comments + Replies

Retrospective on thinking about my career for a year

Thanks for this. I was curious about "Pick a niche or undervalued area and become the most knowledgeable person in it." Do you feel comfortable saying what the niche was? Or even if not, can you say a bit more about how you went about doing this?

1careersthrowaway1yI don't want to share more on the specific field. I did not start with a plan. As I say in the post, I started with writing one or two forum posts on the topic. People thought these were valuable. I read a few books on the topic. I connected with a few people as a result of this, either asking for advice or giving advice. I gave feedback on the writing of others. I focused on the same field during part of my internship, which also helped.
Parallels Between AI Safety by Debate and Evidence Law

This is very interesting! I'm excited to see connections drawn between AI safety and the law / philosophy of law. It seems there are a lot of fruitful insights to be had.

You write,

The rules of Evidence have evolved over long experience with high-stakes debates, so their substantive findings on the types of arguments that prove problematic for truth-seeking are relevant to Debate.

Can you elaborate a bit on this?

I don't know anything about the history of these rules about evidence. But why think that over this history, these rules have trended to... (read more)

2Cullen_OKeefe1yThanks for this very thoughtful comment! I think it is accurate to say that the rules of evidence have generally aimed for truth-seeking per se. That is their stated goal, and it generally explains the liberal standard for admission (relevance, which is a very low bar and tracks Bayesian epistemology well), the even more liberal standards for discovery, and most of the admissibility exceptions (which are generally explainable by humans' imperfect Bayesianism). You're definitely right that the legal system as a whole has many goals other than truth-seeking. However, those other goals are generally advanced through other aspects of the justice system. As an example, finality is a goal of the legal system, and is advanced through, among other things, statutes of limitations and repose. Similarly, the "beyond reasonable doubt" standard for criminal conviction is in some sense contrary to truth-seeking but advances the policy preference for underpunishment over overpunishment. You're also right that there are some exceptions to this within evidence law itself, but not many. For example, the attorney–client privilege [] exists not to facilitate truth-seeking, but to protect the attorney–client relationship. Similarly, the spousal privileges [] exist to protect the marital relationship. (Precisely because such privileges are contrary to truth-seeking, they are interpreted narrowly. See, e.g., United States v. Aramony, 88 F.3d 1369, 1389 (4th Cir. 1996); United States v. Suarez, 820 F.2d 1158, 1160 (11th Cir. 1987)). And of course, some rules of evidence have both truth-seeking and other policy rationales. Still, on the whole and in general, the rules of evidence are aimed towards truth.
AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

Thanks for the great summary! A few questions about it

1. You call mesa-optimization "the best current case for AI risk". As Ben noted at the time of the interview, this argument hasn't yet really been fleshed out in detail. And as Rohin subsequently wrote in his opinion of the mesa-optimization paper, "it is not yet clear whether mesa optimizers will actually arise in practice". Do you have thoughts on what exactly the "Argument for AI Risk from Mesa-Optimization" is, and/or a pointer to the places where, in your opinion,... (read more)

8abergal1y1. Oh man, I wish. :( I do think there are some people working on making a crisper case, and hopefully as machine learning systems get more powerful we might even see early demonstrations. I think the crispest statement of it I can make is "Similar to how humans are now optimizing for goals that are not just the genetic fitness evolution wants, other systems which contain optimizers may start optimizing for goals other than the ones specified by the outer optimizer." Another related concept that I've seen (but haven't followed up on) is what johnswentworth calls "Demons in Imperfect Search" [] , which basically advocates for the possibility of runaway inner processes in a variety of imperfect search spaces (not just ones that have inner optimizers). This arguably happened with metabolic reactions early in the development of life, greedy genes, managers in companies. Basically, I'm convinced that we don't know enough about how powerful search mechanisms work to be sure that we're going to end up somewhere we want. I should also say that I think these kinds of arguments feel like the best current cases for AI alignment risk. Even if AI systems end up perfectly aligned with human goals, I'm still quite worried about what the balance of power looks like in a world with lots of extremely powerful AIs running around []. 2. Yeah, here I should have said 'new species more intelligent than us'. I think I was thinking of two things here: * Humans causing the extinction of less intelligent species * Some folk intuition around intelligent aliens plausibly causing human extinction (I admit this isn't the best example...). Mostly I meant here that since we don't actually have examples of existentially risky technology (yet), putting AI in the reference class of 'new technology' might make you think it's extremely impl