586 karmaJoined Aug 2020


Founded Northwestern EA club. Studied Math and Econ.

Starting a trading job in a few months  and self-studying python. Talk to me about cost benefit analysis !


Assume there are two societies that passed the great filter and are now grabby. Society EA and society NOEA. 

Society EA you could say is quite similar to our own society. The majority of the dominant species is not concerned with passing the great filter and most individuals are inadvertently increasing the chance of the species extinction. However, a small contingent had become utilitarian rationalists and speced heavily into reducing x-risk. Since the group passed the great filter, you can assume this is in large part due to this contingent of EAs/guardian angels. 

Now society NOEA is a species that passed the filter, but they didn't have EA rationalists. The only way they were able to pass the filter was because as a species, they are overall quite careful and thoughtful. The whole species rather than a divergent few has enough of a security mindset that there was no special group that "saved" them.

Which species would we prefer to get more control of resources? 

The punchline is that the very fact that we "need" EA on earth might provide evidence that our values are worse than the species that didn't need EA to pass the filter.

I think this would be good. one thing is that In many situations If you can write p(sucess) in a meaningful way then you should consider running a competition instead of grantmaking. Not going to work in every situation but I find this the most fair and transparent when possible. 

I definitely have very little idea what I’m talking about but I guess part of my confusion is inner alignment seems like a capability of ai? Apologies if I’m just confused.

I don't remember specifics but he was looking if you could make certain claims on models acting a certain way on data not in the training data based on the shape and characteristics about the training data. I know that's vague sorry, I'll try to ask him and get a better summary. 

It seems plausible that there are ≥100,000 researchers working on ML/AI in total. That’s a ratio of ~300:1, capabilities researchers:AGI safety researchers.


Barely anyone is going for the throat of solving the core difficulties of scalable alignment. Many of the people who are working on alignment are doing blue-sky theory, pretty disconnected from actual ML models.

One question I'm always left with is: what is the boundary between being an AGI safety researcher and a capabilities researcher?

For instance, My friend is getting his PhD in machine learning, he barely knows about EA or LW, and definitely wouldn't call himself a safety researcher. However, when I talk to him, it seems like the vast majority of his work deals with figuring out how ML systems act when put in foreign situations wrt the training data. 

I can't claim to really understand what he is doing but it sounds to me a lot like safety research. And it's not clear to me this is some "blue-sky theory". A lot of the work he does is high-level maths proofs, but he also does lots of interfacing with ml systems and testing stuff on them. Is it fair to call my friend a capabilities researcher?

So I can choose  then?

Yes. but I think to be very specific, we should call the problems A and B (for instance, the quiz is problem A and the exam is problem B), and a choice to work on problem A equates to spending your resource [1]on problem A in a certain time frame. We can represent this as  where {i} is the period in which we chose a and {j} is the number of times we have picked a before. j is sorta irrelevant for problem A since we only can use one resource max to study but relevant for problem B to represent the diminishing returns via  . 

What do we mean by 'last'? Do you mean that the choice in period 1,   , yields benefits (or costs) in periods 1 and 2, while the choice in period 2,   , only affects outcomes in period 2?

Neither if I'm understanding you correctly. I mean that the Scale of problem A in period 2, , is 0. This also implies that the marginal utility of working on problem A in period 2 is 0. For instance, if I study for my quiz after it happens this is worthless. This is different from the diminishing returns that are at play when repeatedly studying for the same exam. 

This is the extreme end of the spectrum though. We can generalize this by acknowledging that the marginal utility of a certain problem is a function of time. For instance, it's better to knock on doors for an election the day before than 3 years before but probably not infinitely better.

Can you define this a bit? Which 'choices' have different scale, and what does that mean? 

I think I maybe actually used scale as both meaning MU/resource and as meaning: if we solve the entire problem, how much is that worth? Basically, importance, as described in the ITN framework, except maybe I didn't mean it as a function of the percent of work done and rather the total. Generally though, I think people consider this to be a constant (which I'm not sure they should...) but this being the case, we are basically talking about the same thing but they are dividing by a factor of 100, which again doesn't matter for this discussion.

I think what Eliot meant is importance, so that's what I'm going to define it as, but I think you picked up on this confusion which is my bad. 

By choices, I meant the problems, like the quiz or the exam. I think I used the incorrect wording here though since choices also denote a specific decision to spend a resource on a problem. My fault for the confusion. 

Maybe you want to define the sum of benefits 





Yes basically but I think that 




are better notations. although it doesn't really matter, I got what you were saying.

where a and b are positive numbers, and  is a diminishing returns parameter?  

essentially yes but with my notation.

For 'different scale' do you just mean something like ?

No. taking b to mean , b is the marginal utility of spending a resource in period 1 on problem B, not the total utility to be gained by solving problem b.  Using the test example the scale of B is either  since this is the maximum grade I can achieve based on the convergent geometric sum described or 20% since this is the maximum grade total although maybe it's literally impossible for me to reach this. I'm not actually sure which to use, but I guess let's go with 20%, and denote a convergent sum as meaning 

What I meant was  or 20% > 10% in the test example

So this is like, above,   if 

I think this was the point I was trying to make with the examples I gave to you. Basically that the decision at t = 1 in a sequence of decisions that maximizes utility over multiple periods of time is not the same as the decision that maximizes utility at t= 1, which is what I believe you are pointing out here. In effect

But actually, I think the claim I originally made in response to him was actually a lot simpler than that, more along the lines of "A problem being urgent does not mean that its current scale is higher than if it was not urgent". taking U(Ai) to be the Scale of problem A in the ith period, and taking problem A to be urgent to mean , which I'm getting from the op saying

Some areas can be waited for a longer time for humans to work on, name it, animal welfare, transhumanism.

my original claim in response to Elliot is something like

 and  does not imply  


The fact that I get no value out of studying for a Monday quiz on Tuesday doesn't mean the quiz is now worth more than 10% of my grade. On the flip side if the quiz was moved to Wednesday It would still be worth 10% of my grade.

I think it was maybe not what Eliot meant. That being said, taking his words literally I do think this is what he implied. I'm not really sure honestly haha. 

But that's not just 'because a has no value in period ' but also because of the  diminishing returns on b (otherwise I might just choose b in both periods.

Correct. I think there are further specifications that might make my point less niche, but I'm not sure. 

As an aside, I'm not sure I'm correct about any of this but I do wish the forum was a little more logic and math-heavy so that we could communicate better.


  1. ^

    we could model a situation where you have multiple resources in every period but here I choose to model as if you have a single resource to spend in each period

What do you mean by 

the rate at which they will grow or shrink over time.

specifically what mathematical quantity is "they"

I don't full comprehend why we can't include it. It seems like the ITN framework does not describe the future of the marginal utility per resource spent on the problem but rather the MU/resource right now. If we want to generalize the ITN framework across time, which theoretically we need to do to choose a sequence of decisions, we need to incorporate the fact that tractability and scale are functions of time (and even further the previous decisions we make). 

all this is going to do is change the resulting answer from (MU/$) to MU/$(t), where t is time. everything still cancels out the same as before. In practice I don't know if this is actually useful.

Do you know if anyone else has written more about this? 

Load more