Keerthana Gopalakrishnan

247 karmaJoined


An AI killing everyone wouldn't earn a massive penalty in training, because there won't be humans alive in that scenario to assign the penalty.

Humans need not be around to give a penalty at inference time, just like how GPT4 is not penalized by individual humans, but that the reward is learned / programmed. Even if all humans are sleeping / dead today, GPT can run inference according to the reward we preprogrammed. They are not doing pure online learning.

This is where a huge amount of lethality comes from on anything remotely resembling the present paradigm.  Unaligned operation at a dangerous level of intelligence*capability will kill you; so, if you're starting with an unaligned system and labeling outputs in order to get it to learn alignment, the training regime or building regime must be operating at some lower level of intelligence*capability that is passively safe, where its currently-unaligned operation does not pose any threat.

It is a logical fallacy to account for future increase in capabilities but not future advances in safety research. You're claiming AGI will be an x-risk based on scaling current capabilities only, but you're failing to scale safety. Generalization to unsafe scenarios is a situation we want to write tests for before deploying in situations where they may occur. Phase deployment should help test whether we can generalize to increasingly harder situations.

I'd expect the first AGI systems to be built by labs that are pushing full steam ahead on making crazy impressive things happen ASAP, which means you're actively optimizing against minds that are trying to limit their impact, intelligence, or power

The recent push for productization is making everyone realize that alignment is a capability. A gaslighting chatbot is a bad chatbot compared to a harmless helpful one. As you can see currently, the world is phasing out AI deployment, fixing the bugs, then iterating.

You gave an argument that human goals overlap some with the goals of evolution, but you didn't give an argument that humans are non-catastrophic from the (pseudo-)perspective of evolution. That would depend on whether humans will produce lots of copies of human DNA in the future.

Humans are unaligned in various ways, it looks like a lot of AIs will be deployed in the future, many aligned to different objectives. I'm skeptical of MIRI's modeling of risk because y'all only talk about one super-powerful AGI that is godlike, but y'all haven't modeled multiple companies, multiple AGIs, multiple deployments. Unlike the former, this is going to be the most likely scenario that is frequently unmentioned in forecasting. Future compute is going to be distributed among these AGIs too, so in many ways we end up at something akin to a modern society of humans. 

Yep! The orthogonality doesn't just show that unfriendly goals are possible; it shows that friendly goals are possible too.

Then why the overemphasis/obsession on doom scenario? It makes for a great robot-uprising scifi story but is unscientific. If you approximate the likelihood of future scenarios as a gaussian distribution, wiping out all humans is so extreme and long tailed that it is less likely than almost any other scenario in the set, and the least likely scenario in that set has a probability whose limit approaches to zero given the infinite set of possibilities summing up to 1.0. Given that the number of possibilities are infinite, the likelihood of any one possibility is far too small, close to zero. The likelihood of unaligned AGIs jerking each other off in a massive orgy for eternity is as likely as wiping out humans (more likely accounting for resistance to latter scenario). 

There are a few obvious flaws in interpreting the survey as is:

  1. sample size and sampling bias: 13% respondents amount to 700 ish people surveyed. That is fairly small. Secondly, with MIRI and AI impacts's respective logos on the front page of the survey document, it introduces a bias into who is taking the survey, very likely it is just the people familiar with lesswrong etc who has heard of these organizations. Here's why a lot of serious AI researchers don't engage with MIRI et al:
    1. MIRI hasn't shipped a SoTA in AI alignment or AI research in the last 5-6 years
    2. A quick look at the publications on their website shows they don't publish at top ML conferences ( ICML, ICLR, NeurIPS, etc). Less wrong "research" is not research, it is usually philosophy and not backed by performing experiments ( "thought experiments" don't count). 
  2. appeal to authority fallacy: a lot of people saying something doesn't make it true, so I'd advise people to not confuse between "AI research is bad" and "a proportion of surveyed people think AI research could be bad". Some moral outrage in the comment section takes some people feeling this way as evidence for defacto truths, while the reality is that they are contested claims.  
  3. modeling the future: Human beings are notoriously bad at modeling the future. Imagine if we ran a survey in Oct among EAs about FTX health. Not just that modeling the future is hard, but modeling the far future is exponentially harder and existential risk analyses are often incomplete because:
    1. New improvements in AI safety research is not accounted in these projections
    2. Multiagent dynamics of a world with multiple AIs is not modeled in catastrophic scenario projections
    3. Multiagent dynamics with governments/stakeholders is not modeled
    4. Phased deployment: accelerate, then align to use case, then accelerate again, as we are doing today is also not modeled. AI deployment is currently also accelerating alignment research because alignment is needed to build a useful product- a gaslighting chatbot is a bad product compared to a harmless, helpful one.
    5. New research produces new knowledge previously not known.

Anthropic's write up, afaik, is a nuanced take and may be a reasonable starting point for an informed and calibrated take towards AI research. 
I usually don’t engage with AI takes here because it is a huge echo chamber in here but these are my two cents!

And perhaps an illustrative example here is that yes, some in the death metal community do put more status on tattoos while some do not - which, like polyamory to EA, is wholly separate from what the core of the community is actually focused on, namely death metal music. I accept and respect that to some people within the metal community I'm lower status due to not having tattoos. They aren't being mean to me by assigning me lower status.

Nice of you, but I do not accept or respect having lower status in EA due to being monogamous. They are being mean to me and thousands of monogamous women they're recruiting / want to recruit / who are dedicated EAs by assigning us lower status. I am not willing to participate in a community where I have lower status due to factors I didn't choose (race/gender/sexuality), and I'd think many self respecting others will also not put up with what you're calling "relative lower status". No fuck that. We want equality, equal respect, equal opportunity. 

So I'll start by saying it is absolutely awful you feel like the EA community is assigning you lower status for being a monogamous woman. I have to ask though, isn't this something where you can look at the individuals who were this insensitive towards you and call them assholes without calling the EA community as a whole asshole-ish?

I am not calling the whole EA community asshole-ish, but it is big problem here because there are many such individuals. There's no push back against these people that I'm seeing widely either. I'm also confused you think individuals who assign me lower status are assholes after saying above yourself that may be I should be ok with being assigned lower status like you're ok being lower status elsewhere. 

I'm sorry but the death metal-tattoo analogy got lost on me. You can get a tattoo if you chose to, many people can't change their sexual preference, so it's a false comparison. It's like you're saying white people have higher status and you should be ok with that, but I can't paint my face white and become a white person if I wanted to. Secondly tattoos are nothing like sex. Sex involves two people ( often) and conveys a relationship where you can benefit a higher status individual. Your getting a tattoo is not pleasurable to high status men. I do not want to get into the frame of arguing based on this analogy because the analogy doesn't model many complications. 

  • I really doubt I need to make a case that concerns of women are being taken seriously. If they weren't the number of women in EA wouldn't be growing.

If you recruit more women than you hurt and if you drive out and silence the ones who speak up, number of women will grow but that doesn't mean concerns of women are taken seriously. I'm not saying this is happening but your logic is flawed in many ways. Your implied casuality, ie, evidence that women's numbers are growing means concerns of women are taken seriously, is false. 

Likewise, polyamory is quite stigmatised and we don't really have many role-models or representation.

Bay area EA has around 60% poly, so I'd say monogamists are the minority here. 

guess the question then is, if you possibly don't feel it wrong to assign a women more social status for being a good role-model for women as a woman, why do you feel uncomfortable when poly people are assigned more status for being a good role-model for poly people as a poly person themselves?

Your original statement said "if you're poly you're interesting and would be invited to speak on podcasts" so matter-of-factly. That is very different from "if you're a good role-model for poly people as a poly person...". Good role models should get status, I agree, but that's not what you said. The equivalent of what you initially said would be " if you're a man you're interesting and would be invited to speak on clearer thinking podcast etc".


Which also speaks to a broader point: if you're poly you're interesting and get invited to speak on the Clearer Thinking podcasts etc etc


a woman feeling bad and left out when they compare themselves to other women who have gained social status in part for sharing their experiences as women.

Inverted casual reasoning.  But if we look carefully at your first quote the order of events is being poly gets status that converts into the opportunity to speak on podcasts. But in your justification, sharing experiences gets status. Sharing experiences should get more status, but just being poly/mono shouldn't. Being a good role model for monogamous people and sharing that experience should also get higher status, but tell me dear friend, are they getting invited to podcasts? I am amazing at monogamy, the absolute best, do get me an invite. Or are these opportunities gate-kept?


I like big-tent EA so people with near any preference might also like doing EA stuff. They might even be EA leaders. But we shouldn't let their preferences automatically lead us to conclude that is the preference of the community as a whole.

Have you heard of the netflix quote "The actual company values, as opposed to the nice-sounding values, are shown by who gets rewarded, promoted, or let go." Almost all top rationalists are poly, many top EAs are as well. >50% of bay area EA is poly while base population rate is 15% or less. Tell me this is not a preference of the community once again :) Tell me what monogamous people need to do to rise the ranks. 

Ah, I just found this comment after it was referenced elsewhere. My definition of  "heavily downvoted" = "lot of downvotes". My post had around 75 or so downvotes which may be a third of total votes(speaking from memory).

but it is still a fair accusation to say they are being dishonest.

I'm making a credible accusation of harassment at the cost of my reputation, time and mental energy and you're strawman attacking me for two words in the whole excerpt based on a subjective definition of "heavily downvoted" to call me overall dishonest? Dude. 

If some seedy politician got up and said the project was clearly and obviously "heavily downvoted" and nowhere highlighted that actually most people voted in favour of it, you would be fucking pissed and right to accuse him of being dishonest.

My intention was not to point out that most people in EA voted against it which would be a false characterization. The intention of my statement was to convey that there was heavy backlash against my post, which I believe is accurate. The evidence for the claim  that there was "heavy backlash" is "lots of downvotes / a large fraction of downvotes". I wanted to get the information out that I faced a lot of attack for saying what I did( even right now, I am defending my exact choice of words and defending against being characterized as a liar for a small difference in opinion) because that's useful information to survivors, and because it is an out of domain / unexpected response toward people coming out.

Ultimately my words are my words and not your words. Feel free to disagree with my exact framing but be more careful before accusing someone of intentional fabrication/lying and puncturing the overall credibility of people who are already taking much personal and reputational risk to talk about their truths. Dishonesty implies intention. One day, when it's your turn to tell your story, others may dismiss you as well based on small disagreements in adjective use. 

But if what you're saying is that absent any professional setting, absent any coworker or mentee/mentor relationship, people who identify as "EAs" should still not grant anyone any social status for being interesting when the topic of sexuality is brought up... what you're effectively demanding is for thousands of people around the world to change their personality and become less sexually liberal and less open-minded. 

You're using the word sexually liberal / open-minded /interesting interchangeably. Catholic nuns can be interesting, monogamous people can be open minded. Private sexual preferences have nothing to do with interestingness or open minded ness. 

What I think you're talking about is a problem where power dynamics is involved including mentor/mentee relationships and coworkers etc etc. This is a separate topic from the social status increase and feeling-dejected by it that Ivy and I are talking about.

I am not just talking about professional relationships. I'm also talking about what the community should value. Treating women differently as higher/lower status based on their sexuality is simply wrong. A lot of people are intentionally monogamous ( like me). Assigning them default lower status due to their private relationship preference is an awful practice that shouldn't be adopted community wide. 

Which also speaks to a broader point: if you're poly you're interesting and get invited to speak on the Clearer Thinking podcasts etc etc

May be you're struggling to understand my point, so let me try to demonstrate why this sort of language is troubling. If you substitute the word "poly" with "white" ( ethnicity)/"male" (gender)/ "homosexual"(orientation)  or other equivalents, this sentence sounds so wrong. I don't think that my choice / programming of sexuality is something that needs work, I love being monogamous, just like how I don't feel lesser because of my gender or ethnicity. All other things being equal, I want to be given equal status as someone else with a different sexuality, just like how I want the same status as a man / white person / a person of different nationality. That's all.

Likewise it is my problem for feeling like I "need to be better at being poly to be EA." This is something I genuinely feel.

I am sorry you feel this way. 

This is not really entirely different from how when EAs talk about AI Alignment and get status for it that I also feel uncomfortable and left out for not being smart enough. Let their social status increase. It's my problem for feeling insecure, not theirs.

This is actually status working the right way. Status can be used as an incentive to promote behaviors we want from people because humans are great incentive maximizers. We want alignment researchers to gain status by producing high quality alignment work, because EA thinks this work is of high impact. Conversely, we want more people to aspire to become alignment researchers because this work is highly regarded / high status in EA. Unlike promoting certain type of technical AI work over others in EA, the community should not promote a certain type of sexuality  over others. Let EA be about doing good alone, and decoupled from sexuality / race / ethnicity / orientation. 

Again begs the question, why status in a community oriented around “doing good” has anything to do with sexuality and is not uniformly distributed across all sexualities. Status in EA should be a function of doing good and should be sexuality-neutral, period.

I think you’re reframing on a technicality. Status and success are fairly related in many ways in the real world, because status opens doors and signals greater opportunity.

EA might want to hire competent women but competent women might not want to stick around if they're lower status due to factors outside of their control such as sexuality/race/etc.

These discussions are quite enlightening. I had a gut feeling this is how things are but seeing it clearly verbalized confirms my intuition.

It's more like (to me) some boats are sped up by the sex positive-current and other boats miss the wave and just reach their destination the same time they counterfactually would have.


Which also speaks to a broader point: if you're poly you're interesting and get invited to speak on the Clearer Thinking podcasts etc etc. You gain status just due to your private relationship preference in EA, or such is my perception. Nobody cares if you're mono.

To retain competent people you need to sustain a competitive atmosphere. If success is not just a function of impact / work but also a function of sexual liaisons / sexuality, it calls for a toxic culture because one feels compelled to sleep around to get ahead. Even if you're not doing it, your peers are. 

Do you realize how many competent women will be driven out of EA if they are not open to have sexual liaisons? They're not offered a seat on the fast boats, not because they're not smart/hard working but because they're rejecting sex/ have different relationship preference. 

How is that equality of opportunity? How is that inclusive? 

Hi Catherine, thank you for clarifying what measures were taken regarding each instance reported in the TIME article and for directly addressing each point.

Regarding my previous post, here's more context from a previous discussion on why I haven't yet involved CEA's Health team: I'll probably share more thoughts, especially regarding why I spoke to TIME, women-friendly culture updates a movement can take and more perspectives when time permits me to think more clearly about this topic and write them down. Obviously, SA is a high stress discussion; a lot of context is lost in translation and in medium of communication; people can misrepresent/misinterpret; people also have jobs and other commitments; but I'm hoping we will have more clarity over time/ update to a better state overall as a society given enough time. 

Meanwhile, I'd like more clarification on one matter. I'm one of those people who connected Charlotte, the author of the TIME article with the curious case of the Aurora Quinn Elmore, an unofficial SA mediator who interviews people via facebook and recommends actions for accused and accuser in EA-adjacent/rationalist communities in the bay. This person was introduced to the EA-adjacent group house situation by an active EA (out of good intentions/ lack of awareness I think, it was a high stress situation and all sides were acting sub-optimally) and it was told to me that this EA got the idea of involving the mediator from her work of mediating SA cases at CFAR or Center For Applied Rationality. I was told that this person has mediated at least 5+ SA cases as far as this EA knows, and probably more. Can you verify this information? How many cases has she mediated in totality? Why is CFAR with millions in funding using an unofficial individual (who is a PM in her day job) with no formal training in / affiliation to women's organizations to arbitrate SA cases? Some women who have had their situations arbitrated by this mediator have told me that they faced retaliation for speaking up,  that they were informed of a "no-gossip policy", ie, if the mediator has arbitrated the case and ruled in favor of an accused and if the accuser then speaks about the case to her friends or others, she will face consequences up to and including career consequences and being removed from communities.  Can someone from CFAR share more context/data?  Thank you. 

Load more