All of Aleksi Maunu's Comments + Replies

Naively I would trade a lot of clearly-safe stuff being delayed or temporarily prohibited for even a minor decrease in chance of safe-seeming-but-actually-dangerous stuff going through, which pushes me towards favoring a more expansive scope of regulation.

(in my mind the potential loss of decades of life improvements currently pale vs potential non-existence of all lives in the longterm future)

Don't know how to think about it when accounting for public opinion though, I expect a larger scope will gather more opposition to regulation, which could be detrimental in various ways, the most obvious being decreased likelihood of such regulation being passed/upheld/disseminated to other places.

But the difficulty of alignment doesn't seem to imply much about whether slowing is good or bad, or about its priority relative to other goals.

At the extremes, if alignment-to-"good"-values by default was 100% likely I presume slowing down would be net-negative, and racing ahead would look great. It's unclear to me where the tipping point is, what kind of distribution over different alignment difficulty levels one would need to have to tip from wanting to speed up vs wanting to slow down AI progress.

Seems to me like the more longtermist one is, the more sl... (read more)

How is the "secretly is planning to murder all humans" improving the models scores on a benchmark?

(I personally don't find this likely, so this might accidentally be a strawman)

For example: planning and gaining knowledge are incentivized on many benchmarks -> instrumental convergence makes model instrumentally value power among other things -> a very advanced system that is great at long-term planning might conclude that "murdering all humans" is useful for power or other instrumentally convergent goals

 

You could prove this. Make a psychopathic

... (read more)

GPT-4 doesn't have the internal bits which make inner alignment a relevant concern.

Is this commonly agreed upon even after fine-tuning with RLHF? I assumed it's an open empirical question. The way I understand is is that there's a reward signal (human feedback) that's shaping different parts of the neural network that determines GPT-4's ouputs, and we don't have good enough interpretability techniques to know whether some parts of the neural network are representations of "goals", and even less so what specific goals they are.

I would've thought it's an ope... (read more)

2
RobertM
7mo
I don't know if it's commonly agreed upon; that's just my current belief based on available evidence (to the extent that the claim is even philosophically sound enough to be pointing at a real thing).

Anyone else not able to join the group through the link? 🤔 It just redirects me to the dashboard without adding me in

 

1
annaleptikon
9mo
Focusmate has changed a lot since this post was published; maybe invite links are disabled by now.

In those cases I would interpret agree votes as "I'm also thankful" or "this has also given me a lot to think about"

I think the stated reasoning there by OP is that it's important to influence OpenAI's leadership's stance and OpenAI's work on AI existential safety. Do you think this is unreasonable?

To be fair I do think it makes a lot of sense to invoke nepotism here. I would be highly suspicious of the grant if I didn't happen to place a lot of trust in Holden Karnofsky and OP.

(feel free to not respond, I'm just curious)

8
Ofer
1y
I do not think that reasoning was unreasonable, but I also think that deciding to give $30M to OpenAI in 2017 was not obviously net-positive, and it might have been one of the most influential decisions in human history (e.g. due to potentially influencing timelines, takeoff speed and the research trajectory of AGI, due to potentially[1] inspiring many talented people to pursue/invest in AI, and due to potentially[1:1] increasing the number of actors who competitively pursue the development of AGI). Therefore, the appointments of the fiancée and her sibling to VP positions, after OpenPhil's decision to recommend that $30M grant, seems very problematic. I'm confident that HK consciously judged that $30M grant to be net-positive. But conflicts of interest can easily influence people's decisions by biasing their judgment and via self-deception, especially with respect to decisions that are very non-obvious and deciding either way can be reasonable. Furthermore, being appointed to VP at OpenAI seems very financially beneficial (in expectation) not just because of the salary from OpenAI. The appointments of the fiancée and her sibling to VP positions probably helped them to successfully approach investors as part of their effort to found Anthropic, which ended up raising at least $704M. HK said in an interview: You wrote: I don't think there is a single human on earth whose judgement can be trusted to be resilient to severe conflicts of interest while making such influential non-obvious decisions (because such decisions can be easily influenced by biases and self-deception). ---------------------------------------- 1. EDIT: added "potentially"; the $30M grant may not have caused those things if OpenAI would have sufficiently succeeded without it. ↩︎ ↩︎

I think if I was issuing grants, I would use misleading language in such a letter to make it less likely that the grantee organization can't get registered for some bureaucracy reasons. It's possible to mention that to the grantee in an email or call too to not cause any confusion. My guess would be that that's what happened here but that's just my 2 cents. I have no relevant expertise.

2
LGS
1y
I agree this seems likely. I think it's bad to use misleading language to help neo-Nazi organizations pass bureaucratic checks, though, and I'm concerned that FLI showed no remorse for this. My guess is that what happened here is related to Tegmark's brother -- the brother wanted SND to be registered and had the organization ask FLI for a letter. I'm not sure, though, and I think the information we've received so far from FLI is insufficient and likely deceptive.

Thanks for the comment! I feel funny saying this without being the author, but feel like the rest of my comment is a bit cold in tone, so thought it's appropriate to add this :) 

 

I lean more moral anti-realist but I struggle to see how the concept of "value alignment" and "decision-making quality" are not similarly orthogonal from a moral realist view than an anti-realist view. 

Moral realist frame: "The more the institution is intending to do things according to the 'true moral view', the more it's value-aligned."

"The better the institutions... (read more)

1
Indra Gesink
1y
Thanks for the post and for taking the time! My initial thoughts on trying to parse this are below, I think it will bring mutual understanding further. You seem to make a distinction between intentions on the y-axis and outcomes on the x-axis. Interesting!  The terrorist example seems to imply that if you want bad outcomes you are not value-aligned (aligned to what? to good outcomes?). They are value-aligned from their own perspective. And "terrorist" is also not a value-neutral term, for example Nelson Mandela was once considered one, which would I think surprise most people now. If we allow "from their own perspective" then "effectiveness" would do (and "efficiency" to replace the x-axis), but it seems we don't, and then "altruism" (or perhaps "good", with less of an explicit tie to EA?) would without the ambiguity "value-aligned" brings on whether or not we do [allow "from their own perspective"]. (As not a moral realist, the option of "better value" is not available, so it seems one would be stuck with "from their own perspective" and calling the effective terrorist value-aligned, or moving to an explicit comparison to EA values, which I was supposing was not the purpose, and seems to be even more off-putting via the mentioned alienating shortcoming in communication.) Next to value-aligned being suboptimal, which I also just supported further, you seem to agree with altruism and effectiveness (I would now suggest "efficiency" instead) as appropriate labels, but agree with the author about the shortcoming for communicating to certain audiences (alienation), with which I also agree. For other audiences, including myself, the current form perhaps has shortcomings. I would value clarity more, and call the same the same. An intentional opaque-making change of words might additionally come across as deceptive, and as aligned with one's own ideas of good, but not with such ideas in a broader context. And that I think could definitely also count as / become a conse

(not the author)

 

4. When I hear "(1) IIDM can improve our intellectual and political environment", I'm imagining something like if the concept of steelmanning becomes common in public discourse, we might expect that to indirectly lead to better decisions by key institutions. 

Does anyone have thoughts on

  1. How does the FTX situation affect the EV of running such a survey? My first intuition is that running one while the situation's so fresh is worse than waiting a 3-6 months, but I can't properly articulate why.

  2. What, if any, are some questions that should be added, changed, or removed given the FTX situation?

For what it's worth connecting SBF and Musk might've been a time sensitive situation for a reason or another. There would've also still been time to debate the investment in the larger community before the deal would've actually gone through.

1
Aleks_K
1y
Seems quite implausible to me that this would have happened and unclear if it would have been good. (Assuming "larger EA community" implies more than private conversations between a few people. )

Small note: the title made me think the platform is made by the organization Open AI

5
FlorentBerthet
2y
Same. I suggest "AI Safety Ideas: a collaborative AI safety research platform"
3
Yadav
2y
Yeah, I thought this too. 

(After writing this I thought of one example where the goals are in conflict: permanent surveillance that stops the development of advanced AI systems. Thought I'd still post this in case others have similar thoughts. Would also be interested in hearing other examples.)

 

I'm assuming a reasonable interpretation of the proxy goal of safety means roughly this: "be reasonably sure that we can prevent AI systems we expect to be built from causing harm". Is this a good interpretation? If so, when is this proxy goal in conflict with the goal of having "thing... (read more)

4
Linch
2y
Economic degrowth/stagnation is another example of something that prevents AI doom but will be very bad to have in the long run.

I'd be interested in the historical record for similar industries, could you quickly list some examples that come to your mind? No need to elaborate much.

Interesting, I hadn't thought of the anchoring effect you mention. One way to test this might be to poll the same audience about other more outlandish claims, something like the probability of x-risk from alien invasion, or CERN accidentally creating a blackhole. 

disclaimer: I don't feel like I know much about wild animal welfare, last read about it about 2 years ago

 

You're right, I think suffering-focused wasn't the right term to use, as all WAW interventions that come to my mind are about reducing animals' suffering. I should've asked if you're assuming that WAW people think that:

  1. animals' lives are usually net-negative
  2. the best way to help them and future animals  is to kill them / cause them to not exist

I would guess that (1) is a common belief, but that a only a minority of people who work in WAW belie... (read more)

Doesn't this depend on assuming negative utilitarianism, and suffering-focused ethic, or a particular set of assumptions about the net pleasure vs pain in the life of an 'average' animal?

 

I don't think it depends on those things,  what they meant by species not being inherently valuable is that each individual of a species is inherently valuable. It's a claim that the species' value comes from the value of the individuals (not taking into account value from stuff like possibly making ecological collapse less likely etc).

(I only read the beginning of your comment, sorry for not responding to the rest!)

To the extent that moral uncertainty pushes you to give more credence to common sense ethical views, it does point towards prioritizing biodiversity more than a consequentialist view would otherwise imply, as "let's preserve species" and "let's preserve option value" are common sense ethical views. Probably not enough to affect prioritization in practice though.

How does biodiversity conflict with WAW? I would imagine that there's many  possible interventions which are good both in terms of increasing the wellbeing of animals in the wild, and in keeping species from going extinct.  Are you assuming a suffering-focused view of WAW?

1
emwalz
2y
It's a complex interaction for sure, and gets into some thorny questions that inevitably run into making evaluations of lived (nonhuman) experiences we can't possibly claim to understand. Using longterm scales makes it clear that this issue will determine whether billions of trillions of individuals get to experience life. It's clear that certain land dynamics lead to more life, so if there's more inhabitable space on Earth's surface it means there's more room for life.  Then those with decision making power have to take a philosophical stance as to whether to ensure more possibility space for life or make twisted conclusions about QALYs we don't really understand... 
2
Guy Raveh
2y
That's certainly the only one I've ever seen. Can you give an example of such a view, and such an intervention? Or describe how [an organisation] will find or test one?

Can you expand on this? Is it that anything to do with genes is controversial? Maybe also the possibility that success in this could increase rich people's societal advantages over poor people even more? (I listened to the post yesterday and might've forgotten some key points)

Great job on the talk! :)

I'd be curious to know in more detail how giving the books to the audience was done

1
Max Clarke
2y
For an overview of most of the current efforts into "epistemic infrastructure", see the comments on my recent post here https://forum.effectivealtruism.org/posts/qFPQYM4dfRnE8Cwfx/project-a-web-platform-for-crowdsourcing-impact-estimates-of

Thanks a lot for this comment! I think delving into the topic of epistemic learned helplessness will help me  learn how to form proper inside views, which is something I've been struggling with.

 

I'm very worried about this ceasing to be the case.

Are you worried just because it would be really bad if EA in the future (say 5 years)  was much worse at coming to correct conclusions, or also because you think it's likely that will happen?

2
Thomas Kwa
2y
I'm not sure how likely this is but probably over 10%? I've heard that social movements generally get unwieldier as they get more mainstream. Also some people say this has already happened to EA, and now identify as rationalists or longtermists or something. It's hard to form a reference class because I don't know how much EA benefits from advantages like better organization and currently better culture. To form proper inside views I'd also recommend reading this post, which (in addition to other things) sketches out a method for healthy deference:

Just a heads up: there's an extra "." at the end of the link you posted

1
Jonas Moss
2y
Fixed!

Thanks for the post! I especially enjoyed the mini EA forum literature review aspect of it. 😄

 I personally definitely feel a disconnect between my intellectual understanding and feelings about suffering in the world, and am hoping meditation will help me have my emotions match my understanding more.

1
Merlin Herrick
2y
Thanks for the kind comment! Yes, connecting your feelings and intellectual understanding can be difficult, and meditation has helped me a lot with this. Good luck with your practice and feel free to private message me if you have any questions (bearing in mind I'm only a beginner myself). 

I wonder how one could explain the pleasures of learning about a subject as contentment, relief, or anticipated relief. Maybe they'd describe it as getting rid of the suffering-inducing desire to get knowledge / acceptation from peers / whatever motivates people to learn?

I'm sure it would be possible to find meditators who came to the opposite conclusion about well-being.

If someone reading this happens to know of any I'd be interested to know! I wouldn't be that surprised if they were very rare, since my (layman) impression is that Buddhism aligns well with suffering-focused ethics, and I assume most meditators are influenced by Buddhism.

4
Teo Ajantaival
2y
Pleasures of learning may be explained by closing open loops, which include unsatisfied curiosity and reflection-based desires for resolving contradictions. And I think anticipated relief is implicitly tracking not only the unmet needs of our future self, but also the unmet needs of others, which we have arguably 'cognitively internalized' (from our history of growing up in an interpersonal world). Descriptively, some could say that pleasure does exist as a 'separable' phenomenon, but deny that it has any independently aggregable axiological value. Tranquilism says that its pursuit is only valuable insofar as there was a craving for its presence in the first place. Anecdotally, at least one meditator friend agreed that pleasure is something one can 'point to' (and that it can be really intense in some jhana states), but denied that those states are all that interesting compared to the freedom from cravings, which also seems like the main point in most of Buddhism.

We've also been toying around with this idea in Helsinki University and Aalto University, haven't done anything concrete yet though.

80k has some (short) pointers here: https://80000hours.org/2020/08/ideas-for-high-impact-careers-beyond-our-priority-paths/#become-a-public-intellectual

CGP Grey is great! I'm also a fan of exurb1a's channel, they have many videos with EA-adjacent themes. This one sticks out to me as moving EA content: https://youtu.be/n__42UNIhvU

3
SiebeRozendal
2y
I find exurb1a a bit too nihilistic.. the creator has also been accused of highly abusive behavior, so I feel iffy about the channel. (Sorry, no time to search the link for you)

"Accredited Investors can join Angel Investment Networks and other exclusive communities that provide unique opportunities for high impact."

 

Can you expand on this, what kinds of opportunities are you thinking of? Funding startups that have potential to do good in an EA sense? Influencing high net worth individuals' donations? Making lots of money to donate?

1
Alex Barnes
2y
All of the above. I went to the Angel Investors Ontario meeting yesterday and invited 5 founders to attend as guests. Only one was from EA; 2 were from Complexity Weekend. I introduced them to angels, VCs, someone who works for the Ontario Premier and someone who has met Obama multiple times. At the event, the head of Angel Investors Ontario interviewed the CEO of Sanofi and the Conservative Finance Minister for Ontario. The Liberal Federal Finance Minister / Deputy Prime Minister also presented (she is a Rockstar! I say that as someone who is about to renew my Conservative Party membership) I'm not sure if someone needs to run an RCT on this...

Can you give a bit more context about what you're looking for? Is this a thought experiment type of thing? 

I think it's great that these are being posted on the forum! I've often found that I'd like to discuss an episode after listening to it, but haven't known of a place with active discussion on it (other than twitter, which feels too scattered to me).

Thanks for the post!

We'll probably be trying this out at EA Helsinki.

2
Neel Nanda
3y
I'm curious whether you ended up trying these out?
2
Neel Nanda
3y
Ah, awesome! I'd love to hear how it goes

I agree that the discussion in that subreddit is not very good.

Do you think it would be a good idea to encourage EAs in other spaces to upvote a post about this and have it be the most upvoted post on the sub? So people see it when they sort by top of all time. Currently the most upvoted post is at 261, not a lot. 

Reasons against this:

-Vote manipulation or something

-Maybe such a post could leave a negative impression of EA (framing is very important here)

-Such a post could stay in the top even after the subreddit becomes better, although in that case ... (read more)

3
EricHerboso
3y
I would strongly argue against this, primarily because it is against Reddit's rules. Although subreddits do get to choose many of the policies in their own space, vote manipulation is a rule that is enforced site-wide.

This could be relevant:

https://futureoflife.org/2020/10/15/stephen-batchelor-on-awakening-embracing-existential-risk-and-secular-buddhism/

hey, thanks for this post! I find it quite nice.