But the difficulty of alignment doesn't seem to imply much about whether slowing is good or bad, or about its priority relative to other goals.
At the extremes, if alignment-to-"good"-values by default was 100% likely I presume slowing down would be net-negative, and racing ahead would look great. It's unclear to me where the tipping point is, what kind of distribution over different alignment difficulty levels one would need to have to tip from wanting to speed up vs wanting to slow down AI progress.
Seems to me like the more longtermist one is, the more sl...
How is the "secretly is planning to murder all humans" improving the models scores on a benchmark?
(I personally don't find this likely, so this might accidentally be a strawman)
For example: planning and gaining knowledge are incentivized on many benchmarks -> instrumental convergence makes model instrumentally value power among other things -> a very advanced system that is great at long-term planning might conclude that "murdering all humans" is useful for power or other instrumentally convergent goals
...You could prove this. Make a psychopathic
GPT-4 doesn't have the internal bits which make inner alignment a relevant concern.
Is this commonly agreed upon even after fine-tuning with RLHF? I assumed it's an open empirical question. The way I understand is is that there's a reward signal (human feedback) that's shaping different parts of the neural network that determines GPT-4's ouputs, and we don't have good enough interpretability techniques to know whether some parts of the neural network are representations of "goals", and even less so what specific goals they are.
I would've thought it's an ope...
Anyone else not able to join the group through the link? 🤔 It just redirects me to the dashboard without adding me in
In those cases I would interpret agree votes as "I'm also thankful" or "this has also given me a lot to think about"
I think the stated reasoning there by OP is that it's important to influence OpenAI's leadership's stance and OpenAI's work on AI existential safety. Do you think this is unreasonable?
To be fair I do think it makes a lot of sense to invoke nepotism here. I would be highly suspicious of the grant if I didn't happen to place a lot of trust in Holden Karnofsky and OP.
(feel free to not respond, I'm just curious)
I think if I was issuing grants, I would use misleading language in such a letter to make it less likely that the grantee organization can't get registered for some bureaucracy reasons. It's possible to mention that to the grantee in an email or call too to not cause any confusion. My guess would be that that's what happened here but that's just my 2 cents. I have no relevant expertise.
Thanks for the comment! I feel funny saying this without being the author, but feel like the rest of my comment is a bit cold in tone, so thought it's appropriate to add this :)
I lean more moral anti-realist but I struggle to see how the concept of "value alignment" and "decision-making quality" are not similarly orthogonal from a moral realist view than an anti-realist view.
Moral realist frame: "The more the institution is intending to do things according to the 'true moral view', the more it's value-aligned."
"The better the institutions...
(not the author)
4. When I hear "(1) IIDM can improve our intellectual and political environment", I'm imagining something like if the concept of steelmanning becomes common in public discourse, we might expect that to indirectly lead to better decisions by key institutions.
Does anyone have thoughts on
How does the FTX situation affect the EV of running such a survey? My first intuition is that running one while the situation's so fresh is worse than waiting a 3-6 months, but I can't properly articulate why.
What, if any, are some questions that should be added, changed, or removed given the FTX situation?
For what it's worth connecting SBF and Musk might've been a time sensitive situation for a reason or another. There would've also still been time to debate the investment in the larger community before the deal would've actually gone through.
(After writing this I thought of one example where the goals are in conflict: permanent surveillance that stops the development of advanced AI systems. Thought I'd still post this in case others have similar thoughts. Would also be interested in hearing other examples.)
I'm assuming a reasonable interpretation of the proxy goal of safety means roughly this: "be reasonably sure that we can prevent AI systems we expect to be built from causing harm". Is this a good interpretation? If so, when is this proxy goal in conflict with the goal of having "thing...
I'd be interested in the historical record for similar industries, could you quickly list some examples that come to your mind? No need to elaborate much.
Interesting, I hadn't thought of the anchoring effect you mention. One way to test this might be to poll the same audience about other more outlandish claims, something like the probability of x-risk from alien invasion, or CERN accidentally creating a blackhole.
disclaimer: I don't feel like I know much about wild animal welfare, last read about it about 2 years ago
You're right, I think suffering-focused wasn't the right term to use, as all WAW interventions that come to my mind are about reducing animals' suffering. I should've asked if you're assuming that WAW people think that:
I would guess that (1) is a common belief, but that a only a minority of people who work in WAW belie...
Doesn't this depend on assuming negative utilitarianism, and suffering-focused ethic, or a particular set of assumptions about the net pleasure vs pain in the life of an 'average' animal?
I don't think it depends on those things, what they meant by species not being inherently valuable is that each individual of a species is inherently valuable. It's a claim that the species' value comes from the value of the individuals (not taking into account value from stuff like possibly making ecological collapse less likely etc).
(I only read the beginning of your comment, sorry for not responding to the rest!)
To the extent that moral uncertainty pushes you to give more credence to common sense ethical views, it does point towards prioritizing biodiversity more than a consequentialist view would otherwise imply, as "let's preserve species" and "let's preserve option value" are common sense ethical views. Probably not enough to affect prioritization in practice though.
How does biodiversity conflict with WAW? I would imagine that there's many possible interventions which are good both in terms of increasing the wellbeing of animals in the wild, and in keeping species from going extinct. Are you assuming a suffering-focused view of WAW?
Can you expand on this? Is it that anything to do with genes is controversial? Maybe also the possibility that success in this could increase rich people's societal advantages over poor people even more? (I listened to the post yesterday and might've forgotten some key points)
Great job on the talk! :)
I'd be curious to know in more detail how giving the books to the audience was done
Thanks a lot for this comment! I think delving into the topic of epistemic learned helplessness will help me learn how to form proper inside views, which is something I've been struggling with.
I'm very worried about this ceasing to be the case.
Are you worried just because it would be really bad if EA in the future (say 5 years) was much worse at coming to correct conclusions, or also because you think it's likely that will happen?
Thanks for the post! I especially enjoyed the mini EA forum literature review aspect of it. 😄
I personally definitely feel a disconnect between my intellectual understanding and feelings about suffering in the world, and am hoping meditation will help me have my emotions match my understanding more.
I wonder how one could explain the pleasures of learning about a subject as contentment, relief, or anticipated relief. Maybe they'd describe it as getting rid of the suffering-inducing desire to get knowledge / acceptation from peers / whatever motivates people to learn?
I'm sure it would be possible to find meditators who came to the opposite conclusion about well-being.
If someone reading this happens to know of any I'd be interested to know! I wouldn't be that surprised if they were very rare, since my (layman) impression is that Buddhism aligns well with suffering-focused ethics, and I assume most meditators are influenced by Buddhism.
We've also been toying around with this idea in Helsinki University and Aalto University, haven't done anything concrete yet though.
80k has some (short) pointers here: https://80000hours.org/2020/08/ideas-for-high-impact-careers-beyond-our-priority-paths/#become-a-public-intellectual
CGP Grey is great! I'm also a fan of exurb1a's channel, they have many videos with EA-adjacent themes. This one sticks out to me as moving EA content: https://youtu.be/n__42UNIhvU
"Accredited Investors can join Angel Investment Networks and other exclusive communities that provide unique opportunities for high impact."
Can you expand on this, what kinds of opportunities are you thinking of? Funding startups that have potential to do good in an EA sense? Influencing high net worth individuals' donations? Making lots of money to donate?
Can you give a bit more context about what you're looking for? Is this a thought experiment type of thing?
I think it's great that these are being posted on the forum! I've often found that I'd like to discuss an episode after listening to it, but haven't known of a place with active discussion on it (other than twitter, which feels too scattered to me).
I agree that the discussion in that subreddit is not very good.
Do you think it would be a good idea to encourage EAs in other spaces to upvote a post about this and have it be the most upvoted post on the sub? So people see it when they sort by top of all time. Currently the most upvoted post is at 261, not a lot.
Reasons against this:
-Vote manipulation or something
-Maybe such a post could leave a negative impression of EA (framing is very important here)
-Such a post could stay in the top even after the subreddit becomes better, although in that case ...
This could be relevant:
https://futureoflife.org/2020/10/15/stephen-batchelor-on-awakening-embracing-existential-risk-and-secular-buddhism/
Naively I would trade a lot of clearly-safe stuff being delayed or temporarily prohibited for even a minor decrease in chance of safe-seeming-but-actually-dangerous stuff going through, which pushes me towards favoring a more expansive scope of regulation.
(in my mind the potential loss of decades of life improvements currently pale vs potential non-existence of all lives in the longterm future)
Don't know how to think about it when accounting for public opinion though, I expect a larger scope will gather more opposition to regulation, which could be detrimental in various ways, the most obvious being decreased likelihood of such regulation being passed/upheld/disseminated to other places.