Konstantin

427Zürich, SchweizJoined Oct 2021

Bio

Participation
7

Happy to chat about AI governance, meta-ethics, movement building, and many other things. Just text me and we can schedule a call!

How others can help me

Skillbuilding in alignment and AI governance. Want to be my study buddy? Hit me up with any projects/ reading groups you want to start :)

I'm also organizing a reading group in AI governance for people with a bit of background knowledge (e.g. having done the AGISF gov track); get in touch if you'd like to join

 

Please give me (anonymous) feedback on our interaction.

How I can help others

Ask me about transitioning from Biology to AI governance, how to prioritize between longtermist cause areas and how to get EA content translated to your native language.

Comments
61

Update: Now in Zurich, Switzerland working as RA in AI Governance.
 

  • Looking for part-time AI gov jobs from March onward.
  • Skills gained: AI safety knowledge, Python, Writing, generalist research

I think you're basically right in your points, but they are not enough to say that climate change is nearly as bad as biorisk or AI misalignment. You may get close to nuclear risk, but I'm skeptical of that as well. My main point is that extinction from climate is much more speculative than from the other causes.

Reasons: 

  1. There is some risk of a runaway climate change. However, this risk seems small according to GWWC's article + it would be overconfident to say that humanity can't protect itself against it with future technology. There is also much more time left until we get to > 5° of warming than until the risk of engineered pathogens and powerful AI rises quickly.
  2. Climate change will be very destabilizing. However, it's very hard to predict the long-term consequences of this, so if you're motivated by a longtermist framework, you should focus on tackling the more plausible risks of engineered pathogens and misaligned AI more directly. One caveat here is the perspective of cascading risks, which EA is not taking very seriously at the moment.
  3. The impacts on life quality are not convincing from a longtermist standpoint as I expect them to last much less than 1000 years whereas humanity and its descendants could live for billions of years. I also expect only a tiny fraction of future sentiences to live on earth.

Another thought I often miss in debates on x-risk from climate change is that humans would likely intervene in climate at some stage if it's a serious threat to our economies and even lives. I haven't seen anyone make this point before, but please point me to sources.

If you are still new to EA, you may understand the current position better as you learn more about the pressingness of biorisk and especially AI risk. That said, there is probably room for some funding for climate change from a longtermist perspective, and given the uncertainty surrounding cascading risks, I'd be happy to see a small fraction of longtermist resources directed to this problem.

Since they are not on the menu, these are the options: 

1. Risotto with green beans, zucchini and onions with red pesto, 13.50€

2. Pasta with broccoli, onions and beetroot sauce + roasted almonds 12.70€

3. Fried tofu with salad, olives, dried tomatoes, cucumber, bell peppers french dressing and baguette 13.90€

 

Please bring cash as the restaurant doesn’t accept payment by card

Re: bioweapons convention: Good point, so maybe not as straightforward as I described.

Re: predicting AI: You can always not publish the research you are doing or only inform safety-focused institutions about it. I agree that there are some possible downsides to knowing more precisely when AI will be developed, but there seem to be much worse downsides to not knowing when AI will be developed (mainly that nobody is preparing for it policy- and coordination-wise)
I think the biggest risk is getting governments too excited about AI. So I'm actually not super confident that any work on this is 10x more likely to be positive.

Re: policy & alignment: I'm very confident, that there is some form of alignment work that is not speeding up capabilities, especially the more abstract one. Though I agree on interpretability. On policy, I would also be surprised if every avenue of governance was as risky as you describe. Especially laying out big picture strategies and monitoring AI development seem pretty low-risk.

Overall, I think you have done a good job scrutinizing my claims and I'm much less confident now. Still, I'd be really surprised if every type of longtermist work was as risky as your examples - especially for someone as safety-conscious as you are. (Actually, one very positive thing might be criticizing different approaches and showing their downsides)
 

Note that even if alignment research may sometimes speed up AI development, most AI safety work is still making alignment more likely overall. So I agree that there are downsides here, but it seems really wild to think that it would be better not to do any alignment research instead.

That's also my understanding. However, Will probably has some power over it. I.e. he can talk to his literary agent to actively approach publishers and even offer money to foreign publishers for translating the book.

Some ideas for career paths that I think have a very low chance of terrible outcomes and a reasonable chance to do a ton of good for the long-term future (I'm not claiming that they definitely  will be net-positive, I'm claiming they are more than 10x more likely to be net positive than to be net negative):

  • Developing early warning systems for future pandemics (and related work) (technical bio work)
  • Strengthening the bioweapons convention and building better enforcement mechanisms (bio policy)
  • Predicting how fast powerful AI is going to be developed to get strategic clarity (AI strategy)
  • Developing theories of how to align AI and reasoning about how they could fail (AI alignment research)
  • Building institutions that are ready to govern AI effectively once it starts being transformative (AI governance)

Besides these, I think that almost all work longtermists work on today has a positive expected value, even if it has large downsides. Your comparison to deworming isn't perfect. Failed deworming is not causing direct harm. It is still better to give money to ineffective deworming than to do nothing.

Please try to get this book translated into as many languages as possible! I think it's a great chance to get attention to longtermism in non-English countries too. Happy to assist with organizing a German translation!

Agree that it depends a lot on the training procedure. However, I think that given high situational awareness, we should expect the AI to know its shortcomings very well. 

So I agree that it won't be able to do a backflip on the first try. But it will know that it would likely fail and thus not rely on plans that require backflips or if it needs backflips it will find a way of learning them without being suspicious. (I.e. by manipulating a human into training it to learn backflips)

I think overthrowing humanity is certainly hard. But it still seems possible for a patient AGI that slowly accumulates wealth and power by exploiting human conflicts, getting involved in crucial economic processes, and potentially gaining control of communication systems in the military with deepfakes & the wealth and power it has accumulated. (And all this can be done by just interacting with a computer interface as in Cotra's example) It's also fairly likely that there are some exploits in the way humans work that we are not aware of that the AGI would learn from being trained with tons of data that would make it even easier.

So overall, I agree the AGI will have bugs, but it will also know it likely has bugs and thus will be very careful with any attempts at overthrowing humanity.

Interesting perspective. Though leaning on Cotra's recent post, if the first AGI will be developed by iterations of reinforcement learning in different domains, it seems likely that will develop a rather accurate view of the world, as that will give the highest rewards. This means the AGI will have high situational awareness. I.e., it will know that it's an AGI and it will very likely know about human biases. I thus think it will also be aware that it contains mental bugs itself and may start actively trying to fix them (since that will be reinforced as it gives higher rewards in the longer run).
I thus think that we should expect it to contain a surprisingly low number of very general bugs such as weird ways of thinking or false assumptions in its worldview. 
That's why I believe the first AGI will already be very capable and smart enough to hide for a long time until it strikes and overthrows its owners.

Load More