Jakub Kraus

Working (0-5 years experience)
312Joined May 2021

Bio

Running an AI safety group at the University of Michigan. https://maisi.club/

Email: jakraus@umich.edu

Anonymous feedback: https://www.admonymous.co/jakraus

Comments
105

Naively, it seems as if killing everyone would earn AI a massive penalty in training: why would it develop aims that are consistent with doing that?

There are multiple cognitive strategies that succeed in a training regime that heavily penalizes killing humans (even just one human), such as:

  1. avoid killing humans at all times
  2. avoid killing humans when someone will notice
  3. avoid killing humans during training

How do you incentivize (1)?

The weightings use "annoying pain" as a baseline. How many units of annoying pain would you exchange for a unit of moderate happiness? And then how many units of moderate happiness would you trade for a unit of various pleasant experiences (maybe stuff related to psychedelics, food, nature, music, meditation, exercise, laughter, love, success, beauty, relaxation, fulfillment, etc)?

I imagine the answers to the above questions vary significantly from person to person. I'd be keen to see any existing research on this topic.

Also, maybe I missed it, but the "Question 2" section seems to exclude any detailed contemplation of the value of various pleasant experiences. This makes the analysis seem imbalanced to me.

The empirical reason is that being a newsletter subscriber correlates with having spent about twice the recorded time on our website, compared to non-subscribers.

I feel like the causality could be "people spend a lot of time on the website" --> "people subscribe to the newsletter" rather than the other way? 

And how does 80k record people's time spent on the website?

You could think about a random normal distribution of estimated-value clustered around the true value of the action. The more actors (estimated-values you draw from the normal distribution) the more likely you are to get an outlier who thinks the value is positive when it is actually negative. 

Additionally, the people willing to act unilaterally are more likely to have positively biased estimates of the value of the action:

As you noted, some curse arguments are symmetric, in the sense that they also provide reason to expect unilateralists to do more good. Notably, the bias above is asymmetric; it provides a reason to expect unilateralists to do less good, with no corresponding reason to expect unilateralists to do more good.

These posts provide some interesting points:

I'd like to see more posts like these (including counterarguments or reviews (example 1, example 2)), since timelines are highly relevant to career plans.

I have a hypothesis that some people are updating towards shorter timelines because they didn't pay much attention to AI capabilities until seeing some of 2022's impressive (public) results. Indeed, 2022 included results like LaMDA, InstructGPT, chain-of-thought prompting, GopherCite, Socratic Models, PaLM, PaLM-SayCan, DALL-E 2, Flamingo, Gato, AI-assisted circuit design, solving International Math Olympiad problems, Copilot finishing its preview period, Parti, VPT, Minerva, DeepNash, Midjourney entering open beta, AlphaFold Protein Structure Database expanding from nearly 1 million to over 200 million structures, Stable Diffusion, AudioLM, ACT-1, Whisper, Make-A-Video, Imagen Video, AlphaTensor, CICERO, ChatGPT, RT-1, answering medical questions, and more.

Did Superintelligence have a dramatic effect on people like Elon Musk? I can imagine Elon getting involved without it. That involvement might have been even more harmful (e.g. starting an AGI lab with zero safety concerns).

Here's one notable quote about Elon (source), who started college over 20 years before Superintelligence:

In college, he thought about what he wanted to do with his life, using as his starting point the question, “What will most affect the future of humanity?” The answer he came up with was a list of five things: “the internet; sustainable energy; space exploration, in particular the permanent extension of life beyond Earth; artificial intelligence; and reprogramming the human genetic code.”

Overall,  causality is multifactorial and tricky to analyze, so concepts like "causally downstream" can be misleading. 

(Nonetheless, I do think it's plausible that publishing Superintelligence was a bad idea, at least in 2014.)

My second suggestion is to explicitly connect the present to the future. Compare these two examples:

Example 1: 

In the future, your doctor could be an AI.

Example 2: 

In the future, your doctor could be an AI. Here’s how it could happen: ...

I think the main issue with example 1 is that it lacks detail. I think a solution is to be as concrete and specific as possible when describing possible futures, and note when you're uncertain.

What I would find helpful is a list of potential career pathways in the AI safety space, categorised by the level of technical skills you’ll need (or not) to pursue them.

I'm not sure if this is currently possible to make, because there are very few established career paths in AI safety (e.g. "people have been doing jobs involving X for the past 10 years and here's the trajectory they usually follow"), especially outside of technical research and engineering careers. I did make a small list of roles at maisi.club/help; but again, it's hard to find clear examples of what these career paths actually look like.

Say that you are compromised if it is easy for someone to shame you.
...
Lots of people on this forum have struggled with the feeling of being compromised. Since FTX. Or Leverage. Or Guzey. Or Thiel. Or Singer. Or Mill or whatever.[4]
...

You will make mistakes, and people will rightly hold you to them.[7] It will feel terrible.

I'm confused why you're including Guzey and Thiel in this list. It doesn't seem like Guzey's critique is a mistake that he should "feel terrible" about (although I only did a quick skim), and Torres mentions Thiel exactly once in that article:

Meanwhile, the billionaire libertarian and Donald Trump supporter Peter Thiel, who once gave the keynote address at an EA conference, has donated large sums of money to the Machine Intelligence Research Institute, whose mission to save humanity from superintelligent machines is deeply intertwined with longtermist values.

I was about to comment this too. From a brief skim I can't find any clarification about what the term "fellowship programs" is referring to.

Load more