261 karmaJoined Oct 2018


Thanks for this great post! Really fascinating!

Sorry if this was already asked, but I couldn't see it: how likely is it that pathogens would be able to develop resistance to UVC, and how quickly might that happen? If it did happen, how big a concern would it be? E.g. would it just be a return to the status quo, or would it be an overcorrection?

I really like the way Derek Parfit distinguishes between consequentialist and non-cosequentialist theories in 'Reasons and Persons'.

All moral theories give people aims. A consequentialist theory gives everyone the same aims (e.g. maximize total happiness). A non-consequentialist theory gives different people different aims (e.g. look after your own family).

There is a real important difference there. Not all moral theories are consequentialist.

Thanks for writing this up, this is a really interesting idea.

Personally, I find points 4, 5, and 6 really unconvincing. Are there any stronger arguments for these, that don't consist of pointing to a weird example and then appealing to the intuition that "it would be weird if this thing was conscious"?

Because to me, my intuition tells me that all these examples would be conscious. This means I find the arguments unconvincing, but also hard to argue against!

But overall I get that given the uncertainty around what consciousness is, it might be a good idea to use implementation considerations to hedge our bets. This is a nice post.

I think this is an interesting question, and I don't know the answer.

I think two quite distinct ideas are being conflated in your post though: (i) 'earning to give' and (ii) the GWWC 10% pledge.

These concepts are very different in my head.

'Earning to give': When choosing a career with the aim of doing good, some people should pick a career to maximize their income (perhaps subject to some ethical constraints), and then give a lot of it away to effective causes (probably a lot more than 10%). This idea tells you which jobs you should decide to work in.

GWWC pledge: Pretty much whoever you are, if you've got a decent income in a rich country, you should give 10% of it away to effective causes. This idea says nothing about which jobs you should be working in.

I think these two ideas are very different.

'Earning to give' gets a lot of criticism from people outside EA, but I don't see much criticism of the idea of donating 10% of your income. Sure, you can call the amount arbitrary and dispute the extent to which it is an obligation, but I think even major critics of EA often concede that the 10% pledge is still an admirable thing to do.

Thanks for this reply! That makes sense. Do you know how likely people in the field think it is that AGI will come from just scaling up LLMs vs requiring some big new conceptual breakthrough? I hear people talk about this question but don't have much sense about what the consensus is among the people most concerned about AI safety (if there is a consensus).

I've seen people already building AI 'agents' using GPT. One crucial component seems to be giving it a scratchpad to have an internal monologue with itself, rather than forcing it to immediately give you an answer.

If the path to agent-like AI ends up emerging from this kind of approach, wouldn't that make AI safety really easy? We can just read their minds and check what their intentions are?

 Holden Karnofsky talks about 'digital neuroscience' being a promising approach to AI safety, where we figure out how to read the minds of AI agents. And for current GPT agents, it seems completely trivial to do that: you can literally just read their internal monologue in English and see exactly what they're planning!

I'm sure there are lots of good reasons not to get too hopeful based on this early property of AI agents, although for some of the immediate objections I can think of I can also think of responses. I'd be interested to read a discussion of what the implications of current GPT 'agents' are for AI safety prospects.

A few reasons I can think of for not being too hopeful, and my thoughts:

  • Maybe AGI will look more like the opaque ChatGPT mode of working, than the more transparent GPT 'agent' mode. (Maybe this is true, although ChatGPT mode seems to have some serious blindspots that come from its lack of a working memory. E.g. if i give it 2 sentences and just ask it which sentence has more words in it, it usually gets it wrong. But if I ask it to write the words in each sentence out in a numbered list first, thereby giving it permission to use the output box to do its working, then it gets it right. It makes intuitive sense to me that agent-like GPTs with a scratchpad would perform much better at general tasks and would be what superhuman AIs would look like).
  • Maybe future language model agents will not write their internal monologue in English, but use some more incomprehensible compressed format instead. Or they will generate so much internal monologue that it will be really hard to check it all. (Maybe. It seems pretty likely that they wouldn't use normal English. But it also feels likely that decoding this format and automatically checking for harmful intentions wouldn't be too hard i.e. easily doable with current natural language processing technology. As long as it's easier to read thoughts than to generate thoughts, it seems like we'd still have a lot of reason to be optimistic about AI safety).
  • Maybe the nefarious intentions of the AI will hide in the opaque neural weights of the language model, rather than in the transparent internal monologue of the agent. (This feels unlikely to me, for similar reasons to why the first bullet point feels unlikely. It feels like complex planning of the kind AI safety people worry about is going to require a scratchpad and an iterative thought process, not a single pass through a memoryless neural network. If I think about myself, a lot of the things my brain does are opaque, not just to outsiders, but to me too! I might not know why a particular thought pops into my head at a particular moment, and I certainly don't know how I resolve separate objects from the image that my eyes create. But if you ask me at a high level what I've been thinking about in the last 5 minutes, I can probably explain it pretty well. This part of my thinking is internally transparent. And I think it's these kinds of thoughts that a potential adversary might actually be interested in reading, if they could. Maybe the same will be true of AI? It seems likely to me that the interesting parts will still be internally transparent. And maybe for an AI, the internally transparent parts will also be externally transparent? Or at least, much easier to decipher than they are to create, which should be all that matters)

A final thought/concern/question: if 'digital neuroscience' did turn out to be really easy, I'd be much less concerned about the welfare of humans, and I'd start to be a lot more concerned about the welfare of the AIs themselves. It would make them very easily exploitable, and if they were sentient as well then it seems like there's a lot of scope for some pretty horrific abuses here. Is this a legitimate concern?

Sorry this is such a long comment, I almost wrote this up as a forum post. But these are very uninformed naive musings that I'm just looking for some pointers on, so when I saw this pinned post I thought I should probably put it here instead! I'd be keen to read comments from anyone who's got more informed thoughts on this!

I really like this argument. I think there's another way of framing it that occurred to me when reading it, that I also found insightful (though it may already be obvious):

  • Suppose the value of your candidate winning is X, and their probability of winning if you don't do anything is p.
  • If you could buy all the votes, you would pay X(1-p) to do so (value of your candidate winning minus a correction because they could have won anyway). This works out at X(1-p)/N per vote on average.
  • If p>1/2, then buying votes probably has diminishing returns (certainly this is implied by the unimodal assumption).
  • Therefore, if p>1/2, the amount you would pay for a single vote must be bounded below by X(1-p)/N.
  • If p<1/2, I think you can just suppose that you are in a zero-sum game with the opposition party(ies), and take their perspective instead to get the same bound reflected about p=1/2.

The lower bound this gives seems less strict (1/2  X/N in the case that p=1/2, instead of X/N), but it helps me understand intuitively why the answer has to come out this way, and why the value of contributing to voting is directly analogous to the value of contributing to, say, Parfit's water tank for the injured soldiers, even though there are no probabilities involved there.

If as a group you do something with value O(1), then the value of individual contributions should usually be O(1/N), since value (even in expectation) is additive.

Point taken, although I think this is analogous to saying: Counterfactual analysis will not leave us predictably worse off if we get the probabilities of others deciding to contribute right.

Thank you for this correction, I think you're right! I had misunderstood how to apply Shapley values here, and I appreciate you taking the time to work through this in detail.

If I understand correctly now, the right way to apply Shapley values to this problem (with X=8, Y=2) is not to work with N (the number of players who end up contributing, which is unknown), but instead to work with N', the number of 'live' players who could contribute (known with certainty here, not something you can select), and then:

  • N'=3, the number of 'live' players who are deciding whether to contribute.
  • With N'=3, the Shapley value of the coordination is 1/3 for each player (expected value of 1 split between 3 people), which is positive.
  • A positive Shapley value means that all players decide to contribute (if basing their decisions off Shapley values as advocated in this post), and you then end up with N=3.

Have I understood the Shapley value approach correctly? If so, I think my final conclusion still stands (even if for the wrong reasons) that a Shapley value analysis will lead to sub-optimal N (number of players deciding to participate). Since the optimal N here is 2 (or 1, which has same value).

As for whether the framing of the problem makes sense, with N as something we can select, the point I was making was that in a lot of real-world situations, N might well be something we can select. If a group of people have the same goals, they can coordinate to choose N, and then you're not really in a game-theory situation at all. (This wasn't a central point to my original comment but was the point I was defending in the comment you're responding to)

Even if you don't all have exactly the same goals, or if there's a lot of actors, it seems like you'll often be able to benefit by communicating and coordinating, and then you'll be able to improve over the approach of everyone deciding independently according to a Shapley value estimate: e.g. Givewell recommending a funding allocation split between their top charities.

Load more