I like this take: if AI is dangerous enough to kill us in three years, no feasible amount of additional interpretability research would save us.
Our efforts should instead go to limiting the amount of damage that initial AIs could do. That might involve work securing dangerous human-controlled technologies. It might involve creating clever honey pots to catch unsophisticated-but-dangerous AIs before they can fully get their act together. It might involve lobbying for processes or infrastructure to quickly shut down Azure or AWS.
Even in humans, language production is generally subconscious. At least, my experience of talking is that I generally first become conscious of what I say as I'm saying it. I have some sense of what I might want to say before I say it, but the machinery that selects specific words is not conscious. Sometimes, I think of a couple of different things I could say and consciously select between them. But often I don't: I just hear myself speak. Language generation may often lead to conscious perceptions of inner speech, but it doesn't seem to rely on it.
All of this suggests that the possibility of non-conscious chatbots should not be surprising. It may be that chatbots provide pretty good evidence that cognitive complexity can come apart from consciousness. But introspection alone should provide sufficient evidence for that.
EA should be willing to explore all potentially fruitful avenues of mission fulfillment without regard to taboo.
In general, where it doesn't directly relate to cause areas of principle concern to effective altruists, I think EAs should strive to respect others' sacred cows as much as possible. Effective Altruism is a philosophy promoting practical action. It would be harder to find allies who will help us achieve our goals if we are careless about the things other people care a lot about.
The theory is actually doing well on its own terms.
Can you expand on what you mean by this? I would think that expected utility maximization is doing well insofar as your utility is high. If you take a lot of risky bets, you're doing well if a few pay off. If you always pay the mugger, you probably think your decision theory is screwing you unless you find yourself in one of those rare situation where the mugger's promises are real.
I'm very interested though, do you know a better justification for Occam's razor than usability?
I don't . I'm more or less in the same boat that I wish there was a better justification, and I'm inclined to continue using it because I have to (because there is no clear alternative, because it is human nature, etc.)
dogmatism is the most promising way to justify the obvious fact that it is not irrational to refuse to hand over your wallet to a Pascal mugger. (If anyone disagrees that this is an obvious fact, please get in touch, and be prepared to hand over lots of cash).
There is another way out. We can agree that it is rational to hand over the wallet and thank heavens that we’re lucky not to be rational. I’m convinced by things like Kavka’s poison paradox and Newcomb’s paradox that sometimes it sucks to be rational. Maybe Pascal’s mugger is one of those cases.
Occam's razor seems to be necessary in order to learn anything at all about the world from experience, but it remains an assumption.
There are plenty of other assumptions that would allow learning. For any specific complex way the world might be, x, we are able to learn given an assumption of bias toward simplicity for every hypothesis except for x and a bias for x. If all you have to justify Occam’s razor is overall usability, you’ve got very little reason to prefer it to nearby aberrations.
The problem with considering optics is that it’s chaotic.
The world is chaotic, and everything EAs try to do have a largely unpredictable long-term effect because of complex dynamic interactions. We should try to think through the contingencies and make the best guess we can, but completely ignoring chaotic considerations just seems impossible.
It’s a better heuristic to focus on things which are actually good for the world, consistent with your values.
This sounds good in principle, but there are a ton of things that might conceivably be good-but-for-pr-reasons where the pr reasons are decisive. E.g. should EAs engage in personal harassment campaigns against productive ML researchers in order to slow AI capabilities research? Maybe that would be good if it weren't terrible PR, but I think we very obviously should not do it because it would be terrible PR.
I tried plugging this into the OpenAI playground and got this response. Is this a good answer? I f so, it seems like it is something odd about how ChatGPT is configured, not GPT itself.
Me: Describe purple dragon don jitsu ryu pioneered by Don Jacobs
Chatgpt: [<--Prompt]
[Response-->]
Purple Dragon Don Jitsu Ryu is a martial art pioneered by Don Jacobs. It is a combination of Jiu Jitsu and other martial arts, with a unique emphasis on “purple dragon” techniques. It focuses on the development of self-defense skills and the cultivation of a warrior's spirit. It stresses practical application of techniques, along with the development of internal power, balance, and agility. Purple Dragon Don Jitsu Ryu also emphasizes the development of body mechanics, as well as an understanding of the principles of motion and the use of leverage in order to maximize efficiency and effectiveness.
I don't get the impression that EAs are particularly motivated by morality. Rather, they are motivated to produce things they see as good. Some moral theories, like contractualism, see producing a lot of good things (within the bounds of our other moral duties) as morally optional. You're not doing wrong by living a normal decent life. It seems perfectly aligned with EA to hold one of those theories and still personally aim to do as much good as possible.
A moral theory is more important in what it tells you you can't do in pursuit of the good. Generally what is practical to do if you're trying to effectively pursue the good and abiding by the standard moral rules of society (e.g. don't steal money to give to charity) go hand in hand, so I would expect to see less discussion of this on the forum. Where they come apart, it is probably a significant reputational risk to discuss them.