How “natural” are intended generalizations (like “Do what the supervisor is hoping I’ll do, in the sense that most humans would mean this phrase rather than in a precise but malign sense”) vs. unintended ones (like “Do whatever maximizes reward”)?
I think this is an important point. I consider the question in this paper, published last year at AI Magazine. See the "Competing Models of the Goal" section, and in particular the "Arbitrary Reward Protocols" subsection. (2500 words)
I think there's something missing from the discussion here, which the key point o...
This is very high-quality. No disputes just clarifications.
I don’t just mean meta-orgs.
I think working for a well-financed grantmaking organization is not outrageously unconventional, although I suspect most lean on part-time work from well-respected academics more than OpenPhil does.
And I think 80k may just be an exception (a minor one, to some extent), borne out of an unusually clear gap in the market. I think some of their work should be done in academia instead (basically whatever work it’s possible to do), but some of the very specific stuff like the ...
Thank you for the edit, and thank you again for your interest. I'm still not sure what you mean by a person "having access to the ground truth of the universe". There's just no sense I can think of where it is true that this a requirement for the mentor.
"The system is only safe if the mentor knows what is safe." It's true that if the mentor kills everyone, then the combined mentor-agent system would kill everyone, but surely that fact doesn't weight against this proposal at all. In any case, more importantly a) the agent will not aim to kill everyone regar...
Robin Hanson didn't occur to me when I wrote it or any of the times I read it! I was just trying to channel what I thought conventional advice would be.
So basically, just philosophy, math, and some very simple applied math (like, say, the exponential growth of an epidemic), but already that last example is quite shaky.
In fields where it's possible to make progress with first-principles arguments/armchair reasoning, I think smart non-experts stand a chance of outperforming. I don't want to make strong claims about the likelihood of success here; I just want to say that it's a live possibility. I am much more comfortable saying that outperforming conventional wisdom is extremely unlikely on topics where first-principles arguments/armchair reasoning are insufficient.
(As it happens, EAs aren't really disputing the experts in philosophy, but that's beside the point...)
academics themselves have criticized the peer review system a great deal for various reasons, including predatory journals, incentive problems, publication bias, Why Most Published Research Findings Are False, etc
I think we could quibble on the scale and importance of all of these points, but I'm not prepared to confidently deny any of them. The important I want to make is: compared to what alternatives? The problem is hard, and even the best solution can be expected to have many visible imperfections. How persuaded you be by a revoluationary enumerating t...
What "major life goals should include (emphasis added)" is not a sociological question. It is not a topic that a sociology department would study. See my comment that I agree "conventional wisdom is wrong" in dismissing the philosophy of effective altruism (including the work of Peter Singer). And my remark immediately thereafter: "Yes, these are philosophical positions, not sociological ones, so it is not so outrageous to have a group of philosophers and philosophically-minded college students outperform conventional wisdom by doing first-principles...
Upvoted this.
You generally shouldn't take Forum posts as seriously as peer-reviewed papers in top journals
I suspect I would advise taking them less seriously than you would advise, but I'm not sure.
It could also imply that EA should have fewer and larger orgs, but that's a question too complicated for this comment to cover
I think there might be a weak conventional consensus in that direction, yes. By looking at the conventional wisdom on this point, we don't have deal with the complicatedness of the question--that's kind of my whole point. But even more im...
I'm someone who has read your work (this paper and FGOIL, the latter of which I have included in a syllabus), and who would like to see more work in similar vein, as well as more formalism in AI safety. I say this to establish my bona fides, the way you established your AI safety bona fides.
Thanks! I should have clarified it has received some interest from some people.
...you don't show that "when a certain parameter of a certain agent is set sufficiently high, the agent will not aim to kill everyone", you show something more like "when you can desig
I'd love to chat with you about directions here, if you're interested. I don't know anyone with a bigger value of p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I've published technical AI existential safety research at top ML conferences/journals, and I've gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at michael.cohen@eng.ox.ac.uk.
Glad to hear about this!
I have a recommendation for the structure of it. I'd recommend that anonymous reviewers review submissions and share their reviews with the authors (perhaps privately) before a rebuttal phase (also perhaps private). And then reviewers can revise their reviews, and then chairs can make judgments about which submissions to publish.
I constructed an agent where you can literally prove that if you set a parameter high enough, it won't try to kill everyone, while still eventually at least matching human-level intelligence. Sure it uses a realizability assumption, sure it's intractable in its current form, sure it might require an enormously long training period, but these are computer science problems, not philosophy problems, and they clearly suggest paths forward. The underlying concept is sound. It struck me as undignified to say this in the past, but maybe dignity rightly construed ...
Here are some of mine to add to Vanessa's list.
One on imitation learning. [Currently an "accept with minor revisions" at JMLR]
One on conservatism in RL. A special case of Vanessa's infra-Bayesianism. [COLT 2020]
One on containment and myopia. [IEEE]
Among many things I agree with, the part I agree the most with:
EAs give high credence to non-expert investigations written by their peers, they rarely publish in peer-review journals and become increasingly dismissive of academia
I think a fair amount of the discussion of intelligence loses its bite if "intelligence" is replaced with what I take to be its definition: "the ability to succeed a randomly sampled task" (for some reasonable distribution over tasks). But maybe you'd say that perceptions of intelligence in the EA community...
But that mechanism for belief transmission within EA, i.e. object-level persuasion, doesn't run afoul of your concerns about echochamberism, I don't think.
Getting too little exposure to opposing arguments is a problem. Most arguments are informal so not necessarily even valid, and even for the ones that are, we can still doubt their premises, because there may be other sets of premises that conflict with them but are at least as plausible. If you disproportionately hear arguments from a given community, you're more likely than otherwise to be biased towards the views of that community.
it doesn't follow that it's a good investment overall
Yes, it doesn't by itself--my point was only meant as a counterargument to your claim that the efficient market hypothesis precluded the possibility of political donations being a good investment.
Well, there are >100 million people who have to join some constituency (i.e. pick a candidate), whereas potential EA recruits aren't otherwise picking between a small set of cults philosophical movements. Also, AI PhD-ready people are in much shorter supply than, e.g. Iowans, and they'd be giving up much much much more than someone just casting a vote for Andrew Yang.
we've had two presidents now who actively tried to counteract mainstream views on climate-change, and they haven't budged climate scientists at all.
I have updated in your direction.
Of course, AI alignment is substantially more scientifically accepted and defensible than climate skepticism.
Yep.
You only mean this as a possibility in the future, if there is any point where AGI is believed to be imminent, right?
No I meant starting today. My impression is that coalition-building in Washington is tedious work. Scientists agreed to avoid gene editing in...
That is plausible. But "definitely" definitely wouldn't be called for when comparing Yang with Grow EA. How many EA people who could be sold on an AI PhD do you think could recruited with $20 million?
The other thing is that in 20 years, we might want the president on the phone with very specific proposals. What are the odds they'll spend a weekend discussing AGI with Andrew Yang if Yang used to be president vs. if he didn't?
But as for what a president could actually do: create a treaty for countries to sign that ban research into AGI. Very few researchers are aiming for AGI anyway. Probably the best starting point would be to get the AI community on board with such a thing. It seems impossible today that consensus could be built about such a ...
If you're super focused on that issue, then it will definitely be better to spend your money on actual AI research, or on some kind of direct effort to push the government to consider the issue (if such an effort exists).
I am, and that's what I'm wondering. The "definitely" isn't so obvious to me. Another $20 million to MIRI vs. an increase in the probability of Yang's presidency by, let's say, 5%--I don't think it's clear cut. (And I think MIRI is the best place to fund research).
Is your claim that AI policy is currently talent-constrained, and having Yang as president would lead to more people working on it, thereby making it money-constrained?
No--just that there's perhaps a unique opportunity for cash to make a difference. Otherwise, it seems like orgs are struggling to spend money to make progress in AI policy. But that's just what I hear.
Can you elaborate on this?
First pass: power is good. Second pass: get practice doing things like autonomous weapons bans, build a consensus around getting countries to agree to intern...
Additionally, Morning Consult shows higher support than all other pollsters. The average for Steyer in early states is considerably less favorable.
Good to know.
Steyer is running ads with little competition
Really?
I am in general more trusting, so I appreciate this perspective. I know he's a huge fan of Sam Harris and has historically listened to his podcast, so I imagine he's head Sam's thoughts (and maybe Stuart Russell's thoughts) on AGI.
The stake of the public good in any given election is much larger than the stake of any given entity, so the correct amount for altruists to invest in an election should be much larger than for a self-interested corporation or person.
not that he single-handedly caused Trump's victory.
Didn't claim this.
This is naive.
Not sure what this adds.
MIRI's current size seems to me to be approximately right for this purpose, and as far as I know MIRI staff don't think MIRI is too small to continue making steady progress.
My guess is that this intuition is relatively inelastic to MIRI's size. It might be worth trying to generate the counterfactual intuition here if MIRI were half its size or double its size. If that process outputs a similar intuition, it might be worth attempting to forget how many people MIRI employs in this area, and ask how many people should be working on a topic that by your est...
Do you have a minute to react to this? Are you satisfied with my response?