Those aspects are getting weaker, but the ability for ML to models humans is getting stronger, and there are other “computer acting as salesperson” channels which don’t go through Privacy Sandbox. But probably I’m just misusing the term “ad tech” here, and “convince someone to buy something” tech might be a better term.
Saying that consequentialist theories are “often agent neutral” may only add confusion, as it’s not a part of the definition and indeed “consequentialism can be agent non-neutral” is part of what separates it from utilitarianism.
Congratulations on the switch!
I enjoyed your ads blog post, by the way. Might be fun to discuss that sometime, both because (1) I’m funded by ads and (2) I’m curious how the picture will shift as ad tech gets stronger.
Nice to see you here, Ferenc! We’ve talked before when I was at OpenAI and you Twitter, and always happy to chat if you’re pondering safety things these days.
In outer alignment one can write down a correspondence between ML training schemes that learn from human feedback and complexity classes related to interactive proof schemes. If we model the human as a (choosable) polynomial time algorithm, then
1. Debate and amplification get to PSPACE, and more generally -step debate gets to .
2. Cross-examination gets to NEXP.
3. If one allows opaque pointers, there are schemes that go further: market making gets to R.
Moreover, we informally have constraints on which schemes are practical based on prope...
That is also very reasonable! I think the important part is to not feel to bad about the possibility of never having a view (there is a vast sea of things I don't have a view on), not least because I think it actually increases the chance of getting to the right view if more effort is spent.
(I would offer to chat directly, as I'm very much part of the subset of safety close to more normal ML, but am sadly over capacity at the moment.)
Yep, that’s very fair. What I was trying to say was that if in response to the first suggestion someone said “Why aren’t you deferring to others?” you could use that as a joke backup, but agreed that it reads badly.
I think the key here is that they’ve already spent quite a lot of time investigating the question. I would have a different reaction without that. And it seems like you agree my proposal is best both for the OP and the world, so perhaps the real sadness is about the empirical difficulty at getting people to consensus?
At a minimum I would claim that there should exist some level of effort past which you should not be sad not arguing, and then the remaining question is where the threshold is.
As somehow who works on AGI safety and cares a lot about it, my main conclusion from reading this is: it would be ideal for you to work on something other than AGI safety! There are plenty of other things to work on that are important, both within and without EA, and a satisfactory resolution to “Is AI risk real?” doesn’t seem essential to usefully pursue other options.
Nor do I think this is a block to comfortable behavior as an EA organizer or role model: it seems fine to say “I’ve thought about X a fair amount but haven’t reached a satisfactory conclusi...
it would be ideal for you to work on something other than AGI safety!
I disagree. Here is my reasoning:
Thanks for giving me permission, I guess can use this if I need ever the opinion of "the EA community" ;)
However, I don't think I'm ready to give up on trying to figure out my stance on AI risk just yet, since I still estimate it is my best shot in forming a more detailed understanding on any x-risk, and understanding x-risks better would be useful for establishing better opinions on other cause priorization issues.
This is a great article, and I will make one of those spreadsheets!
Though I can't resist pointing out that, assuming you got 99.74% out of an Elo calculation, I believe the true probability of them beating you is way higher than 99.74%. :)
The issue is not the complexity, but the information content. As mentioned, n qbits can’t store more than n bits of classical information, so the best way to think of them is “n bits of information with some quantum properties”. Therefore, it’s implausible that they correspond to exponential utility.
This is somewhat covered by existing comments, but to add my wording:
It's highly unlikely that utility is exponential in quantum state, for roughly the same reason that quantum information is not exponential in quantum state. That is, if you have n qbits, you can hold n bits of classical information, not 2^n. You can do more computation with n qbits, but only in special cases.
The rest of this comment is interesting, but opening with “Ummm, what?” seems bad, especially since it takes careful reading to know what you are specifically objecting to.
Edit: Thanks for fixing!
Unfortunately we may be unlikely to get a statement from a departed safety researcher beyond mine (https://forum.effectivealtruism.org/posts/fmDFytmxwX9qBgcaX/why-aren-t-you-freaking-out-about-openai-at-what-point-would?commentId=WrWycenCHFgs8cak4), at least currently.
It can’t be up to date, since they recently announced that Helen Toner joined the board, and she’s not listed.
The website now lists Helen Toner, but do not list Holden, so it seems he is no longer on the board.
Unfortunately, a significant part of the situation is that people with internal experience and a negative impression feel both constrained and conflicted (in the conflict of interest sense) for public statements. This applies to me: I left OpenAI in 2019 for DeepMind (thus the conflicted).
He is listed in the website.
> OpenAI is governed by the board of OpenAI Nonprofit, which consists of OpenAI LP employees Greg Brockman (Chairman & CTO), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Holden Karnofsky, Reid Hoffman, Shivon Zilis, Tasha McCauley, and Will Hurd.
It might not be up to date though
I'm the author of the cited AI safety needs social scientists article (along with Amanda Askell), previously at OpenAI and now at DeepMind. I currently work with social scientists in several different areas (governance, ethics, psychology, ...), and would be happy to answer questions (though expect delays in replies).
I lead some of DeepMind's technical AGI safety work, and wanted to add two supporting notes:
This paper has at least two significant flaws when used to estimate relative complexity for useful purposes. In the authors' defense such an estimate wasn't the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn't mention the flaws.
Flaw one: no reversed control
Say we have two parameterized model classes and , and ask what ns are necessary for to approximate and to approximate . It is trivial to construct model classes for which ...
Ah, I see: you’re going to lean on the difference between “cause” and “control”. So to be clear: I am claiming that, as an empirical matter, we also can’t control the past, or even “control” the past.
To expand, I’m not using physics priors to argue that physics is causal, so we can’t control the past. I’m using physics and history priors to argue that we exist in the non-prediction case relative to the past, so CDT applies.
By “physics-based” I’m lumping together physics and history a bit, but it’s hard to disentangle them especially when people start talking about multiverses. I generally mean “the combined information of the laws of physics and our knowledge of the past”. The reason I do want to cite physics too, even for the past case of (1), is that if you somehow disagreed about decision theorists in WW1 I’d go to the next part of the argument, which is that under the technology of WW1 we can’t do the necessary predictive control (they couldn’t build deterministic twin...
As a high-level comment, it seems bad to structure the world so that the smartest people compete against each other in zero-sum games. It's definitely the case that zero-sum games are the best way to ensure technical hardness, as the games will by construction be right at the threshold of playability. But if we do this we're throwing most of the value away in comparison to working on positive-sum games.
Unfortunately, this is unlikely to be an effective use of resources (speaking as someone who worked in high-performance computing for the past 18 years). The resources you can contribute will be dwarfed by the volume and efficiency of cloud services and supercomputers. Even then, due to network constraints the only possible tasks will be embarrassingly parallel computations that do not stress network or memory, and very few scientific computing tasks have this form.
So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of "you can't influence the past with high probability", and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can't influence the past. I'm happy to elaborate.
F...
I'm not sure anyone else is going to be brave enough to state this directly, so I'll do it:
After reading some of this post (and talking to Paul a bunch and Scott a little), I remain unconfused about whether we can control the past.
If we want to include a hits-based approach to careers, but also respect people not having EA goals as the exclusive life goal, I'd have a worry that signing this pledge is incompatible with staying in a career that the EA community subsequently decides is ineffective. This could be true even if under the information known at the time of career choice the career looked like terrific expected value.
The actual wording of the pledge seems okay under this metric, as it only promises to "seek out ways to increase the impact of my career", so maybe this is fine as long as the pledge doesn't rise to "switch career" in all cases.
As someone who's worked both in ML for formal verification with security motivations in mind, and (now) directly on AGI alignment, I think most EA-aligned folk who would be good at formal verification will be close enough to being good at direct AGI alignment that it will be higher impact to work directly on AGI alignment. It's possible this would change in the future if there are a lot more people working on theoretically-motivated prosaic AGI alignment, but I don't think we're there yet.
I think that isn't the right counterfactual since I got into EA circles despite having only minimal (and net negative) impressions of EA-related forums. So your claim is narrowly true, but if instead the counterfactual was if my first exposure to EA was the EA forum, then I think yes the prominence of this kind of post would have made me substantially less likely to engage.
But fundamentally if we're running either of these counterfactuals I think we're already leaving a bunch of value on the table, as expressed by EricHerboso's post about false dilemmas.
I do too, FWIW. I read this post and its comments because I'm considering donating to/through ACE, and I wanted to understand exactly what ACE did and what the context was. Reading through a sprawling, nearly 15k-word discussion mostly about social justice and discourse norms was not conducive to that goal.
I am glad to have you around, of course.
My claim is just that I doubt you thought that if the rate of posts like this was 50% lower, you would have been substantially more likely to get involved with EA; I'd be very interested to hear I was wrong about that.
Not a non-profit, but since you mention AI and X-risk it's worth mentioning DeepMind, since program managers are core to how research is organized and led here: https://deepmind.com/careers/jobs/2390893.
5% probability by 2039 seems way too confident that it will take a long time: is this intended to be a calibrated estimate, or does the number have a different meaning?
Yep, that’s the right interpretation.
In terms of hardware, I don’t know how Chrome did it, but at least on fully capable hardware (mobile CPUs and above) you can often bitslice to make almost any circuit efficient if it has to be evaluated in parallel. So my prior is that quite general things don’t need new hardware if one is sufficiently motivated, and would want to see the detailed reasoning before believing you can’t do it with existing machines.
This is a great document! I agree with the conclusions, though there are a couple factors not mentioned which seem important:
On the positive side, Google has already deployed post-quantum schemes as a test, and I believe the test was successful (https://security.googleblog.com/2016/07/experimenting-with-post-quantum.html). This was explicitly just a test and not intended as a standardization proposal, but it's good to see that it's practical to layer a post-quantum scheme on top of an existing scheme in a deployed system. I do think if we need...
In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.
We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.
I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.
I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.
Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g., https://distill.pub/2019/safety-needs-social-scientists).
I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.
I'm curious what your experience was like when you started talking to AI safety people after already coming to come of your own conclusions. Eg I'm curious if you think that you missed major points that the AI safety people had spotted which felt obvious in hindsight, or if you had topics on which you disagreed with the AI safety people and think you turned out right.
Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.
As a meta-comment, I think it's quite unhelpful that some of these "good heuristics" are written as intentional strawmen where the author doesn't believe the assumptions hold. E.g., the author doesn't believe that there are no insiders talking about X-risk. If you're going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I'm one of the more senior AGI safety researchers.
Presumably there are two categories of heuristics, here: ones which relate to actual difficulties in discerning the ground truth, and ones which are irrelevant or stem from a misunderstanding. I think it seems bad that this list implicitly casts the heuristics as being in the latter category, and rather than linking to why each is irrelevant or a misunderstanding it does something closer to mocking the concern.
For example, I would decompose the "It's not empirically testable" heuristic into two different components. The first is something li...
“Quite possible” means I am making a qualitative point about game theory but haven’t done the estimates.
Though if one did want to do estimates, that ratio isn’t enough, as spread is superlinear as a function of the size of a group arrested and put in a single room.
Thanks, that’s all reasonable. Though to clarify, the game theory point isn’t about deterring police but about whether to let potential arrests and coronavirus consequences deter the protests themselves.
It's worth distinguishing between the protests causing spread and arresting protesters causing spread. It's quite possible more spread will be caused by the latter, and calling this spread "caused by the protests" is game theoretically similar to "Why are you hitting yourself?" My guess is that you're not intending to lump those into the same bucket, but it's worth separating them out explicitly given the title.
Aha. Well, hopefully we can agree that those philosophers are adding confusion. :)