All of Ben Garfinkel's Comments + Replies

On Deference and Yudkowsky's AI Risk Estimates

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would ei... (read more)

Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).

I appreciate this update! 

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

I am confused about you bringing in the claim of "at each stage of his career", given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence t... (read more)

I noted some places I agree with your comment here, Ben. (Along with my overall take on the OP.)

Some additional thoughts:

Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’... (read more)

I'm a bit confused about a specific small part:

tendency toward expressing dramatic views

I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.

 

In other words, I think "working at a 2nd x-risk job and believing it is very important" is mainly predicted by "working at a 1st x-risk job and believing it is very important", much more than by personality t... (read more)

5Dr. David Mathers9d
'Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities.' I think some of this is just a result of being a community founded partly by analytic philosophers. (though as a philosopher I would say that!). I think it's normal to encounter some of these ideas in undergrad philosophy programs. At my undergrad back in 2005-09 there was a whole upper-level undergraduate course in decision theory. I don't think that's true everywhere all the time, but I'd be surprised if it was wildly unusual. I can't remember if we covered population ethics in any class, but I do remember discovering Parfit on the Repugnant Conclusion in 2nd-year of undergrad because one of my ethics lecturers said Reasons and Persons was a super-important book. In terms of the Oxford phil scene where the term "effective altruism" was born, the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it. Most of the phil. physics people at Oxford were gung-ho for many worlds, it's not a fringe view in philosophy of physics as far as I know. (Though I think Oxford was kind of a centre for it and there was more dissent elsewhere.) As far as I can tell, Bayesian epistemology in at least some senses of that term is a fairly well-known approach in philos
6ekka9d
For what it's worth, I found this post and the ensuing comments very illuminating. As a person relatively new to both EA and the arguments about AI risk, I was a little bit confused as to why there was not much push back on the very high confidence beliefs about AI doom within the next 10 years. My assumption had been that there was a lot of deference to EY because of reverence and fealty stemming from his role in getting the AI alignment field started not to mention the other ways he has shaped people's thinking. I also assumed that his track record on predictions was just ambiguous enough for people not to question his accuracy. Given that I don't give much credence to the idea that prophets/oracles exist, I thought it unlikely that the high confidence on his predictions were warranted on the count that there doesn't seem to be much evidence supporting the accuracy of long range forecasts. I did not think that there were such glaring mispredictions made by EY in the past so thank you for highlighting them.
8Verden9d
I feel like people are missing one fairly important consideration when discussing how much to defer to Yudkowsky, etc. Namely, I've heard multiple times that Nate Soares, the executive director of MIRI, has models of AI risk that are very similar to Yudkowsky's, and their p(doom) are also roughly the same. My limited impression is that Soares is no less smart or otherwise capable than Yudkowsky. So, when having this kind of discussion, focusing on Yudkowsky's track record or whatever, I think it's good to remember that there's another very smart person, who entered AI safety much later than Yudkowsky, and who holds very similar inside views on AI risk.

I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate o

... (read more)
On Deference and Yudkowsky's AI Risk Estimates

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in ... (read more)

I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him. 

(To my knowledge. I think... (read more)

On Deference and Yudkowsky's AI Risk Estimates

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely bec

... (read more)

I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, ... (read more)

On Deference and Yudkowsky's AI Risk Estimates

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-t... (read more)

Eliezer writes a bit about his early AI timeline and nanotechnology opinions here, though it sure is a somewhat obscure reference that takes a bunch of context to parse:  

Luke Muehlhauser reading a previous draft of this (only sounding much more serious than this, because Luke Muehlhauser):  You know, there was this certain teenaged futurist who made some of his own predictions about AI timelines -

Eliezer:  I'd really rather not argue from that as a case in point.  I dislike people who screw up something themselves, and then argue like

... (read more)
On Deference and Yudkowsky's AI Risk Estimates

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point.

I definitely do agree with that!

It's possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top.

If it's of interest: I say a little more about how I think about this, in response to Gwern's comment ... (read more)

On Deference and Yudkowsky's AI Risk Estimates

What?

I interpreted Gwern as mostly highlighting that people have updated toward's Yudkowsky's views - and using this as evidence in favor of the view we should defer a decent amount to Yudkowsky. I think that was a reasonable move.

There is also a causal question here ('Has Yudkowsky on-net increased levels of concern about AI risk relative to where they would otherwise be?'), but I didn't take the causal question to be central to the point Gwern was making. Although now I'm less sure.

I don't personally have strong views on the causal question - I haven't thought through the counterfactual.

On Deference and Yudkowsky's AI Risk Estimates

On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his cu

... (read more)

One quick response, since it was easy (might respond more later): 

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don't think there are many who believe that that is going to happen. 

I don't think the distinction between 1 week and 1 year is that ... (read more)

On Deference and Yudkowsky's AI Risk Estimates

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said.

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don't think this case implies we should currently defer more to Yudkwosky's risk estimates than we do to Karnofsky's.)

0Charles He11d
Ugh. Y'all just made me get into "EA rhetoric" mode: What? No. Not only is this not true but this is indulging in a trivial rhetorical maneuver. My comment said that the counterfactual would be better without the involvement of the person mentioned in the OP. I used the retrospective as evidence. The retrospective includes at least two points for why the author changed their mind: 1. The book Superintelligence, which they explicitly said was the biggest event 2. The author moved to SF and learned about DL, and was informed by speaking to non-rationalist AI researchers, and then decided that LessWrong and MIRI were right. In response to this, Gwern states the point #2, and asserts that this is causal evidence in favor of the person mentioned in the OP being useful. Why? How? Notice that #2 above doesn't at all rule out that the founders or culture was repellent. In fact it seems like a lavish, and unlikely level amount of involvement.
On Deference and Yudkowsky's AI Risk Estimates

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seeming... (read more)

On Deference and Yudkowsky's AI Risk Estimates

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectuall... (read more)

On Deference and Yudkowsky's AI Risk Estimates

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influ... (read more)

(I hadn't seen this reply when I made my other reply).

What do you think of legitimising behaviour that calls out the credibility of other community members in the future?

I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...

It feels like a worse epistemic culture.

We should expect to worry more about speculative risks

However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research).

The story does require there to be only a very limited number of arms that we initially think have a non-negligible chance of paying. If there are unlimited arms, then one of them should be both paying and easil... (read more)

Ben Garfinkel's Shortform

A follow-on:

The above post focused on the idea that certain traits -- reflectiveness and self-skepticism -- are more valuable in the context of non-profits (especially ones long-term missions) than they are in the context of startups.

I also think that certain traits -- drivenness, risk-tolerance, and eccentricity-- are less valuable in the context of non-profits than they are in the context of startups.

Hiring advice from the startup world often suggests that you should be looking for extraordinarily driven, risk-tolerant people with highly idiosyncratic pe... (read more)

We should expect to worry more about speculative risks

The bandit problem is definitely related, although I'm not sure it's the best way to formulate the situation here. The main issue is that the bandit formulation, here, treats learning about the magnitude of a risk and working to address the risk as the same action - when, in practice, they often come apart.

Here's a toy model/analogy that feels a bit more like it fits the case, in my mind.

Let's say there are two types of slot machines: one that has a 0% chance of paying and one that has a 100% chance of paying. Your prior gives you a 90% credence that each ... (read more)

2RyanCarey1mo
Interesting, that makes perfect sense. However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research). Another way to explain why the uncertain risks look big would be that we are unable to stop society pulling the AI progress lever until we have proven it to be dangerous. Definitely risky activities just get stopped! Maybe that's implicitly how your model gets the desired result.
2RyanCarey1mo
Interesting, that makes perfect sense. However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and forget about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research). It seems you need an additional element where society is unable to stop itself pulling the AI progress lever...
Ben Garfinkel's Shortform

Couldn't the exact same arguments be made to argue that there would not be successful internet companies, because the fundamental tech is hard to patent, and any website is easy to duplicate?

Definitely!

(I say above that the dynamic applies to "most software," but should have said something broader to make it clear that it also applies to any company whose product - basically - is information that it's close to costless to reproduce/generate. The book Information Rules is really good on this.)

Sometimes the above conditions hold well enough for people to ... (read more)

4RyanCarey1mo
1. Ah, you do say that. Serves me right for skimming! 2. To start, you could have a company for each domain area that an AI needs to be fine-tuned, marketed, and adapted to meet any regulatory requirements. Writing advertising copy, editing, insurance evaluations, etc. 3. As for the foundation models themselves, I think training models is too expensive to go back to academia as you suggest. And I think that there are some barriers to getting priced down. Firstly, when you say you need "patents or very-hard-to-learn-or-rediscover trade secrets ", does the cost of training the model not count? It is a huge barrier. There are also difficulties in acquiring AI talent. And future patents seem likely. We're already seeing a huge shift with AI researchers leaving big tech for startups, to try to capture more of the value of their work, and this shift could go a lot further.
Ben Garfinkel's Shortform

It’ll be interesting to see how well companies will be able to monetise large, multi-purpose language and image-generation models.

Companies and investors are spending increasingly huge amounts of money on ML research talent and compute, typically with the hope that investments in this area lead to extremely profitable products. But - even if the resulting products are very useful and transformative - it still seems like it's still a bit of an open question how profitable they’ll be.

Some analysis:[1]

1.

Although huge state-of-the-art models are increasingly c... (read more)

9Tamay7d
This is insightful. Some quick responses: * My guess would be that the ability to commercialize these models would strongly hinge on the ability for firms to wrap these up with complementary products, that would contribute to an ecosystem with network effects, dependencies, evangelism, etc. * I wouldn't draw too strong conclusions from the fact that the few early attempts to commercialize models like these, notably by OpenAI, haven't succeeded in creating the preconditions for generating a permenant stream of profits. I'd guess that their business models look less-than-promising on this dimension because (and this is just my impression) they've been trying to find product-market-fit, and have gone lightly on exploiting particular fits they found by building platforms to service these * Instead, better examples of what commercialization looks like are GPT-3-powered companies [https://nogood.io/2021/06/25/gpt-3-tools/], like copysmith [https://copysmith.ai/], which seem a lot more like traditional software businesses with the usual tactics for locking users in, and creating network effects and single-homing behaviour * I expect that companies will have ways to create switching costs for these models that traditional software product don't have. I'm particularly interested in fine-tuning as a way to lock-in users by enabling models to strongly adapt to context about the users' workloads. More intense versions of this might also exist, such as learning directly from individual customer's feedback through something like RL [https://arxiv.org/abs/2009.01325]. Note that this is actually quite similar to how non-software services create loyalty I agree that it seems hard to commercialize these models out-of-the-box with something like paid API access, but I expect that to be superseded by better strategies pretty soon.
4RyanCarey1mo
Couldn't the exact same arguments be made to argue that there would not be successful internet companies, because the fundamental tech is hard to patent, and any website is easy to duplicate? But this just means that instead of monetising the bottom layer of tech (TCP/IP, or whatever), they make their billions from layering needed stuff on top - search, social network, logistics.
2Charles He1mo
This was excellent!
We should expect to worry more about speculative risks

This is a helpful comment - I'll see if I can reframe some points to make them clearer.

Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default.

I'm actually not assuming human psychology is flawed. The post is meant to be talking about how a rational person (or, at least, a boundedly rational person) should update their views.

On the probabilities: I suppose I'm implicitly evoking both a subjective notion of probability ("What's a reasonable credence to assign to X h... (read more)

Ben Garfinkel's Shortform

Good points - those all seem right to me!

Ben Garfinkel's Shortform

A point about hiring and grantmaking, that may already be conventional wisdom:

If you're hiring for highly autonomous roles at a non-profit, or looking for non-profit founders to fund, then advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.[1]

Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it's way ha... (read more)

3Ben Garfinkel1mo
A follow-on: The above post focused on the idea that certain traits -- reflectiveness and self-skepticism -- are more valuable in the context of non-profits (especially ones long-term missions) than they are in the context of startups. I also think that certain traits -- drivenness, risk-tolerance, and eccentricity-- are less valuable in the context of non-profits than they are in the context of startups. Hiring advice from the startup world often suggests that you should be looking for extraordinarily driven, risk-tolerant people with highly idiosyncratic perspectives on the world.[1] [#fn-SF9MDzgKmh9Xk3wuQ-1] And, in the context of for-profit startups, it makes sense that these traits would be crucial. A startup's success will often depend on its ability to outcompete large, entrenched firms in some industry (e.g. taxi companies, hotels, tech giants). To do that, an extremely high level of drivenness may be necessary to compensate for lower resource levels, lower levels of expertise, and weaker connections to gatekeepers. Or you may need to be willing to take certain risks (e.g. regulatory/PR/enemy-making risks) that would slow down existing companies in pursuing certain opportunities. Or you may need to simply see an opportunity that virtually no one else would (despite huge incentives to see it), because you have an idiosyncratic way of seeing the world. Having all three of these traits (extreme drivenness, risk tolerance, idiosyncrasy) may be necessary for you to have any plausible chance of success. I think that all of these traits are still valuable in the non-profit world, but I also think they're comparatively less valuable (especially if you're lucky enough to have secure funding). There's simply less direct competition in the non-profit world. Large, entrenched non-profits also have much weaker incentives to find and exploit impact opportunities. Furthermore, the non-profit world isn't even that big to begin with. So there's no reason to assume all t
5Alexis Carlier1mo
I think that’s mostly right, with a couple of caveats: * You only mentioned non-profits, but I think most of this applies to other longtermists organizations with pretty illegible missions. Maybe Anthropic is an example. * Some organizations with longtermists missions should not aim to maximise something particularly illegible. In these cases, entrepreneurialism will often be very important, including in highly autonomous roles. For example, some biosecurity organization could be trying to design and produce, at very large scales, “Super PPE [https://forum.effectivealtruism.org/posts/u5JesqQ3jdLENXBtB/concrete-biosecurity-projects-some-of-which-could-be-big-1#Super_PPE] ”, such as masks, engineered with extreme events in mind. * Like SpaceX, which initially aimed to significantly reduce the cost, and improve the supply, of routine space flight, the Super PPE project would need to improve upon existing PPE designed for use in extreme events, which is “ bulky, highly restrictive, and insufficiently abundant”. (Alvea might be another example, but I don’t know enough about them). * This suggests a division of labour where project missions are defined by individuals outside the organization, as with Super PPE, before being executed by others, who are high on entrepreneurialism. Note that, in hiring for leadership roles in the organization, this will mean placing more weight on entrepreneurialism than on self-skepticism and reflectiveness. While Musk did a poor job defining SpaceX's mission, he did an excellent job executing it. This seems true. It also suggests that if you can be extremely high on both traits, you’ll bring significant counterfactual value.
Simulation argument?

I think most people would probably regard the objection as a nitpick (e.g. "OK, maybe the Indifference Principle isn't actually sufficient to support a tight formal argument, and you need to add in some other assumption, but the informal version if the argument is just pretty clearly right"), feel the objection has been successfully answered (e.g. find the response in the Simulation Argument FAQ more compelling than I do), or just haven't completely noticed the potential issue.

I think it's still totally reasonable for the paper to have passed peer review. ... (read more)

Simulation argument?

To be clear, I'm not saying the conclusion is wrong - just that the explicit assumptions the paper makes (mainly the Indifference Principle) aren't sufficient to imply its conclusion.

The version that you've just presented isn't identical to the one in Bostrom's paper -- it's (at least implicitly) making use of assumptions beyond the Indifference Principle. And I think it's surprisingly non-trivial to work out exactly how to formalize the needed assumptions, and make the argument totally tight, although I'd still guess that this is ultimately possible.[1]


... (read more)
1Leo1mo
My version tried to be an intuitive simplification of the core of Bostrom's paper. I actually don't identify these assumptions you mention. If you are right, I may have presupposed them while reading the paper, or my memory may be betraying me for the sake of making sense of it. Anyway, I really appreciate you took the time to comment.
Simulation argument?

I'm trying to understand the simulation argument. I think Bostrom uses the Indifference Principle (IP) in a weird way. If we become a posthuman civilization that runs many many simulations of our ancestors (meaning us), then how does the IP apply? It only applies when one has no other information to go on. But in this case, we do have some extra information -- crucial information! I.e., we know that we are not in any of the simulations that we have produced. Therefore, we do not have any statistical reason to believe that we are simulated.

I agree that t... (read more)

1Alex Williams1mo
Ok, thank you very much. But why then do so many people take the argument seriously? Is it surprising that the peer reviewed process didn't pick up this problem?
1Alex Williams1mo
Ok, thank you very much. But why then do so many people take the argument seriously?
1Leo1mo
I would like to understand how that is a valid objection, because I honestly don't see it. To simplify a bit, if you think that 1 ('humanity won't reach a posthuman stage') and 2 ('posthuman civilizations are extremely unlikely to run vast numbers of simulations') are false, it follows that humanity will probably both reach a posthuman stage and run a vast number of simulations. Now if you really think this will probably happen, I can see no reason to deny that it has already happened in the past. Why postulate that we will be the first simulators? There's no empirical evidence to support it, given that we are talking about extremely detailed, realistic simulations, and as it was already agreed that simulations are so many, it seems very, very unlikely that we are located at the first level. In other words, if one believes that intelligent life is part of a process which normally culminates with a massive ancestor-simulation program, the fact that there is intelligent life is not enough to find out in what part of the process it is located.
Ben Garfinkel's Shortform

The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there's no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not "you must optimize X goal", the constraints are "in Y situations you must behave in Z ways", which doesn't constrain how you behave in totally diff

... (read more)
4Rohin Shah1mo
I mostly go ¯\_(ツ)_/¯ , it doesn't feel like it's much evidence of anything, after you've updated off the abstract argument. The actual situation we face will be so different (primarily, we're actually trying to deal with the alignment problem, unlike evolution). I do agree that in saying " ¯\_(ツ)_/¯ " I am disagreeing with a bunch of claims that say "evolution example implies misalignment is probable". I am unclear to what extent people actually believe such a claim vs. use it as a communication strategy. (The author of the linked post states some uncertainty but presumably does believe something similar to that; I disagree with them if so.) I like the general idea but the way I'd do it is by doing some black-box investigation [https://forum.effectivealtruism.org/posts/fyk9ypmw6NZMYaeYK/the-case-for-becoming-a-black-box-investigator-of-language] of current language models and asking these questions there; I expect we understand the "ancestral environment" of a language model way, way better than we understand the ancestral environment for humans, making it a lot easier to draw conclusions; you could also finetune the language models in order to simulate an "ancestral environment" of your choice and see what happens then. I agree with the murder example being a tiny bit reassuring for training non-murderous AIs; medium-reassuring is probably too much, unless we're expecting our AI systems to be put into the same sorts of situations / ancestral environments as humans were in. (Note that to be the "same sort of situation" it also needs to have the same sort of inputs as humans, e.g. vision + sound + some sort of controllable physical body seems important.)
Ben Garfinkel's Shortform

(Disclaimer: The argument I make in this short-form feels I little sophistic to me. I’m not sure I endorse it.)

Discussions of AI risk, particular risks from “inner misalignment,” sometimes heavily emphasize the following observation:

Humans don’t just care about their genes: Genes determine, to a large extent, how people behave. Some genes are preserved from generation-to-generation and some are pushed out of the gene-pool. Genes that cause certain human behaviours (e.g. not setting yourself on fire) are more likely to be preserved. But people don’t care

... (read more)

The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there's no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not "you must optimize X goal", the constraints are "in Y situations you must behave in Z ways", which doesn't constrain how you behave in totally diffe... (read more)

Ben Garfinkel's Shortform

The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:

  1. Unaligned artificial intelligence[1]
  2. Unforeseen anthropogenic risks (tied)
  3. Engineered pandemics (tied)
  4. Other anthropogenic risks
  5. Nuclear war (tied)
  6. Climate change (tied)

This isn’t surprising.

For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a ... (read more)

3Zach Stein-Perlman1mo
Related: (source) [https://www.existential-risk.org/concept.pdf]
How likely is World War III?

Let’s call the hypothesis that the base rate of major wars hasn’t changed the constant risk hypothesis. The best presentation of this view is in Only the Dead, a book by an IR professor with the glorious name of Bear Braumoeller. He argues that there is no clear trend in the average incidence of several measures of conflict—including uses of force, militarized disputes, all interstate wars, and wars between “politically-relevant dyads”—between 1800 and today.

A quick note on Braumoeller's analysis:

He's relying on the Correlates of War (COW) dataset, whic... (read more)

Democratising Risk - or how EA deals with critics

I'm not familiar with Zoe's work, and would love to hear from anyone who has worked with them in the past. After seeing the red flags mentioned above,  and being stuck with only Zoe's word for their claims, anything from a named community member along the lines of "this person has done good research/has been intellectually honest" would be a big update for me…. [The post] strikes me as being motivated not by a desire to increase community understanding of an important issue, but rather to generate sympathy for the authors and support for their positi

... (read more)
6RAB6mo
Thanks Ben! That's very helpful info. I'll edit the initial comment to reflect my lowered credence in exaggeration or malfeasance.
Why AI alignment could be hard with modern deep learning

FWIW, I haven't had this impression.

Single data point: In the most recent survey on community opinion on AI risk, I was in at least the 75th percentile for pessimism (for roughly the same reasons Lukas suggests below). But I'm also seemingly unusually optimistic about alignment risk.

I haven't found that this is a really unusual combo: I think I know at least a few other people who are unusually pessimistic about 'AI going well,' but also at least moderately optimistic about alignment.

(Caveat that my apparently higher level of pessimism could also be explai... (read more)

All Possible Views About Humanity's Future Are Wild

Thanks for the clarification! I still feel a bit fuzzy on this line of thought, but hopefully understand a bit better now.

At least on my read, the post seems to discuss a couple different forms of wildness: let’s call them “temporal wildness” (we currently live at an unusually notable time) and “structural wildness” (the world is intuitively wild; the human trajectory is intuitively wild).[1]

I think I still don’t see the relevance of “structural wildness,” for evaluating fishiness arguments. As a silly example: Quantum mechanics is pretty intuitively wild,... (read more)

Ben, that sounds right to me. I also agree with what Paul said. And my intent was to talk about what you call temporal wildness, not what you call structural wildness.

I agree with both you and Arden that there is a certain sense in which the "conservative" view seems significantly less "wild" than my view, and that a reasonable person could find the "conservative" view significantly more attractive for this reason. But I still want to highlight that it's an extremely "wild" view in the scheme of things, and I think we shouldn't impose an inordinate burden of proof on updating from that view to mine.

All Possible Views About Humanity's Future Are Wild

To say a bit more here, on the epistemic relevance of wildness:

I take it that one of the main purposes of this post is to push back against “fishiness arguments,” like the argument that Will makes in “Are We Living at the Hinge of History?

The basic idea, of course, is that it’s a priori very unlikely that any given person would find themselves living at the hinge of history (and correctly recognise this). Due to the fallibility of human reasoning and due to various possible sources of bias, however, it’s not as unlikely that a given person would mistakenl... (read more)

We were previously comparing two hypotheses:

  1. HoH-argument is mistaken
  2. Living at HoH

Now we're comparing three:

  1. "Wild times"-argument is mistaken
  2. Living at a wild time, but HoH-argument is mistaken
  3. Living at HoH

"Wild time" is almost as unlikely as HoH. Holden is trying to suggest it's comparably intuitively wild, and it has pretty similar anthropic / "base rate" force.

So if your arguments look solid,  "All futures are wild" makes hypothesis 2 look kind of lame/improbable---it has to posit a flaw in an argument, and also that you are living at a wildly improb... (read more)

7ofer1y
I think the more decision relevant probabilities involve "Someone believes they should act as if they live at the HoH" rather than "Someone believes they live at the HoH". Our actions may be much less important if 'this is all a dream/simulation' (for example). We should make our decisions in the way we wish everyone-similar-to-us-across-the-multiverse make their decisions. As an analogy, suppose Alice finds herself getting elected as the president of the US. Let's imagine there are 10100 citizens in the US. So Alice reasons that it's way more likely that she is delusional than she actually being the president of the US. Should she act as if she is the president of the US anyway, or rather spend her time trying to regain her grip on reality? The 10100 citizens want everyone in her situation to choose the former. It is critical to have a functioning president. And it does not matter if there are many delusional citizens who act as if they are the president. Their "mistake" does not matter. What matters is how the real president acts.
All Possible Views About Humanity's Future Are Wild

Some possible futures do feel relatively more "wild” to me, too, even if all of them are wild to a significant degree. If we suppose that wildness is actually pretty epistemically relevant (I’m not sure it is), then it could still matter a lot if some future is 10x wilder than another.

For example, take a prediction like this:

Humanity will build self-replicating robots and shoot them out into space at close to the speed of light; as they expand outward, they will construct giant spherical structures around all of the galaxy’s stars to extract tremendous v

... (read more)

To say a bit more here, on the epistemic relevance of wildness:

I take it that one of the main purposes of this post is to push back against “fishiness arguments,” like the argument that Will makes in “Are We Living at the Hinge of History?

The basic idea, of course, is that it’s a priori very unlikely that any given person would find themselves living at the hinge of history (and correctly recognise this). Due to the fallibility of human reasoning and due to various possible sources of bias, however, it’s not as unlikely that a given person would mistakenl... (read more)

Taboo "Outside View"

I suspect you are more broadly underestimating the extent to which people used "insect-level intelligence" as a generic stand-in for "pretty dumb," though I haven't looked at the discussion in Mind Children and Moravec may be making a stronger claim.

I think that's good push-back and a fair suggestion: I'm not sure how seriously the statement in Nick's paper was meant to be taken. I hadn't considered that it might be almost entirely a quip. (I may ask him about this.)

Moravec's discussion in Mind Children is similarly brief: He presents a graph of the co... (read more)

I do think my main impression of insect <-> simulated robot parity comes from very fuzzy evaluations of insect motor control vs simulated robot motor control (rather than from any careful analysis, of which I'm a bit more skeptical though I do think it's a relevant indicator that we are at least trying to actually figure out the answer here in a way that wasn't true historically). And I do have only a passing knowledge of insect behavior, from watching youtube videos and reading some book chapters about insect learning. So I don't think it's unfair to put it in the same reference class as Rodney Brooks' evaluations to the extent that his was intended as a serious evaluation.

Taboo "Outside View"

As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”

A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used... (read more)

The Nick Bostrom quote (from here) is:

In retrospect we know that the AI project couldn't possibly have succeeded at that stage. The hardware was simply not powerful enough. It seems that at least about 100 Tops is required for human-like performance, and possibly as much as 10^17 ops is needed. The computers in the seventies had a computing power comparable to that of insects. They also achieved approximately insect-level intelligence.

I would have guessed this is just a funny quip, in the sense that (i) it sure sounds like it's just a throw-away quip, no e... (read more)

Taboo "Outside View"

Thank you (and sorry for my delayed response)!

I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate."

I also shudder a bit at that prospect.

I am sometimes happy making pretty broad and sloppy ... (read more)

2kokotajlod1y
I guess we can just agree to disagree on that for now. The example statement you gave would feel fine to me if it used the original meaning of "outside view" but not the new meaning, and since many people don't know (or sometimes forget) the original meaning... 100% agreement here, including on the bolded bit. Also agree here, but again I don't really care which one is overall more problematic because I think we have more precise concepts we can use and it's more helpful to use them instead of these big bags. I think I agree with all this as well, noting that this causal/deductive reasoning definition of inside view isn't necessarily what other people mean by inside view, and also isn't necessarily what Tetlock meant. I encourage you to use the term "causal/deductive reasoning" instead of "inside view," as you did here, it was helpful (e.g. if you had instead used "inside view" I would not have agreed with the claim about baseline bias)

As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”

A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used... (read more)

Ben Garfinkel's Shortform

I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Mostly the former!

I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was.

For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that human... (read more)

Taboo "Outside View"

It’s definitely entirely plausible that I’ve misunderstood your views.

My interpretation of the post was something like this:

There is a bag of things that people in the EA community tend to describe as “outside views.” Many of the things in this bag are over-rated or mis-used by members of the EA community, leading to bad beliefs.

One reason for this over-use or mis-use is that the the term “outside view” has developed an extremely positive connotation within the community. People are applauded for saying that they’re relying on “outside views” — “outside

... (read more)
8kokotajlod1y
Wow, that's an impressive amount of charitable reading + attempting-to-ITT you did just there, my hat goes off to you sir! I think that summary of my view is roughly correct. I think it over-emphasizes the applause light aspect compared to other things I was complaining about; in particular, there was my second point in the "this expansion of meaning is bad" section, about how people seem to think that it is important to have an outside view and an inside view (but only an inside view if you feel like you are an expert) which is, IMO, totally not the lesson one should draw from Tetlock's studies etc., especially not with the modern, expanded definition of these terms. I also think that while I am mostly complaining about what's happened to "outside view," I also think similar things apply to "inside view" and thus I recommend tabooing it also. In general, the taboo solution feels right to me; when I imagine re-doing various conversations I've had, except without that phrase, and people instead using more specific terms, I feel like things would just be better. I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate." Now I'll try to say what I think your position is: How does that sound?
Taboo "Outside View"

On the contrary; tabooing the term is more helpful, I think. I've tried to explain why in the post. I'm not against the things "outside view" has come to mean; I'm just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!

My initial comment was focused on your point about conflation, because I think this point bears on the linguistic question more strongly than the other p... (read more)

2kokotajlod1y
I said in the post, I'm a fan of reference classes. I feel like you think I'm not? I am! I'm also a fan of analogies. And I love trend extrapolation. I admit I'm not a fan of the anti-weirdness heuristic, but even it has its uses. In general most of what you are saying in this thread is stuff I agree with, which makes me wonder if we are talking past each other. (Example 1: Your second small comment about reference class tennis. Example 2: Your first small comment, if we interpret instances of "outside view" as meaning "reference classes" in the strict sense, though not if we use the broader definition you favor. Example 3: your points a, b, c, and e. (point d, again, depends on what you mean by 'outside view,' and also what counts as often.) My problem is with the term "Outside view." (And "inside view" too!) I don't think you've done much to argue in favor of it in this thread. You have said that in your experience it doesn't seem harmful; fair enough, point taken. In mine it does. You've also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by "reference class forecasting" you mean the stuff Tetlock's studies are about, then it really shouldn't include the anti-weirdness heuristic, but it seems like you are saying it does?) I found myself repeatedly thinking "but what does he mean by outside view? I agree or don't agree depending on what he means..." even though you had defined it earlier. You've said that you think the practices you call "outside view" are underrated and deserve positive reinforcement; I totally agree that some of them are, but I maintain that some of them are overrated, and would like to discuss each of them on a case by case basis instead of lumping them all together under one name. Of course you are free to use whatever terms you like, but I intend to continue to ask people to be more precise when I hear "outside view" or "inside view." :)
Taboo "Outside View"

I agree that people sometimes put too much weight on particular outside views -- or do a poor job of integrating outside views with more inside-view-style reasoning. For example, in the quote/paraphrase you present at the top of your post, something has clearly gone wrong.[1]

But I think the best intervention, in this case, is probably just to push the ideas "outside views are often given too much weight" or "heavily reliance on outside views shouldn't be seen as praiseworthy" or "the correct way to integrate outside views with more inside-view reasoning is... (read more)

9kokotajlod1y
On the contrary; tabooing the term is more helpful, I think. I've tried to explain why in the post. I'm not against the things "outside view" has come to mean; I'm just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post! By what definition of "outside view?" There is some evidence that in some circumstances people don't take reference class forecasting seriously enough; that's what the original term "outside view" meant. What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically underrated by the average intellectual?
Taboo "Outside View"

When people use “outside view” or “inside view” without clarifying which of the things on the above lists they mean, I am left ignorant of what exactly they are doing and how well-justified it is. People say “On the outside view, X seems unlikely to me.” I then ask them what they mean, and sometimes it turns out they are using some reference class, complete with a dataset. (Example: Tom Davidson’s four reference classes for TAI). Other times it turns out they are just using the anti-weirdness heuristic. Good thing I asked for elaboration!

FWIW, as a... (read more)

8kokotajlod1y
Thanks for this thoughtful pushback. I agree that YMMV; I'm reporting how these terms seem to be used in my experience but my experience is limited. I think opacity is only part of the problem; illicitly justifying sloppy reasoning is most of it. (My second and third points in "this expansion of meaning is bad" section.) There is an aura of goodness surrounding the words "outside view" because of the various studies showing how it is superior to the inside view in various circumstances, and because of e.g. Tetlock's advice to start with the outside view and then adjust. (And a related idea that we should only use inside view stuff if we are experts... For more on the problems I'm complaining about, see the meme, or Eliezer's comment.) This is all well and good if we use those words to describe what was actually talked about by the studies, by Tetlock, etc. but if instead we have the much broader meaning of the term, we are motte-and-bailey-ing ourselves.
What are things everyone here should (maybe) read?

Fortunately, if I remember correctly, something like the distinction between the true criterion of rightness and the best practical decision procedure actually is a major theme in the Kagan book. (Although I think the distinction probably often is underemphasized.)

It is therefore kind of misleading to think of consequentialism vs. deontology vs. virtue ethics as alternative theories, which however is the way normative ethics is typically presented in the analytic tradition.

I agree there is something to this concern. But I still wouldn't go so far as to... (read more)

6Max_Daniel1y
Yeah, I think these are good points. I also suspect that many deontologists and virtue ethicists would be extremely annoyed at my claim that they aren't alternative theories to consequentialism. (Though I also suspect that many are somewhat annoyed at the typical way the distinctions between these types of theories are described by philosophers in a broadly consequentialist tradition. My limited experience debating with committed Kantians suggests that disagreements seem much more fundamental than "I think the right action is the one with the best consequences, and you think there are additional determinants of rightness beyond axiology", or anything like that.)
What are things everyone here should (maybe) read?

A slightly boring answer: I think most people should at least partly read something that overviews common theories and frameworks in normative ethics (and the arguments for and against them) and something that overviews core concepts and principles in economics (e.g. the idea of expected utility, the idea of an externality, supply/demand, the basics of economic growth, the basics of public choice).

In my view, normative ethics and economics together make up a really large portion of the intellectual foundation that EA is built on.

One good book that overview... (read more)

I remember that reading up on normative ethics was one of the first things I focused on after I had encountered EA. I'm sure it was useful in many ways. For some reason, however, I feel surprisingly lukewarm about recommending that people read about normative ethics. 

It could be because my view these days is roughly: "Once you realize that consequentialism is great as a 'criterion of rightness' but doesn't work as 'decision procedure' for boundedly rational agents, a lot of the themes from deontology, virtue ethics, moral particularism, and moral plur... (read more)

Ben Garfinkel's Shortform

That's a good example.

I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn't say that this kind of variation is a "rounding error."

Over sufficiently long timespans, though, I think that technological/economic change has been more significant.

As an attempt to operationalize this claim: The average human society in 1000AD was obviously very differen... (read more)

Ben Garfinkel's Shortform

FWIW, I wouldn't say I agree with the main thesis of that post.

However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.

In principle, non-em-based artificial intelligence is capable of expressing the enti

... (read more)
4abergal1y
Really appreciate the clarifications! I think I was interpreting "humanity loses control of the future" in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where "humanity" refers to present-day humans, rather than humans at any given time period. I totally agree that future humans may have less freedom to choose the outcome in a way that's not a consequence of alignment issues. I also agree value drift hasn't historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.
Ben Garfinkel's Shortform

Do you have the intuition that absent further technological development, human values would drift arbitrarily far?

Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.

[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.

I definitely think that's true. But I also think that was true ... (read more)

Ben Garfinkel's Shortform

A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would i... (read more)

2Aaron Gertler9mo
Would you consider making this into a top-level post? The discussion here is really interesting and could use more attention, and a top-level post helps to deliver that (this also means the post can be tagged for greater searchability). I think the top-level post could be exactly the text here, plus a link to the Shortform version so people can see those comments. Though I'd also be interested to see the updated version of the original post which takes comments into account (if you felt like doing that).
9Max_Daniel1y
I agree with most of what you say here. [ETA: I now realize that I think the following is basically just restating what Pablo already suggested in another comment [https://forum.effectivealtruism.org/posts/kLYD95SK8tQFRmw4T/ben-garfinkel-s-shortform?commentId=dG9Xr8D44Sb7zBHPh] .] I think the following is a plausible & stronger concern, which could be read as a stronger version of your crisp concern #3. "Humanity has not had meaningful control over its future, but AI will now take control one way or the other. Shaping the transition to a future controlled by AI is therefore our first and last opportunity to take control. If we mess up on AI, not only have we failed to seize this opportunity, there also won't be any other." Of course, AI being our first and only opportunity to take control of the future is a strictly stronger claim than AI being one such opportunity. And so it must be less likely. But my impression is that the stronger claim is sufficiently more important that it could be justified to basically 'wager' most AI risk work on it being true.
5Rohin Shah1y
I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.
5abergal1y
Do you have the intuition that absent further technological development, human values would drift arbitrarily far? It's not clear to me that they would-- in that sense, I do feel like we're "losing control" in that even non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise. (It does also feel like we're missing the opportunity to "take control" and enable a new set of possibilities that we would endorse much more.) Relatedly, it doesn't feel to me like the values of humans 150,000 years ago and humans now and even ems in Age of Em are all that different on some more absolute scale.

Another interpretation of the concern, though related to your (3), is that misaligned AI may cause humanity to lose the potential to control its future. This is consistent with humanity not having (and never having had) actual control of its future; it only requires that this potential exists, and that misaligned AI poses a threat to it.

Ben Garfinkel's Shortform

Good point!

That consideration -- and the more basic consideration that more junior people often just know less -- definitely pushes in the opposite direction. If you wanted to try some version of seniority-weighted epistemic deference, my guess is that the most reliable cohort would have studied a given topic for at least a few years but less than a couple decades.

Ben Garfinkel's Shortform

A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.[1]

For this reason, the observation “someone I respect publicly argued for X many years a... (read more)

At least in software, there's a problem I see where young engineers are often overly bought-in to hype trains, but older engineers (on average) stick with technologies they know too much.

I would imagine something similar in academia, where hot new theories are over-valued by the young, but older academics have the problem you describe.

Ben Garfinkel's Shortform

I’d actually say this is a variety of qualitative research. At least in the main academic areas I follow, though, it seems a lot more common to read and write up small numbers of detailed case studies (often selected for being especially interesting) than to read and write up large numbers of shallow case studies (selected close to randomly).

This seems to be true in international relations, for example. In a class on interstate war, it’s plausible people would be assigned a long analysis of the outbreak WW1, but very unlikely they’d be assigned short descriptions of the outbreaks of twenty random wars. (Quite possible there’s a lot of variation between fields, though.)

Ben Garfinkel's Shortform

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against anoth... (read more)

3Stefan_Schubert1y
Interesting ideas. Some similarities with qualitative research [https://en.wikipedia.org/wiki/Qualitative_research], but also important differences, I think (if I understand you correctly).
Ben Garfinkel's Shortform

The O*NET database includes a list of about 20,000 different tasks that American workers currently need to perform as part of their jobs. I’ve found it pretty interesting to scroll through the list, sorted in random order, to get a sense of the different bits of work that add up to the US economy. I think anyone who thinks a lot about AI-driven automation might find it useful to spend five minutes scrolling around: it’s a way of jumping yourself down to a lower level of abstraction. I think the list is also a little bit mesmerizing, in its own right.

One up... (read more)

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against anoth... (read more)

4abergal1y
I agree with the thrust of the conclusion, though I worry that focusing on task decomposition this way elides the fact that the descriptions of the O*NET tasks already assume your unit of labor is fairly general. Reading many of these, I actually feel pretty unsure about the level of generality or common-sense reasoning required for an AI to straightforwardly replace that part of a human's job. Presumably there's some restructure that would still squeeze a lot of economic value out of narrow AIs that could basically do these things, but that restructure isn't captured looking at the list of present-day O*NET tasks.
Load More