Reactions to the "cardinal comparability" objection (4.2):
Unlike validity, this is not a well-studied topic. When I came to look at it for my PhD, I struggled to find much of a literature on it. There were bits and pieces, but nothing that seemed to convincingly offer an overall assessment of the issue (see Plant 2020 where I try to offer one).
Understudied topics are where non-expert input is more likely to be useful. However, in this case, we do have a literature on the topic. The term is "scope insensitivity." It's one of the key cognitive biases Kahneman and others have examined in the behavioral economics literature. People will say they'd pay $80, $78, and $88 to protect 2,000, 20,000, or 200,000 birds from drowning in oil ponds. Just so, they'll report an 8/10 score for their marriage just about no matter how happy it makes them.
Although this is a weedy topic, it might be the key area where the intuitions of growth advocates diverge. Scale shifts mean anchoring on a default value for self-reports. There are cultural defaults for how to describe or quantify your relationship with your spouse, kids, parents, hometown, job, and sports team. These defaults are independent of their true quality. When people's circumstances worsen or improve, the normal way to describe them stays the same.
People's culture and identity informs their default response. If they see themselves as a grateful person, that means they have to give grateful-sounding responses, regardless of circumstances. This is the explicit, fundamental teaching of many philosophies and religions. They don't just teach that you should learn to feel happy in challenging circumstances. They teach that you should start by describing those circumstances with equanimity. They, along with psychologists, also talk about the difficulty of noticing and honestly reporting your true feelings. This challenge is the basis for the perennial human potential movement.
Default responses also help people preserve their mental wellbeing. Too much complaining makes people feel worse. It would just feel sarcastic to claim life's 10/10 when it's not - and realistically can't be, since nothing's ever perfect. So we default to a description somewhere from 4/10-8/10 in most cases. When things are worse, we search for a way to see them as better than they are. When they're better, we find a way to commisserate.
In broad terms, I think we should be somewhat reassured about the cardinal comparability of subjective data. As one piece of evidence, see Figure 9 below, taken from a YouGov poll. Here, American individuals were asked to give ratings from 0 (very negative) to 10 (very positive) for different words like: “very bad”, “terrible”, “outstanding”, “excellent”, and “perfect.”The ‘bumps’ represent the proportion of people that give each answer. The overlap isn’t perfect, but it’s pretty good. If you ask people to score ‘perfect’, basically everyone says it means 10/10. If people thought this task was meaningless, they’d answer at random, and the lines would be flat. So, it seems that people are sensibly able to compare verbal labels and numerical labels and do so in the same sort of way.
In broad terms, I think we should be somewhat reassured about the cardinal comparability of subjective data. As one piece of evidence, see Figure 9 below, taken from a YouGov poll. Here, American individuals were asked to give ratings from 0 (very negative) to 10 (very positive) for different words like: “very bad”, “terrible”, “outstanding”, “excellent”, and “perfect.”
The ‘bumps’ represent the proportion of people that give each answer. The overlap isn’t perfect, but it’s pretty good. If you ask people to score ‘perfect’, basically everyone says it means 10/10. If people thought this task was meaningless, they’d answer at random, and the lines would be flat. So, it seems that people are sensibly able to compare verbal labels and numerical labels and do so in the same sort of way.
This supports the hypothesis that people can put their language on a scale. It doesn't mean they are. There are no cultural or emotional stakes in this study. At most, it leaves the possibility open that people are able to report their real subjective wellbeing. It doesn't surprise me that the cultural default in some countries is a 2-3, while in others it's a 7-8, which is how I interpret figure 3. That data is perfectly compatible with both the idea that people are scaling their happiness according to, say, relative wealth, and that these countries simply have stable default reporting values, which cluster geographically as so many other cultural traits do.
People aren't using "bespoke scales which change from moment to moment." They're using broken scales that stay fixed across time, regardless of their true feelings or changing circumstances.
As you point out:
There’s some evidence from memory data that individuals keep the same scales over their own lives (this is from Prati and Senik 2020, which I discuss in Plant 2020).
Scale shift could still happen generationally: an 8/10 to someone born in 1950 represents a lower level of whatever subjective thing is being measured than an 8/10 to someone born in 2000. I can’t think of any research that addresses this specific concern. It doesn’t strike me as particularly likely, though. It implies that if me, my parents, and my grandparents were each to say we were 10/10 happy, we would assume I would be happier than my parents, who would be happier than their grandparents.
I would absolutely believe that your 10/10 represents a higher level of felt happiness or satisfaction than your grandparents experienced. This corresponds perfectly to the idea of an intergenerational fixed default report.
An alternative to arguing about possible scale changes would be to take a general theory of how happiness works, how economic growth changes our lives and society, and whether we should expect it to increase or reduce happiness as a result. For my money, the most promising option is to conceive of happiness and unhappiness as “Mother Nature’s” reward and punishment mechanisms for evolutionary fitness. In this light, we want to consider humanity’s environment of evolutionary adaptation, i.e. from about 100,000 years ago, to the present day but it’s not obvious this analysis favours the growth-advocate. Notably, Hidaka (2012) argues that depression is rising as a result of modernity, and points to the fact that “modern populations are increasingly overfed, malnourished, sedentary, sunlight-deficient, sleep-deprived, and socially-isolated”.
Evolutionary fitness is the number of offspring you produce, and how many they produce, and so on ad infinitum. This predicts that people will report greater happiness the more children they have. In fact two children, which is below replacement, is the preferred number across many countries.
Overfed? Sounds evolutionarily adaptive to me. Malnourished? Your breakfast cereal is fortified, nobody gets scurvy, and vitamin shops are everywhere. Sedentary? Relaxed. Sunlight-deficient? It's brighter in my house than it is outside at the moment. Sleep-deprived? Watching Netflix while not sick with cholera. Socially isolated? People have options about whether and whom to be friends with.
Weirdly, I feel somewhat anxious in saying this, as though I've broken a mild taboo by responding in these ways to Hidaka's list. That's what I get for violating the cultural default report.
I'm going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine.
I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing thatextremely simple generic architectures written down in a few dozen lines of codewith large capability differences between very similar lines of codesolving many problems in many fields and subsuming entire subfields as simply another minor variantwith large generalizing models...powered by OOMs more computesteadily increasing in agencyisa short description of Yudkowsky's views on what the runup will look likeand how DL now works.
I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that
We don't have a formalism to describe what "agency" is. We do have several posts trying to define it on the Alignment Forum:
While it might not be the best choice, I'm going to use Gradations of Agency as a definition, because it's more systematic in its presentation.
"Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them."
This doesn't seem like what any ML model does. So we can look at "Level 2," which gives the example " You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward."
This seems like how all ML works.
So using the "Gradations of Agency" framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don't appear to be changing levels of agency. They aren't identifying other successful ML models and imitating them.
Gradations of Agency doesn't argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside?
This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky's predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions.
A 10 trillion parameter model now exists, and it's been suggested that a 100 trillion parameter model, which might even be created this year, might be roughly comparable to the power of the human brain.
It's scary to see that we're racing full-on toward a very near-term ML project that might plausibly be AGI. However, if a 100-trillion parameter ML model is not AGI, then we'd have two strikes against Yudkowski. If neither a small coded model nor a 100-trillion parameter trained model using 2022-era ML results in AGI, then I think we have to take a hard look at his track record on predicting what technology is likely to result in AGI. We also have his "AGI well before 2050" statement from "Beware boasting" to work with, although that's not much help.
On the other hand, I think his assertiveness about the importance of AI safety and risk is appropriate even if he proves wrong about the technology by which AGI will be created.
I would critique the OP, however, for not being sufficiently precise in its critiques of Yudkowsky. As its "fairly clearcut examples," it uses 20+-year-old predictions that Yudkowsky has explicitly disavowed. Then, at the end, it complains that he hasn't "acknowledged his mixed track record." Yet in the post it links, Yudkowsky's quoted as saying:
To be a slightly better Bayesian is to spend your entire life watching others slowly update in excruciatingly predictable directions that you jumped ahead of 6 years earlier so that your remaining life could be a random epistemic walk like a sane person with self-respect.
6 years is not 20 years. It's perfectly consistent to say that a youthful, 20+-years-in-the-past version of you thought wrongly about a topic, but that you've since come to be so much better at making predictions within your field that you're 6 years ahead of Metaculus. We might wish he'd stated these predictions in public and specified what they were. But his failure to do so doesn't make him wrong, but rather lacking evidence of his superior forecasting ability. These are distinct failure modes.
Overall, I think it's wrong to conflate "Yudkowsky was wrong 20+ years ago in his youth" with "not everyone in AI safety agrees with Yudkowsky" with "Yudkowsky hasn't made many recent, falsifiable near-term public predictions about AI timelines." I think this is a fair critique of the OP, which claims to be interrogating Yudkowsky's "track record."
But I do agree that it's wise for a non-expert to defer to a portfolio of well-chosen experts, rather than the views of the originator of the field alone. While I don't love the argument the OP used to get there, I do agree with the conclusion, which strikes me as just plain common sense.
Non-EAs are receptive to a proposal to substitute bivalves for other meat. They are not receptive to proposals to go vegetarian/vegan. Bivalves are also healthier than plant-based meat. Therefore, bivalves are the most effective way to reduce overall animal suffering.
I interpret the linked post about receptivity to proposals to go vegetarian/vegan as providing evidence that people are receptive to these proposals. It states:
However, polls suggest that the percentage of the population that’s vegetarian has stayed basically flat since 1999. In short, we’re basically treading water: for every new vegetarian we convince, someone else quits. As you’d expect given this fact, more than four-fifths of vegans and vegetarians eventually abandon their diets.
If the number of vegetarians is flat, yet people are abandoning the diet, this requires a constant inflow of new vegetarians to balance the outflow. The whole point of the post is that people are receptive to trying a vegetarian diet out, but that they struggle to maintain it long-term.
The claim in this post that "non-EAs are receptive to a proposal to substitute bivalves for other meat" does not provide any evidence in favor of the assertion, which might just be a missing link.
I think it's important to make this claim about receptivity to bivalve/meat substitutions more specific. Is this population of non-EAs members of the general public? What countries do they live in? How often, and how much, and for what kinds of meat would they consider bivalves an acceptable substitute? Will they pay more to substitute bivalves for other meats on a pound-for-pound basis, and if so, how much more?
Personally, I enjoy the taste of scallops, oysters, mussels, clams more than the meats like beef, pork, chicken. I believe that most consumers would have the same palate, except for a few countries like the United States that have idiosyncratic food preferences.
Shellfish consumption tends to be aggregated with seafood consumption in general, but the USA consumes a moderate amount of seafood relative to other countries. It's not clear to me what you mean by the USA having "idiosyncratic food preferences." I was only able to find this data on American oyster consumption specifically. While low, I think this is better explained by a combination of the high cost of oysters and the fact that America has a great deal of non-coastal land a long tradition of ranching, and excellent farmland, making cheap, high-quality meat widely available to the population.
I suspect that meat freezes and ships much better than shellfish. I have no reticence about eating a steak from a cow slaughtered 2,000 miles away, but I think of "Midwestern sushi" as a rare example when combining a location name with a food name (i.e. "Washington cherries," "Argentinian french fries," "French pastry") as making the food sound worse rather than better.
I would want to see a deeper investigation into the tractability of upscaling shellfish aquaculture, a stronger argument on the market failure explanation for why normal market mechanisms are inadequate to motivate increased production, and better information on people's receptivity to bivalves as a meat substitute.
That said, I love oysters, and if we can altruistically make them cheap enough that I can eat them on the daily, that alone will make the EA movement a success as far as I am concerned.
This is very helpful, thank you! I've been mainly looking into design projects for the summer, and the impression I picked up at EAGxBoston was that just having low-cost UVC devices available was a key bottleneck. Working on a design sounded like it might fit the bill. Based on what you've said, it sounds like this is more of a logistics and social coordination problem than a money problem. I'll keep this in mind for the future, though.
So this is a definitional issue: is it accurate to call the most Hispanic district in the 14th most Hispanic state (per Wikipedia) "not a heavily Hispanic area or anything?"
We can answer this quantitatively.
17.4% of the citizen voting age population of OR-6 is Hispanic. Of 9 candidates who ran in OR-6, two, Salinas and Leon, are Hispanic, making Hispanics 22.2% of the candidate pool. So they were not particularly over- or under-represented in this race. It is slightly surprising that the strongest candidate in this race happened to be Hispanic, but 22.2% chances happen all the time. Obviously, referring to this as "chance" is in no way suggesting that Salinas won "by luck," she's clearly a skilled legislator.
Matt says that "this is the only viable opportunity to add a Hispanic Democrat to the caucus this year." It seems like we have to consider four counterfactuals here:
1. Salinas didn't run
I think it's a safe assumption that people who vote for Hispanic candidates specifically because they are Hispanic and represent Hispanic issues are a subset of the Hispanic population.
Let's say that the entire Hispanic vote in OR-6 went for Salinas (surely an overestimate), that this represents 17.4% of votes in this election, and that 2/3 of them would have switched their votes to Leon if Salinas hadn't run. That would have given Leon an additional 6,000-7,000 votes or so, which would have been enough to beat Flynn if Salinas's other votes were redistributed evenly or in proportion to vote share to other candidates.
That's a pretty generous assumption in favor of the idea that Leon was a viable candidate in this counterfactual scenario, one that reasonable people could disagree on.
2. Flynn didn't run
In this case, let's assume Flynn's votes would have been redistributed evenly or in proportion to vote share to other candidates. Then Salinas would still have won.
3. Salinas and Flynn didn't run
In this case, let's say once again that Leon would have received an additional 6,000-7,000 Hispanic votes, while the remaining voters would have been redistributed among the other candidates either evenly or in proportion to vote share. In this case, Leon would have been the frontrunner. Indeed, under this model, she could have received more like 1/3 of the Hispanic vote, with the remainder of the votes being split up equally, and been neck and neck with Reynolds. But reasonable people can probably still disagree on whether she'd have received even this much of the Hispanic vote.
4. Flynn had run in a district where his top competitor was white
Let's say that Flynn had run in a different district where his top competitor had equal local appeal and political skill to that of Salinas. However, in this counterfactual district, the prospect of putting an additional Hispanic legislator in the Democratic caucus was not on the table, because Flynn's top competitor was not Hispanic.
Matt is suggesting that, in this case, that competitor may not have been able to attract a big PAC spend of their own, and Flynn's campaign funding, along with his qualities as a candidate, may have been sufficient to win him the election. I don't read this as a dig against Salinas's skill as a politician. I read it as an explanation for why she in particular, among other strong candidates in other districts, was able to attract over a million dollars in PAC spending of her own. Given that BoldPAC is an explicitly pro-Hispanic Democratic PAC, it seems like they themselves would agree that giving a strong Democratic Hispanic candidate extra funding to help them beat non-Hispanic rivals is exactly their agenda.
Flynn couldn't help being from the district he was from, and in this election, there was an extremely limited supply (1) of explicitly EA candidates with a heavy focus on pandemic prevention. So the fact that he happened to be up against a main competitor who is Hispanic and could therefore attract this specific form of campaign financing does seem to be a matter of luck.
It seems possible, but unlikely, that Flynn got "unlucky" in facing an unusually strong opponent. Salinas is clearly very good, and my guess is that in most contested primaries, there is at least one very skilled, appealing, and reasonably well-funded legislator in the running.
From the outside view, we ought to perhaps view an EA candidate as being basically a "random sample" of the candidate quality pool. As we can see in this election, vote distributions are long-tailed, and a randomly sampled candidate in a 9-candidate election will usually be lackluster. That doesn't mean it's a bad idea to take a shot. I think we should not update overmuch on the strategy of "just throw money behind EA-aligned candidates."
For the hypothesis that BoldPAC's late-campaign spend turned a Flynn victory into a Salinas victory, we are sort of positing that Salinas's skill and Flynn's money had them neck-and-neck, but that Salinas could benefit from an influx of cash and advertising much more than Flynn because of diminishing marginal returns. On May 5th, Salinas and Flynn polled at 18% and 14% respectively. So around the time of the BoldPAC ad buy, this hypothesis might have seemed reasonable. Looking at the voting results and assuming we should have known at the time that Salinas would receive twice the support of Flynn is just hindsight bias. Going further and denying that Flynn could have won in any election at all is "totally spurious" and an "ugly" and "backwards interpretation" analysis is, well, the sort of that I deleted my Facebook account in order to avoid.
I strongly upvoted your post, and thanks for taking the time to write it.
I note that you’re effectively recommending a strategy of lobbying instead of electioneering in order to advance the cause of pandemic preparedness. Do you have data or personal experience to support the idea that lobbying is a more effective method than campaign sponsorship of aligned candidates to build political support for an issue?
Matt Lerner spent some time looking into lobbying for altruistic causes and posted about it on the EA forum. I appreciate his research, and would like to see more exploring the effectiveness of altruistic lobbying and how to do it well.
But to me the thrust of this post (and the phenomenon I was commenting on) was: there are many people with the ability to solve the worlds biggest problems. It would be a shame to lose their inclination purely due to our CB strategies. If our strategy could be nudged to achieve better impressions at people's first encounter with EA, we could capture more of this talent and direct them to the world's biggest problems.
Another way of stating this is that we want to avoid misdirecting talent away from the world's biggest problems. This might occur if EA has identified those problems, effectively motivates its high-aptitude members to work on them, but fails to recruit the maximum number of high-aptitude members, due to CB strategies optimized for attracting larger numbers of low-aptitude members.
This is clearly a possible failure mode for EA.
The epistemic thrust of the OP is that we may be missing out on information that would allow us to determine whether or not this is so, largely due to selection and streetlamp effects.
Anecdata is a useful starting place for addressing this concern. My objective in my comment above is to point out that this is, in the end, just anecdata, and to question the extent to which we should update on it. I also wanted to focus attention on the people who I expect to have the most valuable insights about how EA could be doing better at attracting high-aptitude members; I expect that most of these people are not the sort of folks who refer to EA as a "cult" from the next table down at a Cambridge fresher's fair, but I could be wrong about that.
In addition, I want to point out that the character models of "Alice" and "Bob" are the merest speculation. We can spin other stories about "Cindy" and "Dennis" in which the smart, independent-minded skeptic is attracted to EA, and the aimless believer is attracted to some other table at the fresher's fair. We can also spin stories in which CB folks wind up working to minimize the perception that EA is a cult, and this having a negative impact on high-talent recruitment.
I am very uncertain about all this, and I hope that this comes across as constructive.
The criticisms of EA movement building tactics that we hear are not necessarily the ones that are most relevant to our movement goals. Specifically, I’m hesitant to update much on a few 18 year olds who decide we’re a “cult” after a few minutes of casual observation at a fresher’s fair. I wouldn’t want to be part of a movement that eschewed useful tools for better-integrating its community because it’s afraid of the perception of a few sarcastic teenagers.
Instead, I’m interested in learning about the critiques of EA put forth by highly-engaged EAs, non-EAs, semi-EAs, and ex-EAs who care about or share at least some of our movement goals, have given them a lot of thought, are generally capable people, and have decided that participation in the EA movement is therefore not for them.
What follows is mere speculation on my part.
I tend to assume that the physical health interventions we promote via global health initiatives are also the most tractable ways to improve mental health. Losing a child to malaria, or suffering anemia due to a worm infection, or being extremely poor, or living and dying through wars and plagues, seem like they’d have a devastating impact on people’s mental health.
Because EAs don’t typically suffer from these problems, and because we allow for a lot of self-care, it does not surprise me that EAs focus on specifically mental health interventions for themselves. This can smack of “for me but not for thee,” but I read it as “I expect that therapy will do me more good than bednets, and bednets will do you more good than therapy.”
If that’s false, it might be useful to promote the counter argument to that more loudly. I expect you’ve researched it deeply, given your work, and so it might just be a matter of promoting that research more vigorously.
I think Matt’s on the right track here. Treating “immortal dictators” as a separate scenario from “billions of lives lost to an immortal dictator” smacks of double-counting.
Really, we’re asking if immortality will tend to save or lose lives on net, or to improve or worsen QoL on net.
We can then compare the possible causes of lives lost/worsened vs gained/bettered: immortal dictators, or perhaps immortal saints; saved lives from life extension; lives less tainted by fear of death and mourning; lives more free to pursue many paths; alignment of individual self-interest with the outcome of the long-term future; the persistent challenge of hyperbolic discounting; the question of how to provide child rearing experiences in a crowded world with a death rate close to zero; the possible need to colonize the stars to make more room for an immortal civilization; the attendant strife that such a diaspora may experience.
When I just make a list of stuff in this manner, no individual item jumps out at me as particularly salient, but the collection seems to point in the direction of immortality being good when confined to Earth, and then being submerged into the larger question of whether a very large and interplanetary human presence would be good.
I think that this argument sort of favors a more near-term reach for immortality. The smaller and more geographically concentrated the human population is by the time it’s immortal, the better able it is to coordinate and plan for interplanetary growth. If humanity spreads to the stars, then coordination ability declines. If immortality is bad in conjunction with interplanetary civilization, the horse is out of the barn.