Why scientific research is less effective in producing value than it could be: a mapping

My experience talking with scientists and reading science in the regenerative medicine field has shifted my opinion against this critique somewhat. Published papers are not the fundamental unit of science. Most labs are 2 years ahead of whatever they’ve published. There’s a lot of knowledge within the team that is not in the papers they put out.

Developing a field is a process of investment not in creating papers, but in creating skilled workers using a new array of developing technologies and techniques. The paper is a way of stimulating conversation and a loose measure of that productivity. But just because the papers aren’t good doesn’t mean there’s no useful learning going on, or that science is progressing in a wasteful manner. It’s just less legible to the public.

For example, I read and discussed with the authors a paper on a bioprinting experiment. They produced a one centimeter cube of human tissue via extrusion bioprinting. The materials and methods aren’t rigorously controllable enough for reproducibility. They use decellularized pig hearts from the local butcher (what’s it been eating, what were its genetics, how was it raised?), and an involved manual process to process and extrude the materials.

Several scientists in the field have cautioned me against assuming that figures in published data are reproducible. Yet does that mean the field is worthless? Not at all. New bioprinting methods continue to be developed. The limits of achievement continue to expand. Humanity is developing a cadre of bioengineers who know how to work with this stuff and sometimes go on to found companies with their refined techniques.

It’s the ability to create skilled workers in new manufacturing and measurement techniques, skilled thinkers in some line of theory, that is an important product of science. Reproducibility is important, but that’s what you get after a lot of preliminary work to figure out how to work with the materials and equipment and ideas.

What's wrong with the EA-aligned research pipeline?

Looking forward to hearing about those vetting constraints! Thanks for keeping the conversation going :)

Help me find the crux between EA/XR and Progress Studies

Imagine we can divide up the global economy into natural clusters. We'll refer to each cluster as a "Global Project." Each Global Project consists of people and their ideas, material resources, institutional governance, money, incentive structures, and perhaps other factors.

Some Global Projects seem "bad" on the whole. They might have directly harmful goals, irresponsible risk management, poor governance, or many other failings. Others seem "good" on net. This is not in terms of expected value for the world, but in terms of the intrinsic properties of the GP that will produce that value.

It might be reasonable to assume that Global Project quality is normally distributed. One point of possible difference is the center of that distribution. Are most Global Projects of bad quality, neutral, or good quality?

We might make a further assumption that the expected value of a Global Project follows a power law, such that projects of extremely low or high quality produce exponentially more value (or more harm). Perhaps, if Q is project quality and V is value, . But we might disagree on the details of this power law.

One possibility is that in fact, it's easier to destroy the world than to improve the world. We might model this with two power laws, one for Q > 0 and one for Q < 0, like so:

  • , Q >= 0
  • , Q < 0

In this case, whether or not progress is good will depend on the details of our assumptions about both the project quality distribution and the power law for expected value:

  • The size of N, and whether or not the power law is uniform or differs for projects of various qualities. Intuitively, "is it easier for a powerful project to improve or destroy the world, and how much easier?"
  • How many standard deviations away from zero the project quality distribution is centered, and in which direction. Intuitively, "are most projects good or bad, and how much?"

In this case, whether or not average expected value across many simulations of such a model is positive or negative can hinge on small alterations of the variables. For example, if we set N = 7 for bad projects and N = 3 for good projects, but we assume that the average project quality is +0.6 standard deviations from zero, then average expected value is mildly negative. At project quality +0.7 standard deviations from zero, the average expected value is mildly positive.

Here's what an X-risk "we should slow down" perspective might look like. Each plotted point is a simulated "world." In this case, the simulation produces negative average EV across simulated worlds.

And here is a Progress Studies "we should speed up" perspective might look like, with positive average EV.

The joke is that it's really hard to tell these two simulations apart. In fact, I generated the second graph by altering the center point of the project quality distribution 0.01 standard deviations to the right relative to the first graph. In both case, a lot of the expected value is lost to a few worlds in which things go cataclysmically wrong.

One way to approach a double crux would be for adherents of the two sides to specify, in the spirit of "if it's worth doing, it's worth doing with made up statistics," their assumptions about the power law and project quality distribution, then argue about that. Realistically, though, I think both sides understand that we don't have any realistic way of saying what those numbers ought to be. Since the details matter on this question, it seems to me that it would be valuable to find common ground.

For example, I'm sure that PS advocates would agree that there are some targeted risk-reduction efforts that might be good investments, along with a larger class of progress-stimulating interventions. Likewise, I'm sure that XR advocates would agree that there are some targeted tech-stimulus projects that might be X-risk "security factors." Maybe the conversation doesn't need to be about whether "more progress" or "less progress" is desirable, but about the technical details of how we can manage risk while stimulating growth.

What's wrong with the EA-aligned research pipeline?

Yeah, I am worried we may be talking past each other somewhat. My takeaway from the grantmaker quotes from FHI/OpenPhil was that they don't feel they have room to grow in terms of determining the expected value of the projects they're looking at. Very prepared to change my mind on this; I'm literally just going from the quotes in the context of the post to which they were responding.

Given that assumption (that grantmakers are already doing the best they can at determining EV of projects), then I think my three categories do carve nature at the joints. But if we abandon that assumption and assume that grantmakers could improve their evaluation process, and might discover that they've been neglecting to fund some high-EV projects, then that would be a useful thing for them to discover.

What's wrong with the EA-aligned research pipeline?

Your previous comment seemed to me to focus on demand and supply and note that they'll pretty much always not be in perfect equilibrium, and say "None of those problems indicate that something is wrong", without noting that the thing that's wrong is animals suffering, people dying of malaria, the long-term future being at risk, etc.

In the context of the EA forum, I don't think it's necessary to specify that these are problems. To state it another way, there are three conditions that could exist (let's say in a given year):

  1. Grantmakers run out of money and aren't able to fund all high-quality EA projects.
  2. Grantmakers have extra money, and don't have enough high-quality EA projects to spend it on.
  3. Grantmakers have exactly enough money to fund all high-quality EA projects.

None of these situations indicate that something is wrong with the definition of "high quality EA project" that grantmakers are using. In situation (1), they are blessed with an abundance of opportunities, and the bottleneck to do even more good is funding. In situation (2), they are blessed with an abundance of cash, and the bottleneck to do even more good is the supply of high-quality projects. In situation (3), they have two bottlenecks, and would need both additional cash and additional projects in order to do more good.

No matter how many problems exist in the world (suffering, death, X-risk), some bottleneck or another will always exist. So the simple fact that grantmakers happen to be in situation (2) does not indicate that they are doing something wrong, or making a mistake. It merely indicates that this is the present bottleneck they're facing.

For the rest, I'd say that there's a difference between "willingness to work" and "likelihood of success." We're interested in the reasons for EA project supply inelasticity. Why aren't grantmakers finding high-expected-value projects when they have money to spend?

One possibility is that projects and teams to work on them aren't motivated to do so by the monetary and non-monetary rewards on the table. Perhaps if this were addressed, we'd see an increase in supply.

An alternative possibility is that high-quality ideas/teams are rare right now, and can't be had at any price grantmakers are willing or able to pay.

What's wrong with the EA-aligned research pipeline?

In particular, I think it implies the only relevant type of "demand" is that coming from funders etc., whereas I'd want to frame this in terms of ways the world could be improved.

My position is that "demand" is a word for "what people will pay you for." EA exists for a couple reasons:

  1. Some object-level problems are global externalities, and even governments face a free rider problem. Others are temporal externalities, and the present time is "free riding" on the future. Still others are problems of oppression, where morally-relevant beings are exploited in a way that exposes them to suffering.

    Free-rider problems by their nature do not generate enough demand for people to do high-quality work to solve them, relative to the expected utility of the work. This is the problem EA tackled in earlier times, when funding was the bottleneck.
  2. Even when there is demand for high-quality work on these issues, supply is inelastic. Offering to pay a lot more money doesn't generate much additional supply. This is the problem we're exploring here.

The underlying root cause is lack of self-interested demand for work on these problems, which we are trying to subsidize to correct for the shortcoming.

EA is a Career Endpoint

I can see how you might interpret it that way. I'm rhetorically comfortable with the phrasing here in the informal context of this blog post. There's a "You can..." implied in the positive statements here (i.e. "You can take 15 years and become a domain expert"). Sticking that into each sentence would add flab.

There is a real question about whether or not the average person (and especially the average non-native English speaker) would understand this. I'm open to argument that one should always be precisely literal in their statements online, to prioritize avoiding confusion over smoothing the prosody.

EA is a Career Endpoint

Thanks for that context, John. Given that value prop, companies might use a TB-like service under two constraints:

  1. They are bottlenecked by having too few applicants. In this case, they have excess interviewing capacity, or more jobs than applicants. They hope that by investigating more applicants through TB, they can find someone outstanding.
  2. Their internal headhunting process has an inferior quality distribution relative to the candidates they get through TB. In this case, they believe that TB can provide them with a better class of applicants than their own job search mechanisms can identify. In effect, they are outsourcing their headhunting for a particular job category.

Given that EA orgs seem primarily to lack specific forms of domain expertise, as well as well-defined project ideas/teams, what would an EA Triplebyte have to achieve?

They'd need to be able to interface with EA orgs and identify the specific forms of domain expertise that are required. Then they'd need to be able to go out and recruit those experts, who might never have heard of EA, and get them interested in the job. They'd be an interface to the expertise these orgs require. Push a button, get an expert.

That seems plausible. Triplebyte evokes the image of a huge recruiting service meant to fill cubicles with basically-competent programmers who are pre-screened for the in-house technical interview. Not to find unusually specific skills for particular kinds of specialist jobs, which it seems is what EA requires at this time.

That sort of headhunting job could be done by just one person. Their job would be to do a whole lot of cold-calling, getting meetings with important people, doing the legwork that EA orgs don't have time for. Need five minutes of a Senator's time? Looking to pull together a conference of immunologists to discuss biosafety issues from an EA perspective? That's the sort of thing this sort of org would strive to make more convenient for EA orgs.

As they gained experience, they would also be able to help EA orgs anticipate what sort of projects the domain experts they'd depend upon would be likely to spring for. I imagine that some EA orgs must periodically come up with, say, ideas that would require some significant scientific input. Some of those ideas might be more attractive to the scientists than others. If an org like this existed, it might be able to tell those EA orgs which ones the scientists are likely to spring for.

That does seem like the kind of job that could productively exist at the intersection of EA orgs. They'd need to understand EA concepts and the relationships between institutions well enough to speak "on behalf of the movement," while gaining a similar understanding of domains like the scientific, political, business, philanthropic, or military establishment of particular countries.

An EA diplomat.

EA is a Career Endpoint

Great thoughts, ishaan. Thanks for your contributions here. Some of these thoughts connect with MichaelA's comments above. In general, they touch on the question of whether or not there are things we can productively discover or say about the needs of EA orgs and the capabilities of applications that would reduce the size of the "zone of uncertainty."

This is why I tried to convey some of the recent statements by people working at major EA orgs on what they perceive as major bottlenecks in the project pipeline and hiring process.

One key challenge is triangulation.  How do we get the right information to the right person? 80000 Hours has solved a piece of this admirably, by making themselves into a go-to resource on thinking through career selection from an EA point of view.

This is a comment section on a modestly popular blog post, which will vanish from view in a few days. What would it take to get the information that people like you, MichaelA, and many others have, compile it into a continually maintained resource, and get it into the hands of the people who need it? Does that knowledge have a shelf life long enough to be worth compiling, yet general enough to be worth broadcasting, and that is EA-specific enough to not be available elsewhere?

I'm primarily interested here in making statements that are durably true. In this case, I believe that EA grantmakers will always need to have a bar, and that as long as we have a compelling message, there will consequently always be some people failing to clear it who are stuck in the "zone of uncertainty."

With this post, I'm not trying to tell them what they should do. Instead, I am trying to articulate a framework for understanding this situation, so that the inchoate frustration that might otherwise result can be (hopefully) transmuted into understanding. I'm very concerned about the people who might feel like "bycatch" of the movement, caught in a net, dragged along, distressed, and not sure what to do.

That kind of situation can produce anger at the powers that be, which is a valid emotion. However, when the "powers that be" are leaders in a small movement that the angry person actually believes in, it could be more productive to at least come to a systemic understanding of the situation that gives context to that emotion. Being in a line that doesn't seem to be moving very fast is frustrating, but it's a very different experience if you feel like the speed at which it's moving is understandable given the circumstances.

EA is a Career Endpoint

Good thoughts. I think this problem decomposes into three factors:

  1. Should there be a bar, or should all EA projects get funded in order of priority until the money runs out?
  2. If there's a bar, where should it be set, and why?
  3. After the bar is set, when should grantmakers re-examine its underlying reasoning to see if it still makes sense under present circumstances?

My argument actively argues that we should have a bar, is agnostic on how high the bar should be, and assumes that the bar is immobile for the purposes of the reader.

At some point, I may give consideration to where and how we set the bar. I think that's an interesting question both for grant makers and people launching projects. A healthy movement would strive for some clarity and consensus. If neophytes could more rapidly gain skill in self-evaluation relative to the standards of the "EA grantmaker's bar," without killing the buzz, it could help them make more confident choices about "looping out and back" or persevering within the movement.

For the purposes of this comment section, though, I'm not ready to develop my stance on it. Hope you'll consider expanding your thoughts in a larger post!

Load More