All of JoshuaBlake's Comments + Replies

So the question is basically whether the (upkeep costs + opportunity cost of money - benefit from events) is more or less than discount from selling quickly?

4
Habryka
1mo
Yep, I think that would be a reasonable calculation. 

What do you mean by take a huge loss? I'm not sure paper losses are relevant here.

I mean it in the sense that they will have to sell substantially below market value if they want to sell it quickly.

This kind of property tends to have huge bid-ask-spreads and the usual thing to do is to continue operating the property while looking for a buyer (my guess is they would succeed at selling it eventually at market value, but it would take a while).

Interesting read, I'm left unconvinced that traditional pharma is moving much slower than optimal. That would seem to imply that they're leaving a lot of money on the table (quicker approval = longer selling the drug before patent expires).

I have three speculative ideas on why this might be. Cost of the process, ability to scale the process, and risk (e.g. amount of resources wasted if a drug fails at some stage in development).

As the article points out, pharma can do this when the incentives are right (COVID vaccines) which implies there's a reason to not do it normally.

You need a step beyond this though. Not just that we are coming up with harder moral problems, but that solving those problems is important to future moral progress.

Perhaps a structure as simple as the one that has worked historically will prove just as useful in the future, or, as you point out has happened in the past, wider societal changes (not progress in moral philosophy at an academic discipline) is the major driver. In either case, all this complex moral philosophy is not the important factor for practical moral progress across society.

3
Rafael Ruiz
2mo
Fair! I agree to that, at least until this point of time. But I think there could be a time where we could have picked most of the "social low-hanging fruit" (cases like the abolition of slavery, universal suffrage, universal education), so there's not a lot for easy social progress left to do. At least comparatively, then investing on the "moral philosophy low-hanging fruit" will look more worthwhile. Some important cases of philosophical moral problems that might have great axiological moral importance, at least under consequentialism/utilitarianism could be population ethics (totalism vs averagism), our duties towards wild animals, and the moral status of digital beings. I think figuring them out could have great importance. Of course, if we always just keep them as just an interesting philosophical thought experiment and we don't do anything about promoting any outcomes, they might not matter that much. But I'm guessing people in the year 2100 might want to start implementing some of those ideas.

Bear in mind that even if FTX can pay everyone back now, that does not mean they were solvent at the point they were put into bankruptcy.

1
bern
2mo
Agree. In fact, SBF himself described FTX International as insolvent on his substack.  Although I think people may be using the term "solvency" in slightly different ways in discussions around FTX. I think that in FTX's case, illiquidity effectively amounted to insolvency, and that it's uncertain how much they could have sold their illiquid assets for. If for some reason you were to trust SBF's own estimate of $8b, their total assets would have (just) covered their total liabilities. Sullivan & Cromwell's John Ray said in December 2022 "We’ve lost $8bn of customer money" and I think most people have interpreted this as FTX having a net asset value of minus $8b. Presumably, though, Ray was referring either to the temporary shortfall in liquid funds or to the accounting discrepancy that was uncovered that summer/fall. SBF also claimed that he could have raised enough liquidity to make customers substantially whole given a few more weeks, but was under extreme pressure to declare bankruptcy. I think there's a good chance this is accurate, in part because most of the pressure came from Sullivan & Cromwell and a former partner of the firm, who are now facing a class action lawsuit for their alleged role in the fraud. (If anyone has evidence that FTX's liabilities did in fact exceed its assets by $8b at the time of the bankruptcy, I would be interested in seeing it.)
8
Ben Millwood
2mo
My understanding (for whatever it's worth) is that most of the reason why a full repayment looks feasible now is a combination of: * Creditors are paid back the dollar value of their assets at the time of bankruptcy. Economically it's a bit like everyone was forced to sell all their crypto to FTX at bankruptcy date, and then the crypto FTX held appreciated a bunch in the meantime. * FTX held a stake in Anthropic, and for general AI hype reasons that's likely to have appreciated a lot too. I think it's reasonable to think of both of these as luck, and certainly a company relying on them to pay their debts is not solvent.
2
Nathan Young
2mo
They almost certainly were not. (99%)

and even if they were solvent at the time, that does not mean they were not fraudulent.

If I took all my customers money, which I had promised to safekeep, and went to the nearest casino and put it all on red, even if I won it would still be fraud.

In your argument for 3, I think I accept the part that moral philosophising hasn't happened much historically. However, I can't really find the argument that it probably will in the future. Could you perhaps spell it out a bit more explicitly, or highlight where you think the case is being made please?

Great and interesting post though, I love seeing people rigourously exploring EA ideas and fitting them into the wider academic literature.

3
Rafael Ruiz
2mo
Sure! So I think most of our conceptual philosophical moral progress until now has been quite poor. If looked under the lens of moral consistency reasoning I outlined in point (3), cosmopolitanism, feminism, human rights, animal rights, and even longtermism all seem like slight variations on the same argument ("There are no morally relevant differences between Amy and Bob, so we should treat them equally"). In contrast, I think the fact that we are starting to develop cases like population ethics, infinite ethics, complicated variations of thought experiments (there are infinite variations of the trolley problem we could conjure up), that really test our limits of our moral sense and moral intuitions, hints at the fact that we might need a more systematic, perhaps computerized approach to moral philosophy. I think the likely path is that most conceptual moral progress in the future (in the sense of figuring out new theories and thought experiments) will happen with the assistance of AI systems. I can't point to anything very concrete, since I can't predict the future of moral philosophy in any concrete way, but I think philosophical ethics might become very conceptually advanced and depart heavily from common-sense morality. I think this has been an increasing gap since the enlightenment. Challenges to common-sense morality have been slowly increasing. We might be at the early beginning of that exponential takeoff. Of course, many of the moral systems that AIs will develop we will consider to be ridiculous. And some might be! But in other cases, we might be too backwards or morally tied to our biologically and culturally shaped moral intuitions and taboos to realize that it is in fact an advancement. For example, the Repugnant Conclusion in population ethics might be true (or the optimal decision in some sense, if you're a moral anti-realist), even if it goes against many of our moral intuitions.  The effort will take place in separating the wheat from the chaff

Thank you Ricardo, this is an insightful analysis. I'd like to see more EA Forum posts with this level of investigation invested into them. In particular, the balance of more longtermist and less global health funding is in contrast with other analyses on the forum.

I think your write-up could be improved more than the underlying analysis. To make this more accessible to others, and your work higher impact, I'd recommend the following.

  • Include your most important takeaways, and less information on your methods (eg the link to the notebook) in the tl;dr. Ve
... (read more)

This seems weird. We don't write 0156 for the year 156. I think this is likely to cause confusion.

This would surprise me. Surveillance is a very expensive ongoing cost, and the actions you should take upon detecting a new microbe which could potentially be a pathogen are unclear. Have you got a more detailed version of why you think this?

1
PandemicRiskMan
2mo
“Surveillance is a very expensive ongoing cost” 1 Do you have any estimates for that? The cost of annual surveillance is surely a pittance compared to the cost of a Covid-19 pandemic every 20-30 years. We have an insurance industry for this reason. One would also expect the cost of that surveillance to fall over time, and the quality of the info it provides to improve too. 2 The cost of developing, trialling, manufacturing, and distributing 8bn doses of an unproven vaccine to every corner of the world every time a novel PPP is discovered (!!) would surely be more than the annual surveillance costs. It also offers worse health outcomes. Plus vaccines are our last line of defence. If we aim to defend ourselves there, then it is only a question of time before we lose. Global vaccination is, to be frank, an insane approach to pandemic risk management. This, thankfully, is starting to be understood by some prominent epidemiologists: https://www.oecd-forum.org/posts/entering-the-age-of-pandemics-we-need-to-invest-in-pandemic-preparedness-even-while-covid-19-continues   “the actions you should take upon detecting a new microbe which could potentially be a pathogen are unclear” First you alert the world to the fact that there is an unidentified / novel pathogen circulating. Then you implement your national pandemic prevention plans. Remember, this is not a scientific matter. Making decisions in uncertain environments is risk management, not science. Real world planning, preparation, resource management, and tactical decision-making in uncertain environments are required to protect humanity from pandemics, but they are not scientific skills. They are not taught to, or by, scientists, so the methods of science are of little value in a crisis. A scientist can tell you what a hurricane is, whereas a risk manager can tell you what to do about it. That's the key difference. But, that's also a major hurdle that we'll need to overcome, as scientists are very influential but a

Do you know of anything else that feels similar to this? People in public areas collecting biological samples from volunteers (perhaps lightly compensated).

Afraid not. The closest I can think of is collecting samples from healthy volunteers without any benefit to them, but not in public areas. In particular, I'm thinking of swabbing in primary health settings (eg RGCP/UKHSA run something like this in England, I can't remember if it only includes those with respiratory symptoms) and testing blood donations (normally serological testing looking for antibo... (read more)

Thank you for that very detailed reply Jeff, I learnt a lot about how to think about costing this.

The easiest way to collect a pooled sample is the walk around some building and sample everyone. This gets you a big sample pretty cheaply, but it's not a great one if you want to understand the containing city because it's likely that many people in the building will get sick on a similar timeframe.

I agree this is true for an office block, but I would think you can do much better without much cost. For example, if you use a high-traffic commuter train sta... (read more)

4
Jeff Kaufman
3mo
Definitely! Right after writing to you I started thinking about this, estimating costs, and talking to coworkers; sorry for not posting back! I do think something along these lines could work well. My main update since then is that if you do it at a transit station you probably need to compensate people, but also that a small amount of compensation doesn't sink this. Giving people $5 or a candy bar for a swab is possible, and if a team of two people at a busy transit station can get 50-200 swabs in an hour that's your biggest sample acquisition cost. I still think $1k is practical for the sequencing. I'm trying to come up with examples of people doing something similar, which we'd want for presenting this to the IRB. Two examples so far: * XpresCheck for COVID tracking at airports (site, [consent brochure] (https://www.xprescheck.com/xpresresources/CDC_COVID_Testing_Brochure.pdf)) * Various companies that sample for bone marrow compatibility testing (ex: Be The Match) Do you know of anything else that feels similar to this? People in public areas collecting biological samples from volunteers (perhaps lightly compensated).

am I practicing my handwriting in 1439?

I'm not sure what the question is here, I find your metaphor opaque. I guess this is a reference to the invention of the printing press around then, which in some sense makes handwriting pointless. But, being able to have legible handwriting seems pretty useful up until at least this century, perhaps until widespread smartphones.

Thank you for this write-up, very interesting. I'm excited to see more investigations of different surveillance systems' potential.

Hopefully, the SIREN 2.0 study, running this winter, will generate some more data to answer this question.

A few questions now I've had time to consider this post a bit more. Apologies, if these are very basic, I'm pretty unfamiliar with metagenomics.

First, how do you relate relative abundance to detection probability? I would have thought the total number of reads of the pathogen of interest also matters. That is, if you tested... (read more)

4
Jeff Kaufman
3mo
Lots of great questions! Thanks for pointing this out; I hadn't seen it and it's super relevant. I don't see what sample type they're using in the press release, but any kind of ongoing metagenomics to look at respiratory viruses is great! It depends on your detection method, but modeling it as needing some number of cumulative reads hitting the pathogen is a good first approximation. If you think it would take N reads of the pathogen to flag it then if you know RA(1%) and the exponential growth rate you can make a back of the envelope estimate of how much sequencing you'd need on an ongoing basis to flag it before X% of people had ever been sick. For example, if you need 100 reads to flag, it doubles weekly, and RAi(1%) is 1e-7 then to flag at a cumulative incidence of 1% (and current weekly incidence of 0.5%) you'd need 100/1e-7 = 1e9 reads a week. (I chose 1% cumulative incidence and weekly doubling to make the mental math easier. At 1% CI half the people got sick this week and half in previous weeks, and the cumulative infection rate across all past sequencing should sum to 1%, so we can use RAi(1%) directly. Though I might have messed this up since I'm doing it in my head lying in bed.) If you collected a large enough sample volume and sequenced deeply though, yes. It doesn't, for three reasons: * Sequencing in bulk is a lot cheaper per read. You might pay $13k for 10B read pairs, or $1k for 100M. But that's just ~10x. * Some components (lab time, kits) vary in proportion to the number of samples and don't go up much as your samples are bigger. * It's only your sequencing costs that vary with relative abundance, and while with wastewater I expect the cost of sequencing to dominate that's not the case for any other sample type I can think of (maybe air?) If you're sampling from individuals the cost of getting the samples is likely quite high (we were recently quoted $80/person from a contractor, and while I think we can do better if you want 1k peopl

Tl;dr: epidemic and statistical modelling PhD looking for roles in biosecurity, global health, and quantitative generalist roles.

Skills & background: I am about to submit a biostatistics PhD (University of Cambridge, UK), focusing on statistical methods to estimate the incidence of COVID-19 in England and survival analysis. I have experience providing scientific advice to the UK government on the pandemic. Broad Bayesian statistical skillset, as well as skills in engaging critically with literature. View my past posts for less academic samples of my wo... (read more)

Feel free to message me if you're interested in going deeper into what a typical viral load might look like. I can generate trajectories, based on the data from the ATACCC study. Note that this is in viral RNA copies, not Ct values - they did the conversion as part of that study.

2
Jeff Kaufman
3mo
Thanks! I'm most interested in viral load in the sense of the relative abundance you get with untargeted shotgun sequencing (since you need sequencing (or something similarly general) to detect novel threats and/or avoid having a trivially-bypassable detection system) but there's not much literature on this.

Do you have timezone requirements?

2
Gavin
3mo
No

I don't have a strong opinion here. I would guess having the information out and findable is the most important. My initial instinct is directly or linked from the fund page or applicant info.

As someone considering applying to LTFF, I found even rough numbers here very useful. I would have guessed success rates 10x lower.

If it is fairly low-cost for you (e.g.: can be done as an automated database query), publishing this semi-regularly might be very helpful for potential applicants.

4
Linch
3mo
Thanks for the feedback! Do you have thoughts on what platform would be most helpful for you and other (potential) applicants? Independent EAF shortform, a point attached somewhere as part of our payout reports, listed on our website, or somewhere else?

We will be publishing more posts, including information about our other ideas, in the coming weeks.

I can't find these posts on the forum (I checked the pos history of both of this post's authors). Could you please point me towards them?

Thank you Ben! The 80% CI[1] is an output from the model.

Rough outline is.

  1. Start with an uniformative prior on the rate of accidental pandemics.
  2. Update this prior based on the number of accidental pandemics and amount of "risky research units" we've seen; this is roughly to Laplace's rule of succession in continuous time.
  3. Project forward the number of risky research units by extrapolating the exponential growth.
  4. If you include the uncertainty in the rate of accidental pandemics per risky research unit, and random variation, then it turns out the number of
... (read more)
2
Ben Stewart
3mo
Awesome, thanks!

What's your basis for assuming GiveWell's estimates are more accurate?

3
Vasco Grilo
3mo
Thanks for the question, Joshua. Briefly: * I think way more time has been invested into GiveWell's recommendations of AMF and Malaria Consortium than the time dedicated to producing the best investment paper on malaria: * A typical paper is written in a few months. * GiveWell has been doing doing cost-effectiveness analyses of AMF's interventions since 2012, and Malaria Consortium's since 2016. * GiveWell spends 51.1 kh/year (= 0.75*40*46*37) doing research. * From what I have seen, GiveWell often "goes beyond the papers" in order to really get at the best guess of the cost-effectiveness intervention, often adjusting the results found in the literature. * GiveWell understands well the limitations of naive cost-effectiveness estimates. I should also note I meant more accurate conditional on a given set of values (namely, value of saving lives as a function of age and country, and value of income compared to health): * GiveWell has spent much more time studying the interventions than figuring out their moral weights. * I tend to think economic growth is a better proxy for contributing to a better world than GiveWell's moral weights, which I suspect put too much weight on health, and do not account for saving lives in low income countries contributing less to the global economy.

Link-commenting my Twitter thread of immediate reaction and summary of paper. Some light editing for readability. Would be interested on feedback if this slightly odd for a forum comment content is helpful or interesting to people.

Overall take: this is a well done survey, but all surveys of this sort have big caveats. I think this survey is as good as it is reasonable to expect a survey of AI researchers to be. But, there is still likely bias due to who chooses to respond, and it's unclear how much we should be deferring to this group. It would be good to ... (read more)

1
MarcKrüger
3mo
Thanks for citing the survey here, and thank you Joshua for your analysis. Your post doesn´t seem strange to me at this place; at the very least I can´t find any harm in posting it here. (If someone is more interested in other discussions, they may read the first two lines and then skip it.) The only question would be if this is worth YOUR time, and I am confident you are able to judge this (and you apparently did and found it worth your time). Since you already delved that deep into the material and since I don´t see myself doing the same, here a question to you (or whoever else feeling inclined to answer): Were there a significant part of experts who thought that HLMI and/or FAOL are downright impossible (at least with anything resembling our current approaches)? I do hear/read doubts like these sometimes. If so, how were these experts included in the mean, since you can´t just include infinity with non-zero probability without the whole number going up to infinity? (If they even used a mean. "Aggregate Forecast" is not very clear; if they used the median ore something similar the second question can be ignored.)

Good post, aligns with a lot of my (anecodtal) expreience in a related but different field (biostatistics, still doing computational work but not ML and much more mature as a field).

Under communication: I think your missing actively reading papers (or listening to presentations). Each time you read a paper, ask yourself if it was easy to understand the idea, why or why not? A big problem in reasearch writing IMO is that the reader often is not reading the paper for the main reason you wrote it. Perhaps they care more about your methodology than your result... (read more)

I agree.

Reflecting, in the everything-is-Gaussian case a prior doesn't help much. Here, your posterior mean is a weighted average of prior and likelihood, with the weights depending only on the variance of the two distributions. So if the likelihood mean increases but with constant variance then your posterior mean increases linearly. You'd probably need a bias term or something in your model (if you're doing this formally).

This might actually be an argument in favour of GiveWell's current approach, assuming they'd discount more as the study estimate becomes increasinly implausible.

I don't think this evaluation is especially useful, because it only presents one side of the argument. Why spreadsheets are bad, not their advantages or how errors typically occur in programming languages.

The bottom line you present (quoted below) is in fact not very action relevant. It's not strong enough to even support that the switching costs are worth it IMO.

We are far from certain that writing cost-effectiveness analyses in an ordinary programming language would reduce the error rate compared to spreadsheets - quantitative estimates of the error ra

... (read more)
3
EdoArad
4mo
Totally agree with the need for a more balanced and careful analysis!

Pascal's mugging should be addressed by a prior which is more sceptical of extreme estimates.

GiveWell are approximating that process here:

We’re reluctant to take this estimate at face value because (i) this result has not been replicated elsewhere and (ii) it seems implausibly large given the more muted effects on intermediate outcomes (e.g., years of schooling).

3
ClimateDoc
4mo
It's a potential solution, but I think it requires the prior to decrease quickly enough with increasing cost effectiveness, and this isn't guaranteed. So I'm wondering is there any analysis to show that the methods being used are actually robust to this problem e.g. exploring sensitivity to how answers would look if the deworming RCT results had been higher or lower and that they change sensibly?  A document that looks to give more info on the method used for deworming looks to be here, so perhaps that can be built on - but from a quick look it doesn't seem to say exactly what shape is being used for the priors in all cases, though they look quite Gaussian from the plots.

Could you expand on this please? Isn't this going to be roughly equivalent to "we kept our GitHub repo private"?

2
EdoArad
4mo
The main point is that access management is more natively associated with the structure of the model in software settings. Say, you are less likely to release a model without its prerequisites. But I agree that this could also be messed up in software environments, and that it's mainly an issue of UI and culture. I guess I generally argue for a modeling environment that is "modeling-first" rather than something like "explainable-results-first".

I agree with your point 2. To be Bayesian: if your prior is much more uncertain than you likelihood, the likelihood dominates the posterior.

Isn't 1 addressed by Noah's submission? That you will rank noisily-estimated interventions higher.

2
Karthik Tadepalli
4mo
If 2 holds, the risk of noise causing interventions to be re-ranked is small, because the noise distribution is more compressed than the true gap between interventions.

It would be much more reasonable imo to say “Ord’s estimate is much higher than my own prior, and I didn’t see enough evidence to justify such a large update”.

Except the use of Bayesian language, how is that different to the following passage?

We saw in Parts 9-11 of this series that most experts are deeply skeptical of Ord’s claim, and that there are at least a dozen reasons to be wary. This means that we should demand especially detailed and strong arguments from Ord to overcome the case for skepticism.

9
calebp
4mo
Thanks for pointing that out. I re-read the post and now think that the OP was more reasonable. I'm sorry I missed that in the first place. I also didn't convey the more important message of "thank you for critiquing large, thorny, and important conclusions". Thinking about P(bio x-risk) is really quite hard relative to lots of research reports posted on the forum, and this kind of work seems important. I don't care about the use of Bayesian language (or at least I think that bit you quoted does all the Bayesian language stuff I care about). Maybe I should read the post again more carefully, but the thing I was trying to communicate was that I don't understand why he thinks that Ord's estimates are unreasonable, and I don't think he provided much evidence that Ord had not already accounted for in his estimate. It may have just been because I was jumping in halfway through a sequence - or because I didn't fully understand the post. The thing I would have liked to see was something like: 1. Here is my (somewhat) uninformed prior of P(bio x-risk) and why I think it's reasonable 2. Here are a bunch of arguments that should cause updates from my prior 3. Here is my actual P(bio x-risk) 4. This seems much lower than Ord's or 1. Here is how Ord did his estimate 2. Here are the specific methodological issues or ways he interpreted the evidence correctly 3. Here is my new estimate after updating on the evidence correctly or 1. Here is how Ord did his estimate 2. I don't think that Ord took into account evidence a,b and c 3. Here is how I would update on a, b and c 4. Here is my final estimate (see that it is much lower than Ord's) On reflection, I think this is an unreasonable bar or ask, and in any case, I expect to be more satisfied by David's sequence on his site.

OK, I can't commit right now, but I'll look out for if you're advertising again for February (or feel free to get in touch with me). Good luck, great project!

What's the timeline you're after for firm committal? I might be interested but need to prioritise what I'm do long over the next 2-3 months so would not be able to commit immediately.

2
joshcmorrison
4mo
A month at a time so just January at this point

What would be the best thing(s) to read for those of us who know ~nothing about Zach and his views/philosophy?

I'm planning to publish some forum posts as I get up to speed in the role, and I think those will be the best pieces to read to get a sense of my views. If it's helpful for getting a rough sense of timing, I'm still working full-time on EV at the moment, but will transition into my CEA role in mid-February.

1
tobytrem
4mo
I published it yesterday, you can find it here. Thanks for waiting!

Optimistically, Gavi and partners do their thing and we get a nice efficient rollout across the relevant areas. But I have very limited knowledge of this space. I don't know what the bottlenecks or process here is.

Thanks - this is definitely a relevant example, especially the health facilities. I have towards more uncertainty here.

The food security seems to be the impact of inverventions, rather than pure fear, which is the mechanism Gopal et al. suggest.

The reduced mobility of farmers and other agricultural workers, but also the difficulty of getting products to harbours due to the quarantine zone, prevented affected countries from being able to produce and sell their goods [4,8,9]. The epidemic killed and drove out many farmers, leading to the abandonment of field

... (read more)
4
Julia_Wise
4mo
True, I hadn't properly looked at the amount of the agricultural disruption that was due to interventions.

How does the availability of malaria vaccines change AMF's strategy going forward?

For a one-off, you should be able to copy and paste into a Google Doc or Word document then export. The formatting might be a bit iffy but a small amount of manual fixing should sort that.

A more involved alternative, but perhaps more reliable, would be to run the Markdown version through pandoc.

I down voted because it isn't directly relevant to the dispute. High-spending in longtermist EA communities is a question that has been frequently discussed on this forum without consensus views. I don't think restarting that argument here is productive.

2
ElliotJDavies
4mo
Thanks for providing context here, similar to Vaipan, I wasn't sure why people were disagree/downvoting me. 

Very valuable contribution. Crowd sourcing this type of effort seems good.

Maybe! I'm hoping it at least saves people some energy. It's too late for me, but I confess I'm ambivalent myself about the point of all this. Spot-checking some high level claims is at least tractable, but are there decisions that depend on the outcome? What I care about isn't whether Nonlinear accurately represented what happened or what Ben said. I was unlikely to ever cross paths with Nonlinear or even Ben beforehand. I want people to get healthy professional experience, and I want the EA community to have healthy responses to internal controversy and ... (read more)

Is this post meant to be a provocative start of a discussion or the argument in its entirety? If the latter, it really needs some attempt to be more precise about tractability. How much of the problem will marginal funding solve?

I would like that, however, how much they care about external reactions is unclear to me

he's accepted the position without even knowing why they did what they did at a high level

I don't think this is correct, from the same statement:

Before I took the job, I checked on the reasoning behind the change. The board did not remove Sam over any specific disagreement on safety, their reasoning was completely different from that. I'm not crazy enough to take this job without board support for commercializing our awesome models.

5
JWS
5mo
Thanks for this, have retracted that sentence. Feels like some version of the reasoning should be made available to investors/microsoft/the public is some short-term timeframe though? I feel like that would do a fair amount to quell some of the reactions

I think you're missing other features of study design. Notably: feasibility, timeliness, and robustness. Adaptive designs generally require knowledge of outcomes to inform randomisation of future enrolees. But this is often not known, especially if the time until the outcome you're measuring is a long time.

EDIT: the paper also points out various other practical issues with adaptive designs in section 3. These include making it harder to do valid statistical inference on your results (statistical inference is essentially assessing your uncertainty, such as ... (read more)

1
Jonathan Nankivell
5mo
Good comment and good points. I guess the aim of my post was two-fold: 1. In all the discussion of the explore-exploit trade-off, I've never heard anyone describe it as a frontier that you can be on or off. The explore-exploit frontier is hopefully a useful framework to add to this dialogue. 2. The literature on clinical trial design is imo full of great ideas never tried. This is partly due to actual difficulties and partly due to a general lack of awareness about the benefits they offer. I think we need good writing for a generalist audience on this topic and this my attempt. You're definitely right that the caveat is a large one. Adaptive designs are not appropriate everywhere, which is why this post raises points for discussion and doesn't provide a fixed prescription. To respond to your specific points. Section three discusses whether adaptive designs lead to 1. a substantial chance of allocating more patients to an inferior treatment 2. reducing statistical power 3. making statistical inference more challenging 4. making robust inference difficult if there is potential for time trends 5. making the trial more challenging to implement in practice. My understanding of the authors' position is that it depends on the trial design. Drop-the-Loser, for example, would perform very well on issues 1 through 4. Other methods, less so. I only omit 5 as CRO are currently ill-equipped to run these studies - there's no fundamental reason for this and if demand increased, this obstacle would reduce. In the mean time, this unfortunately does raise the burden on the investigating team. This is not an objection I've heard before. I presume the effect of this would be equivalent to the presence of a time trend. Hence some designs would perform well (DTL, DBCD, etc) and others wouldn't (TS, FLGI, etc). This is often true, although generalised methods built to address this can be found. See here for example. In summary: While I think that these difficulties can of

Great post! A particular issue is that E(cost/effect) is infinite or undefined if you have a non-zero probability that the effect is 0. This is very commonly the case.

Another interesting point, highlighted by your log normal example, is that higher variance will tend to increase the difference between E(1/X) and 1/E(X).

6
MichaelStJules
6mo
E[effect/cost] will also inflate the cost-effectiveness, giving too much weight to cases where you spend little and too little weight to your costs (and opportunity costs) when you spend a lot. If there’s a 90% chance the costs are astronomical and there's no impact or it's net negative, the other 10% could still make it "look" cost-effective. The whole project could have negative expected effects (negative E[effect]), but positive E[effect/cost]. That would not be a project worth supporting. You should usually be estimating E[costs]/E[effects] or E[effects]/E[costs], not the expected values of ratios.

It seems likely that these roles will be extremely competitive to hire for. Most applicants will have similar values (ie: EA-ish). Considering the size of the pool, it seems likely that the top applicants will be similar in terms of quality. Therefore, why do you think there's a case that someone taking one of these roles will have high counterfactual impact?

2
AnonymousTurtle
6mo
I'm not from Open Philantropy, but it's likely people worry too much about this.
6
lukeprog
6mo
Echoing Eli: I've run ~4 hiring rounds at Open Phil in the past, and in each case I think if the top few applicants disappeared, we probably just wouldn't have made a hire, or made significantly fewer hires.

Empirically, in hiring rounds I've previously been involved in for my team at Open Phil, it has often seemed to be the case that if the top 1-3 candidates just vanished, we wouldn't make a hire. I've also observed hiring rounds that concluded with zero hires. So, basically I dispute the premise that the top applicants will be similar in terms of quality (as judged by OP).

I'm sympathetic to the take "that seems pretty weird." It might be that Open Phil is making a mistake here, e.g. by having too high a bar. My unconfident best-guess would be that our bar h... (read more)

Afraid I don't have good ideas here.

Intuitively, I think there should be a way to take advantage of the fact that the outcomes are heavily structured. You have predictions on the same questions and they have a binary outcome.

OTOH, if in 20% of cases the worse forecaster is better on average, that suggests that there is just a hard bound on how much we can get.

I am so excited to see this, as it looks like it might address many uncertainties I have but have not had a chance to think deeply about. Do you have a rough timeline on when you'll be posting each post in the series?

Thanks, Joshua! We'll be posting these fairly rapidly. You can expect most of the work before the end of the month and the rest in early November.

As it stands I struggle to justify GHD work at all on cluelessness grounds. GiveWell-type analyses ignore a lot of foreseeable indirect effects of the interventions e.g. those on non-human animals. It isn't clear to me that GHD work is net positive.

Would you mind expanding a bit on why this applies to GHD and not other cause areas please? E.g.: wouldn't your concerns about animal welfare from GHD work also apply to x-risk work?

4
JackM
6mo
I'll direct you to my response to Arepo
Load more