My understanding of the results: for the preregistered tasks you measured effects of 1 IQ point (for RAPM) and 2.5 IQ points (for BDS), with a standard error of ~2 IQ points. This gives weak evidence in favor of a small effect, and strong evidence against a large effect.
You weren't able to measure a difference between vegetarians and omnivores. For the exploratory cognitive tasks you found no effect. (I don't know if you'd expect those tests to be sensitive enough to notice such a small effect.)
At this point it seems a bit unlikely to me that there is a clinically significant effect, maybe I'd bet at 4:1 against the effect being >0.05 SD. That said I still think it would be worthwhile for someone to do a larger study that could detect a 0.1 SD effect, since that would be clinically significant and is very weakly suggested by this data (and would make supplementation worthwhile given how cheap it is).
(See also gwern's meta-analysis.)
I think the "alignment difficulty" premise was given higher probability by superforecasters, not lower probability.
Agree that it's easier to talk about (change)/(time) rather than (time)/(change). As you say, (change)/(time) adds better. And agree that % growth rates are terrible for a bunch of reasons once you are talking about rates >50%.
I'd weakly advocate for "doublings per year:" (i) 1 doubling / year is more like a natural unit, that's already a pretty high rate of growth, and it's easier to talk about multiple doublings per year than a fraction of an OOM per year, (ii) there is a word for "doubling" and no word for "increased by an OOM," (iii) I think the arithmetic is easier.
But people might find factors of 10 so much more intuitive than factors of 2 that OOMs/year is better. I suspect this is increasingly true as you are talking more to policy makers and less to people in ML, but might even be true in ML since people are so used to quoting big numbers in scientific notation.
(I'd probably defend my definitional choice for slow takeoff, but that seems like a different topic.)
Yes, I'm not entirely certain Impossible meat is equivalent in taste to animal-based ground beef. However, I do find the evidence I cite in the second paragraph of this section somewhat compelling.
Are you referring to the blind taste test? It seems like that's the only direct evidence on this question.
It doesn't look like the preparations are necessarily analogous. At a minimum the plant burger had 6x more salt. All burgers were served with a "pinch" of salt but it's hard to know what that means, and in any case the plant burger probably ended up at least 2x as salty.[1] You note this as a complicating factor, but salt has a huge impact on taste and it seems to me like it can easily dominate the results of a 2-3 bite taste test between vaguely comparable foods.
I also have no idea at all how good or bad the comparison burger was. Food varies a lot. (It's kind of coincidental the salt happened to show up in the nutrition information---otherwise I wouldn't even be able to make this concrete criticism). It seems really hard to draw conclusions about taste competitiveness of a meat substitute from this kind of n=1 study, beyond saying that you are in the same vague zone.
Have you compared these foods yourself? I eat both of them regularly. Taste competitiveness seemed plausible the first time I ate impossible ground beef, but at this point the difference feels obviously large. I seriously doubt that the typical omnivore would consider them equivalent after eating them a few times.
Overall, despite these caveats on taste, lots of plant-based meat was still sold, so it was "good enough" in some sense, but there was still potentially little resulting displacement of beef (although maybe somewhat more of chicken).
My conclusion would be: plant substitutes are good enough that some people will eat them, but bad enough that some people won't. They are better than some foods and worse than others.
It feels like you are simultaneously arguing that high uptake is a sign that taste is "good enough," and that low uptake is a sign that "good enough" taste isn't sufficient to replace meat. I don't think you can have it both ways, it's not like there is a "good enough" threshold where sales jump up to the same level as if you had competitive taste. Better taste just continuously helps with sales.
I agree and discuss this issue some in the Taste section. In short, this is part of why I think informed taste tests would be more relevant than blind: in naturalistic settings, it is possible that people would report not liking the taste of PBM even though it passes a blind taste test. So I think this accurately reflects what we should expect in practice.
I disagree. Right now I think that plant-based meat substitutes have a reputation as tasting worse than meat largely because they actually taste worse. People also have memories of disliking previous plant-based substitutes they tried. In the past the gap was even larger and there is inertia in both of these.
If you had taste competitive substitutes, then I think their reputation and perception would likely improve over time. That might be wrong, but I don't see any evidence here against the common-sense story.
The plant burger had about 330mg vs 66mg of salt. If a "pinch" is 200mg then it would end up exactly 2x as salty. But hard to know exactly what a pinch means, and also it matters if you cook salt into the beef or put a pinch on top, and so on.
The linked LW post points out that nuclear power was cheaper in the past than it is today, and that today the cost varies considerably between different jurisdictions. Both of these seem to suggest that costs would be much lower if there was a lower regulatory burden. The post also claims that nuclear safety is extremely high, much higher than we expect in other domains and much higher than would be needed to make nuclear preferable to alternative technologies. So from that post I would be inclined to believe that overregulation is the main reason for a high cost (together with the closely related fact that we've stopped building nuclear plants and so don't benefit from economies of scale).
I can definitely believe the linked post gives a misleading impression. But I think if you want to correct that impression it would be really useful to explain why it's wrong. It would be even better to provide pointers to some evidence or analysis, but just a clear statement of disagreement would already be really helpful.
Do you think that greater adoption of nuclear power would be harmful (e.g. because the safety profile isn't good, because it would crowd out investments in renewables, because it would contribute to nuclear proliferation, or something else)? That lowering regulatory requirements would decrease safety enough that nuclear would become worse than alternative power sources, even if it isn't already? That regulation isn't actually responsible for the majority of costs? A mixture of the above? Something else altogether?
My own sense is that using more nuclear would have been a huge improvement over the actual power mix we've ended up with, and that our failure to build nuclear was mostly a policy decision. I don't fully understand the rationale, but it seems like the outcome was regulation that renders nuclear uncompetitive in the US, and it looks like this was a mistake driven in large part by excessive focus on safety. I don't know much about this so I obviously wouldn't express this opinion with confidence, and it would be great to get a link to a clear explanation of an alternative view.
I'm confused about your analysis of the field experiment. It seems like the three options are {Veggie, Impossible, Steak}. But wouldn't Impossible be a comparison for ground beef, not for steak? Am I misunderstanding something here?
Beyond that, while I think Impossible meat is great, I don't think it's really equivalent on taste. I eat both beef and Impossible meat fairly often (>1x / week for both) and I would describe the taste difference as pretty significant when they are similarly prepared.
If I'm understanding you correctly then 22% of the people previously eating steak burritos switched to Impossible burritos, which seems like a really surprisingly large fraction to me.
(Even further, consumer beliefs are presumably anchored to their past experiences, to word of mouth, etc. and so even if you did have taste equivalence here I wouldn't expect people's decisions to be perfectly informed by that fact. If you produced a taste equivalent meat substitute tomorrow and were able to get 22% of people switching in your first deployment, that would seem like a surprisingly high success rate that's very consistent with even a strong form of PTC, I wouldn't expect consumers to switch immediately even if they will switch eventually. Getting those results with Impossible meat vs steak seems even more encouraging.)
I didn't mean to imply that human-level AGI could do human-level physical labor with existing robotics technology; I was using "powerful" to refer to a higher level of competence. I was using "intermediate levels" to refer to human-level AGI, and assuming it would need cheap human-like bodies.
Though mostly this seems like a digression. As you mention elsewhere, the bigger crux is that it seems to me like automating R&D would radically shorten timelines to AGI and be amongst the most important considerations in forecasting AGI.
(For this reason I don't often think about AGI timelines, especially not for this relatively extreme definition. Instead I think about transformative AI, or AI that is as economically impactful as a simulated human for $X, or something along those lines.)
My point in asking "Are you assigning probabilities to a war making AGI impossible?" was to emphasize that I don't understand what 70% is a probability of, or why you are multiplying these numbers. I'm sorry if the rhetorical question caused confusion.
My current understanding is that 0.7 is basically just the ratio (Probability of AGI before thinking explicitly about the prospect of war) / (Probability of AGI after thinking explicitly about prospect of war). This isn't really a separate event from the others in the list, it's just a consideration that lengthens timelines. It feels like it would also make sense to list other considerations that tend to shorten timelines.
(I do think disruptions and weird events tend to make technological progress slower rather than faster, though I also think they tend to pull tiny probabilities up by adding uncertainty.)
That's fair, this was some inference that is probably not justified.
To spell it out: you think brains are as effective as 1e20-1e21 flops. I claimed that humans use more than 1% of their brain when driving (e.g. our visual system is large and this seems like a typical task that engages the whole utility of the visual system during the high-stakes situations that dominate performance), but you didn't say this. I concluded (but you certainly didn't say) that a human-level algorithm for driving would not have much chance of succeeding using 1e14 flops.
Yes, I'd bet the effects are even smaller than what this study found. This study gives a small amount of evidence of an effect > 0.05 SD. But without a clear mechanism I think an effect of < 0.05 SD is significantly more likely. One of the main reasons we were expecting an effect here was a prior literature that is now looking pretty bad.
That said, this was definitely some evidence for a positive effect, and the prior literature is still some evidence for a positive effect even if it's not looking good. And the upside is pretty large here since creatine supplementation is cheap. So I think this is good enough grounds for me to be willing to fund a larger study.