Senior Research Scientist at NTT Research, Physics & Informatics Lab. jessriedel.com , jessriedel[at]gmail[dot]com
I listed this example in my comment, it was incorrect by an order of magnitude, and it was a retrodiction. "I didn't look up the data on Google beforehand" does not make it a prediction.
I'm also a little surprised you think that modeling when we will have systems using similar compute as the human brain is very helpful for modeling when economic growth rates will change. (Like, for sure someone should be doing it, but I'm surprised you're concentrating on it much.) As you note, the history of automation is one of smooth adoption. And, as I think Eliezer said (roughly), there don't seem to be many cases where new tech was predicted based on when some low-level metric would exceed the analogous metric in a biological system. The key threshold for recursive feedback loops (*especially* compute-driven ones) is how well they perform on the relevant tasks, not all tasks. And the way in which machines perform tasks usually looks very different than how biological systems do it (bird vs. airplanes, etc.).If you think that compute is the key bottleneck/driver, then I would expect you to be strongly interested in what the automation of the semiconductor industry would look like.
I like this post a lot but I will disobey Rapoport's rules and dive straight into criticism.
Historically, many AI researchers believed that creating general AI would be more about coming up with the right theories of intelligence, but over and over again, researchers eventually found that impressive results only came after the price of computing fell far enough that simple, "blind" techniques began working (Sutton 2019).
I think this is a poor way to describe a reasonable underlying point. Heavier-than-air flying machines were pursued for centuries, but airplanes appeared almost instantly (on a historic scale) after the development of engines with sufficient power density. Nonetheless, it would be confusing to say "flying is more about engine power than the right theories of flight". Both are required. Indeed, although the Wright brothers were enabled by the arrival of powerful engines, they beat out other would-be inventors (Ader, Maxim, and Langley) who emphasized engine power over flight theory. So a better version of your claim has to be something like "compute quantity drives algorithmic ability; if we independently vary compute (e.g., imagine an exogenous shock) then algorithms follow along", which (I think) is what you arguing further in the post.But this also doesn't seem right. As you observe, algorithmic progress has been comparable to compute progress (both within and outside of AI). You list three "main explanations" for where algorithmic progress ultimately comes from and observe that only two of them explain the similar rates of progress in algorithms and compute. But both of these draw a causal path from compute to algorithms without considering the (to-me-very-natural) explanation that some third thing is driving them both at a similar rate. There are a lot of options for this third thing! Researcher-to-researcher communication timescales, the growth rate of the economy, the individual learning rate of humans, new tech adoption speed, etc. It's plausible to me that compute and algorithms are currently improving more or less as fast as they can, given their human intermediaries through one or all of these mechanisms. The causal structure is key here, because the whole idea is to try and figure out when economic growth rates change, and the distinction I'm trying to draw becomes important exactly around the time that you are interested in: when the AI itself is substantially contributing to its own improvement. Because then those contributions could be flowing through at least three broad intermediaries: algorithms (the AI is writing its own code better), compute (the AI improves silicon lithography), or the wider economy (the AI creates useful products that generate money which can be poured into more compute and human researchers).
Of course, even if AI performance is, in principle, predictable as a function of scale, we lack data on how AIs are currently improving on the vast majority of tasks in the economy, hindering our ability to predict when AI will be widely deployed. While we hope this data will eventually become available, for now, if we want to predict important AI capabilities, we are forced to think about this problem from a more theoretical point of view.
Humans have been automating mechanical task for many centuries, and information-processing tasks for many decades. Moore's law, the growth rate of the thing (compute) that you ague drives everything else, has been stated explicitly for almost 58 years (and presumably applicable for at least a few decade before that). Why are you drawing a distinction between all the information processing that happened in the past and "AI", which you seem to be taking as a basket of things that have mostly not had a chance to be applied yet (so no data)?
If compute is the central driving force behind AI, and transformative AI (TAI) comes out of something looking like our current paradigm of deep learning, there appear to be a small set of natural parameters that can be used to estimate the arrival of TAI. These parameters are:
The total training compute required to train TAIThe average rate of growth in spending on the largest training runs, which plausibly hits a maximum value at some significant fraction of GWPThe average rate of increase in price-performance for computing hardware The average rate of growth in algorithmic progress
This list is missing the crucial parameters that would translate the others into what we agree is most notable: economic growth. I think needs to be discussed much more in section 4 for it to be a useful summary/invitation to the models you mention.
I agree it's important to keep the weaker fraud protection on debit cards in mind. However, for the use I mentioned above, you can just lock the debit card and only unlock it when you have a cash flow problem. (Btw, if you don't use your IB debit card, you should lock it even if you aren't using it.) Debit card liability is capped at $50 and $500 if you report fraudulent transactions within 2 days and 60 days, respectively.
That said, I have most of my net worth elsewhere, so I'm less worried about tail risks than you would reasonably be if you're mostly invested through IB.
If you have non-qualified investments and just keep money in a savings account in case of unexpected large expenses or interruptions to your income, it may be better to instead move the money in the savings account to Interactive Brokers and invest it. Crucially, you can get a debit card from Interactive Brokers that allows you to spend on margin (borrow) at a low rate (~5%, much less than credit cards) using your investments there as collateral. That way you keep essentially all your money invested (presumably earning more than the savings account) while still having access to liquidity when you need it.
Just to be clear: we mostly don’t argue for the desirability or likelihood of lock-in, just its technological feasibility. Am I correctly interpreting your comment to be cautionary, questioning the desirability of lock-in given the apparent difficulty of doing so while maintaining sufficiently flexibility to handle unforeseen philosophical arguments?
If the Federal government is just buying, on the open market, an amount of coal comparable to how much would have been sold without government action, then it's going to drive up the price of coal and increase the total amount of coal extracted. How much extra coal gets extracted depends on the supply and demand curves, and the amount of coal actually burned will almost certainly be less than in the world where the government didn't act, but it does mean the environmental benefits of this plan will be significantly muted.
Paul Graham writes that Noora Health is doing something like this.https://twitter.com/Jess_Riedel/status/1389599895502278659https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/96773753706640817147890456629920587151705670001482122310561805592519359070209
Regarding your 4 criteria, I think they don't really delineate how to make the sort of judgment calls we're discussing here, so it really seems like it should be about a 5th criterion that does delineate that.
Sorry I was unclear. Those were just 4 desiderata that the criteria need to satisfy; the desiderata weren't intended to fully specify the criteria.
If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria.
Certainly possible, but I think this would partly be because MIRI would explicitly talk in their paper about the (putative) connection to TAI safety, which makes it a lot easier for me see. (Alternative interpretation: it would be tricking me, a non-expert, into thinking there was more of a substantive connection to TAI safety than actually is there.) I am trying not to penalize researchers for failing to talk explicitly about TAI, but I am limited.I think it's more likely the database has inconsistencies of the kind you're pointing at from CHAI, Open AI, and (as you've mentioned) DeepMind, since these organizations have self-described (partial) safety focus while still doing lots of research non-safety and near-term-safety research. When confronted with such inconsistencies, I will lean heavily toward not including any of them since this seems like the only feasible choice given my resources. In other words, I select your final option: "The hypothetical MIRI work shouldn't have made the cut".
I definitely agree that you shouldn't just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included
Here I understand you to be suggesting that we use a notability criterion that can make up for the connection to TAI safety being less direct. I am very open to this suggestion, and indeed I think an ideal database would use criteria like this. (It would make the database more useful to both researchers and donors.) My chief concern is just that I have no way to do this right now because I am not in a position to judge the notability. Even after looking at the abstracts of the work by Raghunathan et al. and Wong & Kolter, I, as a layman, am unable to tell that they are quite notable.
Now, I could certainly infer notability by (1) talking to people like you and/or (2) looking at a citation trail. (Note that a citation count is insufficient because I'd need to know it's well cited by TAI safety papers specifically.) But this is just not at all feasible for me to do for a bunch of papers, much less every paper that initially looked equally promising to my untrained eyes. This database is a personal side project, not my day job. So I really need some expert collaborators or, at the least, some experts who are willing to judge batches of papers based on a some fixed set of criteria.