Regarding the questions about feedforward networks, a really short answer is that regression is a very limited form of inference-time computation that e.g. rules out using memory. (Of course, as you point out, this doesn't apply to other 2020 algorithms beyond MLPs.) Sorry about the lack of clarity -- I didn't want to take up too much space in this piece going into the details of the linked papers, but hopefully I'll be able to do a better job explaining it in a review of those papers that I'll post on LW/AF next week.
(I also want to reply to your top-level comments about the evolutionary anchor, but am a bit short on time to do it right now (since for those questions I don't have cached technical answers and will have to remind myself about the context). But I'll definitely get to it next week.)
Hi Charles, thanks for all the comments! I'll reply to this one first since it seems like the biggest crux. I completely agree with you that feedforward NNs != RNN/LSTM... and that I haven't given a crisp argument that the latter can't scale to TAI. But I don't think I claim to in the piece! All I wanted to do here was to (1) push back against the claim that the UAT for feedforward networks provides positive evidence that DL->TAI, and (2) give an example of a strategy that could be used to argue in a more principled way that other architectures won't scale up to certain capabilities, if one is able to derive effective theories for them as was done for MLPs by Roberts et al. (I think it would be really interesting to show this for other architectures and I'd like to think more about it in the future.)
Thanks, that's a good point. Just posted it here: https://www.lesswrong.com/posts/TMHWfRE7zZkzgFDSo/a-review-of-the-bio-anchors-report