Not too long ago, the trend in cognitive science was that connectionism will fail. Many cognitive scientists thought that connectionism is limited and it is highly unlikely that we can actually assimilate the functions of human minds by imitating the neural circuitry of the human brain.
Characteristically, Steven Pinker in "The Blank Slate", published in 2002, criticized connectionism arguing that
“Humans don’t just loosely associate things that resemble each other, or things that tend to occur together. They have combinatorial minds that entertain propositions about what is true of what, and about who did that to whom, when and where and why. And that requires a computational architecture that is more sophisticated that the uniform tangle of neurons used in generic connectionist networks” (Pinker 2002, 79).
Pinker goes on to expose the limitations of connectionism that derive from this not-sufficiently-sophisticated computational architecture of neural networks. All this is to show that a complete human thought cannot be represented in a generic network that is used in machine learning.
Specifically, neural networks would not be able to distinguish between kinds and individuals, have thoughts that aren’t just a summation of smaller parts (compositionality), play with logical quantification, embed one thought in another (recursion), and reason categorically. Given all these limitations, it would be surprising for connectionism to find any success.
Richard Sutton, in his “Bitter Lesson” (2019) remarks that when Kasparov was defeated in 1997, researchers were not accepting that exceptional chess-playing skills were a matter of the leveraging of computation. Instead, it was widely believed that the methods that would win would be based on human input.
This, however, was not the case; as Sutton writes, “a simpler search-based approach with special hardware and software proved vastly more effective”, and that was not only surprising for researchers at the time, but also disappointing. They couldn’t predict that less human knowledge and more computation combined with large training sets, once more data became available, would produce highly successful speech recognition systems.
The conviction behind this approach was that there is a special knowledge of how the mind works, and that knowledge, once introduced into machine learning can produce intelligence. Of course, human minds remain endlessly complex. Yet, taking advantage of compute and data has led to neural networks being highly successful.
It seems overall that we live in a world where intelligence isn't particularly hard to find; neural networks have sufficiently sophisticated architecture and there is good reason to think that they’re very similar to human brains, or at least, that they are capable of producing behavior similar to that produced by brains.
More research is pointing in this direction along with the success of Large Language Models, like GPT-3 and GPT-4. For example, in “Learning Deep Architectures for AI” (Bengio, 2009) the claim is that the similarity between the human brain and artificial neural networks relies on having multiple layers and each layer represents the data input at a different level of abstraction. If this is true, it becomes easier to argue that brains and neural networks share some features in their computational architecture which explains the success of the latter.