This book provides a framework for thinking about foundational philosophical questions surrounding the use of deep artificial neural networks ("deep learning") to achieve artificial intelligence. Specifically, it links recent breakthroughs to classic works in empiricist philosophy of mind. In recent assessments of deep learning's potential, scientists have cited historical figures from the philosophical debate between nativism and empiricism, which concerns the origins of abstract knowledge. These empiricists were faculty psychologists; that is, they argued that the extraction of abstract knowledge from experience involves the active engagement of psychological faculties such as perception, memory, imagination, attention, and empathy. This book explains how recent deep learning breakthroughs realized some of the most ambitious ideas about these faculties from philosophers such as Aristotle, Ibn Sina (Avicenna), John Locke, David Hume, William James, and Sophie de Grouchy. It illustrates the utility of this interdisciplinary connection by showing how it can provide benefits to both philosophy and computer computer scientists can continue to mine the history of philosophy for ideas and aspirational targets to hit, and philosophers can see how some of the historical empiricists' most ambitious speculations can now be realized in specific computational systems.
Ég þurfti að skrifa bókagagnrýni fyrir heimspekiáfangann minn. Hér kemur hún.
Deep learning and empiricism A review of C. J. Buckner’s From Deep Learning to Rational Machines
Cameron J. Buckner’s From Deep Learning to Rational Machines is a well-written book that explains the success of deep learning through the faculty psychology of classical empiricists. Buckner is very knowledgeable about recent breakthroughs in deep learning and also on relevant research in cognitive science. In this book, he channels these two areas of expertise to argue for a moderate empiricist position in the ongoing philosophical debate between nativists and empiricists on the origin of knowledge. This age-old debate can be traced back to the Ancient Greek philosophers. Plato is often seen among the most radical nativists. He thought that all abstract ideas were innate. On the other hand, radical empiricists hold that the human mind starts as a blank slate that is then formed by sensory experiences. Buckner stresses that the positions in this debate should be looked at on a continuum [1, Ch 1.3]. This is important since the main goal of the book is to argue for a moderate empiricist position, one that derives from Locke, Hume, and other classical empiricists, and is consistent with recent developments in cognitive science and deep learning. The basic tenet of Buckner’s empiricism is a variation of Hume’s “Copy Principle” that he dubs the “Transformation Principle”: “The mind’s simple concepts (or conceptions) are in their first appearance derived from systematic, domain-general transformations of sensory impressions.” [1, p. 19] We should direct special attention to these domain-general transformations. They are what separate Buckner’s position from his predecessors’. Domain-general transformations are transformations of the raw sensory data that can work across many domains. They can become specific through learning, but initially they do not track specific features such as faces or food. In this way he gets by without assuming that the mind has any innate ideas. Moreover, he states that the best way to model rational cognition in AI is through (Do)main General Modular Architectures, giving his position the memorable name “the new empiricist DoGMA”. The modularity here refers to the idea that AIs should have different modules that perform the tasks of different faculties of the human mind. The book is structured into seven chapters. The first chapter sets the philosophical stage, introducing the empiricist position to be argued for. The second chapter gives an overview of deep learning, highlighting both the properties that separate human cognition from current DNNs and how to evaluate the rationality of artificial systems. The remaining five chapters each focus on a single mental faculty, perception, memory, imagination, attention, and social cognition. Each chapter gives an empiricist explanation of the operations of the faculty by focusing on the arguments of one classical philosopher and introducing the deep learning models that have been used to capture some workings of it. The book provides various valuable insights that I cannot cover in detail here. One of which is the “Tiers of Rationality” he uses to evaluate the rationality of rational agents. It emphasizes the importance of flexibility, mental models and social capabilities. Another insight is found in his criticism of nativists who tend to overestimate the abilities of human cognition. He calls it “anthropofabulation” and points out many instances of it. Yet another point and a great argument for the DoGMA is the observation that the shortcomings of current AIs are sometimes similar to the shortcomings of humans or animals that are lacking capabilities in some mental faculty. This supports the idea that creating AI systems with multiple modules that perform the tasks of different faculties might solve these issues. Having reviewed some of the book’s primary insights, we now shift our focus to a theme in the book where I thought Buckner was slightly imprecise in the presentation of his argument.
What can deep learning tell us about human cognition?
As you might have guessed from the title of the book, deep learning plays a large role in Buckner’s philosophical argument. Since deep learning models have proven capable of solving various tasks that nativists previously thought required some innate concepts, and they do this purely by learning from examples using domain-general learning rules. These models can not only show how the human mind might possibly acquire knowledge, but they also refute many arguments prized by nativists, as we will see later. Surprisingly, Buckner wants to go even further and state that DNNs (Deep Neural Networks) can also show us how the mind actually performs a certain task. To support this claim he relies mainly on work by Lisa Mirachi [2]. She holds this can by accomplished by developing three models: The first is a model of the relevant human intelligence explanandum called the agent model. The second is a model of the artificial system in computational terms that enables manipulation and measurement, this is called the basis model. Finally, a model of how features represented by the basis model transform to features represented by the agent model, this is called the generative model. Using these models, one can transfer knowledge about the artificial system to knowledge about the human mind. Throughout the book, when Buckner discusses some faculty of human intelligence, he explains research in cognitive science and philosophy that he deems relevant and also describes DNN architectures used to solve similar tasks. These discussions would correspond to the agent model and the basis model, respectively. As he admits, the generative model is still missing. However, he claims that this book “tee[s] up their development” [1, p. 41]. What he means by this is that he has done the groundwork of identifying appropriate agent and basis models from previous research, which only leaves the technical job of developing the generative model. Here, it strikes me that without this work having been done, there is quite a large hole in the philosophical argument for his empiricist position for human cognition. The discussion of the deep learning models still provides a proof of concept of how the faculty could possibly work in humans, and the discussion of models of human mental faculties and their relation to the artificial models is an insightful synthesis of philosophical and psychological arguments, inspiring a path for future research in artificial intelligence, cognitive science, and the relation between the two. But there is a leap to be made, from artificial computational structures to the processing of the human brain and without this leap, empiricist artificial models cannot confirm empiricism in human cognition. A counterpoint that Buckner might make here would be that the goal of the book is not to provide undisputable evidence for his empiricist position, but rather to argue that recent evidence from deep learning points towards it being close to the truth. The “how actually” approach is only an ideal towards which we should strive, but we are not there yet. Let us now explore Buckner’s arguments in the chapter where he has the strongest case. Where he discusses the faculty with the strongest link between human and artificial processing, perception.
The faculty of perception
Nativists tend to be rationalists who explain our knowledge by appealing to an innate theory of the world’s structure and the human ability to reason. They often claim that empiricists are unable to explain how we possess abstract concepts like PRIME NUMBER. Buckner’s main goal in this chapter on perception is to counter this claim by showing that DNNs can recognize abstract patterns. He does this by focusing on perceptual abstraction. Perceptual abstraction enables the mind to recognize similarities in sensory data representing different phenomena that belong to the same abstract category. Buckner begins the philosophical discussion of this chapter by exploring four classical empiricist approaches from Locke and Hume to explain how we learn abstractions. These are: • Abstraction-as-subtraction: The abstract category starts with all the specifications of a single example. When another example of the category appears, with some features different, the features that differ are removed from the specification. • Abstraction-as-composition: Categories are formed by composition from other categories. For example, a triangle consists of three lines. The category of lines is used to define the category of triangles. • Abstraction-as-representation: We can reason about abstract categories by performing the reasoning on one exemplar for that category. • Abstraction-as-invariance: Categories are defined by properties that are invariant under some systematic transformations.
These approaches all have their flaws. To address them, Buckner argues that the approaches are not mutually exclusive. He introduces a new framework for understanding abstraction called “transformational abstraction” that draws insights from all the aforementioned approaches. He does not give a clear definition of the framework, but there are some key ideas. Among them is the observation that the differences between examples from the same category are not as chaotic as they are often thought to be. Sources of variation are often systematic parameters that machine learning researchers call “nuisance variables”. For visual object recognition, an example of this could be the location or rotation of the object in a visual field. These nuisance variables are often well-behaved in the sense that they exhibit symmetries in spatial or temporal dimensions. The idea is then that the mind can learn systematic domain-general transformations that move examples from the same category closer to each other in some representation space, by eliminating the nuisance variables. The examples that fall into an abstract category are those whose representations transform into a distinct region in the representation space corresponding to the category. The regions of the representation space are inherently inaccessible to our consciousness, which explains why we cannot visualize an abstract category. The deep learning architecture considered in this chapter is “Deep Convolutional Neural Networks” (DCNNs). These networks perform two key operations that conventional neural networks do not perform, convolution and pooling. Another key feature is that they are deep and hierarchical. They have multiple layers with different levels of abstraction. The earliest layers of a visual convolutional network might be detecting edges, while the later ones can detect higher levels of abstraction like ‘cat’. Buckner likens the effect of the convolution operation to ‘abstraction-as-composition’ and the effect of pooling to ‘abstraction-as-subtraction’. A simplified picture of the abstraction performed by these networks is that in each layer the data is transformed, the convolution operations detect features, and the pooling layers eliminate the nuisance variables. This fits well into Buckner’s description of transformational abstraction. DCNNs have proven to be capable in many abstraction tasks, most notably classifying images. They provide a clear proof of concept of how abstraction can be obtained within the empiricist framework. To quote Buckner: “The success of DCNNs provides perhaps the strongest line of established empirical evidence in favor of the new empiricist DoGMA.” [1, p. 129] I agree, this is very strong evidence that the empiricist position is consistent with reality. It also supports the DoGMA, since DCNNs can be seen as a module for perception. They have a specific structure designed for perception (the hierarchical convolution and pooling layers) that is still very domain-general, since this structure can learn to recognize many different categories from multiple domains. But remember, Buckner is not satisfied with a proof of concept. As he mentions early in the chapter: “I will argue that these differences allow DCNNs to model core aspects of abstraction in the mammalian brain.” [1, p. 119] But when it comes to the discussion, the strongest statement he writes in this regard is: “[W]e may nowseehowasingle mechanism which bears some structural similarity to mammalian cortex implements [the four forms of abstractions] simultaneously. We thus have good reason to believe that properties associated with each form of abstraction cluster together in (at least) the mammalian brain.” [1, p. 130] The structural similarity he is referring to is the fact that the mammalian neocortex has a hierarchical processing structure that was indeed an inspiration for the early convolutional networks. The discussion in the chapter is sound, but, to the critical reader, it does not seem to justify the claims made early in the chapter. We might thus wonder whether clarifying these limitations would have made the scope of the arguments more definite and the text more resilient to critique.
Final words
In this text, I have surveyed the main topics and arguments of this excellent book. The book is ambitious and has many purposes, including arguing for a philosophical empiricist position, pointing AI researchers towards productive areas of inquiry, and providing groundwork for research in cognitive science. These are very ambitious goals, and my main criticism has regarded the unclear distinction between the parts that are relevant to the philosophical argument and the ones that only contribute to the broader discussion. This does not invalidate the arguments made in the book, and the surrounding discussion remains excellent and informative. So much so that I think the philosophical and psychological context provided in the book makes it a must-read for all AI researchers who want to implement the next iteration of human-like AI.
References
[1] C. J. Buckner, From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence, Oxford University Press, 12 2023. [2] L. Miracchi, A competence framework for artificial intelligence research, Philosophical Psychology, 32 (2019), pp. 588–633.
Really exciting discussion at the intersection of cognitive psychology, AI, and philosophy. I loved the depth of engagement with current work on DNNs - not too technical, but plenty of concrete details from work spanning the inception of neural networks into 2023, with a focus on recent developments.
This is exactly what I wanted to follow up on the empiricist vs nativist debate in cognitive psychology. Buckner uses arguments from the history of philosophy to make the case that, in principle, no innate knowledge is needed for DNN-based AI architecture to make significant progress toward human capabilities. Rather, domain-general faculties are not only sufficient for our goals, but more promising than GOFAI while avoiding certain of its pitfalls, like context-based fragility. He follows those up with modern experiments implementing relevant features, and generally ends chapters with speculation about challenges and opportunities that may come with the approaches he advocates.
I do wonder if he could have engaged more deeply with potential benefits of nativist approaches, and pitfalls (not necessarily just challenges) of empiricist approaches. In particular, the completed picture he advocates for looks to me a lot like a regular human, with all of our cognitive biases and developmental baggage. One bottleneck is finding ways to make AI more energy efficient and computationally efficient (eg, stop having to simulate PDP on a classical machine). Until then, it may not be feasible to pile up separate modules for memory, perception, imagination, and attention (assuming that each of these is implemented by only one module), and have it run in real time, go through a bunch of human-like developmental stages in a fraction of the time, be cost-efficient enough for anyone to afford, and still have superhuman intelligence in several areas. Meanwhile nativism offers what seem like short cuts without obvious drawbacks in all cases.
Overall, fantastic book! It’s very well written, I learned a lot, and I have a lot left to chew on.
The central idea is that the current debate between empiricists and nativists within the field of artificial intelligence presents a unique opportunity to translate a timeless philosophical question into a testable scientific one. Buckner proposes that deep learning, rather than requiring a nativist approach, can achieve rational cognition through a "moderately empiricist" perspective. He examines the work of influential empiricist philosophers, applying their theories to deep learning and exploring how various mental faculties, such as perception, memory, and imagination, can be implemented in artificial systems. He ultimately argues that deep learning, by incorporating principles of "ecological rationality" and a nuanced understanding of human cognition, can potentially achieve knowledgeable and human-like artificial agents. But this argument, though really interesting and profoundly put, seems really far-fetched as far as current deep learning systems and architectures are concerned.
But while this seems far-fetched, this whole approach of invoking moderate empiricism has lots of potential as far as these unresolved issues of philosophy are concerned. Buckner situates moderate empiricism as a philosophical stance within the broader empiricism-nativism debate. It posits that while abstract knowledge stems from sensory experience, this process involves the active participation of innate, domain-general faculties. This nuanced perspective contrasts with radical empiricism, which minimizes the role of innate factors, and radical nativism, which emphasizes innate, domain-specific knowledge.
In the race to AGI, understanding precisely what human faculties one is trying to replicate and how is fundamental. The recent wave of books on the topic, impressive as it is, lacks this perspective. Buckner's "From Deep Learning to Rational Machines" is a worthy exception.
Firmly rooted in the philosophical debate between empiricists (Locke, Hume) and nativists (Fodor, Chomsky), the book is also a comprehensive explanation of most, if not all, of the recent technologies underpinning LLMs