Ég þurfti að skrifa bókagagnrýni fyrir heimspekiáfangann minn. Hér kemur hún.
Deep learning and empiricism
A review of C. J. Buckner’s From Deep Learning to Rational Machines
Cameron J. Buckner’s From Deep Learning to Rational Machines is a well-written book that explains the success of deep learning through the faculty psychology of classical empiricists. Buckner is very knowledgeable
about recent breakthroughs in deep learning and also on relevant research in cognitive science. In this book, he channels these two areas of expertise to argue for a moderate empiricist position in the ongoing
philosophical debate between nativists and empiricists on the origin of knowledge.
This age-old debate can be traced back to the Ancient Greek philosophers. Plato is often seen among the most radical nativists. He thought that all abstract ideas were innate. On the other hand, radical empiricists hold that the human mind starts as a blank slate that is then formed by sensory experiences.
Buckner stresses that the positions in this debate should be looked at on a continuum [1, Ch 1.3]. This is important since the main goal of the book is to argue for a moderate empiricist position, one that derives from Locke, Hume, and other classical empiricists, and is consistent with recent developments in cognitive science and deep learning.
The basic tenet of Buckner’s empiricism is a variation of Hume’s “Copy Principle” that he dubs the “Transformation Principle”:
“The mind’s simple concepts (or conceptions) are in their first appearance derived from systematic, domain-general transformations of sensory impressions.” [1, p. 19]
We should direct special attention to these domain-general transformations. They are what separate Buckner’s position from his predecessors’. Domain-general transformations are transformations of the raw sensory data that can work across many domains. They can become specific through learning, but initially they do not track specific features such as faces or food. In this way he gets by without assuming that the mind has any innate ideas. Moreover, he states that the best way to model rational cognition in AI is through (Do)main General Modular Architectures, giving his position the memorable name “the new empiricist DoGMA”. The modularity here refers to the idea that AIs should have different modules that perform the tasks of different faculties of the human mind.
The book is structured into seven chapters. The first chapter sets the philosophical stage, introducing the empiricist position to be argued for. The second chapter gives an overview of deep learning, highlighting both the properties that separate human cognition from current DNNs and how to evaluate the rationality of artificial systems. The remaining five chapters each focus on a single mental faculty, perception, memory, imagination, attention, and social cognition. Each chapter gives an empiricist explanation of the operations of the faculty by focusing on the arguments of one classical philosopher and introducing the deep learning models that have been used to capture some workings of it.
The book provides various valuable insights that I cannot cover in detail here. One of which is the “Tiers of Rationality” he uses to evaluate the rationality of rational agents. It emphasizes the importance of flexibility, mental models and social capabilities. Another insight is found in his criticism of nativists who tend to overestimate the abilities of human cognition. He calls it “anthropofabulation” and points out many instances of it. Yet another point and a great argument for the DoGMA is the observation that the shortcomings of current AIs are sometimes similar to the shortcomings of humans or animals that are lacking capabilities in some mental faculty. This supports the idea that creating AI systems with multiple modules that perform the tasks of different faculties might solve these issues.
Having reviewed some of the book’s primary insights, we now shift our focus to a theme in the book where I thought Buckner was slightly imprecise in the presentation of his argument.
What can deep learning tell us about human cognition?
As you might have guessed from the title of the book, deep learning plays a large role in Buckner’s philosophical argument. Since deep learning models have proven capable of solving various tasks that nativists previously thought required some innate concepts, and they do this purely by learning from examples using domain-general learning rules. These models can not only show how the human mind might possibly acquire knowledge, but they also refute many arguments prized by nativists, as we will see later. Surprisingly, Buckner wants to go even further and state that DNNs (Deep Neural Networks) can also show us how the mind actually performs a certain task. To support this claim he relies mainly on work by Lisa Mirachi [2]. She holds this can by accomplished by developing three models: The first is a model of the relevant human
intelligence explanandum called the agent model. The second is a model of the artificial system in computational terms that enables manipulation and measurement, this is called the basis model. Finally, a model of how features represented by the basis model transform to features represented by the agent model, this is called the generative model. Using these models, one can transfer knowledge about the artificial system to knowledge about the human mind.
Throughout the book, when Buckner discusses some faculty of human intelligence, he explains research in cognitive science and philosophy that he deems relevant and also describes DNN architectures used to solve similar tasks. These discussions would correspond to the agent model and the basis model, respectively. As he admits, the generative model is still missing. However, he claims that this book “tee[s] up their development”
[1, p. 41]. What he means by this is that he has done the groundwork of identifying appropriate agent and basis models from previous research, which only leaves the technical job of developing the generative model. Here, it strikes me that without this work having been done, there is quite a large hole in the philosophical argument for his empiricist position for human cognition. The discussion of the deep learning models still provides a proof of concept of how the faculty could possibly work in humans, and the discussion of models of human mental faculties and their relation to the artificial models is an insightful synthesis of philosophical and psychological arguments, inspiring a path for future research in artificial intelligence, cognitive science, and the relation between the two. But there is a leap to be made, from artificial computational structures to the processing of the human brain and without this leap, empiricist artificial models cannot confirm empiricism in human cognition. A counterpoint that Buckner might make here would be that the goal of the book is not to provide undisputable evidence for his empiricist position, but rather to argue that recent evidence from deep learning points towards it being close to the truth. The “how actually” approach is only an ideal towards which we should strive, but we are not there yet.
Let us now explore Buckner’s arguments in the chapter where he has the strongest case. Where he discusses the faculty with the strongest link between human and artificial processing, perception.
The faculty of perception
Nativists tend to be rationalists who explain our knowledge by appealing to an innate theory of the world’s structure and the human ability to reason. They often claim that empiricists are unable to explain how we possess abstract concepts like PRIME NUMBER. Buckner’s main goal in this chapter on perception is to counter this claim by showing that DNNs can recognize abstract patterns. He does this by focusing on perceptual abstraction. Perceptual abstraction enables the mind to recognize similarities in sensory data representing different phenomena that belong to the same abstract category.
Buckner begins the philosophical discussion of this chapter by exploring four classical empiricist approaches from Locke and Hume to explain how we learn abstractions. These are:
• Abstraction-as-subtraction: The abstract category starts with all the specifications of a single example.
When another example of the category appears, with some features different, the features that differ are removed from the specification.
• Abstraction-as-composition: Categories are formed by composition from other categories. For example, a triangle consists of three lines. The category of lines is used to define the category of triangles.
• Abstraction-as-representation: We can reason about abstract categories by performing the reasoning
on one exemplar for that category.
• Abstraction-as-invariance: Categories are defined by properties that are invariant under some systematic transformations.
These approaches all have their flaws. To address them, Buckner argues that the approaches are not mutually exclusive. He introduces a new framework for understanding abstraction called “transformational abstraction” that draws insights from all the aforementioned approaches. He does not give a clear definition of the framework, but there are some key ideas. Among them is the observation that the differences between examples from the same category are not as chaotic as they are often thought to be. Sources of variation are often systematic parameters that machine learning researchers call “nuisance variables”. For visual object recognition, an example of this could be the location or rotation of the object in a visual field. These nuisance variables are often well-behaved in the sense that they exhibit symmetries in spatial or temporal dimensions.
The idea is then that the mind can learn systematic domain-general transformations that move examples from the same category closer to each other in some representation space, by eliminating the nuisance variables. The examples that fall into an abstract category are those whose representations transform into a distinct region in the representation space corresponding to the category. The regions of the representation space are inherently inaccessible to our consciousness, which explains why we cannot visualize an abstract category.
The deep learning architecture considered in this chapter is “Deep Convolutional Neural Networks” (DCNNs). These networks perform two key operations that conventional neural networks do not perform, convolution and pooling. Another key feature is that they are deep and hierarchical. They have multiple layers with different levels of abstraction. The earliest layers of a visual convolutional network might be detecting edges, while the later ones can detect higher levels of abstraction like ‘cat’. Buckner likens the effect of the convolution operation to ‘abstraction-as-composition’ and the effect of pooling to ‘abstraction-as-subtraction’. A simplified picture of the abstraction performed by these networks is that in each layer the data is transformed, the convolution operations detect features, and the pooling layers eliminate the nuisance variables. This fits well into Buckner’s description of transformational abstraction.
DCNNs have proven to be capable in many abstraction tasks, most notably classifying images. They provide a clear proof of concept of how abstraction can be obtained within the empiricist framework. To quote Buckner:
“The success of DCNNs provides perhaps the strongest line of established empirical evidence in
favor of the new empiricist DoGMA.” [1, p. 129]
I agree, this is very strong evidence that the empiricist position is consistent with reality. It also supports the DoGMA, since DCNNs can be seen as a module for perception. They have a specific structure designed for perception (the hierarchical convolution and pooling layers) that is still very domain-general, since this structure can learn to recognize many different categories from multiple domains.
But remember, Buckner is not satisfied with a proof of concept. As he mentions early in the chapter:
“I will argue that these differences allow DCNNs to model core aspects of abstraction in the mammalian brain.” [1, p. 119]
But when it comes to the discussion, the strongest statement he writes in this regard is:
“[W]e may nowseehowasingle mechanism which bears some structural similarity to mammalian cortex implements [the four forms of abstractions] simultaneously. We thus have good reason to believe that properties associated with each form of abstraction cluster together in (at least)
the mammalian brain.” [1, p. 130]
The structural similarity he is referring to is the fact that the mammalian neocortex has a hierarchical processing structure that was indeed an inspiration for the early convolutional networks. The discussion in the chapter is sound, but, to the critical reader, it does not seem to justify the claims made early in the chapter. We might thus wonder whether clarifying these limitations would have made the scope of the arguments more definite and the text more resilient to critique.
Final words
In this text, I have surveyed the main topics and arguments of this excellent book. The book is ambitious and has many purposes, including arguing for a philosophical empiricist position, pointing AI researchers towards productive areas of inquiry, and providing groundwork for research in cognitive science. These are very ambitious goals, and my main criticism has regarded the unclear distinction between the parts that are relevant to the philosophical argument and the ones that only contribute to the broader discussion. This does not invalidate the arguments made in the book, and the surrounding discussion remains excellent and informative. So much so that I think the philosophical and psychological context provided in the book makes it a must-read for all AI researchers who want to implement the next iteration of human-like AI.
References
[1] C. J. Buckner, From Deep Learning to Rational Machines: What the History of Philosophy Can Teach
Us about the Future of Artificial Intelligence, Oxford University Press, 12 2023.
[2] L. Miracchi, A competence framework for artificial intelligence research, Philosophical Psychology, 32
(2019), pp. 588–633.