Tomasello argues for the thesis that many cognitive capacities that distinguish humans from the great apes essentially rely on one foundational capacity, that of joint and collective intentionality . Joint intentionality is the capacity for two humans, in their interaction, to both understand that they have a shared goal, and to be mutually aware of the perspective or role of the other. For example, even eight-month-old infants are capable of pointing to an object with the goal that their caretakers become aware of the object alongside themselves. When the caretaker attends to the object, the infant is implicitly aware that the caretaker sees the object and knows that the infant also sees the object. There are recursive inferences involved in joint intentionality scenarios: I am aware that you are aware of me. Joint intentionality may be summed up by the description that an individual is able to take up the perspective of the "we," of herself as joined with another.
Chimpanzees and bonobos, the great apes that are most phylogenetically similar to humans, are incapable of joint intentionality. While they manifest cooperative behaviors, it is necessarily the case that the motive for any individual to work together with any other is purely instrumental in nature. That is, any apparently altruistic action is made solely for one's own benefit. For example, whenever a chimpanzee breaks up a fight between two others, she will do that only if this fight might end up harming herself; otherwise, even if one of her relatives is being injured in a fight, she will not intervene. Or, mothers will not share their additional foods with their infants.
This capacity for joint intentionality develops into the capacity for collective intentionality around the age of three years. The "we" that defines the perspective central to joint intentionality is composed of two individuals, the self and another particular person. The "we" that defines the perspective central to collective intentionality is composed is a society or community as a whole. Three-year-olds are capable of recognizing that rules and norms imposed by adults are not based in the adult's authority alone, but the adult is like a representative of a greater community, who passes communal knowledge down to the three-year-old. When an adult, for example, uses an object in a particular way (waters a flower with a watering can), the child understands that this is the way the community as a whole does it, not just the way this particular adult acts. Three-year-olds will enforce social norms onto their peers if the latter violate norms. Three-year-olds will also make up rules among themselves in playing games and enforce those rules; they implicitly understand that the social normativity of rules come from social agreement, rather than from any other source.
Such joint and collective intentionality becomes the basis of moral reasoning and behavior. In joint intentionality, infants already implicitly understand that the self and other in the joint relation are equals; both must participate and fulfill their respective roles in order for the goal to be attained and for any one person to receive the good outcomes. An understanding of human equality arises from this. In collective intentionality, we become aware that all of our actions are evaluable by societal norms. We care about belonging to our social group, since evolutionarily such belongingness possesses critical adaptive advantages.
In collective intentionality scenarios, an individual occupies the perspective of the "we" that refers to the entire community; she will see herself and surrounding objects in terms of communal social norms. Young children "internalize" this perspective. Tomasello proposes that this internalization process occurs by virtue of the mechanism of the role-reversal capacity. This is the capacity to be able to simulate and understand the perspective of another person with whom one is joined in a joint intentionality scenario. If you read me a book, I can understand what it is like to read a book to someone, even though I am only literally listening to your reading. According to Tomasello, in collective intentionality scenarios in which young children only literally evaluate others according to social norms, by virtue of reversing roles they can simulate and understand "from the inside" what it's like for those others to be evaluated as such. Over such experience, children develop the capacity to sense themselves as evaluated as such, by potential others. Even in the absence of any adults or peers, children can possess an omnipresent evaluative stance on their own movements; they cannot help but cognitively and emotionally react to whatever they do according to the societal evaluation that would be made of their action.
In contrast, great apes are not capable of collective intentionality or morality. A striking study that exemplifies this is found in an "ultimatum" situation in which a subject must choose one of two options: either the subject receives an unfair division of a certain amount (2 out of 10 goods), or she refuse and receive nothing at all. Human adults often refuse. But great apes always take the option in which they attain the most material goods, even if it is unfair. Great apes are not sensitivity to fairness, and so standards of goodness are defined by numerical quantity; but for humans, standards of goodness are defined by moral standards of fairness and equality.
Another key example is that both great apes and human infants are capable of recognizing themselves in mirrors. But human infants are immediately shy or coy, or they even turn away and bury their heads in their mothers' laps, upon seeing their reflection. Apes do not react in any such way; they inspect their bodies in the mirrors without any emotional difficulty or complexity. This shows that even young human infants can't help but see themselves with the implicit understanding that this is how other people see them; hence they get shy or coy. Apes do not possess any such societal perspective, which they take up and let frame their experience of viewing themselves in mirrors.
So overall, we humans always see and think from the perspective of community; social norms are always used to evaluate and interpret any situation, before we can become self-aware of the situation or choose to see or think about it in any other way. These social norms are structured to serve community interests. So humans and apes are similar in that both fundamentally see and think about the world in instrumental or goal-seeking ways. But they differ in that these goals are inevitably communal for humans, while they are always individualistic for apes.
I found this book an extremely important read. I've followed some of Tomasello's work previously, but did not notice the thesis that he proposes here. It goes against the grain of the vast majority of western philosophical thinking, which assumes that our perspective is purely our own, and a societal perspective requires efforts to think beyond oneself. Tomasello shows that our perspective is always social in nature, and it requires efforts to abstract from that and try to imagine how one would see the world independently of evaluation of social norms (and perhaps this effort is futile). Tomasello shows this by presenting an abundance of empirical studies. These are very fun and easy to read. His thesis in the end is very well-grounded. It's a stunningly powerful way to understand how we can form the societal complexes and bodies of knowledge that we do. It's a stunningly elegant way to understand how we can differ from apes so fundamentally, given that humans and apes are pretty much identical with respect to many sensorimotor and cognitive capacities.
I'm interested in the implications of Tomasello's work for our understanding of language. Tomasello only briefly mentions that linguistic items (words, phrases, whole utterances) are socially normative in nature, and language crucially enables us to attend to the same object, when that object is not available in the distal environment. Moreover, language enables our greater access to each other's perspectives or roles in a given joint situation, for our clarifying our common ground, and for our justification of actions if they appear to violate social norms. I think Tomasello is spot-on that the ontogenetic origin of language is the motivation to get others to attend to the same thing that is preoccupying oneself. I think he is also spot-on that language allows us to "travel" across different perspectives. It is amazing to think about the features and underlying processes of language that must be in place in order for language to fulfill these functions. This is not Tomasello's project in any way, but is just something I'm curious about.