This is a quite famous and somewhat controversial book. It is widely rumored that the bleak evaluation of the limitations of perceptrons in this book lead to the dramatic decrease in neural networks research until it resurged in the PDP era. I must say that I like this book. In many respects, it caught me off guard. It is first and foremost a mathematical treatise with a more or less definition-theorem style of presentation. More surprisingly for me, the mathematical tools are algebra and group theory, not statistics as one might expect. Minsky and Papert's purpose in writing this book was presenting the first steps in a rigorous theory of parallel computation. In order to be able to build a mathematical theory, they had to constrain themselves to a narrow but yet interesting subspecies of parallel computing machines: perceptrons. Their perceptron is crucially different from what we would call perceptron today. In today's parlance, perceptron is a single layer (i.e., no hidden layers) neural network with threshold units in its output layer: sum w_i*x_i >theta. Minsky and Papert think in terms of boolean predicates (instead of x_i's directly). For them a perceptron takes a weighted sum of some set of boolean predicates defined on the input: sum w_i*b_i(X) > theta where b_i(X) is a predicate (0-1 valued function). For example b(X) could be [x_1 and x_2 and (not x_3)]. Adopting this definition, today's perceptron is a special case of theirs where b_i(X) depends on only a single x_j. For Minsky and Papert, that would be an order 1 predicate (because the predicate involves only one input). Building on this order concept, they define the order of a problem as the maximum order of the predicates one needs to solve it. The famous XOR result then is the statement that XOR problem is not of order 1 (it is of order 2). It is interesting that this is only mentioned in passing; it is not an important part of the book. It is not even proved! Minsky and Papert are more interested in problems of infinite order, i.e., problems where the order grows with the problem size. For example, the convexity (of a figure in 2D) problem is of finite order (in fact of order 3) because whatever the size of the input retina, predicates of order 3 are enough to solve it. Their most important results concern some infinite order problems. For example it turns out that parity problem, i.e., odd or even number of 1s, (XOR in high dimensional spaces) is not of finite order. If you have N inputs, you need at least one predicate of order N to solve this problem. Another example problem of infinite order is connectedness, i.e., whether a figure is connected. Minsky and Papert build a mathematical theory based on algebra and group theory to prove these results. Another interesting results is that for certain problems, the coefficients become ill-conditioned in the sense that the ratio of largest to smallest w_i becomes quite large. This raises practical concerns on learnability by perceptrons. The last part of the book is on learning where they look at the perceptron convergence among other things; here one sees a little bit of the currently popular optimization by gradient descent perspective when they talk about perceptron learning as a hill-climbing strategy. In an epilogue added some years later (right around the time when PDP got popular), Minsky and Papert respond to some of the criticisms. This chapter I think was valuable. Minsky and Papert respond to the claim that with multi-layer networks, none of their results are relevant because multi-layer networks can approximate any function, i.e., learn any predicate). Of course, Minsky and Papert's concerns are far from irrelevant; how efficiently we can solve problems with these models is still an important question, a question that we have to face one day even if not now.