Solid introduction to the theory of neural networks, pre-deep learning revolution. Not particularly useful for implementation, and it predates the development of popular architectures like CNNs and LSTM networks, and all the associated optimization and regularization schemes. But it's got probably the clearest and most in-depth coverage of vanilla MLP and stochastic networks I've come across -- way better than more modern texts like Goodfellow (which seriously generally sucks).