The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques.
This writing in this book is very often amazingly obtuse, just as you'd expect from a Springer publication. In addition, some of the content is estranged from how things are done with the libraries I am aware of; for example, neither R's tm nor Python's NLTK nor any other widely available text mining software I know of induces decision rules from text as described in this volume. On the other hand it is (perhaps mercifully) short and can at least give the reader a high-level impression of how things go in this field.
Excellent crash course in standard Data Mining and Natural Language Processing methods. There're a couple of chapters dense with math, and that might throw some people off, but you can skip the equations if not interested and still get a lot out of this. At ~200 pages, the book concisely describes the terminology and methods used to parse unstructured text and derive some meaning out of it (within a document, and across a collection of documents ) with great clarity.