Jump to ratings and reviews
Rate this book

Text Mining: Predictive Methods for Analyzing Unstructured Information

Rate this book
Data mining is a mature technology. The prediction problem, looking for predictive patterns in data, has been widely studied. Strong me- ods are available to the practitioner. These methods process structured numerical information, where uniform measurements are taken over a sample of data. Text is often described as unstructured information. So, it would seem, text and numerical data are different, requiring different methods. Or are they? In our view, a prediction problem can be solved by the same methods, whether the data are structured - merical measurements or unstructured text. Text and documents can be transformed into measured values, such as the presence or absence of words, and the same methods that have proven successful for pred- tive data mining can be applied to text. Yet, there are key differences. Evaluation techniques must be adapted to the chronological order of publication and to alternative measures of error. Because the data are documents, more specialized analytical methods may be preferred for text. Moreover, the methods must be modi?ed to accommodate very high tens of thousands of words and documents. Still, the central themes are similar.

249 pages, Hardcover

First published October 25, 2004

3 people are currently reading
43 people want to read

About the author

Sholom M. Weiss

14 books3 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
7 (33%)
4 stars
5 (23%)
3 stars
6 (28%)
2 stars
3 (14%)
1 star
0 (0%)
Displaying 1 - 3 of 3 reviews
Profile Image for Muhammad al-Khwarizmi.
123 reviews38 followers
October 1, 2016
This writing in this book is very often amazingly obtuse, just as you'd expect from a Springer publication. In addition, some of the content is estranged from how things are done with the libraries I am aware of; for example, neither R's tm nor Python's NLTK nor any other widely available text mining software I know of induces decision rules from text as described in this volume. On the other hand it is (perhaps mercifully) short and can at least give the reader a high-level impression of how things go in this field.
Profile Image for Pete Aven.
66 reviews1 follower
September 17, 2007
Excellent crash course in standard Data Mining and Natural Language Processing methods. There're a couple of chapters dense with math, and that might throw some people off, but you can skip the equations if not interested and still get a lot out of this. At ~200 pages, the book concisely describes the terminology and methods used to parse unstructured text and derive some meaning out of it (within a document, and across a collection of documents ) with great clarity.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.