A good collection of data mining techniques. However, for actual implementation of the presented algorithms you might need to look somewhere else because the presented information is not always clear and the examples are often difficult to transform to your own problems.
Jiawei Han was my professor for Data Mining at U of I, he knows a ton and is one of the most cited professors (if not the most) in the Data Mining field. I felt this book reflects that, honestly, his book explains many of the concepts of Data Mining in a more efficient and direct manner than he can in a class setting.
I enjoyed reading his book and learned a lot and there is a reason this is the standard Data Mining book for graduate studies, I would recommend it to anyone wishing to learn Data Mining.
Very good intro to data mining concepts, well explained and easy to understand. The math was a bit difficult for me though, but you can always come back to it if you ever need the basics.
Good overview of Data Science techniques and some algorithms.
3 Stars because some computer scientists need to learn Set theory properly. There's no legitimate reason to exchange the symbols of Union and Intersection in a textbook. Mathematics has a well defined pedagogy and history, and with something as basic as a Venn Diagram, the CS field should actually use accepted terminologies. And I am surprised a professional editor would let this pass.
Should we also exchange the functional operations of addition and subtraction...just for data mining? No. Reading through algorithms where the Union symbol means "Intersection" is just a serious impediment to learning for any student of mathematics.
Every Automata Theory textbook I've read defines these symbols properly. No one I've read exchanges Union and Intersection symbols when proving a language is regular.
There's a predefined history of common operations...use them.
Good for those who want to get a high level knowledge about data mining in general. As a software engineer I found it beneficial to learn new techniques about data mining phases in order to reach knowledge discovery.
من الكتب المرجعية الأساسية في موضوع البحث وتنظيم البيانات الكبيرة، الجميل فيه أنه يبدأ بترتيب يسهل على غير المتخصص فهم الموضوع من بدايته من خلال التعريفات والمصطلحات والقواعد الأساسية.
I'm biased because I took the class with the author, professor Han, so I had more time to digest all the math in it, but I find it an extremely useful coverage of the field.
First of all, I would like to mention that I am not familiar with data mining and its technology So you can take my review as a summary of the book with my personal opinion -not a professional one- when it is needed.
Now, I'm reading:
**UNIT6 Mining Frequent Patterns,dealing with finding all frequent itemsets and generate strong associating rules.
Every rule holds in transaction set D with Support s and Confidence c: Support = probability of two items A and B are chosen together. Confidence = Probability of B in the transaction set D which contains A.
Apriori Algorithm is the fundamental theory to find Frequent Itemsets by confined candidate generation (It is time consuming) P:248.
Improving the efficiency of Apriori can be done using different variations: P:255-256
* Frequent Pattern Growth (FP-growth)method for finding frequent itemsets without costly candidate generation process.P:257.
* Using Vertical Data format (personally didn't find it interesting)
* Mining Closed and Max Patterns: This requires us to prune the search space as soon as possible using one of the next strategies: - Item merging. - Sub-itemset pruning. - Item skipping.
6.3 (P:264) is presenting important idea that (Support and Confidence) are not enough and could result in a mislead "strong" association rules.For that reason we are suggested to use correlation measures: - Lift (A,B)=1 means A and B are independent. Lift (A,B)<1 means A and B are negatively correlated. Lift (A,B)>1 means A and B are positively correlated.
- X^2 (squared difference between observed and expected values,divided by expected value).
Other patterns evaluation measures which gain interests lately are: all_confidence, max_confidence, Kulczynski and cosine.measure value(0~1), The higher the value,the closer the relationship between A and B.
I selected this book, hoping to understand the difference between Data Mining, which I wasn't familiar with yet, and the fields already known to me of Machine Learning and Statistics. This book provides very good overview of Data Mining techniques in general and it is also packed with lots of practical examples, giving good intuition on what actually Data Mining is and how it is related to Machine Learning and Statistics.
I read the translated Chinese version, not the original English version. I don't really like the Chinese version of this book. For some serious abstract names and concepts, there are lots of weird translations. This is my personal opinion. I guess the English version may be easier to read, especially when it is the concepts that are mainly concerned.
Another good book on data mining. Explains data mining algorithms and provides examples of their usage. The book is used as a text book for Master's level studies in computer science.
I have read three/four books in data mining and took two classes and attend a conference. This is the best beginning for you to learn data mining basics and everything related to data analysis.
The book has simplistic language and is very easy to understand. Very good from a student's perspective. The Diagrams are easy to understand and explain the concept thoroughly.