Rate this book

Data Mining: Concepts and Techniques

Name: Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)
Rating: 3.89 (22 reviews)
ISBN: 9780123814791

Jiawei Han, Micheline Kamber, Jian Pei

Rate this book

All our books are brand new. We ship worldwide

GenresComputer ScienceTextbooksProgrammingReferenceNonfictionComputersTechnology

744 pages, Hardcover

First published August 1, 2000

About the author

Jiawei Han

35 books4 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

122 (29%)

4 stars

165 (39%)

3 stars

95 (22%)

2 stars

28 (6%)

1 star

5 (1%)

Displaying 1 - 22 of 22 reviews

Radek Lát

20 reviews1 follower

January 17, 2015

A good collection of data mining techniques. However, for actual implementation of the presented algorithms you might need to look somewhere else because the presented information is not always clear and the examples are often difficult to transform to your own problems.

Austin

13 reviews3 followers

January 18, 2014

Jiawei Han was my professor for Data Mining at U of I, he knows a ton and is one of the most cited professors (if not the most) in the Data Mining field. I felt this book reflects that, honestly, his book explains many of the concepts of Data Mining in a more efficient and direct manner than he can in a class setting.

I enjoyed reading his book and learned a lot and there is a reason this is the standard Data Mining book for graduate studies, I would recommend it to anyone wishing to learn Data Mining.

Parisa

21 reviews4 followers

March 30, 2022

This book is suitable for both beginners and intermediate learners. I enjoyed reading this book immensely.

Emmi

137 reviews

August 16, 2017

Finished reading important area from this book. It gives clear knowledge in data mining techniques.

eri b.❀

518 reviews40 followers

June 22, 2022

Very good intro to data mining concepts, well explained and easy to understand. The math was a bit difficult for me though, but you can always come back to it if you ever need the basics.

Thomson Kneeland

44 reviews4 followers

March 17, 2019

Good overview of Data Science techniques and some algorithms.

3 Stars because some computer scientists need to learn Set theory properly. There's no legitimate reason to exchange the symbols of Union and Intersection in a textbook. Mathematics has a well defined pedagogy and history, and with something as basic as a Venn Diagram, the CS field should actually use accepted terminologies. And I am surprised a professional editor would let this pass.

Should we also exchange the functional operations of addition and subtraction...just for data mining? No. Reading through algorithms where the Union symbol means "Intersection" is just a serious impediment to learning for any student of mathematics.

Every Automata Theory textbook I've read defines these symbols properly. No one I've read exchanges Union and Intersection symbols when proving a language is regular.

There's a predefined history of common operations...use them.

computer-science-tech mathematics

Khaled Al-Ansari

65 reviews1 follower

October 5, 2020

Good for those who want to get a high level knowledge about data mining in general. As a software engineer I found it beneficial to learn new techniques about data mining phases in order to reach knowledge discovery.

soft-copy

رائد الغامدي

Author 5 books21 followers

December 18, 2015

من الكتب المرجعية الأساسية في موضوع البحث وتنظيم البيانات الكبيرة، الجميل فيه أنه يبدأ بترتيب يسهل على غير المتخصص فهم الموضوع من بدايته من خلال التعريفات والمصطلحات والقواعد الأساسية.

Fabio

144 reviews6 followers

January 17, 2019

I'm biased because I took the class with the author, professor Han, so I had more time to digest all the math in it, but I find it an extremely useful coverage of the field.

Sayma

4 reviews

October 8, 2019

This book really helped me with my course.

study-metarial

Ramasubramaniam

14 reviews

July 24, 2021

My favourite classic in data mining.

Soma Boubou

14 reviews

November 16, 2013

First of all, I would like to mention that I am not familiar with data mining and its technology So you can take my review as a summary of the book with my personal opinion -not a professional one- when it is needed.

Now, I'm reading:

**UNIT6 Mining Frequent Patterns,dealing with finding all frequent itemsets and generate strong associating rules.

Every rule holds in transaction set D with Support s and Confidence c:
Support = probability of two items A and B are chosen together.
Confidence = Probability of B in the transaction set D which contains A.

Apriori Algorithm is the fundamental theory to find Frequent Itemsets by confined candidate generation (It is time consuming) P:248.

Improving the efficiency of Apriori can be done using different variations: P:255-256

- Hash-based Techniques.
- Transaction reduction.
- Partitioning.
- Sampling.
- Dynamic itemset counting.

* Frequent Pattern Growth (FP-growth)method for finding frequent itemsets without costly candidate generation process.P:257.

* Using Vertical Data format (personally didn't find it interesting)

* Mining Closed and Max Patterns: This requires us to prune the search space as soon as possible using one of the next strategies:
- Item merging.
- Sub-itemset pruning.
- Item skipping.

6.3 (P:264) is presenting important idea that (Support and Confidence) are not enough and could result in a mislead "strong" association rules.For that reason we are suggested to use correlation measures:
- Lift (A,B)=1 means A and B are independent.
Lift (A,B)<1 means A and B are negatively correlated. Lift (A,B)>1 means A and B are positively correlated.

- X^2 (squared difference between observed and expected values,divided by expected value).

Other patterns evaluation measures which gain interests lately are:
all_confidence, max_confidence, Kulczynski and cosine.measure value(0~1), The higher the value,the closer the relationship between A and B.

all_conf(A,B) = min{P(A|B),P(B|A)}
max_conf(A,B) = max{P(A|B),P(B|A)}
Kulc(A,B)= 1/2 {P(A|B)+P(B|A)}
cosine(A,B)= square{P(A|B)*P(B|A)}

Previous six measures were examined on six typical data-sets (Page:269)
Lift and X^2 are strongly influenced by the number of null-transaction.

null-invariant measure if its value is free from the influence of null-transactions.

Imbalance Ratio(IR): which assesses the imbalance of two itemsets, A and B, in rule implications.

IR(A,B)= |sup(A)-sup(b)|/(sup(A)+sup(B)-sup(A&B)) [0~...]

With imbalance data with confusing values of the latest fore measures, we use the two measures together IR and Kulc.

Daniel Korzekwa

26 reviews12 followers

June 2, 2012

I selected this book, hoping to understand the difference between Data Mining, which I wasn't familiar with yet, and the fields already known to me of Machine Learning and Statistics. This book provides very good overview of Data Mining techniques in general and it is also packed with lots of practical examples, giving good intuition on what actually Data Mining is and how it is related to Machine Learning and Statistics.

异次元骇客

9 reviews

July 3, 2016

I read the translated Chinese version, not the original English version. I don't really like the Chinese version of this book. For some serious abstract names and concepts, there are lots of weird translations. This is my personal opinion.
I guess the English version may be easier to read, especially when it is the concepts that are mainly concerned.

Dustin

1 review

Read

September 10, 2007

Still reading...

Ayman Sieny

13 reviews3 followers

February 4, 2011

Another good book on data mining. Explains data mining algorithms and provides examples of their usage. The book is used as a text book for Master's level studies in computer science.

Darin

125 reviews19 followers

reference-only

March 20, 2021

This is a good, high level book on data mining. If you want heavy theory, you will need to look elsewhere.

artificial-intelligence computer-science data-mining

Ohud Saud

93 reviews4 followers

April 21, 2015

WOW

I have read three/four books in data mining and took two classes and attend a conference. This is the best beginning for you to learn data mining basics and everything related to data analysis.

data-mining intrusion-detection

Mona Mahfouz

63 reviews14 followers

August 17, 2014

Read for Data Mining course. Well written and easy to follow with good examples.

Audrey

210 reviews39 followers

June 23, 2016

Very clear explanations!

258 reviews18 followers

November 30, 2016

The book has simplistic language and is very easy to understand.
Very good from a student's perspective.
The Diagrams are easy to understand and explain the concept thoroughly.

computer-science