Rate this book

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Name: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Rating: 4.55 (218 reviews)

Aurélien Géron

Rate this book

Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.

By using concrete examples, minimal theory, and two production-ready Python frameworks—Scikit-Learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started.

Explore the machine learning landscape, particularly neural nets Use Scikit-Learn to track an example machine-learning project end-to-end Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods Use the TensorFlow library to build and train neural nets Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning Learn techniques for training and scaling deep neural nets

GenresProgrammingComputer ScienceArtificial IntelligenceTechnologyNonfictionTechnicalCoding

851 pages, ebook

First published April 9, 2017

About the author

Aurélien Géron

22 books100 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

1,870 (65%)

4 stars

737 (25%)

3 stars

190 (6%)

2 stars

29 (1%)

1 star

20 (<1%)

Displaying 1 - 30 of 218 reviews

Matthew Perez

1 review12 followers

May 12, 2020

I spent the last five months learning the math and theory behind machine learning, but when I finally tried to do something on a simple Kaggle set, I was drawing blanks.

This book really showed me what I was missing: context. It doesn't just demonstrate different tools, it gives you a framework that you can apply to any problem (chapter 2) and how to think about what you're doing in each phase of an ML project.

It doesn't baby you on the math, but it doesn't go deeper than it needs to either. I think the same can be said for the coding. This book is all about connecting and implementing the basics in a solid manner. For me, that's exactly what I'm looking for.

It really has been the missing link for me on my self-study to connect theory to application and I'm really happy to have picked it up. If you feel like you're in a similar position, I highly recommend that you pick up a copy.

I also recommend doing the coding exercises as you read them. It'll reinforce what you're learning and also keep you from outpacing yourself. Take the time to enjoy the awesome journey that is Hands-On ML.

☘Misericordia☘ ⚡ϟ⚡⛈⚡☁ ❇️❤❣

2,592 reviews19.3k followers

December 31, 2019

A really nice and sensible intro to some of the most salient ML topics. Really visual and nifty in explanations, scikit/TF-oriented.

Q:
When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadlyTerminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s alreadyhere. In fact, it has been around for decades in some specialized applications, such as
Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives ofhundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactlya self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so wellthat you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applicationsthat now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search.
Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is itsuddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you maywant to use it. (c)

Topics:
Q:
Here are some of the most important supervised learning algorithms (covered in this book):
k-Nearest Neighbors
Linear Regression
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees and Random Forests
Neural networks (c)
Q:
Here are some of the most important unsupervised learning algorithms (we will cover dimensionality reduction in Chapter 8):
- Clustering:
k-Means
Hierarchical Cluster Analysis (HCA)
Expectation Maximization
- Visualization and dimensionality reduction:
Principal Component Analysis (PCA)
Kernel PCA
Locally-Linear Embedding (LLE)
t-distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning:
Apriori
Eclat (c)

Mohamed

Author 25 books27 followers

November 6, 2017

One of the best ML books out there. Dives deep into the practical implementation of Sklearn and Tensorflow. Also, dives deep enough into the math side of ML. Read it from cover to cover. Really worth it.

Eryk Banatt

35 reviews16 followers

October 27, 2019

I thought this book was a great overview of the actual practice of machine learning, and I think it compares favorably to something like Andrew Ng's "Machine Learning Yearning" which contains significantly less detail and no code. In general I think this was a pretty good beginning resource for using these frameworks, and it ends up feeling like reading structured documentation. I think this is a particularly useful part of this book, since in my experience a lot of this field is knowing which search terms to use - a task I think this book accomplishes rather neatly for its reader.

I read through this book but I expect I will probably revisit it (along with the notes I took on it) when I need to actually do something it mentions. For example, I don't currently have any need for deploying a tensorflow model but I'm certain eventually it will be something I will need to know how to do. It's a nice reference point: I now vaguely know what to do for that, and know I have a good summary + examples available to me when I need it in the future.

I did find that a number of the chapters were a little overly repetitive, but I think that's largely because I'm not exactly a beginner to keras, so I'm willing to not pay any mind to this. But for people who use some of these frameworks relatively often it can get a little boring to hear "if you want to make a custom version of , you can subclass the existing feature" for every feature.

That said, in general I found that this book captured a nice balance between exposition and documentation. Especially something with as much wide functionality as scikit-learn, it can sometimes get to be a bit much to list out every technique in every module the way it might be in the actual sklearn documentation. However, Aurelien manages to pick out a few examples that capture the essence of the things you need to know, such that even if you've never seen something before you can easily figure out what's happening (i.e. intoducing sklearn.manifold.TSNE after not mentioning it, but mentioning sklearn.manifold.locally_linear_embedding; you might not really know what TSNE is if you've never heard it before (if you live under a rock), but you can probably easily infer from context what it's supposed to be doing). You don't need to know every single classifier available in something like sklearn, all you really need to know is that they're all mostly drop-in replacements for each other, and you can look up the details later in the actual documentation if you forget.

Likewise, tensorflow 2 really just seems like keras 2, so I'm pretty excited about playing around with it a bit more. My opinions on the actual framework are probably not super within the scope of this book, though, so I won't bother detailing them here.

In general, a pretty good intro for beginners, and fairly easy to get through if you just want to know how to use a framework you previously didn't know how to use (like tf 2.0)

Mihail Burduja

1 review2 followers

January 4, 2017

The book contains a chapter that shows a basic flow for working with data problems. The TF chapters are interesting but somehow short. I would have liked more on convolutional layers and RNN.

The reinforcement learning chapter is very interesting.

Eugene

158 reviews15 followers

August 6, 2017

great introduction into machine learning for both developer and non developers. authors suggests to just go through even if you don't understand math details.
main points are:
- extraction of field expert knowledge is very important. you should know which model will serve better for the given solution. luckily lot of models are available already from other scientists.
- training data is the most important part. the more you have it the better.
- so if you can you should accumulate as much data as you can, preferably categorized. you may not still know how you will apply the accumulated data in the future but you will need it.
- labeling training data is very important too. to train neural network you need to have at least thousands of labeled data samples. the more the better.
- Machine learning algorithms and neural networks are pretty common for years but latest breakthrough is possible because of new optimization , new autoencoders ( that may help to artificially generate training data) allowing to do training faster and with less data.
- machine learning is still pretty time and resources consuming process.
- to train machine learning model you need to know how to tweak parameters and how to use different training approaches fitting the particular model.

the book demonstrate (including the code) different approaches using SciLearn python package and also the TensorFlow.

Edaena

12 reviews10 followers

August 29, 2017

This is the best book I've read on machine learning. It is well written and the examples are very good with real data sets.

The first half is an introduction to machine learning and the second half explores deep learning. It is a great book to read along an online course.

Lukasz Pruski

989 reviews148 followers

December 27, 2020

"Machine Learning is the science (and art) of programming computers so they can learn from data."

It is December 27th, four days until the end of the year, and I am four books short of my Goodreads 2020 Reading Challenge goal of 60 books. Never abandon hope! I will review two computer science books that were tremendously important to me and my students in 2020, books that helped me return to the field of neural networks and machine learning in general and helped my outstanding research student complete her challenging and advanced research project with extraordinary success.

I worked with neural networks (NN) in the late 1980s and early 1990s and even co-taught a psychology/computer science course on neural network learning. However, in the 1990s it had become clear that the limits of what the then traditional NN architecture can achieve had been reached and the scientific community basically abandoned NNs as the preferred approach to machine learning. Yet beginning in the first decade of the 21st century we witnessed the rebirth of the NN idea, primarily via various multi-level NN models, such as convolutional neural networks (CNNs) developed by Le Cun, Hinton, and others. Currently, CNNs achieve truly spectacular (without exaggeration one can say 'superhuman') results in various areas of artificial intelligence (AI) and machine learning (ML).

The recent explosion of research and commercial interest in ML resulted in an avalanche of books published on the topic, particularly "popular" books (ones that can serve as tutorials of sorts), addressed to computer science practitioners of various level of preparation, from complete novices to advanced. The range of quality of the books is even more vast. I worked with, read, or at least scanned thoroughly over 20 ML books and to me Hands-On Machine Learning is by far the best text, one that can serve for a wide variety of purposes: on one hand, it can serve as an ML textbook, on the other it can be used as a tutorial for particular methods of ML. (I will review the other great ML book, one that focuses purely on NN, the day after tomorrow. By the way, I was amazed how many bad, totally useless ML books have been published. Christmas spirit prevents me from listing their titles.)

Aurélien Géron, the author of Hands-On Machine Learning, comes with impressive industry credentials. He served as the Product Manager of YouTube video classification at Google, and held several senior positions in artificial intelligence engineering in various companies.

The first two chapters of the book, which belong to the first part entitledThe Fundamentals of Machine Learning, are an absolute must read for anyone interested in studying ML. The author presents the 'landscape of machine learning' and shows a typical ML project 'end-to-end', including data preparation and preprocessing as well as selecting, training, and fine-tuning the model.

The next six chapters of Part I focus on specific ML approaches and their mathematical background. We read about the methods of classification, the Support Vector Machines approach, including the 'kernel trick,' decision trees, ensemble learning and random forests. I love the solid yet very accessible presentation of the math background in the chapter on gradient descent, various types of regression, and regularization. Part I closes with a nice chapter about dimensionality reduction, which focuses on the method of Principal Component Analysis.

Part II of the book, titled Neural Networks and Deep Learning, gives a great overview of the so-called 'deep learning' approach: the reader will learn about the 'classical' NN approach, and then will be gradually introduced to the multi-level NN architecture, CNNs, recurrent NNs, and autoencoders.

The author's reliance on the production-ready Scikit-Learn and TensorFlow Python frameworks rather than on developing own toy versions of various algorithms is commendable. Scikit-Learn, a free software library of machine learning tools, is one of the best things developed in computer science in the last 50 years. It is a splendid manifestation of the power of open-source software.

From a teacher's point of view, the book is excellent! I believe Hands-On Machine Learning is great for the students too. It comes with a lot of interesting Python code samples, in the form of Jupyter notebooks. And the code works! The students can learn a lot by rewriting and extending the sample code.

Very, very highly recommended book! And I am going to round up my extremely high rating of

Four-and-a-half stars.

Wanasit

33 reviews4 followers

September 22, 2019

At the time of reading, I had already learned about most concepts in the book. So, I focused only on the deep parts of Tensorflow. It's a good book overall. I imagine it would be very useful for myself a few years ago.

My favorite part is the reinforce learning in the last chapter. The chapter makes sense, is easy to understand, and its example is very practical.

Ferhat Culfaz

275 reviews19 followers

August 3, 2018

5* for the first half of the book, scikit learn. 3* for the second half, Tensor Flow. Nice examples with Jupyter notebooks. Good mix of practical with theoretical. The scikit learn section is a great reference, nice detailed explanation with good references for further reading to deepen your knowledge. The tensor flow part is weaker as examples become more complex. Chollet’s book Deep Learning with Python, which uses Keras is much stronger, as the examples are easier to understand as Keras is a simple layer over tensor flow to ease the use. Also Chollet explains the concepts better and nicely annotates his code.

Buy this book for scikit learn and overall best practise for machine learning and data science.

Buy Chollet’s Deep Learning using Python for practical deep learning itself.

Overall still a practical book with Jupyter Notebook supplementary material.

Douglas

21 reviews1 follower

July 12, 2020

The best book about the subject out there. It contains easy to understand code in Python and covers from simple linear regression to RNN and CNNs that were published a few months before the launching of the book. A must have.

Mehdi

24 reviews

August 5, 2020

Best ML book! It explains the most used ML concepts very well while being practical. The graphics used are amazing! It is a piece of art :)

Denis Vasilev

845 reviews112 followers

August 15, 2019

Полномасштабное введение в машинное обучение - теоретическое и практическое. Можно использовать как руководство, как справочник, я же пролистал из любопытства...

Kamil

1 review

May 27, 2021

What a read! It's truly one of the best possible looks into the world of the machine learning. It helped me a lot to focus on the most crucial aspects for building usable ML models.

Inès Rouaghe

81 reviews

March 7, 2025

Okay so I did not pick up this book by myself, it was required by one of my module and all my friends laughed when they saw I put it on goodreads. But honestly? I had to give 5 stars cause it’s such a good book about machine learning and it doesn’t make you feel dumb you actually understand it has the codes in it for python. Genuinely if anyone is interested in machine learning read it!! It is very technical but yes good read. It helped me with my project that I handed in back last night abt predicting risk of heart attack in patients so yes I’m very proud of my work and this manual helped a ton!!

Keegan

3 reviews

March 10, 2025

I've only read the first 9 chapters as it relates to my machine learning research - this book is filled with excellent advice with great examples; very easily one of the best resources for machine learning out there!

C.P.

Author 4 books2 followers

February 14, 2018

Great book, just not a great book for the purposes of an introduction to ML. In order to really appreciate what this book has to offer, A. Plan to do the exercises, & B. Know your mathematics better than I did when reading it.

Fernando Flores

4 reviews3 followers

March 31, 2019

Nicely well explained from scratch to advanced

Sebastian

82 reviews7 followers

Read

June 17, 2022

An excellent reference for anyone who wants to get an overview of existing Machine Learning algorithms and learn when and how to apply a suitable model to a given problem. In my opinion, the author strikes a perfect balance between detail and practicality in this textbook, thus providing a pragmatic and easy-to-follow guide for anyone willing to learn.

You do not need any prerequisites to read this book other than a working knowledge of the Python programming language and some rudimentary understanding of statistical data analysis concepts. Along the way, you will learn the basics of scikit-learn, Tensorflow 2, and Keras and see all of these frameworks in action.

It is also easy and straightforward to access the accompanying Jupyter notebooks from the author's GitHub repository and run them, e.g., in Google's Colaboratory.

Shyam Poovaiah

22 reviews1 follower

November 8, 2020

This is the first ML book I read.
It covers all topics along with practical examples.
This is a hands on book and there are no mathematical proofs.
The best book to start your journey in ML.
Consider the second version of this book since it uses Keras and TensorFlow.

programming

Michael

121 reviews5 followers

April 4, 2022

Great book for getting to know the basics of data science in Python. Pretty easy to find your way around for reference. Recommended for people looking to break into data science from other parts of CS/sciences.

Omri Har-shemesh

21 reviews

September 25, 2019

Great book for introduction to machine learning using Scikit-Learn. I didn't like as much the part about Tensorflow but the scikit-leran one is great.

Joe Woods

5 reviews2 followers

Read

March 20, 2025

Read every single chapter for a class so you bet I’m putting it on Goodreads

Ewa

61 reviews2 followers

Read

September 5, 2025

1/3 read never do that again 🤪

Giulio Ciacchini

417 reviews15 followers

February 10, 2024

This textbook is like the Swiss Army knife of machine learning books—it's packed with tools and techniques to help you tackle a wide range of real-world problems.
It takes you on a journey through the exciting landscape of machine learning, equipped with powerful libraries like Scikit-Learn, Keras, and TensorFlow.
It explains in depth each library: I was more interested in the first one, as Keras and TS are too advanced for my interests and knowledge.
It is actually funny to read the Natural Language Processing NLP, LLMs section, prior to ChatGPT.

NOTES:
Supervised Learning: the algorithm is trained on a labeled dataset, meaning the input data is paired with the correct output. The model learns to map the input to the output, making predictions or classifications when new data is introduced. Common algorithms include linear regression, logistic regression, decision trees, support vector machines, and neural networks.

Unsupervised Learning: deals with unlabeled data, where the algorithm explores the data's structure or patterns without any explicit supervision. Clustering and association are two primary tasks in this type. Clustering algorithms, like K-means or hierarchical clustering, group similar data points together. Association algorithms, like a priori algorithm, find relationships or associations among data points.

Reinforcement Learning: involves an agent learning to make decisions by interacting with an environment. Usually is used on robots: It learns by receiving feedback in the form of rewards or penalties as it navigates through a problem space. The goal is to learn the optimal actions that maximize the cumulative reward. Algorithms like Q-learning and Deep Q Networks (DQN) are used in reinforcement learning scenarios.

Additionally, there are subfields and specialized forms within these categories, such as semi-supervised learning, where algorithms learn from a combination of labeled and unlabeled data, and transfer learning, which involves leveraging knowledge from one domain to another. These types and their variations offer diverse approaches to solving different types of problems in machine learning.

Gradient descent is a fundamental optimization algorithm widely used in machine learning for minimizing the error of a model by adjusting its parameters. It's especially crucial in training models like neural networks, linear regression, and other algorithms where the goal is to find the optimal parameters that minimize a cost or loss function.
- Objective: In machine learning, the objective is to minimize a cost or loss function that measures the difference between predicted values and actual values.
- Optimization Process: Gradient descent is an iterative optimization algorithm. It works by adjusting the model parameters iteratively to minimize the given cost function.
- Gradient Calculation: At each iteration, the algorithm calculates the gradient of the cost function with respect to the model parameters. The gradient essentially points in the direction of the steepest increase of the function.
- Parameter Update: The algorithm updates the parameters in the direction opposite to the gradient (i.e., descending along the gradient). This step size is determined by the learning rate, which controls how big a step the algorithm takes in the direction of the gradient.
- Convergence: This process continues iteratively, gradually reducing the error or loss. The algorithm terminates when it reaches a point where further iterations don't significantly decrease the loss or when it reaches a predefined number of iterations.

There are variations of gradient descent, such as:

Batch Gradient Descent: Calculates the gradient over the entire dataset.
Stochastic Gradient Descent (SGD): Computes the gradient using a single random example from the dataset at each iteration, which can be faster but more noisy. Randomness is good to escape local optima.
Mini-batch Gradient Descent: Computes the gradient using a small subset of the dataset, balancing between the efficiency of SGD and the stability of batch gradient descent.

Gradient descent plays a vital role in training machine learning models by iteratively adjusting parameters to find the optimal values that minimize the error or loss function, leading to better model predictions and performance.
It is commonly used in conjunction with various machine learning algorithms, including regression models. It serves as an optimization technique to train these models by minimizing a cost or loss function associated with the model's predictions.

Support Vector Machines SVM
It can perform linear or nonlinear classification, regression and even outlier detection.
Well suited for classification of complex small to medium sized datasets.
They tend to work effectively and efficiently when there are many features compared to the observations, but SVM is not as scalable to larger data sets and it’s hard to tune its hyperparameters.
SVM is a family of model classes that operate in high dimensional space to find an optimal hyperplane when they attempt to separate the classes with a maximum margin between them. Support vectors are the points closest to the decision boundary that would change it if were removed.
It tries to fit the widest possible space between the classes, staying as far as possible from the closest training instances: large margin classification.
Adding more training instances far away from the boundary does not affect SVM, which is fully determined/supported by the instances located at the edge of the street, called support vectors.
N.B. SVMs are sensitive to the feature scales.

Soft margin classification is generally preferred to the hard version, because it is tolerant to outliers and it’s a compromises between perfectly separating the two classes, and having the widest possible Street.
Unlike Logistic regressions, SVM classifiers of not output probabilities.

Nonlinear SVM classification adds polynomial features and thanks to the kernel trick we get the same result as if we add many high-degree polynomial features, without actually adding them so there is no combinatorial explosion of the number of features.

SVM Regression reverses the objective: it tries to fit as many instances as possible on the street while limiting margin violations, that is training instances outside the support vectors region.

Decision Trees
They have been used for the longest time, even before they were turned into algorithms.
It searches for the the pair (feature, threshold) that produces the purest subsets (weighted by their size) and it does it recursively, however it does not check whether or not the split will lead to the lowest possible impurity several levels down.
Hence it does not guarantee a global maximum solution.
The computational complexity does not explode since each node only requires checking the value of one feature: the training algorithm compares all features on all samples at each node.
Nodes purity is measured by Gini coefficient or entropy: a node’s impurity is generally lower that its parents’.
Decision trees make very few assumptions about the training data, as opposed to linear models, which assume that the data is linear. If left unconstrained, the tree structure will adapt itself to the training data, fitting it very closely, indeed, most likely overfitting it.
Such a model is often called a non-parametric model, it has parameters, but their number is not determined prior to training.
To avoid overfitting, we need to regularize hyperparameters, to reduce the decision tree freedom during training: pruning (deleting unnecessary nodes), set a max number of leaves.

We can have decision tree regressions, which, instead of predicting a class in each node, it predicts a value.

They are simple to understand and interpret, easy to use, versatile and powerful.
They don’t care if the training data is called or centered: no need to scale features.
However, they apply orthogonal decision boundaries which makes them sensitive to training set rotation, that is the model will not generalize well because they are very sensitive to small variations in the training data. Random forests can lead to disease stability by averaging predictions over many trees.

Random Forests
It is an ensemble of Decision Trees, generally trained via bagging or sometimes pasting, typically with the max_samples set to the size of the training set.
Instead of using the BaggingClassifier the RandomForestClassifier is optimized for Decision Trees, it has all its hyperparameters.
Instead of searching for the best feature when splitting a node, it searches for the best feature among a random subset of features, which results in a greater tree diversity.
It makes it easy to measure feature importance by looking at how much that feature reduces impurity on average.

Boosting
Adaptive Boosting
One way for a new predictor to correct its predecessor is to pay a bit more attention to the training instances that the predecessor underfitted. This results in new predictors focusing more and more on the hard cases. For example, when training AdaBoost classifier the algorithm first trains of base classifier such as a decision trees, and uses it to make predictions on the training set. The algorithm then increases the relative weight of misclassified training instances, then train the second classifier using the updated weights and again makes predictions on the training said updates the instance weights and so on. Once all predictors are trained the ensemble makes predictions like bagging expect that the predictors have wights depending on their overall accuracy on the weighted training set.

Gradient Boosting
It works by sequentially adding predictors to an ensemble, each one correcting its predecessor.
Instead of tweaking the instance weights at every iteration like AdaBoost does, it tries to fit the new predictor to the residual errors made by the previous predictor.
[XGBoost Python Library is an optimised implementation]

Stacking
Stacked generalization involves training multiple diverse models and combining their predictions using a meta-model (or blender).
Instead of parallel training like in bagging, stacking involves training models in a sequential manner.
The idea is to let the base models specialize in different aspects of the data, and the meta-model learns how to weigh their contributions effectively.
Stacking can involve multiple layers of models, with each layer's output serving as input to the next layer.
It requires a hold-out set (validation set) for the final model to prevent overfitting on the training data.
Stacking is a more complex ensemble method compared to boosting and bagging.
[Not supported by Scikit-learn]

Unsupervised Learning
Dimensionality Reduction
Reducing dimensionality does cause information loss and makes pipelines more complex thus harder to maintain, while speeding up training.
The main result is that it is much easier to rely on Data Viz once we have fewer dimensions.
[the operation can be reversed, we can reconstruct a data set relatively similar to the original]
Intuitively dimensionality reduction algorithm performs well if it eliminates a lot of dimensions from the data set without losing too much information.

The Curse of Dimensionality
As the number of features or dimensions in a dataset increases, certain phenomena occur that can lead to difficulties in model training, performance, and generalization.

- Increased Sparsity: In high-dimensional spaces, data points become more sparse. As the number of dimensions increases, the available data tends to be spread out thinly across the feature space. This sparsity can lead to difficulties in estimating reliable statistical quantities and relationships.
- Increased Computational Complexity: The computational requirements grow exponentially with the number of dimensions. Algorithms that work efficiently in low-dimensional spaces may become computationally expensive or impractical in high-dimensional settings. This can affect the training and inference times of machine learning models.
- Overfitting: In high-dimensional spaces, models have more freedom to fit the training data closely. This can lead to overfitting, where a model performs well on the training data but fails to generalize to new, unseen data. Regularization techniques become crucial to mitigate overfitting in high-dimensional settings.
- Decreased Intuition and Visualization: It becomes increasingly difficult for humans to visualize and understand high-dimensional spaces. While we can easily visualize and interpret data in two or three dimensions, the ability to comprehend relationships among variables diminishes as the number of dimensions increases.
- Increased Data Requirements: As the dimensionality increases, the amount of data needed to maintain the same level of statistical significance also increases. This implies that more data is required to obtain reliable estimates and make accurate predictions in high-dimensional spaces.
- Distance Measures and Density Estimation: The concept of distance becomes less meaningful in high-dimensional spaces, and traditional distance metrics may lose their discriminative power. Similarly, density estimation becomes challenging as the data becomes more spread out.

Projection
In most real-world problems, training instances are not spread out uniformly across all dimensions: many features are almost constant whereas others are highly correlated.
As a result, all training instances lie within a much lower dimensional subspace of the high-dimensional space.
If we project every instance perpendicularly onto this subspace we get a new Dimension-1 dataset.

Manifold Learning focuses on capturing and representing the intrinsic structure or geometry of high-dimensional data in lower-dimensional spaces, often referred to as manifolds.
The assumption is that the task will be simpler if expressed in the lower dimensional space of the manifold, which is not always true: the decision boundary may not always be simpler with lower dimensions.

PCA Principal Component Analysis
It identifies the hyperplane that lies closest to the data and then it projects the data onto to it while retaining as much of the original variance as possible.
PCA achieves this by identifying the principal components of the data, which are linear combinations of the original features, the axis that accounts for the largest amount of variance in the training set.
[It's essential to note that PCA assumes that the principal components capture the most important features of the data, and it works well when the variance in the data is aligned with the directions of maximum variance. However, PCA is a linear technique and may not perform optimally when the underlying structure of the data is nonlinear. In such cases, non-linear dimensionality reduction techniques like t-Distributed Stochastic Neighbor Embedding (t-SNE) or Uniform Manifold Approximation and Projection (UMAP) might be more appropriate.]

It identifies the principal components via a standard matrix factorization technique, Singular Value Decomposition.
Before applying PCA, it's common to standardize the data by centering it (subtracting the mean) and scaling it (dividing by the standard deviation). This ensures that each feature contributes equally to the analysis.
PCA involves the computation of the covariance matrix of the standardized data. The covariance matrix represents the relationships between different features, indicating how they vary together.
It is useful to compute the explained variance ratio of each principal component which indicates the proportion of the dataset’s variance that lies along each PC.
The number of dimensions to reduce down to, should account for 95% of the variance.
After dimensionality reduction the training set takes up much less space.

- Dimensionality Reduction: The primary use of PCA is to reduce the number of features in a dataset while retaining most of the information. This is beneficial for visualization, computational efficiency, and avoiding the curse of dimensionality.
- Data Compression: PCA can be used for data compression by representing the data in a lower-dimensional space, reducing storage requirements.
- Noise Reduction: By focusing on the principal components with the highest variance, PCA can help filter out noise in the data.
- Visualization: PCA is often employed for visualizing high-dimensional data in two or three dimensions, making it easier to interpret and understand.

Kernel PCA, Unsupervised Algorithm
The basic idea behind Kernel PCA is to use a kernel function to implicitly map the original data into a higher-dimensional space where linear relationships may become more apparent. The kernel trick avoids the explicit computation of the high-dimensional feature space but relies on the computation of pairwise similarities (kernels) between data points.

Commonly used kernel functions include the radial basis function (RBF) or Gaussian kernel, polynomial kernel, and sigmoid kernel. The choice of the kernel function depends on the characteristics of the data and the desired transformation.

After applying the kernel trick, the eigenvalue decomposition is performed in the feature space induced by the kernel. This results in eigenvalues and eigenvectors, which are analogous to those obtained in traditional PCA.

The final step involves projecting the original data onto the principal components in the higher-dimensional feature space. The projection allows for non-linear dimensionality reduction.
Kernel PCA is particularly useful in scenarios where the relationships in the data are not well captured by linear techniques. It has applications in various fields, including computer vision, pattern recognition, and bioinformatics, where the underlying structure of the data might be highly non-linear.

However, it's important to note that Kernel PCA can be computationally expensive, especially when dealing with large datasets, as it involves the computation of pairwise kernel values. The choice of the kernel and its parameters can also impact the performance of Kernel PCA, and tuning these parameters may be necessary for optimal results.

Clustering: K-Means
It is the task of identifying similar instances and assigning them to clusters or groups of similar instances.
It is an example where we can use Data Science not to predict but to classify the existing data.
Use cases:
- Customer segmentation: You can cluster your customers based on their purchases and their activity on your website. This is useful to understand who your customers are and what they need, so you can adapt your products and marketing campaigns to each segment.

coding math-physics non-fiction

Shilpa Subrahmanyam

12 reviews1 follower

August 19, 2022

The best book I've read on machine learning fundamentals to-date. I've started to recommend this book to everyone I mentor in the space.

Therese Barøy Ræder

33 reviews1 follower

November 14, 2025

For spesielt interesserte…….

Matthieu Miossec

36 reviews4 followers

October 20, 2021

It is impossible to absorb everything in this beast of a machine learning book on first reading (that will require revisiting several chapter and exercises), but I certainly know a whole lot more than when I started reading it. This does truly appear to be the best guide out there to truly get going. Will definitely be revisiting soon.

bimri

Author 2 books12 followers

August 16, 2025

Of all rankings of ML books Aurélien's HOML tops most lists. With having traversed it now cover to cover for a second time, I'd say this book totally lives up to the hype. I will make a case for that shortly. But first, I must say I gotta read it a third time, for the charm of it. To even the score with this latest review being for the third edition.

I first heard of the book, a few years back, probably 3. Then the book was a heavy tome of torture, having not solidified my skills as a programmer and my mathematical background still lacking to say the least. Regardless, even then the book was well written to float through it as an initialization step on my learning of ML.

Most technical books order their contents in terms how graspable the topics are, that is gradually - from simpler ones onto complex ones. To my understanding, Aurélien's HOML begins with `Machine Learning Algorithms` which are quite taxing to comprehend immediately. Due to their many technical nature from mathematical background and statistics of course. As with everything AI/ML - its the taxonomy that is hard to wrap your head around. Most other concepts are fairly easy, once you are done with all the hard words - most just names of the inventors of those particular methods used. Be warned, there are many of those around.

Then the second part of the book, if studied chronologically, will be way easier and I dare say fun to read. Most applicable knowledge within deep learning which is in a craze hype of AI right now: - is seen upfront in this latter part of the book. Which is still technical in nature, even more so, cause stakes are higher with each deep neural network; but, the fun is also surreal. As one builds deep neural nets that will surely be used in the field to create some of the most dazzling models. The book also is cutting edge of reaching some of the model's creation used in likes of the top AI research labs.

Other than those two prime reasons to read this book; the other, rather lingering indirectly throughout this book, just as the field of AI is and as the book is written - you will come to the realization of how research driven everything really is and there is no finality even remotely. This fact of course is heighten by the sheer research papers cited in this book. Which reading all of them constitutes another odyssey.

With all that said, as one works out all the implementation details of each ML/DL neural network, I sure hope the engineers on track here will build safe models. In whichever corner of AI they eventually converge to! As Aurélien says, "you now have superpowers: use them well!"

Hồ Vinh

117 reviews12 followers

August 18, 2018

Here are what I expected from the book and it actually did achieve:
- The intended way to use Scikit-learn and TensorFlow. Specifically, essential building components, the understanding of their capacities in modelling and how to extend a model systematically, from a software engineering's perspective.

What I did not expect but happy to learn about:
- A typical guideline of how to attack a machine learning problem.
- An update of almost all well-known models: the first half is about Decision Tree, SVM, etc... whereas the second half is primarily about Deep Learning. The best thing is, both the technical and practical aspects are covered; the book does not dwell much into details of introduced techniques but suffices to gain intuition and dig down if audiences are interested.

Furthermore, I have to emphasize how much I love the way all the concepts are represented here. Instead of a lookup table, the author chooses to guide readers with examples, but along the line substantially introduces the core mechanism, in case of TensorFlow, as well as produces a complete model that you can play with.

Definitely recommended for newbie who looks for a starting place to learn about Machine Learning.

2018

Displaying 1 - 30 of 218 reviews

More reviews and ratings

Join the discussion

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Aurélien Géron

About the author

Aurélien Géron

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion