PART I - Understanding Machine Learning Chapter 1: Machine Learning BasicsChapter This chapter familiarizes and acquaints readers with the basics of machine learning, industry standard workflows followed for machine learning processes and expands on the different types of machine learning and deep learning algorithmsNo of 50-60 Sub -Topics1. Brief on machine learning, definitions and concepts2. Industry standard for data mining processes - CRISP - DM and adoption in ML3. Brief on data processing, visualization, feature extraction\engineering concepts4. Types of learning algorithms - supervised, unsupervised, reinforcement learning5. Advanced models - time series, deep learning6. Model building and validation concepts7. Applications of machine learningChapter 2: The Python Machine Learning EcosystemChapter This chapter introduces readers to the python language and the entire ecosystem built around machine learning with python tools, frameworks and libraries. Overview and code samples are given for each tool to depict its usage and effectivenessNo of 50 - 60Sub - Topics 1. Brief on Python 2. Why is Python effective for machine learning and data science3. Brief overview on the python ecosystem followed by data scientists (includes anaconda distribution) 4. Reproducible research with ipython5. Data processing and computing with pandas, numpy, scipy6. Statistical learning with statsmodels7. ML frameworks - scikit-learn, pyml etc8. NLP frameworks - nltk, pattern, spacy9. DL frameworks - theano, tensorflow, keras PART II - The Machine Learning PipelineChapter 3: Processing, wrangling and visualizing data&Sub - 1. Data Retrieval mechanisms (crawling, databases, APIs etc)2. Data processing (handling various forms of data - SQL, JSON, XML, Images)3. Data attributes and features (numeric, categorical etc)4. Data Wrangling (cleaning, handling missing values, normalizing data)5. Data Summarization6. Data Visualization (bar, histogram, boxplot, line, scatter etc) Chapter 4: Feature Engineering and SelectionChapter This chapter focuses on the next stage in the ML pipeline, feature extraction, engineering and selection. Readers will learn about both basic and advanced feature engineering methods for different data formats including numeric, text and images. We will also focus on methods for effective feature selectionNo of 50 - 60Sub - 1. Features - understanding yourv>2. Basic Feature engineering3. Extracting features from numeric, categorical variables4. Extracting features from date\timestamp variables5. Extracting Basic features from textual data (bag of words)6. Advanced Feature engineering7. Extracting complex features from textual data (word vectorization, tfidf, topic models)8. Extracting features from images (pixels, edge detection, shapes)9. Time series features10. Feature scaling and standardization11 Feature selection techniques12 Using forward\backward selection techniques13 Using machine learning models like random forests14 Other methods Chapter 5: Building, tuning and deploying modelsChapter This chapter focuses on the final stage in the ML pipeline where readers will learn how to fit and build models on data features, how to optimize and tune model
We are trying to implement a data lake on our network. During this work to launch Spark as a processing layer the book gave me a lot of ideas to do. I changed the examples to adapt to execute on cluster then I submitted on our cluster.
The book is ok to get some level of overview of some machine learning practices. It doesn't really explain much (mathematical) background of what's going on, so if you don't have that background already you will have a harder time in places. The examples feel a little random, but on the other hand they cover a fairly broad spectrum of application areas which is good. Probably the most important issue with the book nowadays is that it was written in 2017, and in the fast evolving landscape of ML, especially deep learning model architecture, it often feels woefully out of touch with topics of interest today. Also, a lot of the Jupyter notebooks don't run anymore since the software APIs of the current versions of tools used in them have made changes that are not backed compatible. It would have helped if the authors had provided a docker image (or similar) with frozen versions of all the Python libraries that the Jupyter notebooks depend on.
This book presents random examples of machine learning exercises. Unfortunately this is no introduction to machine learning, no guide on ML ecosystem in python