Rate this book

Feature Engineering Bookcamp

Name: Feature Engineering Bookcamp
Rating: 4.33 (1 reviews)
ISBN: 9781617299797

Sinan Özdemir

Rate this book

Deliver huge improvements to your machine learning pipelines without spending hours fine-tuning parameters! This book’s practical case-studies reveal feature engineering techniques that upgrade your data wrangling—and your ML results.

In Feature Engineering Bookcamp you will learn how

Identify and implement feature transformations for your data
Build powerful machine learning pipelines with unstructured data like text and images
Quantify and minimize bias in machine learning pipelines at the data level
Use feature stores to build real-time feature engineering pipelines
Enhance existing machine learning pipelines by manipulating the input data
Use state-of-the-art deep learning models to extract hidden patterns in data

Feature Engineering Bookcamp guides you through a collection of projects that give you hands-on practice with core feature engineering techniques. You’ll work with feature engineering practices that speed up the time it takes to process data and deliver real improvements in your model’s performance. This instantly-useful book skips the abstract mathematical theory and minutely-detailed formulas; instead you’ll learn through interesting code-driven case studies, including tweet classification, COVID detection, recidivism prediction, stock price movement detection, and more.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Get better output from machine learning pipelines by improving your training data! Use feature engineering, a machine learning technique for designing relevant input variables based on your existing data, to simplify training and enhance model performance. While fine-tuning hyperparameters or tweaking models may give you a minor performance bump, feature engineering delivers dramatic improvements by transforming your data pipeline.

About the book
Feature Engineering Bookcamp walks you through six hands-on projects where you’ll learn to upgrade your training data using feature engineering. Each chapter explores a new code-driven case study, taken from real-world industries like finance and healthcare. You’ll practice cleaning and transforming data, mitigating bias, and more. The book is full of performance-enhancing tips for all major ML subdomains—from natural language processing to time-series analysis.

What's inside

Identify and implement feature transformations
Build machine learning pipelines with unstructured data
Quantify and minimize bias in ML pipelines
Use feature stores to build real-time feature engineering pipelines
Enhance existing pipelines by manipulating input data

About the reader
For experienced machine learning engineers familiar with Python.

About the author
Sinan Ozdemir is the founder and CTO of Shiba, a former lecturer of Data Science at Johns Hopkins University, and the author of multiple textbooks on data science and machine learning.

Table of Contents
1 Introduction to feature engineering
2 The basics of feature engineering
3 Diagnosing COVID-19
4 Bias and Modeling recidivism
5 Natural language Classifying social media sentiment
6 Computer Object recognition
7 Time series Day trading with machine learning
8 Feature stores
9 Putting it all together

272 pages, Paperback

Published October 4, 2022

3 people are currently reading

27 people want to read

About the author

Sinan Özdemir

15 books9 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

3 (50%)

4 stars

2 (33%)

3 stars

1 (16%)

2 stars

0 (0%)

1 star

0 (0%)

Displaying 1 of 1 review

Walter Ullon

332 reviews165 followers

December 21, 2022

A great resource to learn feature engineering theory and best practices in a structured manner. It is a book I wish would have been available back in my early days as a data science practitioner.

The author does a superb job explaining the types of data (nominal, ordinal, ratio, interval, etc...) and the types of transformations and improvements that help turn these data into usable components for machine-learning pipelines.

Employing use cases across time series forecasting, image classification, sentiment analysis, and more, he outlines specific techniques germane to these and shows iterative model improvement by checking each version against the baseline.

I award the most Kudos to the author for showing how to detect, deal with, and mitigate bias in a machine-learning pipeline (using the COMPAS recidivism dataset). In all of my years reading these types of books, I have yet to come across another resource that even mentions these techniques, let alone devote an entire chapter packed with tools, techniques, and analytical summary. This was my favorite chapter by far.

More kudos even for also being the first book I've come across to explain, show, and implement a feature store for machine learning. Some could consider this to be the province of Data Engineers and not modelers/data scientists, but in most small companies these are one and the same and it pays to have the exposure. You would think books such as Fundamentals of Data Engineering: Plan and Build Robust Data Systems would cover this thoroughly but they only mention it in passing, and there are no implementations to follow along.

It is not perfect though. I wish the author had structured each chapter/use-case by showing how to efficiently test many feature engineering pipelines at once. For instance, in the first couple of chapters, Ozdemir performs feature improvements/transformations, checks against the baseline, then performs some more transformations, checks again, and so on. It would have been far more efficient and useful for people reading the book if he had just created these as different pipelines each and fed them to a grid-search or auto-ML model to iterate and find the best combination.

I would have also liked to have seen some implementations of these techniques combined with auto-ML frameworks such as Optuna, TPOT, AutoSKLearn, AutoKeras, etc...

Overall, the quibbles don't outweigh the great content. Highest recommendation!

ai_machine_learning data-science