Jump to ratings and reviews
Rate this book

Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions

Rate this book
Use machine learning to understand your customers, frame decisions, and drive value The business analytics world has changed, and Data Scientists are taking over. Business Data Science takes you through the steps of using machine learning to implement best-in-class business data science. Whether you are a business leader with a desire to go deep on data, or an engineer who wants to learn how to apply Machine Learning to business problems, you’ll find the information, insight, and tools you need to flourish in today’s data-driven economy. You’ll learn how the key building blocks of Machine sparse regularization, out-of-sample validation, and latent factor and topic modelingUnderstand how use ML tools in real world business problems, where causation matters more that correlationSolve data science programs by scripting in the R programming languageToday’s business landscape is driven by data and constantly shifting. Companies live and die on their ability to make and implement the right decisions quickly and effectively. Business Data Science is about doing data science right. It’s about the exciting things being done around Big Data to run a flourishing business. It’s about the precepts, principals, and best practices that you need know for best-in-class business data science.

591 pages, Kindle Edition

Published August 23, 2019

83 people are currently reading
496 people want to read

About the author

Matt Taddy

5 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
27 (46%)
4 stars
17 (29%)
3 stars
11 (18%)
2 stars
3 (5%)
1 star
0 (0%)
Displaying 1 - 8 of 8 reviews
111 reviews35 followers
September 9, 2019
Matt Taddy's Business Data Science should be required reading for anyone doing applied econometrics in 2019. While the title makes it sound like any one of dozens of books on introductory statistics for business rehashing 30 or even 100 year old material for bored business school students in a mandatory class, this is actually a tastefully arranged tour and hands on introduction to the tools and practices that a current day data scientist at a top tier company will use in solving real business problems, by an economist with practical experience at Ebay, Microsoft, and Amazon as well as an academic record spanning top tier publications in both economics and ML. Concretely, that means while it eases you in with basic regression, it proceeds quickly to modern statistical learning tools, starting with the bootstrap, moving to regularized regression and classification, and reaching to tree based methods and unsupervised learning, all accompanied by in line code (in R) and abundant practical advice from years of real world experience. The selection of topics is opinionated rather than encyclopedic; methods that are found useful, including particular variants of the lasso, get detailed code examples, often coming from Taddy's own research, while methods which are now mostly passé, like SVMs, get a dismissive sentence or two, which is appropriate for an introduction. The general subject matter and style is comparable to books like Hastie, Tibshirani, and Friedman's "Elements of Statistical Learning" and Efron and Hastie's "Computer Age Statistical Inference", but the focus, applications, and presumed reader knowledge are targeted to business and economics in a way which will make it a better introduction for that class of readers.

One of the real standout features of this book is the real data examples, for which the real code introduces you, through learning by doing, to the kinds of data cleaning and formatting and computational implementation details (like the very nice coverage of sparse matrix and parallel and distributed computing facilities in R, along with candid admissions of when Python would be preferred instead) that make up so much of the work life of the professional data scientist but are nowhere to be seen in your typical intro class where students work with mtcars, iris, UCI repository data sets and MNIST, etc. There are some of these canned data sets in the book for simple examples, but enough real ones that you get the feel for it.) Having all of this inline, rather than hidden away or in some special "data cleaning" chapter, makes for a bit more challenging reading for the reader not well versed in R, and I personally kept an R terminal open while reading to try out bits needing clarification, but presenting it as it came up for the specific issues for each specific data set kept it manageable while also demonstrating what it requires in practice.

The "Business" in the title is not just because the author taught in a business school; not just business and economic examples, but careful thought about when and how the methods should be used for business goals are integrated throughout. Causal inference gets two chapters, covering experiments and control, which go over AB testing and a quick run through the "Mostly Harmless" playbook that makes up most of the distinctive econometric knowledge of many applied economists (IV, RD, Diff-in-diff, you know the drill), but also more recent methods incorporating machine learning components, like variants of orthogonal ("Doubly robust") estimators and causal trees and forests for high dimensional control and heterogeneous treatment effects estimation. The discussion of demand estimation here is particularly well informed. Much of Taddy's own work has been on use of text data, and the discussion is highly practical, focusing on preprocessing tasks and issues with use an interpretation of simple methods, like lasso and partial least squares text regression on the supervised side and PCA and LDA on the unsupervised side. This, like the book's heavy focus on lasso methods, reflects a taste for simple, scalable, and interpretable methods with stable well-established implementations which is appropriate both for an introductory text and for real business decisions, which require input that takes both data and real thinking rather than just ML black boxes. Deep learning is postponed to the last chapter, which is more descriptive than instructional, which may be disappointing for people swept up in the hype (and those unfortunate engineers at companies that mandate all data analysis be performed remotely via a Tensorflow API), but is defensible for the business and economics intended audience. I personally introduce a little bit of Keras when teaching similar material just to demonstrate that deep learning is no more challenging than any of the rest of the material, but given how fast the frameworks change and the abundance of alternative resources, it's fine to leave it out.

In terms of complaints, aside from not having released the book earlier when I was teaching classes on similar material and had to build up notes from scattered papers and software guides, I don't have many. Some of the figures and math notation get jumbled up or cut off in the Kindle edition, which is a minor annoyance. The included subsection on inference for the lasso, while appropriately heavily caveated, gives readers by its very presence the pernicious idea that "standard errors for lasso coefficients" is an idea that makes any sense at all, rather than an elementary misunderstanding that overzealous and underinformed referees should be persuaded out of. Specifically, in high dimensions where a bias variance tradeoff is unavoidable, valid confidence intervals "for" a regularized estimator cannot be made to be centered on the estimates (at least not without being massively inflated). One can produce intervals for the underlying model, or for particular functionals of the model, but jointly achieving rate optimal predictions and rate optimal valid (frequentist) inference for a model of this type is known to be simply impossible (Giné and Nickl's book is a useful source on this). The proposed purely heuristic approaches, based on undersmoothing and subsampling, seem unnecessary given that the later causal chapter covers a semiparametric approach for estimation of functionals which is demonstrably fit for purpose. As a matter of disclosure, I have not personally been forced into this by misguided editor or referee, but I have repeatedly been told by economists reluctant to use machine learning or even classical nonparametric methods that expected editors' demands for valid confidence intervals are holding them back from applying them. So devising potemkin methods designed only to get an editor off your back without genuinely solving the problem rather than forcefully explaining what cannot and can be done here (including valid inference as a separate task from point estimation!) seems like a missed opportunity.

Overall, this book is an incomparable resource, bringing modern data science practices into reach for applied economists, who even at the forefront could learn a lot of immediate practical relevance from it.

Profile Image for Haydn.
125 reviews3 followers
October 7, 2021
Useful in that Taddy outlines which techniques are actually used in industry and why others - including the well-known - are not. But ultimately he gets stuck between discussing theory, writing code, and professional applications. This book attempts to give an overview of all three but ends up not giving you enough on any.

It gives you a destination but sadly no usable map of how to get there.
Profile Image for Terran M.
78 reviews105 followers
March 9, 2022
For an audience who already knows how to fit and interpret models, Taddy's book shows you how to apply the techniques you know to causal inference, one of the most important strategic questions in business today. This book is a much quicker read and more accessible than either Angrist and Pischke or Imbens and Rubin, and will get you pointed in the right direction. You'll also get clear thinking about the difference between prediction and inference and where each can be used in the modern firm.

There are substantial errors in the equations and code, and I could not find any published errata list, which makes this four stars instead of five. These erors may seem obvious to the experienced but can derail learners for days.
Profile Image for Wei Cui.
18 reviews
December 22, 2019
The special feature of Business data science is that it includes deep theoretical foundation built upon the author’s academic research and VERY practical tips on applying algorithms to business for real from the author’s industry experience. This is rare and invaluable. As a professional in data field, when reading the book, I consistently switched between “ok, now I understand this method’s theoretical foundation” and “wow, I should try this trick in my project”.

I also like the writing style, with examples from various fields and the author’s clear explanation, sometimes it feels like an in person chat with the author. It was indeed a pleasure reading experience, not common for a fairly technical book.

I especially like the final chapter Artificial Intelligence, where the author presents his thought on where the industry is heading towards and what talent will be key for this exciting future. I reread this chapter a couple of times.

Overall, a great book for data scientists to enhance theoretical foundation, expand tool sets, and plan for career development!
20 reviews
Want to read
October 14, 2022
DataScience and ML plan:

(0) Ace the Data Science Interview (Kevin Huo)
(1) Business Data Science
(2) Naked Statistics — Stripping the Dread From the Data
(3) Machine Learning Simplified
(4) Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, 3rd Edition
(5) Practical Statistics for Data Scientists
(6) Elements of Statistical Learning
(7) Machine Learning Yearning
(8) Artificial Intelligence in Practice: How 50 Successful Companies Used AI
(9) Deep Learning Illustrated
(10) Deep Learning with Python
(11) Deep Learning (by Ian Goodfellow et al)
(12) Interpretable Machine Learning with Python
(13) Mastering 'Metrics: The Path from Cause to Effect
This entire review has been hidden because of spoilers.
Profile Image for Ed Barton.
1,303 reviews
December 22, 2019
Covering a lot of statistics, with snippets of R code and a touch of Python, you will get a good overview of business oriented data science in this book. A caution - if you are not reasonably strong on your stats before you dive into this book, you will quickly become befuddled. I've got 12+ credit hours of stats and found the book a challenge. Having said that, it covers the basics well and touches on pretty much everything from simple linear regression to machine learning and AI. If this is your discipline, it's a good intro.
Profile Image for O.
38 reviews
July 13, 2021
Professor Taddy was a superstar professor at Booth School (UChicago). My friends took classes from him and recommended buying his book. I purchased it the day it was released and it is one of my favorite books on ML (kudos for R language too). Taddy left Booth for Amazon, I wish I could take his class.
6 reviews
September 7, 2024
I think this book found a good balance between technical details and business intuition. It also highlighted interesting real-world applications of business data science and structured the different areas logically.
Displaying 1 - 8 of 8 reviews

Can't find what you're looking for?

Get help and learn more about the design.