Rate this book

Software Engineering for Data Scientists: From Notebooks to Scalable Systems

Name: Software Engineering for Data Scientists: From Notebooks to Scalable Systems
Rating: 4.18 (5 reviews)
ISBN: 9781098136161

Catherine Nelson

Rate this book

Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering,and clearly explains how to apply the best practices from software engineering to data science.

Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how

Understand data structures and object-oriented programmingClearly and skillfully document your codePackage and share your codeIntegrate data science code with a larger code baseLearn how to write APIsCreate secure codeApply best practices to common tasks such as testing, error handling, and loggingWork more effectively with software engineersWrite more efficient, maintainable, and robust code in PythonPut your data science projects into productionAnd more

GenresProgrammingTechnicalScienceTextbooksCoding

393 pages, Kindle Edition

Published April 16, 2024

26 people are currently reading

79 people want to read

About the author

Catherine Nelson

2 books

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

16 (47%)

4 stars

11 (32%)

3 stars

5 (14%)

2 stars

1 (2%)

1 star

1 (2%)

Displaying 1 - 5 of 5 reviews

Emma Stowe

56 reviews1 follower

September 11, 2024

Read this as part of an Innovation Sprint at work and I feel like I was the exact target audience. I know I tend to write ad-hoc code that is not easily scalable, especially by other people, so it was great to learn some software engineering best practices, especially around design/refactoring, formatting, and documentation. Lots of good packages mentioned throughout. And the tone was conversational enough that it was really easy to read.
Overall, it was validating to me that I’m not alone in my coding weaknesses as a data scientist, but also gave me good paths forward to get better. Yay.

Efgefg

1 review7 followers

June 29, 2024

I just loved this book for its simplicity and clarity. I found it extremely useful and discovered a lot of techniques I didn’t know. Would advise to everyone wanting to pursue a career in Data Science.

Hadar Sharvit

7 reviews

September 4, 2025

The book positions itself as being suitable for both junior and senior data scientists, but in practice, it only serves the former. In the book's opening, Catherine claims it can address more advanced audiences. It does not.

By attempting to cover nearly every aspect of software engineering within a data science context, the author sacrifices depth. Many topics are treated superficially, and in some cases, poor practices are suggested while core fundamentals are overlooked entirely.

The first nine chapters offer little value to anyone with even minimal cs education or more than a month of industry experience. In contrast, Chapters 10 through 12 are worth reading, as they provide clear introductions to version control, APIs, automation, and deployment. these from my experience are areas where junior data scientists / researchers lack knowledge

Bottom line: The book may work as a primer for beginners but fails to provide meaningful insights for intermediate or senior level ds

Milele

235 reviews8 followers

June 11, 2024

This book has a great list of topics for data folks who are asked to build software that has to run reliably more than once: it's time to bring in some software engineering. I think the overall scope is perfect, from technical details like modular code and using appropriate data structures for data pipelines, to fuzzy people high-level topics like automated deploys and operations and working in teams.

data

Ege Strider

25 reviews

October 3, 2025

Although most chapters were known to me, I quite liked the overall refreshments and as a seasoned data scientists (5 years) I find out that I was missing out some new practices like mypy, ruff, poetry etc. So learning about that was quite good.

I recommend it as a light read which I have done during a break between projects.

Displaying 1 - 5 of 5 reviews