Jump to ratings and reviews
Rate this book

Effective Pandas: Patterns for Data Manipulation

Rate this book
Best practices for manipulating data with Pandas. This book will arm you with years of knowledge and experience that are condensed into an easy to follow format. Rather than taking months reading blogs and websites and searching mailing lists and groups, this book will teach you how to write good Pandas code.

It

497 pages, Paperback

Published December 8, 2021

28 people are currently reading
340 people want to read

About the author

Matt Harrison

40 books24 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
41 (49%)
4 stars
28 (33%)
3 stars
11 (13%)
2 stars
1 (1%)
1 star
2 (2%)
Displaying 1 - 11 of 11 reviews
Profile Image for GleeGMJournal.
301 reviews1 follower
June 30, 2024
A helpful reference/cookbook for Python pandas library. Thorough content for a beginner who wants to have a touch on pandas e.g. Python data type, dataframe, Time series, exporting, joining, filtering, aggregation, groupby

Author keeps adding his personal takes on which method he prefers, which is more intuitive to him. Those're helpful. However, as I don't use visualization on python much, the topics that I will keep revisiting the most are Filtering, Aggregation, Groupby.

The editing and format can be improved here. Many figures and code snippets are placed to separated pages that required me to turn the page back and forth. It's inconvenient to read.
Profile Image for Robert.
302 reviews
July 2, 2022
Effective Pandas is an excellent opinionated guide to pandas. “Opinionated” is important in the context of pandas: it’s a very flexible library that gives you rope to hang yourself, often contradicting the zen of Python: “There should be one — and preferably only one — obvious way to do it”. This can lead to some very messy code, in which the time-pressed data scientist ends up melding several different programming philosophies just to get their aggregation to work.

No doubt some of the people reading this will consider “effective pandas” to be an oxymoron. This is justified. Pandas seldom feels like it was “designed” in the way that R’s tidyverse is. It is a collection of tricks, wherein fluency only arises from many hours spent pulling one’s hair out.

Effective Pandas is Harrison’s effort to define and encourage “idiomatic pandas”, using chaining. It just so happens that chaining is the style of pandas that I had converged on due to its readability and elegance. Seeing a nice piece of chained pandas should mellow the complaints of tidyverse folks.

One area that wasn’t properly addressed (which is why I give this 4 stars instead of 5) is memory usage and performance. Some of this is quite important, as there are some methods in pandas that create copies of objects but others that modify objects in place.

This aside, Effective Pandas is a useful and readable outline of an important tool; it has the flavour of a user guide rather than a documentation reference. I skimmed through several chapters because I’m reasonably familiar with pandas, but I’d recommend the book to anyone who uses pandas a lot.

My highlights here.
Profile Image for Daniel Walton.
107 reviews2 followers
January 5, 2024
A nice pandas reference that was actually quite readable, too. I picked up some useful knowledge, so I count that as a win. In particular, I really liked the chaining technique and usage of ‘.pipe’ as well as the debugging ideas. The display formatting was new to me, too, which is quite useful. Even with chatgpt majorly assisting with df manipulation these days, knowing these patterns and what’s available helps a lot.
Profile Image for Dang-Khoa Le Tan.
41 reviews14 followers
August 3, 2023
More like a manual than a book with "Effective" in the title (Like Effective C++ or Effective Python). No deep dive into understanding pandas internal designs or best practices (although there are some tips and tricks here and there but not much). Pandas API are grouped into categories based on its functionality so I guess it serves as a reference better than the one on the official website.
Profile Image for Ethan J.
356 reviews11 followers
October 4, 2022
I love pandas, but would hope to have seen more details, this book feels like a print of the official documentation...
but what differentiates it from documentation is it provides more structure? :/
meh
Profile Image for Walter Ullon.
331 reviews164 followers
March 7, 2022
Pandas is one of those libraries that suffers from the "guitar principle" (also known as the "Bushnell Principle" in the video game circles): it is easy to use, but difficult to master.

Truly, it is one of the most straightforward and powerful data manipulation libraries, yet, because it is so easy to use, no one really spends much time trying to understand the best, most pythonic way to employ the library to its full extent.

If you haven't read Matt Harrison's book and use Pandas, chances are you're like that Chad at the picnic or camping trip that pulls out his guitar to strum along the same basic chords for an hour straight... Well, NO MORE!

Matt Harrison is ready to drop some knowledge on you and have you riffing your own data manipulation solos like you're Slash in "November Rain", or Prince in "Purple Rain"...

The book goes beyond explaining the data structures and methods that underpin Pandas, but he also provides a ton of practical advice regarding best practices in data manipulation and transformations.

For instance, by the time you're done you'll know which functions to use to leverage Pandas' vectorized structures to ensure your code is fast and efficient, which data types provide huge savings in terms of memory allocation, how to chain operations to ensure you're always accessing the correct intermediary dataframe, how to utilize indices to give you superpowers over your data, how to debug chains, merge, join, melt, style, and more.

It is by far, the best book you can get yourself if you want to take your data science skills to the next level, after all, they say modern data science is 90% data cleaning. I mostly agree.

I have recommended this book to every member of my team. REQUIRED READING.

Highest possible recommendation.
Profile Image for Joao Pedro.
20 reviews
June 12, 2022
This book is the best book on pandas out there. It is the first work on pandas that seriously leverages the power of method chaining for running clean and efficient pandas code. There are no in-place commands, no copies of data frames (wonderful!). It is also a legit hands-on material, and even the session on debugging chains that starts with commenting commands before applying more robust debugging methods mimics what data scientists are really doing in the wild.

This is the future of pandas and this type of material paves the way for lazy evaluation.
Displaying 1 - 11 of 11 reviews

Can't find what you're looking for?

Get help and learn more about the design.