Group-by From Scratch

I've found one of the best ways to grow in my scientific coding is to spend time comparing the efficiency of various approaches to implementing particular algorithms that I find useful, in order to build an intuition of the performance of the building blocks of the scientific Python ecosystem.

In this vein, today I want to take a look at an operation that is in many ways fundamental to data-driven exploration: the group-by, otherwise known as the split-apply-combine pattern.An architypical example of a summation group-by is shown in this figure, borrowed from the Aggregation and Grouping section of the Python Data Science Handbook:

[image error]

The basic idea is to split the data into groups based on some value, apply a particular operation to the subset of data within each group (often an aggregation), and then combine the results into an output dataframe.Python users generally turn to the Pandas library for this type of operation, where it is is implemented effiently via a concise object-oriented API:

 •  0 comments  •  flag
Share on Twitter
Published on March 22, 2017 10:00
No comments have been added yet.


Jake VanderPlas's Blog

Jake VanderPlas
Jake VanderPlas isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jake VanderPlas's blog with rss.