MongoDB Aggregation I: Counting and Grouping

This is the first in a series of articles detailing the syntax, patterns, and
use cases for MongoDB’s aggregation functions:

Counting and Grouping
Grouping Elaborated
Map-reduce Basics

Here, as an introduction, we cover the basics of count(), distinct(), and group().

Count

count() simply returns the number of documents in a collection.

Suppose we have a collection where each document represents a pageview. We can
get the total number of pageviews like so:

// Javascript
db.pageviews.count()

# Ruby
@db['pageviews'].count

Admittedly more useful would be the number of pageviews from a given month:

// Javascript
db.pageviews.count({date:
{'$gte': new Date("Nov 1, 2009"), '$lt': new Date("Dec 1, 2009")}})

# Ruby
@db['pageviews'].find({:date =>
{'$gte' => Time.utc(2009, 11), '$lt' => Time.utc(2009, 12)}}).count

Notice that we’ve passed a document selector to the count() method.
This ensures that only documents matching the selector are included in the total.

Distinct

Another simple but useful aggregator is distinct(), which returns an
array of unique values for a given key in a collection. We can specify a
root-level key or a nested key. Here, we request unique ip-addresses and user agents:

// Javascript
db.pageviews.distinct('ip-address')
db.pageviews.distinct({'user.agent'})

# Ruby
@db['pageviews'].distinct('ip-address')
@db['pageviews'].distinct('user.agent')

Group

MongoDB’s group() command provides some of the same functionality as
SQL’s GROUP BY, although the syntax is rather different. Like most database
commands, group() takes a document whose keys designate the parameters
of the operation:

// Group by user-agent, returning a sum for each user-agent.
db.pageviews.group(
{
key: {'user.agent': true},
initial: {sum: 0},
reduce: function(doc, prev) { prev.sum += 1}
});

// Returns (Note: simplified)
[
{
'user.agent': 'Mozilla 5.0 / Gecko'
'sum': 241
},
{
'user.agent': 'Mozilla 5.0 / Webkit'
'sum': 79
}
]

Looking at the command itself, the first parameter is clear enough:
key specifies which key or keys to group by.

The next two parameters, initial and reduce, may be less
familiar. Their use is analgous to most programming language implementations of
inject.

With initial, we provide a base, aggregator document for each grouped
result. Here, we’re just saying that the base document should contain one key,
sum, having a value of 0.

// Initial, aggregator document
{sum: 0}

The reduce function take two parameters: 1) the current document being
processed, and 2) the document we’re aggregating over (which starts out as the initial document described above).

// Reduce function
function(doc, prev) {
prev.sum += 1;
}

In this simple case, we don’t even use the doc parameter; we just
increment our initial document’s sum key. Since a new document is used
for each grouping, the group() function returns all those documents as
an array, adding any keys we’re grouping by, so that we’re given a result like this:

// Result (Note: simplified)
[
{
'user.agent': 'Mozilla 5.0 / Gecko'
'sum': 241
},
{
'user.agent': 'Mozilla 5.0 / Webkit'
'sum': 79
}
]

The command we’ve been describing could be run from Ruby as follows:

# Group from Ruby
@db['pageviews'].group(['user.agent'], nil, {'sum' => 0},
"function(doc, prev) { prev.sum += 1}")

With the second parameter, here nil, we can provide a query selector, so
that group() will only operate over a certain subset of the collection.
In the next installment, we explore that along with some of group()’s more advanced features.

In the meantime, fire the up the MongoDB shell, experiment with some of these
functions, and don’t hesitate to consult the docs.

View more on Kyle Banker's website »

Like • 0 comments • flag

Published on May 02, 2013 22:57

No comments have been added yet.

Kyle Banker's Blog

Kyle Banker's profile

Kyle Banker isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

delete edit this post