MongoDB: a Fast and Easy Way to Calculate Aggregated Values without Map-Reduce

A MongoDB aggregation framework allows you to calculate aggregated values without having to use map-reduce. While map-reduce is a powerful tool, it often proves to be slow when processing big volumes of data. In this article, I would like to compare map-reduce with MongoDB and show the significant benefits of using the latter.

MongoDB vs Map-Reduce

The main differences of Aggregation Framework from Map-Reduce are:

  • declarative syntax, no need to write code in JavaScript;

  • describing chains of operations to apply;

  • expressions evaluation;

  • higher performance because aggregation framework is implemented in C++ instead of JavaScript;

  • projections of returned data so a user can add computed fields, sub-objects, etc.

Framework concepts

Aggregation Framework provides the similar logic as the “GROUP BY” SQL operator. There are 2 main concepts in aggregation framework: pipelines and expressions. Pipelines are operators that can process a stream of the documents. Expressions return the output documents after the calculations on input documents. Some pipelines:

  • $match – uses query predicate like collection.find({});

  • $project – allows to change the shape of the result, include computed values, sub-objects, etc.;

  • $unwind – separates elements of an array and add it into an output document;

  • $sort – sorts documents;

  • $limit – specifies maximum number of documents to be returned;

  • $skip – skips a specified number of documents.

Using MongoDB in Node.JS: our hands-on experience

MongoDB has drivers for many programming languages and platforms, including Node.JS. You can install Node.JS driver by typing npm install mongodb.

All MongoDB features are available in the driver. There was a task to aggregate huge data collection by three fields to build some statistical report. The collection contained about 500k records with web pages views statistics. Each document had the following format:

It was necessary to group data by time, IP address and URL. The first version of this logic was implemented using map-reduce:

The processing of 500k records took about 1 minute. It was an annoying issue and we decided to switch to the MongoDB 2.1. aggregation framework. The new version of aggregation logic is presented below:

In this code, we use 2 pipelines: $match and $group. The $match filter required records, and the $group aggregates records by three fields: time, URL and IP. These fields are used as a key because we explicitly specified ‘_id’ field and expression $sum calculates the number of records with the same key. The output data has the following view:

Result

The use of an aggregation framework significantly improved the performance of the processing. Now 500k of records are processed within 3-4 seconds. The MongoDB aggregation framework is a powerful, simple and lightweight tool that really allows you to improve the performance of aggregated values calculations without using map-reduce.

Alexander P.
February 02, 2018
related
iOS vs Android User Experience: 10 things Google Play and App Store Visitors Do Differently 10 Top Questions about Web Developers Preparing for 5G Networks: How New 5G Technology will Change the App Market
recent
iOS vs Android User Experience: 10 things Google Play and App Store Visitors Do Differently 10 Top Questions about Web Developers Preparing for 5G Networks: How New 5G Technology will Change the App Market
recommended
Everything You Want to Know About Mobile App Development App Development Calculator Infographics: Magora development process Dictionary
categories
News Technologies Design Business Development
Logo Magora LTD
close
Get in touch
Do you agree to the personal data processing?

Logo Magora LTD
close
Thank you very much.

Your registration to the webinar on the 27th of September at 2 p.m. BST was successfuly completed.
We will send you a reminder on the day before the event.
Magora team
Registration for a webinar

"Let Smart Bots Speed up your Business"
Date: 27.09.2018 Time: 2 p.m. BST
Do you agree to the personal data processing?