MongoDB – Aggregate and Group example

mongodb-group-example

In this tutorial, we will show you how to use MongoDB aggregate function to group documents (data).

1. Test Data

Data in JSON format, shows the hosting provider for website.

website.json
{ "_id" : 1, "domainName" : "test1.com", "hosting" : "hostgator.com" }
{ "_id" : 2, "domainName" : "test2.com", "hosting" : "aws.amazon.com"}
{ "_id" : 3, "domainName" : "test3.com", "hosting" : "aws.amazon.com" }
{ "_id" : 4, "domainName" : "test4.com", "hosting" : "hostgator.com" }
{ "_id" : 5, "domainName" : "test5.com", "hosting" : "aws.amazon.com" }
{ "_id" : 6, "domainName" : "test6.com", "hosting" : "cloud.google.com" }
{ "_id" : 7, "domainName" : "test7.com", "hosting" : "aws.amazon.com" }
{ "_id" : 8, "domainName" : "test8.com", "hosting" : "hostgator.com" }
{ "_id" : 9, "domainName" : "test9.com", "hosting" : "cloud.google.com" }
{ "_id" : 10, "domainName" : "test10.com", "hosting" : "godaddy.com" }

Imports into a “website” collection.

> mongoimport -d testdb -c website --file website.json
connected to: 127.0.0.1
Mon Jan 13 14:30:22.662 imported 10 objects
Note
If the collection is existed, add --upsert option to override the data.


> mongoimport -d testdb -c website --file website.json --upsert

2. Grouping Example

Uses db.collection.aggregate and $group to perform the data grouping.

2.1 The following example groups by the “hosting” field, and display the total sum of each hosting.


> db.website.aggregate(
    { 
	$group : {_id : "$hosting", total : { $sum : 1 }}
    }
  );

Output


{
        "result" : [
                {
                        "_id" : "godaddy.com",
                        "total" : 1
                },
                {
                        "_id" : "cloud.google.com",
                        "total" : 2
                },
                {
                        "_id" : "aws.amazon.com",
                        "total" : 4
                },
                {
                        "_id" : "hostgator.com",
                        "total" : 3
                }
        ],
        "ok" : 1
}

The equivalent SQL.


SELECT hosting, SUM(hosting) AS total
       FROM website
       GROUP BY hosting

2.2 Add sorting with $sort.


>  db.website.aggregate(
     { 
	$group : {_id : "$hosting", total : { $sum : 1 }}
     },
     {
	$sort : {total : -1}
     }
  );

Output – Display “total” in descending order. For ascending order, uses $sort : {total : 1}.


{
        "result" : [
                {
                        "_id" : "aws.amazon.com",
                        "total" : 4
                },
                {
                        "_id" : "hostgator.com",
                        "total" : 3
                },
                {
                        "_id" : "cloud.google.com",
                        "total" : 2
                },
                {
                        "_id" : "godaddy.com",
                        "total" : 1
                }
        ],
        "ok" : 1
}

2.3 Add $match condition, groups by “hosting” for “aws.amazon.com” only.


> db.website.aggregate(
    { 
	$match : {hosting : "aws.amazon.com"}
    },
    { 
	$group : { _id : "$hosting", total : { $sum : 1 } }
    }
  );

Output


{
        "result" : [
                {
                        "_id" : "aws.amazon.com",
                        "total" : 4
                }
        ],
        "ok" : 1
}
More Examples
Refer to this official MongoDB Aggregation guide for more advance aggregation and group examples.

3. Exports Grouping Result to CSV or JSON

Often times, we need to export the grouping results in csv or JSON format. To solve it, inserts the group results in a new collection, and exports the new collection via mongoexport.

3.1 Set the group results in a variable. In this case, the variable name is “groupdata”.


> var groupdata = db.website.aggregate(
    { 
	$group : {_id : "$hosting", total : { $sum : 1 }}
    },
    {
	$sort : {total : -1}
    }
  );

3.2Inserts groupdata.toArray() into a new collection.


> db.websitegroup.insert(groupdata.toArray());

> db.websitegroup.find().pretty()
{ "_id" : "aws.amazon.com", "total" : 4 }
{ "_id" : "hostgator.com", "total" : 3 }
{ "_id" : "cloud.google.com", "total" : 2 }
{ "_id" : "godaddy.com", "total" : 1 }
>

3.3 Exports the collection “websitegroup” to a csv file.


c:\> mongoexport -d testdb -c websitegroup -f _id,total -o group.csv --csv
connected to: 127.0.0.1
exported 4 records
group.csv

_id,total
"aws.amazon.com",4.0
"cloud.google.com",2.0
"godaddy.com",1.0
"hostgator.com",3.0

3.4 Exports the collection “websitegroup” to a JSON file.


c:\> mongoexport -d testdb -c websitegroup -o group.json
connected to: 127.0.0.1
exported 4 records
group.json

{ "_id" : "aws.amazon.com", "total" : 4 }
{ "_id" : "cloud.google.com", "total" : 2 }
{ "_id" : "godaddy.com", "total" : 1 }
{ "_id" : "hostgator.com", "total" : 3 }

4. Large Sort Operation

Changed in version 2.6 – Read this Memory Restrictions
In MongoDB, the in-memory sorting have a limit of 100M, to perform a large sort, you need enable allowDiskUse option to write data to a temporary file for sorting.

To avoid the sort exceeded memory limit error, enable the allowDiskUse option.


db.website.aggregate(
[
	{$group : {_id : "$hosting", total : { $sum : 1 }}},
	{$sort : {total : -1}}
],
	{allowDiskUse: true}
);

References

  1. MongoDB Aggregation
  2. MongoDB db.collection.aggregate()
  3. Aggregation Pipeline Limits
  4. MongoDB Hello World Example
author image

mkyong

Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities. Read all published posts by

Comments

avatar
3000
newest oldest most voted
Tamilanban Rajendran
Guest
Tamilanban Rajendran

How to perform the group operation based on sub document in Java Springboot. I have the Mongo DB query..If I convert in Java its not working . Can anyone help me on this .

Ulysses Alexandre Alves
Guest
Ulysses Alexandre Alves

Your tutorial is much better than the official MongoDB documentation for aggretation and grouping! Thank you for sharing.

prtk
Guest
prtk

why {$sum:1} is use in this aggregation ?

rue lim
Guest
rue lim

Hello Sir.

I need help with or condition in the where (‘$match’) query. I want something like this but it doesn’t work. Could you please help me?

> db.website.aggregate(
{
$match : {$or : [ {hosting : “aws.amazon.com”},{hosting: “cloud.google.com”} ] } },
{
$group : { _id : “$hosting”, total : { $sum : 1 } }
}
);

Suraj
Guest
Suraj

Aggregate takes filters(match, group etc..) inside square brackets.
db.website.aggregate(
[
{
$match : {$or : [ {hosting : “aws.amazon.com”},{hosting: “cloud.google.com”} ] } },
{
$group : { _id : “$hosting”, total : { $sum : 1 } }
}
]
);

JH
Guest
JH

And can I return multiful fields? not for query options but as results.

carl
Guest
carl

Mkylong, thanks for this tutorial, also had the same problem that Erics reports and resolved as he says.
once again, thanks

Erik
Guest
Erik

Hi, tnx for the tutorial. This statement did not work:

db.websitegroup.insert(groupdata.result);

But changing it to

db.websitegroup.insert(groupdata.toArray());

did the trick.

Arslan
Guest
Arslan

Hello Mkyong
This tutorial is very helpful I want to ask aboutHow to process unstructured data ? like any web data and also how to handle large files like audio,video etc Thanks

Udhayakumar GK
Guest
Udhayakumar GK

Great Tutorial. I got it useful.

Thanks,
Udhayakumar G K
http://saibabaprayerrequest.blogspot.in