MongoDB performance bottlenecks, optimization Strategies for MongoDB

I will try to describe here all potential performance bottlenecks and possible solutions and tips for performance optimization, but first of all – You should to ensure that MongoDB was the right choice for your project. You should clearly understand that MongoDB is completely “Nonrelational Database” (I mean no joins). And MongoDB is Document orientated database (not graph oriented). This is completely Important to be sure that you made the right choice of database.

0. Map-Reduce
Before MongoDB 2.4 update (main point here is update to V8 engine) MongoDB have been using SpiderMonkey as a javascript engine, and the problem was that it’s single threaded (that was pretty awkward when Map-Reduce has been working only on 1 core from e.g. 48 ones). So after the 2.4 update performance was raised up, but there are too many pitfalls and you’d better to read this out and this one

N.B. Bear in mind that the performance of Map-Reduce is depends upon “the state of data”, I mean, that the difference between Map-Reduce on the data “as-is” and on the sorted data is too huge (on sorted data Map-Reduce will be something like 10-100x faster then without sorting). So, to raise up the performance of Map-Reduce you need:

  • Find out the key, that you will use for Map-Reduce job (usually it’s the same as the emit key) and ENSURE that you have added indexes for this key (you can try to run your query filter)
  • Add input sort for key for the Map-Reduce job(emit key)

Also, please take a look at this doc

1. Sharding
It is hella cool to have out of the box sharding, but apart from the sharding you have also one of the performance pitfalls.
Shard keys should satisfy the following:

  • “distributable” – the worst case of the shard key is auto-incremented value (this will entail the “hot shard” behavior, when all writes will be balanced to the single shard – here is the bottle neck). Ideal shard key should be as much “randomness” as possible.
  • Ideal shard key should be the primary field used for your queries.
  • An easily divisible shard key makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values can result in chunks that are “unsplittable.”
  • unique fields in your collection should be part of the shard key

Here is the doc about shard key

2. Balancing
You should bear in mind that moving chunks from shard to another shard is a very expensive operation (adding of new shards may significantly slow down the performance).
As an helpful option – you could stop the balancer during the “prime time”.

3. Disk Input Output operations
You should understand that in most cases the hardware bottleneck will be HDD (not CPU or RAM), especially if you have several shards.
So, during the growth of data, the number of I/O operations will rapidly increase. Also keep monitoring free disk space. So fast disks are more important in case if you are using sharding.

4. Locks
MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write operation.
When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.
Locks are “writer greedy,” which means writes have preference over reads. When both a read and write are waiting for a lock, MongoDB grants the lock to the write.

And the very sad point – MongoDB implements locks on a per-database basis for most read and write operations (before 2.2 update was the global lock – one per instance for al databases).
This is very valuable point, and if you have too many write requests here will be the bottleneck with the solution (m.b. it’s really to create hack with several databases, but better forget about this).

In case if your application have too many write operations it make sense to think about migration to something like Cassandra (In Cassandra, a write is atomic at the row-level, meaning inserting or updating columns for a given row key will be treated as one write operation).
Please take a look at concurrency docs to ensure that you understanding mongo concurrency.

5. Fast Writes
Use Capped Collections for Fast Writes
Capped Collections are circular, fixed-size collections that keep documents well-ordered, even without the use of an index. This means that capped collections can receive very high-speed writes and sequential reads.

These collections are particularly useful for keeping log files but are not limited to that purpose. Use capped collections where appropriate.

6. Fast Reads
Use Natural Order for Fast Reads. To return documents in the order they exist on disk, return sorted operations using the $natural operator. On a capped collection, this also returns the documents in the order in which they were written.
Natural order does not use indexes but can be fast for operations when you want to select the first or last items on disk.

7. Query Performance
Read out about query performance, especially please pay attention to Indexes and Compound Indexes.

8. Remove Expired Data
It seems to be a good practice to enable the TTL (time to live) in your collections, add expireAfterSeconds value and use Expire Data from Collections by Setting TTL technique. This approach will allow you to get rid of “unnecessary data”.

9. The size of Database
As far as you might understand MongoDB will store e.g. this document

{	UserFirstAndLastName: "Mikita Manko",
	LinkToUsersFacebookPage: ""

“as-is”. I mean that names of these fields “UserFirstAndLastName” and “LinkToUsersFacebookPage” will reduce free space.
Buy the using “name shorting” technique you can minimize the usage of memory (you can get rig of something like 30-40% of unnecessary data):

{	FL: "Mikita Manko",
	BFL: ""

Obviously that it will cause the creation of “mapper” in your code (You should map shortened unreadable names from database to long ones to allow to use readable fields in your code)

A. Application Design
Take a look at these notes and bear them in mind during the designing of your architecture and solutions.

B. Profiling and Investigations
You should be familiar with such features as:

C. Updates
The most obvious point is to be on the cutting edge of technologies and Investigate and Install last updates.

As i mentioned before – Use MongoDB not just for fun, but if your project is applicable for Document Oriented Database, that is the most important point.

Social Share Toolbar

Mongoose aggregate with $group by nested field

Let’s say we have a collection in mongodb with document’s structure similar to this:

// schema on nodejs via mongoose:
var mongoose = require('mongoose')
    , Schema = mongoose.Schema
    , ObjectId = Schema.ObjectId;
var flatSchema = new Schema({
    _id: ObjectId,
    rentType: Number,
    price: Number,
    thumbnail: String,
    address: {
        country: String,
        city: String,
        street: String,
        district: Number
        // ... more fields
    // ... some more fields
}, { collection: 'flats'});
module.exports = mongoose.model('Flat', flatSchema);

and we need to get some items (that matched to some request) and then group all documents by “address.district” (nested field) and after this we also need to collect some stats – e.g. min and max price and flats count for each district.

Let’s do it with usage of Mongoose:

    // e.g. let's define request
    var rules = [{'': '1'}, {price: {$gte: 200}}];
    // and here are the grouping request:
        { $match: {$and: rules } },
            $project: {
                _id: 0, // let's remove bson id's from request's result
                price: 1, // we need this field
                district: '$address.district' // and let's turn the nested field into usual field (usual renaming)
            $group: {
                _id: '$district', // grouping key - group by field district
                minPrice: { $min: '$price'}, // we need some stats for each group (for each district)
                maxPrice: { $max: '$price'},
                flatsCount: { $sum: 1 }
    ], {}, callback);

First of all – it’s not possible to group by nested field, e.g. “address.district”, thats why we workarounded this by field’s renaming in $project.

Here is the list of interesting reading:

That was simple example of how to group something via mongoose on nodejs without usage of direct access to mongo’s db.collection.

Social Share Toolbar

How to use mongoose with MongoDB in Node.js

Here is the example how to start simple project with usage of Mongoose for MongoDB on Node.js and mongoDB hosted on mongolab.

First of all you need to correct package.json file in your project to something like this

    "name": "rent",
    "version": "0.0.1",
    "private": true,
    "scripts": {
        "start": "node app"
    "dependencies": {
        "express": "3.0.2",
        "jade": "*",
        "log4js": "0.5.6",
        "mongoose" : "3.5.4",
        "moment" : "1.7.2",
        "mongoose-pureautoinc": "*"

Make sure that you have added mongoose package, other stuff just for example
Mikitas-MacBook-Air-2:test nik$ sudo npm install
or just “npm install” (for windows or if you configured permissions)
You can also use “npm install mongoose” to install it.

Let’s create mongoDB “model”. So somewhere in your DAL try this code:

var mongoose = require('mongoose')
    , Schema = mongoose.Schema
    , ObjectId = Schema.ObjectId;
// let's create scheme for some random stuff:
var postSchema = new Schema({
    _id: ObjectId,
    thumbnail: String,
    whenCreated: Date,
    whenUpdated: Date,
    comments: [{
            author: {
                authorId: ObjectId, 
                name: String, 
                thumbnail: String
            date: String, 
            text: String
    rating: Number,
    isApproved: Boolean,
    text: String
}, { collection: 'post'});
// and let's register this scheme as a model and make global for this module
module.exports = mongoose.model('Post', postSchema);

Now we can include this file:

var Post = require('../DAL/models/Post');

And use it for example to get all items from collections

exports.getPosts = function (callback){
	return Post.find({/* query */}, function (err, docs){
		if (!err) { 
		else { 
			throw err;

Note: that the operation is async and you will get result only in callback
please check this mongo docs out

Or we can insert new item into collection:

exports.addPost = function (){
    var post = new Post();
    post.rating = 5.89;
    post.text = "Here is one more post to my blog on"; 
    post.isApproved = true;
    // other fields;

Here is one more link on mongo docs

Finally, to use this “DAL” methods we need to create connection to DB.
You can use service to host your DB for free (until your DB is less then 500 mb).
After you created new BD on mongolab you can create connection with the following way:

var mongoose = require('mongoose');
mongoose.connect('mongodb://<your login>:<your password>@<your id><database name>');
// here you can call methods described above

That’s it, just several lines of code, some clicks and you have MondoDB database already hosted in web.
Perfect solution for Hackathons, fast tests and startups.

By the way, you can host you Node.js code on service also for free (until it exceeds some limitation, for details check heroku website out)

Social Share Toolbar