MongoDB’s key features

Nowadays, most developers are familiar with SQL, and most of us appreciate the beauty of normalized data model, even if it’s costing a lot. But when we start advocating another data store technologies, questions about the utility of these new technologies arise. In this post, I will try to answer to this question: Why can MongoDB be interesting for developers?

1- Document-oriented database

MongoDB data model is document-oriented. It replaces the concept of row with a more flexible concept, the document. By allowing embedded document, the document-oriented approach make it easier to represent complex hierarchical relationships with a single record. If you’re not yet familiar with this concept, we can demonstrate it with this example:

{
  // (1) this's the primary key
  _id: ObjectID("5037cac65f3257931833902b"),
  text: "MongoDB is an awesome #nosql document-orientred database bit.ly/23444",
  // (2) tags saved as array of strings
  tags: ["nosql"],
  favorite_count: 56,
  urls: ["http://bit.ly/23444"],
  // (3) comments stored as array of comment objects
  comments: [
    {
      user: "john",
      text: "Very interesting article!"
    }, {
      user: "julia",
      text: "I like this article"
    }
  ]
}

As you can see, a document is a JSON-like object which is a set of property names and their values. A value can be a simple type (strings, integers, etc.). But can be also a complex data type such as arrays (2) or even more complex such as other document (3). Now, to see clearly the power of this data model, let’s contrast it with a standard relational database representation:

EER Diagram

The above figure, is the EER diagram for entries on the social site. You can see, that we user 7 tables and 7 foreign keys. Which is a rich structure compared to the document structure. Another point here, to display an entry, we we’ll need to perform a join between the entry and both tag and url tables. Last and not least, each table has a strictly defined schema. So if you want to change column type, add a new column or delete a specific column, you have to alter the table explicitly.

2- Aggregation framework

Relational database supports dynamic queries through select, project and join database operations. Especially join which make us able to fetch data from many tables with just one query. Unfortunately, NoSQL databases does not support join operation. But, MongoDB preserves the most of the query power by providing an alternative powerful solution: the aggregation framework which allows us to build complex aggregations from simple pieces. Let’s take a simple example. Suppose we want to fetch all entries tagged with the term “MongoDB” and having greater than 5 favorites. The SQL query would look like this:

SELECT * from entry
  INNER JOIN entry_tags ON entry.id = entry_tags.entry_id
  INNER JOIN tag ON tag.id = entry_tags.tag_id
  WHERE tag.text = 'MongoDB'
    AND entry.favorite_count > 5;

The equivalent query on MongoDB is:

db.entries.find({
  'tags': 'MongoDB',
  'favorite_count': {'$gt': 5}
});

3- File storage

MongoDB supports an easy-to-use protocol for storing large files and their metadata. Unlike relational databases, storing a 100MB video in MongoDB is not a problem. MongoDB stores objects in a binary format called BSON (Binary JSON) and, by default, use GridFS for chuncking big files.

4- Secondary indexes
MongoDB implements secondary indexes as B-trees, which are optimized for many type of queries such as range scans and sort clauses. However, it’s limited to 64 indexes per collection.

5- Replication
In MongoDB, replication is guaranteed by distributing data across machines (one primary node and one or more secondary nodes). This typology is known as replica set. Replica set is like master-slave replication, but it support automated failover: When the primary node fails, the replica set promote a secondary node to be primary.

6- Sharding

MongoDB make horizontal scaling manageable by providing a partitioning mechanism, also known as sharding. Sharding refers to the mechanism of putting a subset of data on each machine. This concept give us the possibility to store more data, and handle more requests and load.

I hope that you like this post, but I don’t think that it’s perfect. So, your feedbacks are welcome.

KISS

KISS is an acronym to “Keep it simple, stupid”, ‘Keep it short and simple”, or “Keep it smart and simple”, etc. A huge amount of appellation exists and the concept is the same: simplicity matters.

Maybe, you consider solutions following the KISS principle boring or even stupid, but you can’t denied that it’s the most simple and understandable solutions.

complexity-vs-simplicity

Principle Statement

A simple solution is better than a complex one—even if the solution looks stupid.

Origin

The acronym was coined by Kelly Johnson (1910–1990), lead engineer at the Lockheed Skunk Works. The principle is well illustrated by the story of Johnson handing a team of design engineers a handful of tools, with the challenge that the jet aircraft they were designing must be repairable by an average mechanic in the field under combat conditions with only these tools. Hence, the “stupid” refers to the relationship between the way things break and the sophistication available to repair them.

Benefits

  • Simple solutions are easier to understand
  • Simple solutions are easier to maintain
  • Simple solutions are faster to implement
  • Simple solutions are easier to extend and modify

Applying the KISS principle

  • Divide and conquer
  • Functions should do one thing
  • Classes should have one responsibility
  • Functions and classes should be small
  • Avoid complicated concepts
  • Solve the problem then code it