Difference Between Cassandra and MongoDB (With Table)

With the advancement of technology, mankind has been blessed with mobile phones, wireless networks and above all, the Internet. Internet is rife with volumes of data that is just a click away from us. Such overwhelming volumes of data are electronically stored on a database, which is, in turn, controlled by a database management system (DBMS). Cassandra and MongoDB happen to be two such databases.

Cassandra vs MongoDB

The main difference between Cassandra and MongoDB is that while the former works on a hybrid data model consisting of tabular structure and key value, and uses a “peer-to-peer” architecture model, the latter’s data model is object and document-oriented, and it uses a “master-slave” model.

Cassandra is an open-source NoSQL database and uses a “peer-to-peer” architecture model. Owing to this feature, Cassandra does not have one master node but multiple master nodes inside a cluster, therefore, even if one master node goes down, there are multiple master nodes that can take over and ensure that the database responds to all requests at all points of time. Furthermore, only the master node can write and accept input, and Cassandra uses the cluster model, which means that multiple master nodes can write and accept input. It is due to this model that Cassandra provides high data availability and flexible scalability.

MongoDB is also an open-source NoSQL database and is based on the “master-slave” model. Consequently, when the master node is unable to function, a slave node can assume the role of the master node but this transition takes a few minutes, and during this period, the database is not in a position to respond to requests. This affects the data availability. MongoDB is also limited in terms of scalability since only the master node can write and accept inputs and the slave nodes come in handy only for reads. 

Comparison Table Between Cassandra and MongoDB

Parameter of Comparison

Cassandra

MongoDB

Data Model

It is a hybrid between key-value and a table structure that uses rows and columns

It has an object and document-oriented data model

Programming Language Support

Supports C++, Python, Java, JavaScript, .Net, Ruby, PHP, Scala, Perl, C#, Clojure, Go, Erlang, Haskell

Supports C, C++, C#, Clojure, ColdFusion, Dart, Delphi, Ruby, Python, Scala, JavaScript, Java, Erlang, Go, Groovy, Haskell, PHP, Perl, Lisp, Lua, MatLab, PowerShell, Prolong, Smalltalk

Aggregation Framework

Does not have an aggregation framework but requires the assistance of external tools like Hadoop, Apache Spark, etc.

It has a built-in aggregation framework

Schema

It has flexible schema, therefore, each row within the same column family need not have the same number of columns

In the newer version of MongoDB, one can decide whether or not they want schema, making the database much more flexible

Query Language Support

Cassandra Query Language (CQL) is Cassandra’s very own query language

It does not have a query language yet, but it uses JSON-structure   

What is Cassandra?

Cassandra was developed by Facebook for the purpose of inbox search and was released in 2008. It became an Apache project in 2009 and came to be known as Apache Cassandra.

Cassandra is a NoSQL database that uses a basic data structure comprising of column families, rows, columns and keyspace, to store data. As Cassandra has a flexible schema, the rows within the same column family can have different number of columns.

Each node in the cluster of nodes in Cassandra has the same functions, and receives all types of requests. Cassandra does not use the “master-slave” model, but uses the idea of “coordinator node”. This means that when the client issues a request, the node that receives the request is the coordinator for the specific request, and coordinates the exchange of response from the node that actually contains the information to the client’s request, to send the result to the client.

A few prominent users of Cassandra are Netflix, Twitter, Viocom Hosting, Walmart Labs, Spotify, Reddit, Instagram, and Facebook.

What is MongoDB?

MongoDB is a NoSQL database that was developed by 10gen, presently known as MongoDB, Inc., in 2007 to address issues concerning scalability.

As it is a document-oriented database, the primary structure used for data storage is in the form of documents. In this context, document is the basic structure that is used to store a single unit of data. Due to the lack of schema, documents are stored with different structures and contents, all in the same collection.

The documents in MongoDB use JSON as its query language, therefore, its model can also support object-oriented programming.

MongoDB is based on a master-slave model, as a result of which if the master node stops functioning, the database will stop functioning for a few minutes. To remedy this, MongoDB has a replica set that consists of a master node or the primary node, and all the secondary nodes. This makes the master node the recipient of all the requests posed by the client, and it also stores all the changes in its operation log. The slave nodes use the primary node’s operation log and replicate the changes onto their copies of the data to maintain consistency.

If the primary node dies, MongoDB uses a communication protocol called “heartbeat” and “elections”. At an interval of two seconds, the members of the replica set send heartbeats to each other, if one member fails to answer within ten seconds, it is considered to be dead and the secondary nodes are informed about it. Following this, the replica set holds an election and votes to select the secondary node that will become the new primary node.  The secondary node with the highest vote wins the election. In case of a tie, there is a third kind of node, known as the Arbiter, that decides which secondary node will become the primary.

Abode, Google, Forbes, Facebook, eBay, BOSH, Cisco are some of the prominent users of MongoDB.

Main Differences Between Cassandra and MongoDB

  1. While Cassandra uses the tabular structure to store data, MongoDB uses an objective and data-oriented model.
  2. Cassandra uses a cluster of nodes to ensure high data availability. Whereas, MongoDb uses a single master node, thereby limiting its data availability.
  3. Cassandra provides flexible scalability as all the nodes in the ring are equal. In contrast, MongoDB does not offer flexible scalability as it has one master node to store all the data.
  4. Cassandra does not have an in-built aggregation frame-work, therefore, it relies on external tools. Whereas, MongoDB has an internal aggregation frame-work which is most suitable for small and medium-sized data traffic.
  5. While Cassandra offers components like memory tables, commit logs, cluster, data centers and node, MongoDB supports ad-hoc queries, file storage, collections, replication and transactions.

Conclusion

Cassandra and MongoDB are both NoSQL database management systems but have several significant differences. Cassandra proves to be more essential when one deals with transactional data, but MongoDB is more useful for running real-time analytics.

References

  1. https://dl.acm.org/doi/abs/10.1145/1773912.1773922
  2. https://bora.uib.no/bora-xmlui/bitstream/handle/1956/17228/kb-thesis.pdf?sequence=1&isAllowed=y