MongoDB for absolute beginners: ACID and CAP

ACID property in SQL :

I know, you people know about what ACID property is. Anyway lemme explain it in brief manner.

RDBMS and SQL supports transactions. A database transaction is, "a transformation of state" that has the ACID properties. A key feature of transactions is that they execute virtually at first, allowing the programmer to undo (using ROLLBACK) any changes that may have gone awry during execution; if all has gone well, the transaction can be reliably committed. Let's take a moment to revisit what this really means.

ACID is an acronym for Atomic, Consistent, Isolated, Durable, which are the gauges we can use to assess that a transaction has executed properly and that it was successful:

Fig : ACID property

Atomic :

Atomic means “all or nothing”; that is, when a statement is executed, every update within the transaction must succeed in order to be called successful. There is no partial failure where one update was successful and another related update failed. The common example here is with monetary transfers at an ATM: the transfer requires subtracting money from one account and adding it to another account. This operation cannot be subdivided; they must both succeed.

Consistent :

Consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together. For example, if a transaction attempts to delete a Customer and her Order history, it cannot leave Order rows that reference the deleted customer’s primary key; this is an inconsistent state that would cause errors if someone tried to read those Order records.

Isolated :

Isolated means that transactions executing concurrently will not become entangled with each other; they each execute in their own space. That is, if two different transactions attempt to modify the same data at the same time, then one of them will have to wait for the other to complete.

Durable :

Once a transaction has succeeded, the changes will not be lost. This doesn’t imply another transaction won’t later modify the same data; it just means that writers can be confident that the changes are available for the next transaction to work with as necessary.

CAP Theorem for distributed systems :

Horizontal scaling of software systems has become necessary in recent years, due to the global nature of computing and the ever-increasing performance demands on applications. In many cases, it is no longer acceptable to run a single server with a single database in a single data center adjacent to your company’s headquarters. We need truly distributed environments to tackle the business challenges of today.

Unfortunately, the performance benefits that horizontal scaling provides come at a cost - complexity. Distributed systems introduce many more factors into the performance equation than existed before. Data records vary across clients/nodes in different locations. Single points of failure destroy system up-time, and intermittent network issues creep up at the worst possible time.

These concerns of consistency (C), availability (A), and partition tolerance (P) across distributed systems make up what Eric Brewer coined as the CAP Theorem. Simply put, the CAP theorem demonstrates that any distributed system cannot guaranty C, A, and P simultaneously, rather, trade-offs must be made at a point-in-time to achieve the level of performance and availability required for a specific task.

We must understand the CAP theorem when we talk about NoSQL databases (or) when we are going to design any distributed system.

Fig : CAP theorem

Consistency :

As we discussed already, consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together.

Typical relational databases are consistent: SQL Server, MySQL, and PostgreSQL.

Availability :

The system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system. This metric is trivial to measure: either you can submit read/write commands, or you cannot.

Typical relational databases are also available: SQL Server, MySQL, and PostgreSQL. This means that relational databases exist in the CA space - consistency and availability.

Note : CA is not only reserved for relational databases - some document-oriented tools like ElasticSearch also fall under the CA umbrella.

Partition Tolerance :

It says how good your system is when you're actually partitioning the data i.e. System continues to work despite message loss or partial failure.

Most people think of their data store as a single node in the network. “This is our production SQL Server instance”. Anyone who has run a production instance for more than four minutes, quickly realizes that this creates a single point of failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

Storage systems that fall under Partition Tolerance with Consistency (CP): MongoDB, Redis, AppFabric Caching, and MemcacheDB. CP systems make for excellent distributed caches since every client gets the same data, and the system is partitioned across network boundaries.

Theoretically, it is impossible to achieve all 3 requirements. CAP provides the basic requirements for a distributed system to follow '2 of the 3 requirements'.

Fig : CAP theorem

Just FYI, MongoDB falls under "Consistency and Partition tolerance". It means that we are compromising with "Availability" in MongoDB. Let's discuss about this point when we talk about MongoDB Replication.

BASE :

Luckily for the world of distributed computing systems, their engineers are clever. How do the vast data systems of the world such as Google’s BigTable and Amazon’s Dynamo and Facebook’s Cassandra (to name only three of many) deal with a loss of consistency and still maintain system reliability? The answer is BASE (Basically Available, Soft state, Eventual consistency). BASE system gives up on consistency of a distributed system.

Fig : BASE concept

Basically Available :

This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.

Soft State :

The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’

Eventual Consistency :

The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.

**Next Post : "MonogDB Overview"

MongoDB for absolute beginners

Tuesday 7 June 2016

ACID and CAP

ACID property in SQL :

CAP Theorem for distributed systems :

BASE :

3 comments:

Blog Archive