Introduction to NOSQL
In this article I will explain the following points
1- What is NOSQL?
2- The NOSQL types?
3- ACID and CAP theorems
4- Constancy between ACID and CAP theorems
What is NOSQL?
• When people use the term “NoSQL database”, they typically use it to refer to any non-relational database. Some say the term “NoSQL” stands for “NON-SQL” while others say it stands for “not only SQL.” Either way, most agree that NOSQL databases are databases that store data in a format other than relational tables
• The data structures used by NoSQL databases Differ from those used in relational databases
What are NOSQL databases types?
1- Key Value Store
The key value store NoSQL Database it is simplest NoSQL databases, every single item in the database is stored as an attribute key together with its value witch key is unique. Key-value databases are highly partition-able and allow horizontal scaling, Key-Value Store offer Randomly access to data by key like hash-table or dictionary data-structures witch time complexly of reading is O (1).
Use Case: Session store such as user information
Example: Redis
2- The column Store
The column store databases are used to optimized queries over large datasets and store columns of data together instead of rows. Columns store databases use a concept called a key space. A key space is kind of like a schema in the relational model. The key space contains all the column families (kind of like tables in the relational model), which contain rows, which contain columns.
Use case: Data-intensive applications
Example: Cassandra
3- Document Store
The document database is a type of non-relational database that is designed to store and query data as JSON-like documents. Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code. Document database offer flexibility to update entities. So, if the data model needs to change, only the affected documents need to be updated. No schema update is required, and no database downtime is necessary to make the changes.
Use Case: Content management system such as blogs and video
Example: mongoDB
4- Graph Store
a graph database is a database designed to treat the relationships between data as equally important to the data itself. It is intended to hold data without constricting it to a per-defined model. Instead, the data is stored like we first draw it out — showing how each individual entity connects with or is relate to others. Also Graph databases are a special kind of database storing complex data structures that would be infeasible to store in a traditional relational database. They’re most notably used for social networks, as they’re much more performant for certain queries.
Use Case: Social Media
Example: Neo4j
ACID and CAP theorems
1- ACID theorem: (atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors)
- Atomicity: A transaction is an atomic unit; hence, all the instructions within a transaction will successfully execute, or none of them will execute. The following transaction transfers 20 dollars from Alice’s bank account to Bob’s bank account. If any of the instructions fail, the entire transaction should abort and rollback.
- Consistency: A database is initially in a consistent state, and it should remain consistent after every transaction. Suppose that the transaction in the previous example fails after Write(A_b) and the transaction is not rolled back; then, the database will be inconsistent as the sum of Alice and Bob’s money, after the transaction, will not be equal to the amount of money they had before the transaction.
- Isolation: refers to the requirement that other operations cannot access or see the data in an intermediate state during a transaction.
- Durability: refers to the guarantee that once the user has been notified of success, the transaction will persist and not be undone.
2- CAP Theorem : The CAP theorem (also called Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees) also that a distributed database system can only guarantee two out of these three characteristics: Consistency, Availability, and Partition Tolerance.
- Consistency: A system is said to be consistent if all nodes see the same data at the same time. Simply, if we perform a read operation on a consistent system, it should return the value of the most recent write operation. This means that, the read should cause all nodes to return the same data, i.e., the value of the most recent write.
Consistency between ACID and CAP: Consistency concept is different between ACID and CAP
ACID consistency is all about database rules. If a schema declares that a value must be unique, then a consistent system will enforce uniqueness of that value across all operations. If a foreign key implies deleting one row will delete related rows, then a consistent system will ensure the state can’t contain related rows once the base row is deleted.
CAP consistency promises that every replica of the same logical value, spread across nodes in a distributed system, has the same exact value at all times. Note that this is a logical guarantee, rather than a physical one. Due to the speed of light, it may take some non-zero time to replicate values across a cluster. The cluster can still present a logical view by preventing clients from viewing different values at different nodes.
- Availability: Availability in a distributed system ensures that the system remains operational 100% of the time. Every request gets a (non-error) response regardless of the individual state of a node.
Note: this does not guarantee that the response contains the most recent write.
- Partition Tolerance: This condition states that the system continues to run, despite the number of messages being delayed by the network between nodes. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. When dealing with modern distributed systems, Partition Tolerance is not an option. It’s a necessity Hence, we have to trade between Consistency and Availability.
References:
- Design data-intensive application book
- https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321