A key aspect of any Large scale system is the ability to handle a large amount of Data.
A Database is used to persist information that will be useful later on. Broadly there are two types of database categories available to store the data and those are SQL or NoSQL. Which one to use depends on the needs of the system. For many years in the past SQL or Relational databases were the standard to store information. As the systems became more and more complex and the data grew many-fold NoSQL databases have gained their ground over the last decade and emerged as the primary data store for many applications. To understand which database is better for your needs to let's understand the difference between these two.
SQL or Relational Databases
A SQL or Relational database stores data in an organized set of tables with rows and columns. All the data related to a particular entity is stored in one table or more tables. A row stores all the related values for a particular instance of entity or object and each column stores one attribute of that object. Queries data using SQL syntax and JOINS. CRUD uses SCHEMA and transactions.
NoSQL Database refers to Non Relational database, In contrast to a relational database where data is stored in well-defined tabular relations and Structured Query Language is used to access data, a NoSQL database uses different mechanisms for storage and retrieval. Different flavors of NoSQL use proprietary storage methods. Based on the problem they are solving NoSQL Databases store data in a wide variety of forms like key-value pairs, documents, columnar forms, graphs, or special time-sequenced events.
SQL v/s NoSQL
SQL: Data is nicely organized in appropriate tables according to the predefined schema, which reduces redundant information. Data conforms to the applied constraints. Requires a lot of initial thought to minimize changes later which might require migration of existing data and related downtime. Structured data prevents developers from sloppily adding data, and constraints prevent the corruption of data due to software bugs. This translates to more effort for developers initially.
NoSQL: schema is flexible, depending upon the type of NoSQL database, we might add or skip different columns and data in each document. It provides the flexibility of the data model to the developers allowing for easy iterations through the development process, without any downtime. On the flip side, it can lead to data corruption, if constraints and checks are not implemented correctly at the application layer.
SQL: Data is typically stored across multiple tables in normalized forms to prevent duplication of data.
NoSQL: Data is stored in a nested form with everything related to an entity stored in a single document. Though it introduces data redundancy but provides very fast access.
SQL: Due to structured data, the Relational database provides a powerful SQL(Structured Query Language) interface for Data definition, Data Control, and Data Manipulation. The split structure allows us to join data in any way. Numerous join types are available and can be done with any number of tables in any way.
NoSQL: Since data for a particular entity is stored together, data access is simple and does not require any complex queries.
SQL: With proper use of indexes and query tuning, it is possible to query very high volumes of data with reasonable performance.
NoSQL: Depending upon the type of NoSQL certain queries and access patterns are extremely optimized and fast. for example, the Key-Value database provides fast lookup, Column based database provides fast aggregation capabilities.
SQL: SQL Database is powerful and flexible but constrained in terms of scaling. Scaling SQL Database systems require expensive hardware to scale vertically. If we distribute SQL over multiple servers using data partitioning, however, performance suffers for certain queries and joins. This can also be attributed to the fact that relational databases came into existence in the early 1970s, in those times the scale and traffic that we witness today were unheard of.
NoSQL: These are built for web-scale applications and are easily scalable horizontally. queries. NoSQL does not use custom expensive hardware that allows for faster retrieval of data at a high scale. Since most of the NoSQL databases were created in recent times, they were developed cloud-native and with a distributed approach in mind. Almost all popular cloud vendors provide managed solutions that scale very easily according to growing traffic.
SQL: Majority of SQL databases offer strong ACID compliance and ensure consistency and reliability of data that is stored, thus making them relevant for transactional data even today.
NoSQL:, Unlike SQL counterparts, most NoSQL databases offer vague interpretations of ACID compliance and guarantees. These are majorly designed for scale and hence focus on Availability and Partition Tolerance by compromising on the consistency of the data. Always dig deep about the guarantees offered by the database that you are using before making any assumptions.
After discussing all the differences between SQL vs NoSQL, it is essential to note that in recent times, the lines between the SQL vs NoSQL has been blurring.
SQL providing NoSQL like Features
SQL databases like PostgreSQL and MySQL now provide support for storing, manipulating, and querying JSON Data. Similarly we can use PostgreSQL HStore to store key-value pairs.
NoSQL providing SQL like Features
MongoDB provides support for Joins and Transactions, although the underlying implementation and guarantees can be very different.
Choosing between SQL vs NoSQL
While designing a system or during a system design interview, it is of utmost importance to decide on the right database for storing data as it will allow you to get good performance out of your system. We need to choose the database that is the best fit for a particular use case.
For example: If you are designing a Banking application or other financial services-related application, It is a no-brainer to choose a Relational Database because of the data guarantees and security it has to offer which is a prime concern for this use case.
On the other hand, If you are designing a web crawler or search engine application, it is not advisable to use a Relational database due to the unstructured nature of web data and the scale at which you need to operate.
It is a common source of confusion for the new and in-experienced developers on whether to choose SQL or NoSQL while doing the required system design. So how to approach this?
- Always think in terms of the access patterns or operations that you will need, does the database supports that as a core feature ?.
- What kind of Guarantees and Isolation level (for transactions) do you need for your application?
Here is a list of Popular Use cases along with the most suitable Database types for each use case.
|Sno||Use case||Choice of Database|
|1||Cache||NoSQL inmemory datastore : Redis, Memcached|
|2||Banking Application||SQL Database|
|3||Social Network||NoSQL Graph Database: Neo4j|
|4||Facebook Ad Platform||NoSQL columnar Databse: Cassandra, BigTable|
Below is Genius Cheatsheet created by Satish Chandra Gupta, that can help you in picking the correct DB for your use case.
Here is a list of all the Popular Database and their rankings - DB Ranking