Database Design of Netflix System Design

Netflix uses two different databases i.e. MySQL(RDBMS) and Cassandra(NoSQL) for different purposes. 

4.1. EC2 Deployed MySQL

Netflix saves data like billing information, user information, and transaction information in MySQL because it needs ACID compliance. Netflix has a master-master setup for MySQL and it is deployed on Amazon’s large EC2 instances using InnoDB. 

The setup follows the “Synchronous replication protocol” where if the writer happens to be the primary master node then it will be also replicated to another master node. The acknowledgment will be sent only if both the primary and remote master nodes’ write have been confirmed. This ensures the high availability of data.  Netflix has set up the read replica for each and every node (local, as well as cross-region). This ensures high availability and scalability. 

All the read queries are redirected to the read replicas and only the write queries are redirected to the master nodes.

  • In the case of a primary master MySQL failure, the secondary master node will take over the primary role, and the route53 (DNS configuration) entry for the database will be changed to this new primary node.
  • This will also redirect the write queries to this new primary master node.  

4.2. Cassandra

Cassandra is a NoSQL database that can handle large amounts of data and it can also handle heavy writing and reading. When Netflix started acquiring more users, the viewing history data for each member also started increasing. This increases the total number of viewing history data and it becomes challenging for Netflix to handle this massive amount of data.

Netflix scaled the storage of viewing history data-keeping two main goals in their mind:

  • Smaller Storage Footprint.
  • Consistent Read/Write Performance as viewing per member grows (viewing history data write-to-read ratio is about 9:1 in Cassandra).

Total Denormalized Data Model  

  • Over 50 Cassandra Clusters
  • Over 500 Nodes
  • Over 30TB of daily backups
  • The biggest cluster has 72 nodes.
  • 1 cluster over 250K writes/s

Initially, the viewing history was stored in Cassandra in a single row. When the number of users started increasing on Netflix the row sizes as well as the overall data size increased. This resulted in high storage, more operational cost, and slow performance of the application. The solution to this problem was to compress the old rows.

Netflix divided the data into two parts:

  • Live Viewing History (LiveVH):
    • This section included the small number of recent viewing historical data of users with frequent updates. The data is frequently used for the ETL jobs and stored in uncompressed form.
  • Compressed Viewing History (CompressedVH):
    • A large amount of older viewing records with rare updates is categorized in this section. The data is stored in a single column per row key, also in compressed form to reduce the storage footprint.



System Design Netflix | A Complete Architecture

Designing Netflix is a quite common question of system design rounds in interviews. In the world of streaming services, Netflix stands as a monopoly, captivating millions of viewers worldwide with its vast library of content delivered seamlessly to screens of all sizes. Behind this seemingly effortless experience lies a nicely crafted system design. In this article, we will study Netflix’s system design.

Important Topics for the Netflix System Design

  • Requirements of Netflix System Design
  • High-Level Design of Netflix System Design
    • Microservices Architecture of Netflix 
  • Low Level Design of Netflix System Design
    • How Does Netflix Onboard a Movie/Video?
    • How Netflix balance the high traffic load
    • EV Cache
    • Data Processing in Netflix Using Kafka And Apache Chukwa
    • Elastic Search
    • Apache Spark For Movie Recommendation
  • Database Design of Netflix System Design

Similar Reads

1. Requirements of Netflix System Design

1.1. Functional Requirements...

2. High-Level Design of Netflix System Design

We all are familiar with Netflix services. It handles large categories of movies and television content and users pay the monthly rent to access these contents. Netflix has 180M+ subscribers in 200+ countries....

2.1. Microservices Architecture of Netflix

Netflix’s architectural style is built as a collection of services. This is known as microservices architecture and this power all of the APIs needed for applications and Web apps. When the request arrives at the endpoint it calls the other microservices for required data and these microservices can also request the data from different microservices. After that, a complete response for the API request is sent back to the endpoint....

3. Low Level Design of Netflix System Design

3.1. How Does Netflix Onboard a Movie/Video?...

3.1. How Does Netflix Onboard a Movie/Video?

Netflix receives very high-quality videos and content from the production houses, so before serving the videos to the users it does some preprocessing....

3.2. How Netflix balance the high traffic load

1. Elastic Load Balancer...

3.3. EV Cache

In most applications, some amount of data is frequently used. For faster response, these data can be cached in so many endpoints and it can be fetched from the cache instead of the original server. This reduces the load from the original server but the problem is if the node goes down all the cache goes down and this can hit the performance of the application....

3.4. Data Processing in Netflix Using Kafka And Apache Chukwa

When you click on a video Netflix starts processing data in various terms and it takes less than a nanosecond. Let’s discuss how the evolution pipeline works on Netflix....

3.5. Elastic Search

In recent years we have seen massive growth in using Elasticsearch within Netflix. Netflix is running approximately 150 clusters of elastic search and 3, 500 hosts with instances. Netflix is using elastic search for data visualization, customer support, and for some error detection in the system....

3.6. Apache Spark For Movie Recommendation

Netflix uses Apache Spark and Machine learning for Movie recommendations. Let’s understand how it works with an example....

4. Database Design of Netflix System Design

Netflix uses two different databases i.e. MySQL(RDBMS) and Cassandra(NoSQL) for different purposes....

Contact Us