What is Data partitioning?

It is a database procedure of partitioning that involves breaking up a very large table into a number of smaller sections. Queries that access only a tiny portion of the data can run faster since there is fewer data to scan when huge tables are divided into smaller individual tables. When the amount of data is large and a single system cannot handle it, partitioning is used.

Now let us discuss different methods of achieving partitioning that is as follows:

Partitioning Methods

There are 3 types of partitioning that are listed below and later discussed as follows: 

  1. Horizontal Partitioning 
  2. Vertical Partitioning 
  3. Directory based Partitioning

Let us discuss them in detail for better understanding as follows:

1. Horizontal Partitioning: Without the need to make separate tables for each portion, horizontal partitioning divides big tables into smaller, more manageable pieces. A partitioned table’s data is physically kept in row groups known as partitions. It is possible to access and save each partition independently. In horizontal partitioning, each shard has the same schema as the parent database.

Application: Zipcode 

Note: It is also known as shading or sometimes referred as range-based partitioning.

2. Vertical Partitioning: Tables with fewer columns are created using the vertical partitioning technique, and the remaining columns are stored in new tables. Data is presented in a vertical format.

The main purpose of vertical table partitioning is to increase SQL Server speed, particularly when a query needs to fetch all columns from a database with a lot of text or BLOB columns.

Application: Large Reports(be it of any domain) 

3. Directory-based Partitioning: A search function that is aware of the partitioning structure and decouples it from the database access code. It enables modifying the partitioning scheme or adding new database servers without impacting the application. It results in a horizontally scalable application that is loosely connected. Since key-based partitioning requires the use of a hash function that cannot be often updated.

  • Directory-based partitioning is more adaptable than key-based or range-based partitioning.
  • Range-based partitioning establishes range values that cannot be changed.

However, since directory-based partitioning is a more dynamic method and is therefore more flexible, we can use any technique to assign data to the shards.

Now geeks you must be wondering what is the criteria behind above discussed methods of partitioning. So let us do discuss them now to get grasp understanding over concept of partitioning that is as follows:

Partition Criteria:

  1. Key or Hash-based Robin Partitioning: To determine the partition number, we apply the hash function to the entry’s key attribute.
  2. List Robin Partitioning: The column that corresponds to one of the sets of discrete values is used to choose which partition to use. A set of appropriate values is assigned to the specific partition.
  3. Round Robin Partitioning: The ith tuple is assigned to partition number i%n if there are n partitions. This implies that (i%n) nodes would receive the ith data. Sequential assignments are made to the data. The distribution of data is guaranteed by this partitioning criterion.
  4. Consistent hashing: This form of division is novel. The hash-based partitioning had the drawback of requiring a change in the hash function when adding new servers. A server outage and data redistribution would result from changing the hash function.

Advantages of Partitioning:

  • Performance Optimization
  • Availability
  • Load Balancing 
  • Scalability
  • More manageable

Disadvantages of Partitioning:

  • The complexity of the software must be maintained, including the logic for routing inquiries and aggregating compute results.
  • Additional hardware management: more DevOps work.
  • Additional overhead when redundancy. 

Complete Reference to Databases in Designing Systems – Learn System Design

Previous Parts of this System Design Tutorial

Similar Reads

What is a Database?

When we store data, information, or interrelated data, in an organized manner in one place, it is known as Database....

Types of Databases

They are of 3 types as follows as listed and shown below media as follows:...

Databases Basics In System Designing

Role of Database in System Design...

Blob Storage

Let us say we are up to designing a Uber system where we are up to the booking, renting cabs, and many other services....

How to select the right database for the service?

It is a very crucial step when it comes to databases in designing systems. In order to get the right database for our data, we need to first look over 5 factors that are as follows:...

Challenges to databases while Scaling

We are facing a problem of increased cost for query operations no matter what the type of database. It is because the CPU is responsible for query operation whereas our data is stored in hard disk(secondary memory). Now CPU is computing a million input per second (MIPS) whereas our hard disk is only doing <100 operations per second no matter how fast it be. So they cannot interact with each other directly but have to correspond to which we bring primary memory (RAM) into play which can operate faster via caching but it is not optimized as perceived from the below media:...

How to overcome challenges to Databases while Scaling

Now let us discuss below concepts that help us in scaling our databases and overcoming these challenges that are as follows:...

What is Indexing?

Indexing is a procedure introduced for database operations and other queries (received by CPU) are optimized by reducing the amount of time needed to complete a query, indexing helps optimize queries and other database processes while fetching data in lesser time. The indexes are stored using the B-tree data structure. Only utilize indexing if the data is massive and the application requires a lot of reading. Indexing may slow down write operations if an application is write-intensive....

What is Data partitioning?

It is a database procedure of partitioning that involves breaking up a very large table into a number of smaller sections. Queries that access only a tiny portion of the data can run faster since there is fewer data to scan when huge tables are divided into smaller individual tables. When the amount of data is large and a single system cannot handle it, partitioning is used....

What is Sharding?

Sharding is a very important concept that helps the system to keep data in different resources according to the sharding process. The word “Shard” means “a small part of a whole“.  Sharding means dividing a larger part into smaller parts. In DBMS, Sharding is a type of database partitioning in which a large Database is divided or partitioned into smaller data and different nodes. These shards are not only smaller, but also faster and hence easily manageable....

Contact Us