Definition

Partitioning is the process of splitting a dataset into multiple, smaller datasets and then assigning the responsibility of storing and processing them to different nodes of a distributed system. This allows us to increase the size of the data our system can handle, by adding more nodes to the system.

Types of partitioning

Vertical

Vertical partitioning involves splitting a table into multiple tables with fewer columns and using additional tables to store columns that serve the purpose of relating rows across tables (commonly referred to as a join operation). These different tables can then be stored in different nodes. Normalization8 is one way to perform vertical partitioning, but general vertical partitioning can go far beyond that, splitting columns, even when they are normalized.
Horizontal (sharding) On the other hand, horizontal partitioning involves splitting a table into multiple, smaller tables, where each of those tables contain a percentage of the rows of the initial table. These different sub-tables can then be stored in different nodes.

Horizontal Partitioning Algorithms

Range partitioning

Range partitioning is a technique, where a dataset is split into ranges, according to the value of a specific attribute. Each range is then stored in a separate node. Of course, the system should store and maintain a list of all these ranges, along with a mapping, indicating which node stores a specific range. In this way, when the system is receiving a request for a specific value (or a range of values), it consults this mapping to identify to which node (or nodes, respectively) the request should be redirected.

The advantages of this technique are:

its simplicity and ease of implementation.
the ability to perform range queries, using the value that is used as the partitioning key. a good performance for range queries using the partitioning key, when the queried range is small and resides in a single node.
easy and efficient way to adjust the ranges (re-partition), since one range can be increased or decreased, exchanging data only between 2 nodes.

Some of its disadvantages are:

the inability to perform range queries, using other keys than the partitioning key.
a bad performance for range queries using the partitioning key, when the queried range is big and resides in multiple nodes.
an uneven distribution of the traffic or the data, causing some nodes to be overloaded.

Definition

Types of partitioning

Horizontal Partitioning Algorithms

Range partitioning

Hash partitioning