Definition

Partitioning is the process of splitting a dataset into multiple, smaller datasets and then assigning the responsibility of storing and processing them to different nodes of a distributed system. This allows us to increase the size of the data our system can handle, by adding more nodes to the system.

Types of partitioning

Horizontal Partitioning Algorithms

Range partitioning

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f5bf9787-c2d3-4b23-ab18-bb5a17301e1a/Untitled.png

Range partitioning is a technique, where a dataset is split into ranges, according to the value of a specific attribute. Each range is then stored in a separate node. Of course, the system should store and maintain a list of all these ranges, along with a mapping, indicating which node stores a specific range. In this way, when the system is receiving a request for a specific value (or a range of values), it consults this mapping to identify to which node (or nodes, respectively) the request should be redirected.

The advantages of this technique are:

Some of its disadvantages are:

Hash partitioning

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ee17b699-45ba-4bf9-83af-a0eb0fefcfd8/Untitled.png