Overview

Developed by Facebook(then became open sourced) and inspired by Dynamo and BigTable papers.

Goal: extremely high availability (high throughput/low latency with emphasis to heavy write load)

It consists of keyspaces, that can contain multiple tables. A table contains multiple rows characterized by the schema. The schema defines a structure for each row and a primary key.

Primary key has two components:

  1. Partition key. Defines distribution of data. Rows are split to different partitions. Partition resides on one disk space.
  2. (Optional) Clustering columns. Defines how rows on the same partition will be stored on a disk(asc desc etc).

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/354b056a-1518-494d-bc39-55f46b2c8a75/Untitled.png

If both are present, then it's called compound primary key.

Each partition contains rows with the same value for defined partition key.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b9cd39d2-2c9e-402d-9bd5-6ebc95ba2fd0/Untitled.png

Architecture

Cassandra uses consistent hashing for load distribution. Each partition replicated across N nodes and it's called the replication factor.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/83e7b0fa-4f0c-41a4-9a41-b85d16129a93/Untitled.png

Consistent Hashing

The nodes of the cluster communicate with each other using gossip protocol. In this way nodes are able to keep track of which nodes are responsible for which token ranges. Also it's used to determine unhealthy servers.

Gossip protocol

Cassandra doesn't have leader, so node that has been requested is called coordinator node. It's responsible for managing and executing request on behalf of the client. This node identifies nodes that contain requested partition and dispatches requests.

Nodes can handle writes concurrently. So in case of conflict Cassandra uses LWW (last write wins). Each row is written with timestamp. Then after collecting responses coordinator returns latest.