Developed by Facebook(then became open sourced) and inspired by Dynamo and BigTable papers.
Goal: extremely high availability (high throughput/low latency with emphasis to heavy write load)
It consists of keyspaces, that can contain multiple tables. A table contains multiple rows characterized by the schema. The schema defines a structure for each row and a primary key.
Primary key has two components:
If both are present, then it's called compound primary key.
Each partition contains rows with the same value for defined partition key.
Cassandra uses consistent hashing for load distribution. Each partition replicated across N nodes and it's called the replication factor.
The nodes of the cluster communicate with each other using gossip protocol. In this way nodes are able to keep track of which nodes are responsible for which token ranges. Also it's used to determine unhealthy servers.
Cassandra doesn't have leader, so node that has been requested is called coordinator node. It's responsible for managing and executing request on behalf of the client. This node identifies nodes that contain requested partition and dispatches requests.
Nodes can handle writes concurrently. So in case of conflict Cassandra uses LWW (last write wins). Each row is written with timestamp. Then after collecting responses coordinator returns latest.