- system groups data in batches(groups)
- groups are large (all items per day for ex)
- high throughput at the cost of latency
Example: MapReduce
- stream processing systems
- continuous stream of data
- low latency at the cost of decreased throughput
Example: Spark, Flink
- micro-batch processing systems
- very small batches
- balance between latency and throughput
Spark vs MapReduce
MapReduce allows developing and running a huge number of parallel computations on a big number of machines in the same time. But every job has to read the input from disk and write the output to disk. So lower bound of execution is determined by disk speeds.