Data Processing Systems

batch processing systems

system groups data in batches(groups)
groups are large (all items per day for ex)
high throughput at the cost of latency Example: MapReduce

stream processing systems

continuous stream of data
low latency at the cost of decreased throughput Example: Spark, Flink

micro-batch processing systems

very small batches
balance between latency and throughput

Spark vs MapReduce

MapReduce allows developing and running a huge number of parallel computations on a big number of machines in the same time. But every job has to read the input from disk and write the output to disk. So lower bound of execution is determined by disk speeds.