OLAP vs. OLTP (Chapter 3 — Designing Data-Intensive Applications)
Here are my notes from the third chapter of the book — Designing Data-Intensive Applications by Martin Kleppmann.
The chapter focuses on data structures used for storing and querying databases.
- This is Part-IV and the last post of chapter 3. In the previous three posts ,we looked at an alternate storage mechanism to B-tree, known as LSM trees which are optimized for write-heavy workloads.
- In this post,
- we take a step back to draw another classification scheme for storing and retrieving data in databases — OLAP vs. OLTP
- re-visit LSM vs. B-trees with respect to the above higher-level classification.
OLAP/OLTP
1. While we focused on comparing databases as LSM vs. B-tree models in the earlier posts, they are common also in the sense that they both primarily for OLTP class of storage engine.
OLTP (Online Transaction Processing) systems are user-facing which implies they see a large volume of requests but each request touches only a small number of records. Think of a customer account query in a bank or a transaction during checkout at a grocery shop. Main bandwidth: Disk seek time.
Compare this to OLAP (Online Analytical Processing) systems which are known for summarizing, aggregating type data operations. As a consequence of the nature of these operations, they touch a large number of records. Main bottleneck — disk bandwidth.
2. Just like we compared LSM and B-trees under OLTP category, row vs. column-oriented storage models can be compared under OLAP. Here’s an example comparing row vs. column-oriented storage formats.
Source: DDIA Book, Chapter 3.
Benefits of column-oriented storage for analytical queries:
- Better compression since all values of a single column are stored together.
- Better IO utilization since analytical queries typically access/process a subset of columns in the database table.
This concludes posts around chapter 3 from this book. Do drop a note if this helps or for any feedback/questions.
Mahesh SV