Chapter 2 — Querying paradigms (Designing Data-Intensive Applications)

3 min readJul 27, 2021

This post covers my notes from the section on query languages in the second chapter of the book — Designing Data-Intensive Applications by Martin Kleppmann.

The chapter focuses on the following theme:

In Part I, we talked about different types of data models, specific use cases in which each of them excel. The author compares relational, document, and graph data models in detail. This post addresses the following question — If data models are a way of representation, then what are the paradigms for querying them (Declarative, Imperative, and MapReduce).

The main differences between them is:

In the declarative style, you are only specifying what all information you are seeking. For example:

SELECT * FROM animals WHERE family = ‘Sharks’;

In the SQL query above, you are just telling the database engine what you need — information about animals which belong to the family of Sharks. You do not care whether behind the scenes the database engine is using any indexes or is it scanning every data point to find the required information.

So, from that perspective, this declarative style is simple to use for the end user.

Another aspect — For the query above, let’s say the engine is going through each data point and returning it if it is matching the filter (the where clause).

Now, consider the scenario where you add an index on the family attribute to retrieve the same information faster. Would that require any change to your query above?

The answer is no! In other words, declarative style allows to change the implementations without requiring change in the user’s query.

Few additional points to note:

The author presents another example of a query language that follows declarative style. The language is called Cypher and it is used to query over a graph database called Neo4j. SPARQL is another language used to query graph based data.
Declarative style is not only limited to database query languages. CSS and XSL also follow the same paradigm. The book offers examples around them too.
Imperative: In the imperative paradigm, you not only specify what you need but you also articulate the procedure to get that information.

function getSharks() {
var sharks = [];
for (var i = 0; i < animals.length; i++) {
if (animals[i].family === “Sharks”) {
sharks.push(animals[i]);
}
}
return sharks;
}

Code snippet source: Book, Page 42

In the database world, declarative query languages such as SQL are considered better than imperative because of the scope and opportunities that declarative style offers for underlying engine level optimizations and improving database performance.

Food for thought:
Out of declarative and imperative styles, which one do you think would be more amenable to parallel execution across multiple cores?
Give it a thought and see if it matches the answer provided at the end of the post.

MapReduce: This is neither declarative nor purely imperative. The author highlights the imperative aspect by quoting how “the logic is expressed with snippets of code” but does not comment on the declarative aspect. I believe the abstractions of map and reduce represent the declarative, high-level aspects.

The popularity of MapReduce paradigm in distributed computing comes from one of the properties or constraint that it enforces. The constraint being that the map and reduce functions must be pure functions, which means that they can only operate on the data that is passed to them and can not perform additional database queries. In other words, they are self-contained in a way.

Answer for the question above

The answer is declarative paradigm because the underlying algorithm is not a part of the paradigm whereas in the imperative style, you specify a procedure where certain operations may be interdependent. So, it is hard to parallelize.

That was all for this chapter. The next post will be going under the hood into the storage internals in databases and the data structures used there.

Thanks for reading and would be great to hear back on whether you liked the post or for any questions you may have.

Chapter 2 — Querying paradigms (Designing Data-Intensive Applications)

The chapter focuses on the following theme:

Written by Mahesh S Venkatachalam