Ep. 98Monday, January 26, 2026

Replication, Partitioning, & Transactions - Designing Data-Intensive Applications by Martin Kleppman

Watch on YouTube

Book Covered

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Get the book →

Book links are affiliate links. We earn from qualifying purchases.

Author

Martin Kleppmann

Hosts

Carter MorganHost

Nathan ToupsHost

Transcript

This transcript was auto-generated by our recording software and may contain errors.

Carter Morgan (00:00)

this is kind of all of the low level problems that a lot of the existing databases are solving for the people who designed the databases need a lot of the advice in this book. ⁓ and this book is much better for understanding problems.

and not necessarily for understanding the solutions.

Hey there, you're listening to Book Overflows, the podcast for software engineers by software engineers where every week we read one of the best technical books in the world in an effort to improve our craft. I'm Carter Morgan. I'm joined here as always by my co-host, Nathan Doops. How are you doing, Nathan?

Nathan Toups (00:35)

Doing great. Hey, everybody.

Carter Morgan (00:37)

Well, thanks for listening everyone. As always, like, comment, subscribe wherever you're at. Leave a five star review if you're on an audio platform. You can always book time with us on Leland. If you want some one-on-one coaching with Nathan Ri, you can join our discord if you'd like. We have a link for that and we're giving out a special Nathan. What I need to know more about discord terminology. Are these badges?

Nathan Toups (00:58)

Yeah, they're called roles, they can have special privileges. I think we're going to do something cool with them later. Alpha Tester just closed out with the first 128 members on the server. You could still become a beta tester, which is our second round of new folks. I think we're going to go up to 512 users there. And you'll have this distinction for the rest of the life of the server that you were a OG.

Carter Morgan (01:01)

Rolls, okay.

There we go. So get in and out folks. ⁓ and yeah, that's, I think that's it kind of for housekeeping up top. this is another week of designing data intensive applications. Thanks everyone who listened to the first episode. It's had a great response from our, ⁓ listener base and you've got lots more episodes coming your way. I think we said this is going to be a four parter. Maybe it'll be a five parter. I don't know. There's going to be a lot of this. Yes. And boy, it was.

Nathan Toups (01:45)

It's getting denser as we go, we might, we'll see what happens.

Carter Morgan (01:51)

dense this week. Let's, we read chapters five through seven this week. Let's do the author introduction and the book introduction for anyone who forgot from last week or is just tuning in for this episode. This is, the author is Martin Kleppman. He is an associate professor at the University of Cambridge, where he works in the distributed systems and local first collaboration software. Before academia, he was in the trenches. He co-founded Reportive, which was acquired by LinkedIn in 2012, where he worked on large scale data infrastructure.

He's also one of the people behind AutoMerge, an open source library for building cooperative or collaborative applications. And the book introduction is, data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL data source, streamer batch processors, and message brokers. What are the right choices for your application? How do you make sense of all of these buzzwords? In this practical and comprehensive guide,

Author Martin Kleppman helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice and how to make full use of data in modern applications. So we read chapters one through four last week. We're on chapters five through seven this week. This book is divided into four clean parts, but they're not all equal in length.

Nathan Toups (03:16)

Oh, no, it's

actually, so yeah, it's from correction from last week. It's three parts, and we're having to subdivide more and more, because they get long. yeah. Yeah.

Carter Morgan (03:19)

But, ⁓ what do we know?

Yes, yes. So I guess three parts. again,

how we divide this episode up is, or this series of episodes up is still anyone's guess.

Nathan Toups (03:35)

We don't

quite finish part two. So I think that there's either one or two more chapters for part two, ⁓ which is all about distributed data. ⁓

Carter Morgan (03:38)

Yes.

I guess,

yes, let me, at the top of the episode, why don't I give the chapters, the titles of the chapters we read, just so everyone has an idea. Let me see, I don't wanna get these wrong, so I'm pulling it up right now. Okay, so.

That would be, wait just a sec. ⁓ what are they? Do you have them? Okay, you got them. Yes, okay.

Nathan Toups (04:07)

Yeah, I actually have them right here. yeah, chapter five is on replication, chapter six

is on partitioning, and chapter seven is on transactions. And so that's how far we got. We finished transactions for this ⁓ discussion episode. ⁓ And ⁓ yeah, I can go ahead and share some general thoughts that I have if you want. So.

Carter Morgan (04:25)

Please do, please do.

Nathan Toups (04:28)

It's dense. And I think that we're gonna get denser and denser, right? I feel like we're getting deeper into a forest. And so like we were kind of at the fun outskirts where most of the people take their hikes and we've like gone down a trail that maybe hasn't been maintained as much. And this is kind of the feedback, like even the comments in our first episode, they're like, yeah, I've got the chapter three, but I haven't read past that. And I was like, if I wasn't reading this for the podcast, I easily could have gotten stuck at chapter three or chapter four.

Carter Morgan (04:30)

really.

Yeah.

Right.

Nathan Toups (04:56)

And then like, you know what? I got like other stuff I need to do. I still think you can get a lot out of the book, but it's, will tell you, it's been rewarding to hold myself accountable. These are important chapters. Distributed data is hard. And I think that one of the things that we're gonna really kind of go over here is that...

there's a lot of really great solutions that didn't exist in 2017 that are hosted and managed solutions. And you should probably not have to solve these problems yourself. That being said, it's really, you won't get an appreciation for why PlanetScale or AWS Aurora is amazing unless you read these chapters, right? This is really gonna give you a deep understanding of like, man, I don't wanna do that myself. Like self-managing, you know. ⁓

Carter Morgan (05:22)

Yes.

Right, right.

Nathan Toups (05:45)

Dirty reads or something like that's hard. That's a hard problem. So yeah

Carter Morgan (05:50)

Every time he mentions like, this is a problem that developers have struggled with for years and at present exists no solution. like, huh, I wonder if there is a solution now, right? Beats me. Yeah. Yeah. I mean, yeah, this book is dense. And I was just thinking about this because like, I'm listening to this on audio book, going back occasionally to read the ebook. ⁓

Nathan Toups (06:01)

think this is a call to action. I think a whole entire generation of startups are like, OK, I read Kleppman's book. Let's solve this problem.

Carter Morgan (06:20)

And, know, sometimes I feel bad because like, I'm not committing everything that this book says to memory. But then I had the thought, was like, okay, let's say I'm at a university teaching undergrad and I'm teaching designing data intensive applications for a one, right? It's like a senior level class. Like, I don't think I would use this entire book as the textbook. Like, and I don't mean that in like a bad way is in like, they're not useful parts of it, but that there is so much material in this textbook that for an entire.

college level class, I wouldn't, I think this would be too much to expect a student to kind of understand over the course of an entire class. And so I'm giving myself some credit covering it over the course of four episodes of a podcast. ⁓ But this book does expose you to a lot of surface area. ⁓ And I think even if you listen to this or read it and kind of come away like, okay, that concept I didn't entirely grock.

It's still a concept you understand a lot better. And I'm finding that with my experience listening to this, anything which I have kind of heard of before, and he has a chapter or a section of a chapter really digging deeper into it, I'm getting a lot out of that. Like acid, he had a whole chapter on acid and what that means and kind of the evolution of it. I'm like, okay, this is great. Cause I already had a decent understanding of acid and now this, you know, I'm getting a deeper understanding.

And then we get to some parts like, I don't know. I'm just trying to even see here. Like there's a lot guys like, um, the whole transaction chap, like snapshot isolation and weak isolation levels and serial execution, two phase locking. And I'm just like, okay, I have now heard these terms. And in the future, if I hear them again, there'll be, you know, some good surface area there, but, uh, yeah, like it's a lot. so.

Nathan Toups (07:59)

yeah, snapchat.

Carter Morgan (08:16)

I guess that's just my, maybe a call to sympathy for anyone who's reading this. If you're reading this and going like, man, there's a lot here and I don't feel like I'm quite getting all of this. This is our, ⁓ not our job, but you know, Nathan and I read a lot of software engineering books. so if we are, I are coming away from this thing, like there's a lot here and I'm grokking all of it. It's like, well, no, I don't know. It's a really intense book. And so I get that there, I get why there's a very

There's a badge of honor within our community if you've read this book and we will claim that badge of honor.

Nathan Toups (08:50)

Also, another quick shout out to the Discord is we open these books up for discussion in, we have like a forum section on book discussions. So if you ever are curious about a preview of our thinking on stuff or if you even want to hear what the other readers are who are now starting to join along, ⁓ really cool opportunity. I was able to kind of talk about.

where my brain was on some of this in that discussion forum. And so that was ⁓ really useful because I actually forgot to fill out my general thoughts in our guide document this week. And I was like, let me go back and look in the Discord and see what I was actually thinking.

Carter Morgan (09:26)

Yeah. ⁓

yeah, with that, we can get right into it. ⁓ Again, beginning at part two, we've got chapters five through seven here. I love part two introduction. says, why distribute data? And kind of gives a couple of reasons because, know, the simplest, your simplest database setup can just be a single database on a single machine. Right. But then he gives a couple of reasons for why you want to do this. says scalability, which makes a lot of sense. Basically, if you max out that one database, there's only ⁓

where it's well known that vertical scaling, the idea of making one machine stronger, know, more RAM, you know, ⁓ more storage, ⁓ has its limits and becomes expensive much more quickly than horizontal scaling. And so if you can distribute your data across multiple machines, you can ⁓ get a higher level of scalability. Talks about fault tolerance and high availability. So that's different from scalability because if you just had one copy,

You distribute across multiple loads, but if each one of those ⁓ machines has a unique copy of its data, then you're not necessarily achieving fault tolerance or high availability, at least for that subset of the data. guess you could say if one section goes down, you can still serve data from all of the other sections. ⁓ And then latency is another ⁓ characteristic he talks about here, which is that if you can distribute your data to be closer to where the actual user is accessing it,

you can decrease latency. And again, this would be ⁓ more in case where you have multiple copies of the same data, which you might think, well, that sounds pretty easy, right? Just have multiple copies distributed. No, that's what this whole section is about is how stinking hard it is to have multiple copies of even the same data.

Nathan Toups (11:46)

Yeah, it's nuts. And I think when I was reading these, I kept thinking about like, you're spinning plates. Like I feel like when you start building your custom things, it's like, okay, I'm spinning this plate, I'm spinning this plate. And then you're like that old comedy routines where the guy's running around trying to keep all the plates spinning. This book really does a great job.

of being like, yeah, so you think this is a great idea. Here's all the things that can go wrong. I think a lot of, especially in databases, databases have been around a long time, right? They actually predate most computing for most people, right? Companies were buying mainframes with database infrastructure for a very, very long time. SQL's been around a tremendous amount of time. And they are also probably one of the worst.

Carter Morgan (12:11)

Yeah.

Nathan Toups (12:35)

places for marketing jargon. And he gets into this, where, you know, what does it even mean to be fault tolerant? Like, what does it even mean to have high availability? You really start having to dig into, like, past a claim. And he doesn't talk about this too much in the book. I think, actually, the beginning of chapter eight, noticed he, I saw a peek of it, and he mentions Jepson, I think. Which, are you familiar with Jepson at all? I'm gonna bring it up for, a hot second. Yeah, so.

Carter Morgan (13:02)

I am not familiar with Jetson. I'm familiar with

the Jetsons TV show.

Nathan Toups (13:05)

Yeah, Jetsons, yeah, no, so

Jepson is this, was a project that came out of Stripe, I think. ⁓ Correct me in the comments if I'm wrong. But ⁓ Affir, or Affir is the person who wrote it. ⁓ It's all written in Clojure. There's this whole thing, and the people who are into Jepson are really into it. But it's a testing framework to test the claims of distributed databases. And it's the whole reason I'm bringing this up is that, ⁓

There was a lot of companies like MongoDB that made claims about what their system could do and how resilient it was. And Jepson would just blow it out of the water. It would come up with failure states. that's why Mongo's gotten a lot better over the years. So it got a really bad rap around this 2015 to 2017 time period because there were some pretty catastrophic failures when it came to like the...

Carter Morgan (13:43)

Really? Wow.

Nathan Toups (13:59)

consensus being divided and then coming back together and how does it heal and who's in charge and how do you decide who the leader is and all these other kind of resiliency claims and Jepson kind of changed the whole framing of like, how do I make a claim about how my system shards or how replication works or how partitioning works or who's the, what's the leader versus leaderless thing. And so ⁓ I will say that like, I think it's books like this in tools like Jepson.

that have really matured the conversation. then, and I wrote down some stuff, some research I had done, and we'll talk about it later. But there's like whole categories of databases like didn't exist. Like I think technically Spanner existed, Google's Spanner database existed in 2017. I don't think it was general, I don't think it was GA yet. And so I don't think most people could use it. But now, if you need a consistent distributed global database, it's like a,

Carter Morgan (14:43)

Yeah, yeah.

Yeah, was.

Nathan Toups (14:57)

it's a switch that you turn on in your Google Cloud, Like there's like really, again, you won't appreciate how amazing of a technology that Spanner is or CockroachDB or a lot of these other kind of tools are unless you've read books like this and go, oh man, replication's hard, I'm not doing this.

Carter Morgan (15:00)

Yeah.

Yeah, right. And I think that's what's challenging about reading a book like this is how, how much of this is directly applicable to most software engineers work, right? Cause at what point do you just use Google spanner, right? Or MongoDB or, know, I, and, and I think we're going to talk about this later, but there's even just a question of like, how many people run into these kinds of issues in there?

in their career. mean, if you're at any sort of startup or small company that starts running into these sorts of issues, like you've hit the jackpot, right? Because like that means your company's probably doing really, really well. ⁓ And even if you get to like some really, really big companies like Big Tech, it's a, you know, a lot of times a lot of this stuff is somewhat abstracted away from you and they'll have entire teams dedicated to maintaining these ⁓ custom, you know, intensive databases to make it so you don't have to worry about all this stuff. ⁓

which I am very grateful for because yeah, every time, like this whole book is like, sounds simple, right? Wrong. And also this book says like a million times, like, we use this term in the industry, but really this term is poorly defined or like, is poorly named because of this, this and this. I'm just like, dang, like, and like Kleppman, like he knows his stuff. ⁓ But man.

Nathan Toups (16:22)

Right?

Yeah, and

it's a very approachable tone. I will say it'd be so easy. This book is dense as it is, is actually an easy read from any individual section that you're in. You might have to pause and go, what did I just read? know, like, let me really think of, he'll give some, especially later when we get into these kind of contrived, but useful thought experiments of like a scheduling system for, ⁓

Carter Morgan (16:42)

Yes, yes.

Yes.

Yeah.

Nathan Toups (17:06)

like an office meeting room and how you can get these weird data races. And you're like, yeah, yeah, that is hard to reason about. You kind of have these little aha moments. I had to rewind it a few times. I was listening to the audio book for a good chunk of it. Like I said, I'd go back and look at the diagrams and some other things, but there were definitely times in which it would make me start thinking and then I realized I didn't hear the last three paragraphs and I had to rewind.

Carter Morgan (17:16)

Yeah.

Well, I'll say if Kleppman is listening and taking advice for the second edition, he does give examples every now and again. like, like a meeting scheduling software or one is about doctors at a hospital and they're on call schedule. Right. And anytime he does that, I kind of, I'm like, ⁓ okay. I see what we're talking about here. I wish he did it more often. I think if he did it more often or if like every kind of concept had an example like that, then

Nathan Toups (17:44)

Yeah, that was a good one.

Carter Morgan (18:00)

that might help it stick a little more. ⁓ But as it stands, like, yeah, I would agree. It's fairly approachable. ⁓ And so when we talk about distributed data, he talks about two mechanisms for data distribution. And we talked a bit about this up top.

The two mechanisms are ⁓ replication, which is the same data on multiple nodes, and partitioning, which is different data on different nodes. And it's really important to kind of understand the differences between these two concepts. And that's basically going to be what the whole episode is devoted to, because a lot of times, like if you're in a system design interview and we're talking about like large amounts of data, what we jump to immediately is partitioning. We talk about sharding, ⁓ you know, and as our data grows, but...

Replication is also important because that helps you achieve fault tolerance that it helps you achieve lower latency. Like we talked about at the beginning, but replication is not as easy as flipping a switch. There is a lot that goes into making sure that, your replicated data is consistent across the entire system. ⁓ and so, yeah, I guess we can get into that with chapter five replication. Nathan, is there anything that stands out to you here when we talk about, ⁓ the importance of data replication?

Nathan Toups (19:18)

So there's a few reasons. You really need to understand, and this is the chapter really dives into, what's the shape of the change management to your data? If you're doing something that writes occasionally and reads a ton, there's a lot more stuff you can do as far as how you distribute read data.

And so for instance, let's say we did this actually at my fintech company. And I think this would be worthwhile bringing up because we had to think a lot about we'd have very high re transactions. We would do a bunch of model training and we were doing like 30,000 requests per second. But the data would only change on a big data pipeline at night, you know, like between like midnight and five in the morning. So we could do really aggressive distributed caching.

because the data wasn't changing all the time. And so it made it so that we could highly optimize for read throughput. But if we were constantly changing the data through the day, this would have been an inappropriate architecture. So we could make these really cool replication patterns, specifically with edge caching and a bunch of other kind cool stuff, and know that the data wasn't stale or sterile. And this is the piece that we're talking about here is that as soon as you start having

Carter Morgan (20:22)

Mm-hmm.

Nathan Toups (20:40)

transaction data mixed in with a bunch of distributed data, replication gets really hard ⁓ because you're like, which part of your system is it okay to get a stale read, right? So this idea is that like, say you're playing a video game, I think he brings this kind of up in this section. What is essential to block and get the correct information versus what can be asynchronous and if you get kind of like,

Carter Morgan (20:52)

Yeah.

Nathan Toups (21:06)

you know, a total hit count, maybe that can have a lag to it, or maybe you can eventually get consistent reads of like the number of people watching your video stream versus like, should you block and only show literally everyone on the stream? Does everyone have to see the exact, you know, watcher count at all times? And if that's the case, then you have a much harder distributed systems problem, right? And so he kind of, I thought it was kind of cool because like there was no, is right, this is wrong. It's really just thinking about like,

Well, what does it mean if you have to have synchronous data and it's a distributed system? ⁓ You also then have to get, there has to be some way of guaranteeing consistency, right? So we, I think we get into ACID stuff later. ⁓ The idea of consistency or eventual consistency is a big part of data replication, ⁓ you know, which is if you have atomic consistency, meaning give me the state of the data, the database blocks.

and says, here's the authoritative answer. And it gives it to everyone. That can actually be very expensive in large scale systems. ⁓ You end up having all these coordination problems. Or you could have asynchronous where you're like, ⁓ know, writes will always be consistent. Like we will be very like pedantic about that. But maybe reads, they can be asynchronous. yeah, it doesn't really matter if like my profile description on my Twitter account,

Carter Morgan (22:30)

Right.

Nathan Toups (22:34)

is kind of not everyone sees it as soon as I make the update, you know?

Carter Morgan (22:38)

Right,

right. Well, and this is where, ⁓ this is the whole idea behind like the cap theorem, right? That a database from what we understand, can, there's consistency, availability and partitioning. can have two of the three. so you're kind of a large distributed NoSQL database, is prioritizing consist it's. Or it has availability and is resistant to partitioning, but it is not consistent. Right.

⁓ and, there are ways you can sort of fudge it a bit. ⁓ so in a, a very common pattern in a distributed database is the, ⁓ master primary. mean, this is the, there's a lot of names for it in the past. This was called the master slave pattern. now there's leader follower primary replicas. think leader followers, what this book goes with. And so that's the terminology we'll use.

But that's the idea that all rights must go to the leader, which will then ⁓ eventually write those to the followers. And then the reads can happen to the followers. But there are some examples where you don't want to just send a read to any follower because you want ⁓ from the user's perspective, that data to appear consistent. so like you're talking about, if I go to someone's Twitter bio and it turns out they actually update it in the past minute.

and it's there and it's not actually updated. That's probably not a big deal to me. But what about my own Twitter bio? I update it and I want to know that it was updated. And so if I update it and the system tells me, oh yeah, it's going to be updated eventually, how am I supposed to know that? It's just gonna look broken to me. So they say, or Cartman says, a common pattern for all user-related data is

to make sure that that is read directly from the leader node. So you will write and update the leader immediately, and then you're always getting ⁓ your data from the leader node. So to you, everything looks immediately available and entirely consistent, but ⁓ it might not look like that for all other profiles on a website.

Nathan Toups (24:55)

Yeah, it is, it's a really, these are some really juicy problems because you start running into things like determinism, you run into problems of like when did an event happen first? There's a lot of focus on ⁓ how data replication,

Carter Morgan (25:09)

Right.

Nathan Toups (25:21)

I'm trying to, yeah, for instance, if you have to have consistency, ⁓ maybe that's one of the design goals. You always want to see all of the same data all the time, right? So if I update my profile, everyone sees my profile update as soon as that's happened. I have to make a choice. I either say, well, I'll do consistency and availability, and my trade-off is that my system can't operate despite

network partitions, right? So I say that partitioning isn't something I can't guarantee. If availability and consistency are my most important thing. A single server running like Postgres is a very good example of like making that available consistent data in a single place. But maybe, if I now I need to do network partitions like sharding and distributed database, I can't guarantee that.

the data that you see in one shard and the data you see in another shard are gonna be exactly identical, right? Now if I want consistency in partitioning, I might lose availability, right? So again, there gets these these trade-offs that we're now talking about. And there is no perfect answer. Now I will say, one of the things I thought was interesting, you talked about eventual consistency, is that it's this very amorphous term, right?

What does eventual mean? And I think this is one of the things I really appreciate about this book. It's left ambiguous. Some systems, eventual means sub three seconds, right? That's the eventual consistency. It's like a pretty quick thing, but if you need real-time data that's always accurate, this is not gonna meet that goal. But if it's like, hmm, three seconds of lag, who cares? That's still eventually consistent. You still could get a stale read in that three-second window. But that's a

That's pretty okay. And I think a lot of people with like, your S3 for the longest time was eventual consistency, like you would get the correct answer the first time you wrote a blob to the blob storage. But if you ever made a change to this three ⁓ key value store, right? So if you made an update to the file, it was eventually consistent. So if people were requesting this, it had to propagate through their data availability infrastructure and you might get a stale read.

for some period of time. ⁓ That's different. You can now tell S3 to make it consistent and that if I make a change, don't make the data available until it's actually the correct response. But ⁓ again, the world's different than it was in 2017. Yeah.

Carter Morgan (28:00)

Yeah, but I think

it's a, it's just a good habit to get into as a software engineer, which is we can't treat the tools we use as magic, right? I think you can, you can pull something like Mongo off the shelf and be like, my gosh, like infinitely scalable, you know, perfect. This solves all of our needs, but you need to be aware of all of the trade-offs that come with that. talked a bit about that last week with like indexes and making sure that you're not writing these really ridiculously slow queries. ⁓ but another one of those is like,

I mean, I'm thinking about this part myself, like we use Mongo. I'm not sure what Mongo's consistency guarantees are, right? And we've only got three nodes on our database, but like what, how, what is the lag time there between when you write to the leader and when it ultimately shows up on the followers? I'm sure Mongo gives some amount of SLO ⁓ regarding that, but I have no idea what it is. ⁓

Nathan Toups (28:51)

Yeah,

that would be really interesting. Also, one of the things that the book brought up well, which is that, and I haven't always set this up, but I regretted when I didn't, you can, most database managed database metrics have a replication lag parameter that you can watch now. And that's a really nice one to set alerts on if you're doing asynchronous read replication, which is, in a lot of workloads, it makes total sense.

especially if you don't have to have up to the second, it'll tell you what the lag is for that read to be consistent. So it might be that it's 200 milliseconds, and that's pretty consistent, but if it gets backed up for some reason, and it goes up to four, five, six seconds, or it goes to two minutes, if I was an SRE, I would wanna know, right? Like would wanna know that all of a sudden the behavior of my application has changed, and it's something that we need to think about.

Carter Morgan (29:39)

Right, right.

I, I, the terminology I find really funny is, ⁓ that if the leader database or if the leader node goes down, then a new, a follower node may become the leader. And the term for this is an election that basically the follower node will nominate itself to be the leader. And then all the other nodes in the database vote on who the new leader should be. And from my understanding, this is different across, ⁓ database systems, but basically like

vote is not a preferential vote it's ⁓ it's an algorithm that runs to determine if you which particular node satisfies the exactly

Nathan Toups (30:24)

Yeah, it's like a fitness function. And

the two big ones that they talk about in this book, I remember back at this time, because it's funny, they talk about this database technology called React, which I was obsessed with for a little while. Nobody cares about React anymore. But React was like this really cool consensus. It used ⁓ Raft protocol, which was new at the time, actually. John Osterholt is one of the authors of the Raft protocol. ⁓

Carter Morgan (30:48)

No.

Nathan Toups (30:51)

But Paxos and Raft were the two big consensus algorithms. There's another one called Gossip, which isn't talked about as much, but it's like a third ⁓ algorithm of how do you distribute information. But Raft has kind of won. So it's funny that you were talking about this, Paxos was very complex. Raft is much simpler. It's a very simple election model. And it is, it's based off of like,

you observe, you know, what's the latency? Like if you need to elect a new leader, it's like, how low is the latency? You know, are they giving me honest answers? Like there's several things. And then I'm like, I think this should be a leader. And if everybody agrees, then that's who gets elected. ⁓ There's actually some really nice web visualizations on how Raft protocol works, like how big of the consensus piece is and how chatty it is. And I'll tell you, if you've used etcd, which is the way is the key value store for Kubernetes,

Carter Morgan (31:19)

Mm-hmm.

Nathan Toups (31:43)

you're using Raft Protocol. ⁓ If you, yeah, and so like, are, Raft Protocol has become a console, which is like what HashiCorp uses, also uses Raft Protocol. It's become like pretty ubiquitous since the writing of this book. ⁓ But it's non-trivial. It really is, there are these hard problems. actually, this section also gets into this of like, how does replication work? like, talk about.

Carter Morgan (31:45)

Interesting.

Nathan Toups (32:10)

leadership, leaderless replication, you know, there's all these kind of like, how failovers function, what problems can happen. like, for instance, let's say you get your network cut into, you've elected two leaders, and then now the network heals, and you need to go back to having the real leader. Well, if I write, if this leader was coordinating rights, and this leader is coordinating rights, how do I heal that? How do I say which,

Carter Morgan (32:37)

Mm-hmm.

Nathan Toups (32:40)

know, write ahead log is the one that wins. And this is not a trivial problem. This is like literally any distributed database system has to come up with novel ways of handling this. And if you're managing these databases, you need to understand what happens when this thing's in place. Cause you can have, just like we've had as software engineers, a merge conflict, right? Two of us are working on the same piece of data. You have to make decisions on like, okay, well actually, know what? Carter's work on this function.

makes more sense than mine, but I actually, I fixed this other thing and we have to kind of come up with some complex way of fixing this merge conflict. ⁓ Distributed databases have to do this automatically or return some failure state to let you know that you can't reason about the system, there's something wrong, we don't know what the state of the data is. ⁓ This book, again, it gives you this really great breakdown of like, ooh man, this is a hard problem. ⁓

He also get quick shout out, he talks about ways that you can do conflict resolution. think this one, kind of like, I think I ended up talking about it at the very end of the podcast too, which is CRDTs, which is something that I know of, but I've never spent time with. This is that conflict free replicated data types. And so CRDTs, if you've used Google Docs, right? In your collaborative editing in Figma or Google Docs.

they're all using some form of a CRDT. It's some way of saying, let's say we're both working on paragraph two, who wins, who gets the data in there? If I delete something and you write something, there's a conflict resolution algorithm. ⁓ These were brand new research in 2017, and there's major, major applications that are using very mature versions of CRDTs now. And down to Klepman, who is now working on AutoMerge. AutoMerge is a CRDT problem, and I think there's...

Carter Morgan (34:30)

Damn.

Nathan Toups (34:35)

Was it YGS or something? There's a few others that are very interesting. And they're juicy. I can see why he gravitated to this, because he makes this case of all these crazy weird conflict resolutions. you're like, if you're a math PhD or computer science PhD and you want to work on something, CRDT seems like a really worthwhile.

Carter Morgan (34:55)

Yeah. I, I always want to ask authors and sometimes we do. I'm just like, why? I remember we asked this of Adrian, but for Gonsa who wrote looks good to me, like, which is a whole book on like, uh, pull request, like our code review etiquette. Right. And I'm like, why, why this? Why did your mind gravitate to this of all things? And I feel this with Klepman. like, obviously his mind didn't just gravitate to this. It like was consumed by this. Like he, he wrote the book on.

designing data intensive applications. And I'd be fascinated to learn like, about this did he find so fascinating? Like we talked about, I was like in a book where it has like a pitfalls or anti-patterns chapter or section. And ⁓ I find that really helps me to internalize a concept when he says like, okay, we've been talking about ⁓ these patterns or strategies, but this is why you'd wanna use it. Because if you don't use these patterns or strategies, you wind up with these problems. ⁓

We talked about one of them already, which is reading your own rights, which is the idea that like if the user submits data, but then the next read comes from a stale follower, it'll look like they didn't submit their data. So the solution there is make sure the user's data, ⁓ user is always reading from the leader for their own data. Another, he calls it monotonic reads. This is the idea that the user reads from a fresh replica and then a stale replica. So time appears to go backwards. Do you remember, had an example for this in the book, which I can't quite remember.

⁓ but basically that like, if you were to, I don't know, maybe there's some sort of like time log, like it's a log of all the, know, of, ⁓ of what someone is doing during their day. Right. And then you refresh the page expecting, ⁓ to see the next entry, but actually maybe a new entry hasn't been entered and worse, ⁓ you now get routed to a stale replica. so that last entry in the daily log is now gone.

And so that would be an example of like time moving backwards. Another simple solution here, just route each user to the same replica. So you hash the user ID and make sure that they're always reading from the same replica. Again, we could say this is an easy solution, but you can already imagine the problems that this introduces because theoretically one of the points of a distributed database is fault tolerance. And so if one replica goes down, it's not a big problem because the user just goes to the next replica. so, and he talks about that too, which is that the boundaries when designing a distributed, ⁓

data application get fuzzy between what is being handled at the data level and what is being handled at the application level. Because you can't write into the database, hey, if user with this ID comes in, send them to this node. That's something that needs to happen at the application level. And then that brings in problems because like, what about service discovery or like node discovery, right? What happens when you're, you know, your node, now you have three nodes or four nodes or five nodes, right?

Nathan Toups (37:31)

Great.

Carter Morgan (37:51)

How is your hashing function taking care of that? ⁓ And so, yeah, it's a complicated.

Nathan Toups (37:56)

And so this is funny, this is

a funny thing too because some of these actually have been largely solved with like modern Kubernetes infrastructure and stuff, which is that like service discovery used to be this really complex system called ZooKeeper. And talk about it in the book a little bit. No one uses ZooKeeper anymore unless you are on some legacy system that has to use ZooKeeper. There are so many other systems now that allow you to do service discovery.

Carter Morgan (38:09)

Mm-hmm. Yes.

Nathan Toups (38:24)

in a really nice way. are now partitioning algorithms and he also talks about like consistent hashing and some of the other things that were like non-solved. I'm not claiming that they're a solved problem, but they are the ergonomics in the same defaults that are in these modern service oriented infrastructure things. used to have to think about this stuff a lot and I don't really have to anymore if I'm using nice.

modern tooling. again, I think this is like a, sometimes there's a lot of innovation that's happened over last 10 years that's been largely invisible. Like I can easily look back and go, man, you some stuff really hasn't changed that much. mean, large language models have changed that conversation a lot recently. But when we think about core web technology and I'm like, you know what? Everything is just like nicer. Like even if, even if I'm self hosting a database, have a Yuga byte and ⁓ even modern Postgres like,

Carter Morgan (39:07)

Right.

Nathan Toups (39:22)

half the stuff that he talks about here in Postgres, of like, oh, Postgres doesn't do this yet, or Postgres doesn't do that. I was like doing research, because I was like, I thought Postgres did do that. And I'm like, oh, well, Postgres has been doing this since version 10, and now we're at version, like, what, 15 or 17 or something. And it's like, Postgres seems to have done an amazing job of just like listening, and being like, oh yeah, that would be a nice feature. Could we give a guarantee on, you know,

Carter Morgan (39:31)

Yeah, right.

Nathan Toups (39:51)

can we give a guarantee on serialization of rights or something? ⁓ it's evolved in a lot of really great ways. so, yeah. Again, I appreciate, I think this is one of those let's honor our forefather kind of sections to the book of how, if you try to implement something yourself or build something, you're like, I don't like any of the databases, I'm gonna build my own database.

Carter Morgan (40:07)

Yeah.

Right,

right.

Nathan Toups (40:15)

these are all the things that you're going to have to like stand on the shoulder of giants for because there's a lot of expectations now and there's a lot of really amazing tooling. A little fun little side note. One of the things that's not talked about in this book is blockchain stuff and blockchain also has to deal with a lot of these pieces like who's the leader and what's consistency. And I think that community is very pedantic around certain guarantees which is why it works a certain way. But there's also like

And there was also a of criticisms and I completely understand those. But if you look now that there's like some cooler new databases, there's one called Tiger Beetle. I don't know if you've heard of this one. ⁓ It is a double entry, like a double ledger, sort of double entry accounting style database that allows you to do high frequency financial transactions. And it's just a super purpose built database specifically for doing like financial stuff and

Carter Morgan (40:52)

Haven't heard of that.

Nathan Toups (41:12)

I love it because they really write a lot about what the trade-offs are and how you should do distributed data and how they can, they're know, Jepson from day one type folks who make sure that the claims that they're making are true. And I don't know, the databases that are out there now are so cool. Like, yes, there's Mongo, yes, there's DynamoDB, those have been around, those are in this book. But there's also like, yeah, Tiger Beetle. Foundation is in here as well. There's ⁓ weird stuff too, like DuckDB.

There's, mean, like, if you wanna get into some weird, obscure databases and, like, cool trade-offs and guarantees, there's not, just like with programming languages, there's not, like, a better time to be alive. Like, there's just so many cool databases out there. ⁓ So, anyway.

Carter Morgan (41:56)

Yeah. Well, I want to

make sure we move on to ⁓ the next section, is partitioning. But I think I did mention the three anomalies. And so I do want to make sure we cover all three. Let's see. I have.

Nathan Toups (42:09)

yeah, yeah. And we

might have to speed run partitioning unfortunately, I, transactions is also one that I definitely want to at least speak on a couple of things.

Carter Morgan (42:15)

yeah, yeah.

Yeah, the last one was consistent prefix reads, which is the idea that ⁓ you let's say it's like an email system, like someone sends an email, the next person replies to that email, but it's possible that ⁓ the first email, the second email get written to different nodes and so, or they get written to the leader and then replicated at different speeds across the different nodes. And so you might wind up on a node.

that has the response to the email, but not the original email itself. And so that causal order is violated. So the solution there would be to write causally related data to the same partition. Again, easier said than done. Let's talk about partitioning and save plenty of time for transactions. What stands out to you here, Nathan, for partitioning?

Nathan Toups (43:07)

Yeah, so, you know, partitioning is, there's a lot of terms for this. A lot of people call it sharding. You know, I think the book calls out that like Bigtable calls them tablets, HBase calls them regions, Cassandra calls them VNodes. He just kind of settles on the term partition because I think it's more, it's a better understanding of it. This is where you've decided to carve out your database so that they operate independently on a set of

a subset of its data, right? And so one of the things that he talks about is that, for instance, you might have a strategy on spreading your data evenly, quote unquote, but you should think of it like how, this is gonna date myself, but when I was growing up, made all these infomercials, and inevitably, your family would buy an encyclopedia set, right? It happened at some point.

Carter Morgan (44:00)

Yeah.

Nathan Toups (44:03)

and we get the encyclopedia set and you know, some of them like A and B might be clumped together because they're relatively small. S might be spread across three volumes because there's so many things in that letter and then like X, Y and Z might also be in one little volume. That's a sharding algorithm, right? Like the encyclopedia company realized what's the right way of balancing the data across a set of books. ⁓ I thought that was a really great way of thinking about, I mean, again, if you've never seen an encyclopedia set, if you're too young, ⁓

Carter Morgan (44:32)

Yeah.

Nathan Toups (44:33)

But imagine you have the world's knowledge and you want to spread it across a set of books in alphabetical order, you're not going to make an A book, a B book, and a C book because some of those are be really, really thin and some of them are gonna be super thick. ⁓ And so when you decide to partition things, you have to think about what's the best way to partition my data, right? What are the advantages and disadvantages of ⁓ doing queries? And of course, it introduces a lot of things like how do I do a join?

Carter Morgan (44:46)

Right.

Nathan Toups (45:03)

on a set of data across shards, right? You know, these are interesting and weird problems to think about.

Carter Morgan (45:12)

And then you have read problems too, like the hotspot, which is, know, they point out like, uh, Twitter, like you have a celebrity that's very popular. And so lots of people want to read that person's tweets or look at that person's profile. That's going to put a lot of strain on one of your shards. so, uh, he recommended, um, you can add a random suffix to split the hot key across partitions. so correct me if I'm wrong, my understanding here is basically that like you would store that celebrity on multiple partitions and then.

That way, ⁓ when someone queries that celebrity on the application side, you could attach one of those random suffixes. so you might get that celebrity from one of the different partitions based on kind of like the application random logic. Is that correct?

Nathan Toups (45:59)

Yeah, and again, there's like lots of strategies for this. There's also like, this is where if you look into it, like modulo arithmetic gets really kind of cool because you can actually say, hey, I want my data evenly spread. Let's say I have a cluster of 15 nodes and I want it evenly spread at least in three places across the set of 15 nodes. How do I balance this in a way? And so there's like, how do I make my partition key?

Carter Morgan (46:15)

Mm-hmm.

Nathan Toups (46:24)

And so that if any one node goes down, I still have data availability on at least two other nodes, right? And how do I rebalance this? And if I need to add five more nodes because I need more data, what's the algorithm that I'm using to rebalance this and have discovery of where the partitions show up? And you still run into all kinds of like, there's all these like really interesting problems that come up. again, the book frames it so well. You're like, yeah, but what happens if this happens? What happens if that happens? ⁓

Carter Morgan (46:38)

Right.

Nathan Toups (46:53)

Secondary indexes is a perfect example, where you might actually have a really nice partitioning scheme on a primary index. So I can easily find the data on a primary index. But a secondary index, which is another way of looking at the data, right? So if we, for the uninitiated, I might have, you know, user IDs. And then if I look it up efficiently by user ID and like timestamp or something, let's say I have one with a sort key and a user ID, but maybe all of my users also have like,

lifetime score. And I really want to also have indexed by ⁓ user ID in, ⁓ sorry, that's not a good one. So maybe I want to index by, you know, game ID. So every user who has played a set of games and I want to like be able to index by the games and then look up the number of users in those games or the users in those games. The way it would balance game IDs and user IDs are not necessarily

They're ones of one to many relationship. There's all these like other things to consider. And if I'm rebalancing users, now game IDs can have all these weird other issues, right? The hotspots and availability and finding stuff. ⁓ And so it gets really strange to think about ⁓ how to balance those two. at the time of this book, and I think still now, it's not an easy problem. He like talks about

Carter Morgan (47:52)

Right, right.

Nathan Toups (48:18)

all of the caveats of like, ⁓ if you do this and you have a secondary index, this secondary index is gonna be eventually consistent. I think you talked about this with like DynamoDB, where DynamoDB guarantees some sort of performance on a primary index, but on a secondary index, there's like some trade-offs that you have to make to make it work. ⁓ Yeah.

Carter Morgan (48:38)

When

whenever we talk about partitioning and I feel like this comes with like systems I interviews the question is always like, okay, what are we partitioning? What about resharding, you know, and you've got to divide the data again because that can be a really intensive process and ⁓ if you do it wrong can ⁓ disrupt your application. It's like, okay, well, how do you handle that? ⁓ One solution he gives is at the beginning to actually create a predetermined number of shards like a thousand and so you

You have a thousand shards for the application, but those are not distributed across a thousand different machines. If you have two machines with your two databases, each one would have 500 of the shards. And then when it, let's say you want to add a third one. Well, now each one has 333, a fourth one, each one has 250. And so you're migrating the shards. The shards kind of serve as these autonomous units, these holders for your data.

and then you're migrating those across. And similarly, if you need to scale down, you can also group them together. I thought that was interesting. Is that how Kafka works?

Nathan Toups (49:43)

Kafka, this is very popular, yeah,

Kafka works like this too. And they do, they actually encourage you to make this arbitrarily high number ⁓ because yeah, you might need to horizontally scale out your cluster. ⁓ then there's all this, we'll get into this idea of serialization and stuff too. You can easily serialize by shard. You cannot.

easily serialize across shards, right? And again, it becomes this eventually consistent sort of distributed database piece. We either have to create locks that go across shards and could be multi-region in some cases. ⁓ Serializability is one of the things that you do with data flow and a bunch of other stuff that's going on. But yeah, how we partition, and again,

I will say we're very lucky. Kafka, for instance, is super mature at this point. was sort of new, more cutting edge technology in 2017, and now it's ubiquitous. A lot of people are even just picking tools like Kafka instead of traditional queuing services like RabbitMQ and stuff, because Kafka not only does give you queuing services, but it gives you what they call event sourcing, it's all these cool things that you can do.

based off of these sort of sharded replay things. There's always nice characteristics to it. ⁓

Carter Morgan (51:01)

hilarious

he was like choosing the number of shards is very difficult task and too many is too much it could be bad and too little can also be bad good luck with that like okay hey

Nathan Toups (51:06)

Right.

Yeah, exactly.

It is true though. And I think that these are non-trivial things. Again, luckily, this is where modern blogs are really nice. Most really cool companies that you might respect are going to share in public how they were able to scale from X to Y and how they set up Kafka to do it. there's even modern things. There's Red Panda, which is a super efficient version of Kafka. There's Hosted Kafka.

Carter Morgan (51:28)

Right.

Nathan Toups (51:39)

⁓ Again, there's also, I think, have kind of largely taken off the planet scale and super base, both have like are hosted. Planet scale is VITAS, which is the database technology that it's the version of MySQL that YouTube used to scale out. And again, they have a really advanced charting mechanism so that they can horizontally scale YouTube. ⁓

But you can benefit from that. There's an open source database that allows you to shard like they do, and it's solved a lot of that problem for you. And if you don't even want to manage it, just use PlanetScale. And PlanetScale, again, you can use it as a small cluster. But if you get to a point where your data needs to get sharded and scaled out this way, you just use their system. And it does it for you. And you don't have to sit here and hand ring about your partition key balancing strategy and all this other stuff, because it's really smart folks have already done it.

Carter Morgan (52:35)

Right, right. Well, we wanted to save some plenty of time for transactions, which is all chapter seven. Nathan, you've been very interested to talk about this, so why don't you get us started?

Nathan Toups (52:45)

So actually, I think this might be the most important chapter in the book, at least as far as where I got. Maybe this also has to do with the fact that it's the last chapter I've read, and each chapter I keep thinking this is the most important chapter. But transactions, I think it's really fun to think about replication and partitioning, especially if you have a computer science-y background, if you wanna do cool engineering, you're like, distributed databases and all other stuff. But transactions are like,

this is a really meaningful and interesting chapter, regardless of if you choose to have a distributed database or not, right? Thinking about the, I'm gonna miss it, atomicity, right? The atomic nature of a change. ⁓ And again, this is where we get into ACID, right? And think this is very useful, because I didn't think that, I never thought of ACID as a loose marketing term.

Carter Morgan (53:28)

Yeah, yeah.

Nathan Toups (53:43)

which is how he describes it. I was like, ACID, that's like the gold standard. If you have an ACID compliant database, then you can have certainty. And he's like, yeah, not really. I thought that was, specifically he thinks that C, the C in ACID should not be there. I thought that was like.

Carter Morgan (53:48)

Yeah, right, right.

Yes, consistency.

have here, I don't entirely follow his logic, you he understands better than I

Nathan Toups (54:10)

His

hot take, think, and this is definitely his hot take, is that atomicity, isolation and durability are 100 % in the domain of databases. when a database is making a claim and the consistency is actually, I think he kind of argues that it's like an emergent property. Like if you have a database that handles atomic work, isolated work, and it has durability,

Carter Morgan (54:23)

Yes.

Nathan Toups (54:37)

that you can make consistency claims, but that those consistency claims are actually responsibility of the application. That is a.

Carter Morgan (54:44)

just just a refresher for our listeners who haven't heard of asset in a bit. it's optimist or atomicity, consistency, isolation, durability. Rough definition of this is atomicity, which is all or nothing. Basically, if an error occurs, you abort or roll back. You can't have a half finished transaction consistency. And we're going to talk about this in a bit. Application level and variance maintained isolation concurrent transactions don't interfere. So basically each transaction pretends it's the only one.

Nathan Toups (54:46)

Yeah. Yeah.

Carter Morgan (55:12)

and durability, committed data survives crashes. ⁓ As far as consistency goes, he says, the idea of asset consistency is that you have certain statements about your data invariant that must always be true. For example, in an accounting system, credits and debits across all accounts must always be balanced. If a transaction starts with a database that is valid according to these invariants, and then he writes during the transaction, preserve the validity, then you can be sure that the invariants are always satisfied. However, this idea of consistency depends on the application's notion of invariant.

as the application's responsibility to define its transactions properly or correctly so that they preserve consistency. This is not something that the database can guarantee. If you write bad data that violates your invariance, the database can't stop you. And then he says, thus the letter C doesn't really belong in ACID. Now I also have this pulled up on Chaggbt here. And again, Chaggbt, disclaimer, right? Chaggbt defines consistency, says, the database must enforce all rules that define what valid data looks like. These rules include unique constraints, foreign keys, schemas.

Says a transaction can't leave the database in a state that breaks these rules. So according to that definition, seems like acid would belong in when you're talking about a database, but I don't know, right?

Nathan Toups (56:20)

Yeah, and

I think his argument was basically, can't just, so I can delegate atomicity, I can delegate isolation, like I don't have to know the inner workings of those, the database really should handle those, but do you, as the application developer, understand what is the consistent state of this, and the database might hold consistency at a layer that it defines as consistent, meaning it's not doing things that are inconsistent, but that your application itself might,

Carter Morgan (56:29)

Right.

Nathan Toups (56:50)

existentially do things that are inconsistent with how you think the world should look. ⁓

Carter Morgan (56:54)

But I guess it just

depends on how you're right. At what layer are you defining that consistency? Because I can tell like a SQL database, can say this field is not allowed to be greater than 256 characters, right? And then if I try to make that right, the database will reject that and say this is inconsistent with the schema you've provided me before. But in the accounting system you provide, it's like debits must equal credits. No, of course your application can just write a bajillion debits and no credits. And like, that's not the database's fault. And so, you know, this is right.

Nathan Toups (57:21)

Right, and

I will say there's also some really interesting trade-offs too, because if you take the assumption that consistency actually is the responsibility of application, for instance, the way that VITAS works, ⁓ and if I'm mispronouncing that database, y'all roast me, ⁓ but PlanetScale, I'll put it that way, the way that ⁓ YouTube functions, they actually had to get rid of foreign key constraints.

Carter Morgan (57:47)

really?

Nathan Toups (57:48)

get sharding to work. And so the trade off is you have to handle an application whether one table is referring to a dead key from another table. And traditional database people are like freaking out. They're like this is the most disgusting thing I've ever heard. I can't do cascading deletes and all the other stuff that you would do in a traditional SQL database. But if you remove that, you actually can get...

Carter Morgan (57:57)

Interesting.

Ha ha ha ha.

Nathan Toups (58:14)

application performance that was just not possible before. And as long as you know that you have to build this in your application, you're taking control of the consistency, right? The inconsistency is I might be referring to a key that no longer exists. But if you take that off the plate of the database and just make sure that, okay, well, any operation I on the database is atomic, that the changes are isolated, know, the durability has to, and the integrity, that's the domain. So I think it is interesting to think about like,

And maybe the pedantic part here is what our expectations of the database should be. I think his argument is, if you just focus your database on actually being isolated and durable and atomic, that's really what we want. And that the consistency stuff's like a nice to have. ⁓ It's an interesting argument, basically, ⁓ is how I was looking at it. So, yeah.

Carter Morgan (59:07)

Well, we

get him with, we get him with transactions. Then we get into a whole bunch of sections, which were good for me to hear about for the first time. And that's a, because we get like actual serial execution, two phase locking, serializable snapshot isolation. mean, I'd be lying. Thank goodness. Uh, there was no exam associated with this podcast because I don't know if I could answer a ton of these with great confidence. Was there anything in the rest of the transition or transaction chapter?

that I really rang true to you and you wanted to make sure we were able to discuss.

Nathan Toups (59:41)

So one of the big things is, mean, knew serializability is a kind of a thing, it's like a major theme of this whole idea. Again, serializability meaning you guarantee that one thing is being done after the other, right? That's the whole idea is that when you have something that is making a snapshot.

a read snapshot so that my atomic transaction's internally consistent, but I'm not worrying about the rest of the database. All these weird things can start to happen. And he actually has like a really good breakdown of race conditions that you would never think about. That internally, a database might make it feel like you're doing something that's acid, but actually you're ended up doing something that has a side effect that can kind of catch you off guard. Perfect example of this is ⁓ he'll talk about

Carter Morgan (1:00:15)

All right.

Nathan Toups (1:00:30)

he has this interesting breakdown of read committed snapshot isolation and serializability as like three strategies for how to deal with ⁓ different types of things. Like for instance, a dirty read. And again, a dirty read is this idea that you're reading data that's not been fully committed. And so it's in this kind of like intermediary state. And if the database goes down and comes back up, ⁓ the end of your record disappears kind of thing, right?

and that I only wanna read from the database if it's read committed, meaning that it's ⁓ guaranteed that it's in the database and at that point I will take ⁓ it. dirty reads are prevented by all three of these strategies, like read committed, snapshot isolation. Snapshot isolation again is like taking a snapshot of all the transitive stuff that's in a database for ⁓ what you need. It kind of solves it internally and then...

puts it back in and then serializability meaning only one thing is being done after the other. And by doing this, I know that the correct order of operations is always happening. ⁓ He talks about these different anomalies like dirty reads, dirty writes, which is way harder to kind of think about. Read skew, lost updates, write skew, phantoms. There's all these like kind of ⁓ fun and juicy things. The only thing that addresses all of these weird anomalies is serializability. Meaning if you are

Carter Morgan (1:01:55)

Interesting.

Nathan Toups (1:01:55)

doing something that's not serializable, you can't guarantee that every possible race condition is in place. But again, serializability is a hard distributed systems problem. And so it's like, how do I minimize the number of ways that I can't serialize my data, but I also need to guarantee that something happens? ⁓ And so anyway, it's way more than we can get onto in the podcast. But I love it. This is the section that I kept having to rewind. Like I was on a run and I was like,

Carter Morgan (1:02:18)

Right.

Nathan Toups (1:02:24)

I would just like stare off into space while I'm running and I'm like, oh man. I'm like, I just miss, I had to go back and like re-listen to chapter seven. Chapter seven took me a long time to get through, because I kept having to go back. Yeah. it's something cool, juicy stuff. So you might have heard about pessimistic locking versus optimistic locking. Again, pessimistic is block if something might go wrong. And then,

you can, ⁓ is where you like proceed until you check the commit. there's, ⁓ again, there's trade-offs, you can waste a lot of CPU cycles if you do optimistic locking improperly. He kind of goes into the trade-offs on those. ⁓ Serializable snapshot isolation, way more than we can talk about right now. ⁓ Two-phase locking, okay. Two-phase locking has like fallen out of favor. And I would say that if you're doing a modern, if you're doing a modern system now,

Carter Morgan (1:03:16)

Interesting.

Nathan Toups (1:03:19)

you should, and you're using two-phase locking, you're probably either not using an alternative that's available to your database or you're on some legacy system. Two-phase locking is super expensive, it's correct. Two-phase locking basically says I would like a lock, stop the world until I do my thing, and then undo the lock. And then there's this two-part piece to it, ⁓ think it's readers block writers, then writers block readers. And so there's this two piece.

And if you're writing like a basic mutex type thing, you're doing two-phase, it's whatever. But there's now these optimistic ⁓ and other sort of like locking things that you can sort of preempt the system so that you're not having to do two phases. You can do it in a single phase and just either abandon or whatever. Those are much more performant, they're much better. ⁓ You're probably dealing with serializability in some level and... ⁓

I would highly recommend just learning how these guarantees, these serializable sort of lock sort of guarantees ⁓ work. ⁓ And I think your life will be a lot better. But spending time in chapter seven, sitting, even if you like your eyes glaze over in these other sections, even if your eyes glaze over in chapter seven, there's a lot to learn. Again, I learned a lot and I already care about this topic a lot.

Carter Morgan (1:04:34)

Ha ha ha ha.

Nathan Toups (1:04:42)

I was like, ugh, I've been doing this wrong, or I haven't been thinking about this the right way.

Carter Morgan (1:04:46)

Yeah. And I'll say with this book that like, it's not the kind of book you have to read cover to cover that if you just picked up and said, like, I just want to learn about transactions and just read chapter seven, you're going to come away with a lot of useful information. It's not like you have to, there's a lot of like, uh, base knowledge built up beforehand. Uh, there might be a little bit, but my guess is if you're the kind of person who says I want to learn about transactions, then you probably have the base knowledge necessarily to understand this chapter. I want to make sure we get to our hot takes this week. Uh, which

Nathan Toups (1:05:12)

It looks like we're gonna

have a disagreement. love it.

Carter Morgan (1:05:14)

Yeah. Well, let's start with some of the ones that

aren't disagreements. think most of the people, most software engineers don't need any of the advice in this book because at this is kind of all of the low level problems that a lot of the existing databases are solving for the people who designed the databases need a lot of the advice in this book. ⁓ and this book is much better for understanding problems.

and not necessarily for understanding the solutions.

And I don't mean that because it's like, uh, he doesn't offer any of the solutions, but oftentimes the solutions that he offers are the solutions that the database vendors took and implemented themselves. And so you might be reading this and they're like, Oh, I'll just use, you know, if I run a problem, a, then I'll use solution C or whatever. Um, but that's not necessarily the case often, you know, that solution might be built into your database. Um, and so I think understanding all of the problems that can occur.

with this distributed data is really, important, but more so so that you can understand why your database of choice, how your database of choice deals with that problem and, or maybe it has a built-in solution, maybe it doesn't have a built-in solution, and then you can start to go exploring those solutions. But also that's just kind of par for the course of the book written in 2017. The problems haven't changed, but what solutions exist for them have. So this is great for understanding.

the theory and the problems, but as far as if you are like designing a big system, you're like, I know, I'll read Designing Data Intensive Applications because that will have all the answers to my questions. This book might not necessarily have all those answers, but it will give you a much better understanding of all the problems you may deal with.

Nathan Toups (1:06:57)

Yeah, I think that's a really good point. And there's two sides to this. Number one is just use Postgres. I know that's become cliche, but it's true. Number one, Postgres is just more capable than it's ever been. And then I take it step further in, if you were going to use Postgres or something fancy, use the managed version. The odds that your investors, your startup, whatever your bootstrapping, cares that you

Carter Morgan (1:07:05)

Yeah, yeah, right.

Bye.

Nathan Toups (1:07:26)

self-manage and self-host some database and that you've got this backups locked down and everything else is probably not true. And that the modern database infrastructure that's out there is so cool. like take advantage of the fact that you're standing on shoulders of giants. yeah. Let's, yeah, so I skipped over it earlier and there's a caveat and I understand where you're coming from. Hear me out.

Carter Morgan (1:07:41)

Absolutely. Well, you have a hot take.

Okay.

Nathan Toups (1:07:55)

stored procedures have their place. I know everyone wants to hate on stored procedures. There's really been awful, awful stories about, for the uninitiated, stored procedures are extra functions that you can put, in functionality, can put in your database. I hate.

Carter Morgan (1:08:11)

but it gets triggered

based on some database operations, right?

Nathan Toups (1:08:15)

that

gets triggered on some database operations, meaning that you're doing transformations to your data outside of your normal change management process, and so therefore the database becomes harder to reason about. I understand why people hate stored procedures. ⁓ I have one use case that I will defend. ⁓

Carter Morgan (1:08:30)

Okay, one, because I have

my notes, put, hear me out, procedures have their place, to which I replied, store procedures are the devil, Nathan, because I just think like, anytime you're modifying that data, I don't like the lack of a paper trail. I don't like, right. So what's your use case?

Nathan Toups (1:08:44)

I completely agree and I will tell you,

I, like Postgres store procedures and stuff, they are a code smell, I really do avoid them. There was one time that I really needed an atomic operation in Redis and so I'm gonna bring this one, I'm kind of throwing this one out there. Redis has the ability to write Lua where you can store your Lua script.

inside of a key value store and then call that function from its key, it's like the content hash of the function in there. And so you actually, that Lua code lives in your code base. So in this case, you would initialize it, it lives in your, in my case I was using Python I think at the time. Or no, I was using Go, I was using Go. So I have my little Lua script and it gets serialized and shoved into Redis. And I needed to do something on the, so Redis is single threaded.

which means you have serializability, right? So it's a nice little, if you're using Redis on a single server. And I needed to guarantee that a timestamp didn't drift outside of a time to live allotment that I gave it. And Redis has the, I could have done this application side where I had to like check it, pull down the Redis value, you know, do a lock, pull down the Redis value, look at it, and then send it back up if it was a thing or atomically,

I could have a function that says allow this to overwrite the previous one if these time constraints are in place. And because Redis has Lua and it can atomically evaluate a timestamp because it can like deserialize JSON or whatever inside of the thing, I was able to just make us really like five line code check that was a stored procedure inside of Redis that said if this thing is true of some guarantee inside of the data packet itself,

then allow this to overwrite or write a new or overwrite a key. And it made my system so much more performant and I was able to take advantages of Redis now. I think that there is a, this is the, he kind of advocates for the fact that stored procedures can have a value and I was just like, I wanted to put it out there because I know you're correct. They are the devil. They really are the devil.

Carter Morgan (1:10:58)

Right.

Hahaha.

Nathan Toups (1:11:06)

But there was this one case where I was like, okay, actually, if you're looking at this database and you're looking at transactions and you're really wanting to have provably correct serializable data event, this was really nice. I guess that was my point.

Carter Morgan (1:11:21)

All you have the one scenario in human history where stored procedures work. And that's, you should know everyone, the amount of preamble Nathan needed to give to justify a stored procedure, that's how much preamble you need to give. Your stored procedure is not good enough unless there's that much preamble.

Nathan Toups (1:11:33)

I did.

It's true, no, no, no.

It is defensible, but I would say you're probably wrong, and I might have even been wrong, ⁓ but I did this in a single Redis transaction, yeah.

Carter Morgan (1:11:44)

Yeah, let us know.

In the comments, if you you find Nathan's stored procedure egregious as almost all stored procedures are then You can well, let's talk about what we're gonna do differently in our career as a result of having read this book I can go first. I I had here look at MongoDB sharding strategy just because I don't even know how that's working under the hood But I think in general I need to read like the primer on How MongoDB works underneath the hood? We use it a lot

We do not understand it as much as we should. And so I need to kind of step up and take some ownership there and become the subject matter expert, at least on the underpinnings. We do have an engineer who's very, very good with understanding like ⁓ muggle query language and how that works. And thank goodness, because he saves our bacon plenty of times. How about you, Nathan?

Nathan Toups (1:12:38)

Yeah, so I want to spend more time with some CRDTs, ⁓ specifically AutoMerge, because I think it would just, number one, I have so much respect for Klepman, know, seven chapters into this book. And if this is something he's decided to take time and energy and think about, I think it would behoove me to not ⁓ spend time at least learning with the whole idea behind local first and kind of weird and cool things you could do with a system like AutoMerge.

Carter Morgan (1:13:09)

Very nice. Well, who would you recommend this book to?

Nathan Toups (1:13:14)

So same as last time, so I said this is a book for software engineers that are deeply curious about systems architecture, but I'm gonna add a little asterisk to this, is that you really have to be committed to get this far into the book. And we're just over halfway, right? I mean, we got some work ahead of us, so.

Carter Morgan (1:13:26)

Yep, yep.

Yeah, I think few engineers would find this book incredibly practical. And honestly, there's a lot of this book is so well known in the industry. There's a lot of like content produced around designing that intensive applications. This podcast being one of them, but like, if you go to YouTube, like they're going to be channels who are like doing their kind of summary of designing that intensive applications for most engineers. That's what I would recommend first. Like just to, you know, get the, ⁓ designing that app.

That intensive applications for dummies version first ⁓ But you know what? There's a real badge of honor to reading this book and you know, I my co-workers all know about the podcast and ⁓ Half the time more than half time probably eight percent of the time they ask. what book are you reading this week? And I'll say it they're like I've never ever heard of that book before This one when they heard I was doing the signing that intensive application. I was like, wow, you're tackling that one and so

There's a real badge of honor to reading this book. And so I think if you want that badge of honor, this is one of the very few books in our industry that most engineers, ⁓ know and recognize. So, Hey, go out and claim it. And you can be like us absolute sickos who read this book.

Nathan Toups (1:14:45)

Yeah,

I will say you also if you're just somebody who loves the like playing around with distributed stuff. So if you're a programmer who's also doing like DevOps type things and you're playing with systems like Kubernetes, this book's super valuable. If you're doing data pipelines that have, you know, data lake, large scale, if like if you're the same type of person who read crocking concurrency and got a lot out of it, this is sort of like a compliment to this from a infrastructure and data.

larger view standpoint ⁓ of a whole domain of problems that are just, I don't know, if you find these kind of things fun to think about, you should absolutely read this book. I'm kicking myself for not reading it sooner.

Carter Morgan (1:15:26)

This consumed a lot of mental energy listening to this and it's minorly related, but have you ever played the game Bellatro or heard of Bellatro?

Nathan Toups (1:15:36)

No, ⁓ no, mentioned, did you mention this before? Somebody in Discord mentioned you in this book.

Carter Morgan (1:15:37)

Blah.

Yeah.

Yeah. How do I have I mentioned this the longest before? I love Bolotro. I'm obsessed with it. It is. It's it's a it's a rogue like deck builder game around poker. And so the idea is that you have to earn a certain amount of chips in each round and you earn those chips by playing poker hands. But you can upgrade your deck and you can upgrade your hands and you have these joker cards that sit at the top and they'll give you points. So it's like, you get

plus 30 chips or whatever for if you play a spade card or something like that, right? And so it's all about kind of like constructing your deck and then organizing your jokers. Anyhow, but the order of your jokers matters a lot. And so I'm trying to get, I have 100 % of this game. I have gotten every possible thing you can do, except you can literally get a score so high because your score in Bellotro, it's your chips and a multiplier that it breaks the game and your score becomes non E-inf. And so it's your chips number is not a number and your

⁓ multiplier is exponent infinite and you know, you just it breaks the game. And so I am trying to achieve this and there's a very specific strategy you have to do. have to get specific jokers and order them in specific way. So anyhow, I'm trying to do this and in Bellatro you can run into these challenges where like it'll it'll give you like a it'll debuff you in some way. And the challenge I had was it flipped all my jokers upside down and shuffled them. And my order was so important that like I had to glean I had to play one hand

and see how the jokers execute. I recorded this. And then I had to watch the recording and like, me and my eight year old son, we like made index cards with like our jokers, what they were and like laid them out on the table. And like, we had to do like this operation to like reorder them in the exact right way. And it was like doing brain surgery. Anyhow, it was ridiculously complicated. We spent like 45 minutes trying to do this to play a single hand. And this was after listening to Design Dad intensive applications all week. I told my wife, like, I'm calling out of work today.

Like I cannot I have no more reasoning ability left and so anyhow minor belongs retention at the end of the episode but Yes, this book will consume a lot of your brain power and it's going to continue to do a lot of our pain power over at least the next two episodes and maybe three we reserve the right to make this a five-parter depending on how Intensive the book gets But but thanks for tuning in folks stick around for the rest of it ⁓

Nathan Toups (1:17:41)

All

Right.

Carter Morgan (1:18:03)

And yeah, you can always contact us at contact at bookoverflow.io or on Twitter at bookoverhoapod. I'm on Twitter at Carter Morgan, Nathan, and all of his work with his consulting agency is at rohoroboto.com and his newsletter for that is at rohoroboto.com slash newsletter. Thanks for tuning in folks and we will see you later.

Episodes in This Series

Ep. 97Reliability, Scalability, and Maintainability - Designing Data-Intensive Applications by Kleppman

Jan 19, 2026

Ep. 98Replication, Partitioning, & Transactions - Designing Data-Intensive Applications by Martin Kleppman(This episode)

Jan 26, 2026

Ep. 100 Time is an Illusion - Designing Data-Intensive Applications by Martin Kleppman

Feb 2, 2026

Ep. 101The Ethics of Data-Intensive Applications - Designing Data-Intensive Applications by Martin Kleppman

Feb 9, 2026