Ep. 100Monday, February 2, 2026

Time is an Illusion - Designing Data-Intensive Applications by Martin Kleppman

Chapters 8-10

Watch on YouTube

Book Covered

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Get the book →

Book links are affiliate links. We earn from qualifying purchases.

Author

Martin Kleppmann

Hosts

Nathan ToupsHost

Carter MorganHost

Transcript

This transcript was auto-generated by our recording software and may contain errors.

Nathan Toups (00:00)

This book is dense and I will tell you that I am holding on for dear life. I'm loving it. I mean, it is it's it's like I'm in a CrossFit class, you know, and I'm doing, you know, overhead press and, you know, a bunch of burpees or something. That's what my mental gymnastics are going through.

Carter Morgan (00:04)

You

Hey there, welcome to Book Overflow. This is the podcast for software engineers by software engineers where every week we read one of the best technical books in the world in an effort to improve our craft. I am Carter Morgan and I'm joined here as always by my co-host, Nathan Toops. How are you doing, Nathan?

Nathan Toups (00:37)

Doing great here, everybody.

Carter Morgan (00:39)

As always, make sure to like, comment, subscribe, wherever you're at, listen to the podcast. Give us a five-star review on an audio platform. Share with your friends and coworkers. All of this helps the podcast grow and helps us keep making it. You always book time with us on Leland if you want some one-on-one coaching with Nathan. And you should also join our Discord because it is really, really fun in there. We're at about 200 members. I am having a great time. I love seeing all the people come in and particularly we have an introduce yourself channel, which I'm loving.

I think it's so cool to see what everyone does and how they found the podcast. ⁓ and I, know, Nathan, you can give me your thoughts on this, but like, think we have the seeds of something really cool here going because I think finding quality software engineering discussion can be challenging. kind of your Twitter is just going to be full of hot takes. LinkedIn is full of maniacs. ⁓ the, ⁓ Reddit, sometimes you can have good stuff and like experienced devs, but

Lately, it's all just been like AI doom and gloom. And so I think having a forum that is full of other engaged software engineers. And if you're a software engineer listening to this podcast, I think you're probably pretty engaged in, ⁓ bettering your career. Like I think there's something really special going on. What's your, you have the same feeling, Nathan?

Nathan Toups (01:53)

Yeah, I've been pleasantly surprised at how cordial and civil and nice. And I mean, I don't want to toot my own horn, because I think you're out of control. ⁓ But I think it's a lot of the tone and temperament of this podcast. who've enjoyed the podcast and have decided to get on the Discord, that's a subset, if you think of it in a funnel. ⁓

Carter Morgan (02:00)

Yeah, yeah.

Except for me.

Yeah, right.

Nathan Toups (02:21)

They've been great. It's just been really nice to have these pleasant conversations. And ⁓ so maybe it's because we've cultivated a culture around that. think if you're attracted to the tone of how we talk about long form topics, or if we have a culture of wanting to help each other out, that's really what I'm seeing in the Discord. We have a career section or job search, and we have a general section. We've been actually...

Carter Morgan (02:40)

Rhyme.

Nathan Toups (02:48)

If you come join the Discord and you have an idea for a new channel, I've just been like spinning it up. like, well, some of them are active and some of them are kind of a ghost town. We're just experimenting. And so if you want to be part of that, please come join.

Carter Morgan (02:53)

Yeah.

Yeah. So I think, for anyone who's kind of felt like. I wish I had a community. wish I, you know, a tribe, ⁓ you know, ⁓ I saw it had mentioned in the channel. Like I, I, I listened to the podcast because I want to improve my career and I feel like I don't get kind of the quality of technical discussion at work. And, and so the podcast has been really helpful for that. Well, come join the discord because you're going to find lots of other people like you. ⁓ so I know, ⁓ that's, that's one, another plug for the discord. I made sure to give it extra time today, just cause I've been having a ton of fun with it. ⁓

But you should also check out our interview with Austin McDonald, which should be live when you are listening to this. What a cool guy. ⁓ he's the author of mastering behavioral interviews. He's someone who we had known was going to come on the podcast. And so I've been really looking forward to it. ⁓ yeah. And we've talked about on this podcast before, like, how have we done this for almost two years and never had an episode discuss it or devoted to the behavioral interviews. So not only did we get to read the book, we got to interview Austin, which was super cool.

Nathan Toups (03:59)

Yeah. And I'll tell you, yes, you might want to pick this book up if you're in the job search or maybe you've had some frustrations with, ⁓ you fine-tuned your behavioral interview strategy well enough? And I would argue you probably haven't, especially after reading this book. I'm a big believer that there's more tuning that you can do. But this book's also generally applicable to other aspects of like,

learning how to tell a good story about your journey in your career, whether this is like, you know, part of your annual review process or just kind of understanding how to stick up for yourself at work. And we talk about this in the discussion. Right. So I think if you're just you're kind of on the fence of is this useful to me? Austin has just such a great perspective. He's got a great ⁓ presence on on video and audio as well. And I don't know, it was was one of my one of the highlights of interviews that we've had for sure.

Carter Morgan (04:54)

Yeah. Well, really, really cool guy. ⁓ yeah. So check that out. And when you get into the meat of today, which is designing data intensive applications by Martin Kleppman, this is our long, long journey. ⁓ this is episode three of four. I think we know it's going to be four now, right? Nathan.

Nathan Toups (05:13)

We're locked in, there's two chapters left. They're hefty, but there's two chapters left after this.

Carter Morgan (05:16)

Great, hefty chapters.

Okay, great. ⁓ So this is part three of four. What was this week? This was chapters eight, nine, and 10, right? Does that sound right?

Nathan Toups (05:29)

Yeah, do we ⁓

before? Yep, it's at chapters eight, nine, ten. We're rounding out part two and we're getting started on part three. There's only three parts in the book. ⁓ These get hefty before we dive in. Do we want to do the author and book introduction?

Carter Morgan (05:32)

Yeah.

Yeah, let's, let's

do the author and book introduction. Um, for anyone who's tuning in for the first time or just needs a reminder. So this is written by Martin Kleppman, who is an associate professor at the University of Cambridge, where he works on distributed systems and local first collaboration software. Before academia, he was in the trenches. He co-founded Reportive, which was acquired by LinkedIn in 2012. Where he works on large scale data infrastructure. He's also one of the people behind AutoMerge, an open source library for building collaborative applications. And the book introduction is.

Data is at the center of many challenges in system design today. Difficult issues need to be figured out such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL data stores, streamer batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppman helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing.

but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice and how to make full use of data and modern applications. So like we said, this is chapters eight, nine, and 10. I just want to get those names for you so you have an idea of what we're talking about. Chapter eight, the trouble with distributed systems. Took us until chapter eight, I guess, to start talking about that. It's funny because the whole book could just be titled The Trouble with Distributed Systems.

Nathan Toups (07:07)

⁓ for sure.

Carter Morgan (07:09)

Chapter nine, consistency and consensus, and chapter 10, batch processing. And chapter 10 is the first part of part three, which is derived data. Okay, ⁓ so we're now about three quarters away through this book. Nathan, give me your thoughts on ⁓ this last week of reading.

Nathan Toups (07:28)

Carter Morgan (07:33)

You

Nathan Toups (07:46)

Yeah, this was an interesting one. We didn't get a clean cut across ⁓ parts. And so there's a little shift in tone. But again, this book so well.

structured that you understand why. And I really like it that we know why we're going where we're going as we get there. ⁓ Chapters eight and nine round out part two. We really get into some juicy parts about distributed systems and thinking about things. There's things that I think we'll be able to touch on, like Byzantine fault tolerance and some other stuff that ⁓ I like to think about a lot. And then the part three sort of rounds out.

the sort of rest of the book as far as like how data starts flowing and patterns around that data flow. ⁓ We start with batch processing, it's gonna end up getting into stream processing and other stuff, which I'm pretty excited about. I've worked in this data pipeline type world. ⁓ But I like, again, I loved, we'll get into this, like, he takes a little like step and goes, before he gets into batch processing in chapter 10, he has this whole section on the Unix philosophy.

which I think would resonate with anybody who's been reading any of the software books that we're into, because he just kind of goes, hey, let me take a step back and talk about this amazing thing, and then makes it his thesis on why batch processing is actually a natural extension of this in distributed systems, and I just never thought of it that way. Yeah, and so again, even if I'm kind of going back and relistening to parts or being like, man, I don't think I comprehended everything that he said here. ⁓

Carter Morgan (08:56)

Yeah.

Yeah, that's a good point.

Nathan Toups (09:24)

I really appreciate the sort of like, there's almost like a poetry to the way that he's writing it. And I do think you can look at these systems and go, oh, this is beautiful, even if I don't fully understand what I'm looking at. So yeah.

Carter Morgan (09:36)

Yeah, it's a, mean, this book is a beast. Like it has a reputation as being a bit of a beast and that, that certainly holds up on a read. mean, our episode notes today are 27 pages long. They usually are like nine or 10 and granted, how are we making these notes? This is all AI, right?

Nathan Toups (09:58)

Yeah, so it's a mix of like, I'll tell you, it's like AI augmented. I'll end up, if I can get a scraped copy of the book, I paid for all the books and stuff, but if I can get a scraped copy, I will use this. ⁓ I have like an interactive chat window. actually, ⁓ I'll tell you, I actually forgot to delete a section, so I think we're actually at maybe 19 pages and not 27. I forgot to delete one from last week, so a little bit better. ⁓

Carter Morgan (10:02)

Okay.

Okay. Well, you know, that's

that's better.

Nathan Toups (10:25)

But basically

what we do is we use this to summarize chapters with key points just to kind of remind us of where we are. And then we also have quotes that we've been looking up ourselves. But this has been evolving over time as the tools have gotten better. Like when we started this a year and a half ago, I tried doing this and just completely disregarded. It was very manual. And our notes were pretty sparse. Now it's been awesome because, yeah, we were able to dive in. Maybe we...

Carter Morgan (10:32)

Right, right.

Yeah.

Nathan Toups (10:53)

It'd be cool to do like sort of a meta episode of like our process or how we do this, because I think it would be kind of fun. yeah, it's like we're increasingly have agentic AI stuff. actually have a skill that I built to help make chapter markers in the style that I like. actually, I last week was the first one I did a one shot on, which is pretty cool. But it was like months of me tuning this like skill. Yeah, yeah, anyway.

Carter Morgan (10:57)

Yeah, let us know in the comments if that's something you'd be interested in or the Discord.

nice.

There you go.

Well, so yeah, it's long, it's dense, ⁓ just in the spirit of honesty and transparency with the pod or, know, with our listeners, ⁓ this week has been particularly crazy for me at work. It's been a good week. ⁓ we've just had, we're scaling, which is good. And, ⁓ but we're, we're starting to see cracks in kind of the original system, maybe the original engineers built. And so my job has been to fix all of that. And leadership has been excellent and we've all been able to kind of swarm on this problem.

⁓ but, ⁓ my head space has been there and I haven't been able to give this maybe as much attention as I would have liked. I've listened to everything, but I also would say as far as the audio book goes, ⁓ I, I can't recommend just the audio book for this. think the audio book is great. And we talked about this last week, Nathan, you, said like, you'll pause it and sometimes you'll, you'll read the physical copy, or I guess the, you know, the text copy, ⁓ yourself when you get to a part you don't understand. ⁓

We've had plenty of books on this podcast, which were great for audio books. And I don't think I would have gotten anything more meaningful out of reading the, the text itself. obviously kind of, you're more like, you're, you're less technical books like radical candor or made to stick are great for that. But even some technical books like the DevOps handbook, ⁓ was a really, really pleasant read, ⁓ or I guess listen on audio books. So, we've had lots of great audio books here. ⁓ this one.

I'm very glad it comes in an audio book, but I think if you really want to truly understand and grok this material at a deep level, audiobooks not going to cut it. so I have a chat, GBT window hold up and I've just been looking up in these notes and like, okay, wait a minute. Like I remember we talked about, know, this talk or the book talked about this topic, but I remember exactly what that was. And so, you know, it gives you a good jumping off point gives you a, some good

surface level understanding of a lot of these things, but I think if you really truly want to deeply understand this book, you can't do the audiobook. So take that for what it's worth, but we still have plenty of great stuff to discuss today and are really excited for what we got going. So what's we can get into it.

All right, well, let's start with chapter eight, the trouble with distributed systems. Lots going on here. I think we were both partial of the fact that our clocks are lying to us. This book spends quite a bit of time telling you that your clock cannot be trusted, which I thought was pretty funny.

Nathan Toups (14:22)

Yeah, it's funny, I always, whenever I was doing architecture, I would always say, very obnoxiously, but I would always say time is an illusion. Like I would just say time is an illusion, there's all these problems with it. If you could make something that had to do with like the atomic counter of something over referencing some point in time in a distributed system and then try to reconcile which thing happened first, I knew sort of intuitively and I knew from certain aspects of time

Carter Morgan (14:32)

You

Nathan Toups (14:52)

⁓ Time Drift and some other things. But this book actually scientifically and mathematically shows why you can't trust clocks, ⁓ especially in distributed systems and all the weird problems that can come up from this. ⁓ I think he actually made some really interesting ⁓ framing here, which he talked about how like, even on a single system, right, even if you have a single computer, ⁓ you can't trust the clock.

unless you're using specific types ⁓ of clock systems, ⁓ because systems can pause, because you can pause a process and resume, now that we have multi-core processors and we have all these other things, you actually have a lot of the problems of a distributed system in a single system. we don't have to get into all the details here, because there's a lot to cover in these next three chapters.

But one of the things I loved was he talked about cloud computing versus supercomputing and how like supercomputing is like, it's a big distributed system, but it's built like it's one big system operating by itself. And so therefore, you know, it has this idea of like a checkpointing system. It can stop and start words coming from it still behaves like one big computer versus cloud computing where it's a bunch of pieces that have no very loosely coupled.

Carter Morgan (16:11)

Right, right.

Nathan Toups (16:17)

And if you have a very loosely couple of distributed system where the systems are all sort of acting autonomously as their own actors, ⁓ it gets way more complex of like, what does a partial failure look like, right? Like when I own a computer, my program might crash and whatever, I'll start over or maybe pick up wherever the write head log was or whatever. But in distributed computing, we can have these partial failures where like maybe aspects of the caching layer failed, but it wrote to the disk and then like,

Okay, well, where's the checkpoint? Where do I pick up? And if there's 300 players doing something. And again, the book does such a good job of like framing all of these. What we a lot of times would figure, think of as like edge cases. And he's like, these aren't edge cases, especially at scale. You're having to deal with this stuff all the time. Yeah. ⁓

Carter Morgan (17:08)

Yeah, I thought it was interesting. He points out that really clocks have two uses or I guess time in general is two uses. Like there's duration. How long did something take? And then there's points in time, you know, when did something happen? And you might want to use two different kinds of clocks for that. Like a, you know, he calls them the time of day clocks, which is like, that just returns what time the clock perceives it to be right now. But even then those can get out of sync. ⁓ and like,

Nathan Toups (17:16)

Mm-hmm.

Carter Morgan (17:37)

I get like, it has crazy terms here, like quartz crystals drift. basically like 17, a clock can drift for about 17 seconds per day. ⁓ it's like a VM, yeah, honestly on computer, VM clocks can pause during migrations leap seconds. Well, forgot we had to worry about leap seconds. ⁓ but then he also talks about kind of like monotonic clocks and these are clocks that don't actually care what time it is. They're just, ⁓ they care about.

Nathan Toups (17:47)

Right? On a single computer. Right.

Carter Morgan (18:06)

how much time has passed, they can only ever increase. And so that might be the kind of clock you wanna use for, if you're just asking how long did something take? ⁓ But the more I learn about all this stuff, yeah, yeah.

Nathan Toups (18:18)

But

it's mind boggling, right? So like, yeah, time of day clocks are, again, it's how most of us are thinking about it. ⁓ And think about it, your computer is, let's say you're on a VM, and this was a good example that I hadn't really thought about. I'm on a VM, and let's say I'm on a physical host, and let's say we've over-provisioned this physical host, and I've had to freeze one of my VMs and move it to another physical host, let's say, right?

Carter Morgan (18:36)

Mm-hmm.

Nathan Toups (18:46)

⁓ There's a, in the mind of the application, time stood still during that migration, and then it wakes itself back up and then time bolts forward. There's this little gap. And it breaks a lot of assumptions, right? Even if you've written a good program, if you can just teleport forward in time, ⁓ you get into all these weird issues on like, okay, well, let's say you had a lease on a lock.

And you still think you have that lease on the lock because you jumped forward in time, but you actually jumped far enough ahead in time that your time to live for this lock went away. And so there's all these interesting things that ended up coming up. So he talks about how time of day clocks are unsuitable for measuring elapsed time because of all these weird things. Your clock skew network time protocol can actually make you move backwards in time. So let's say you've clock skewed five seconds into the future for whatever reason.

Carter Morgan (19:20)

Right.

Nathan Toups (19:42)

NTP endpoint gets hit. It says, no, actually, it's five seconds in the past. And you update this, your logs, right? If you're using a time of day clock, your logs will jump backwards in time, right? Which we know didn't actually happen, right? It's just your reference point moved. ⁓ Yeah, monotonic clocks always move forward. But even those have problems. And it was really interesting because he's like, ⁓ he talks about monotonic clocks. I'm like, cool. Yeah, this is great. And he's like, but you cannot compare them across machines. And I was like,

Carter Morgan (19:54)

Right.

You

Nathan Toups (20:12)

Very good point. ⁓ Also that ⁓ NTP, ⁓ Network Time Protocol, still has like, you still have physics, right? I'm gonna ask the Network Time Protocol server to give me an update in my clock, even if I guarantee I won't move backwards in time. ⁓ I'll ask it for an update to the clock, but there's like, there's an amount of time that the electrons take to pass over the wire and then come back to my machine. And there's going to be some small amount of skew

Even then, even when I do this update and get the same network time protocol across multiple machines, and it's just, there's no absolute value, right? Even if we were all coordinating to a network time protocol server, all of my machines are gonna be off by some small margin of error. And that was the part that like, I think he talks about clock synchronization problems, right? ⁓ And...

there's this, he gets into Google Spanner. guess maybe he kind of ties all this together. I don't know if people know, but like Google Spanner, it's kind of epic because everyone said, you can't have a globally distributed database and get clocks right, have this like guaranteed ordering of operations. And Google Spanner actually shows that it is possible to do this, but they have this like crazy infrastructure that almost no one else can reproduce.

where they're using the atomic clocks in each data center. They have all this GPS offset stuff. But what I didn't know was they actually do this like uncertainty interval where anytime a machine writes, and again, I might mess up some of the details on this, but anytime a machine writes, actually like discloses, it's the last time it's talked to like a network time protocol endpoint.

And then there's like the certainty interval so that it can calculate how much drift probability, like what's the probability of drift in this bell curve across when I think that something happened in time versus what another system does. And it can like do this sort of statistical analysis of knowing which event happened next ⁓ in some solvable way. And it's like, I was like, I need to read more about how their time system works because it's, it is, it's like, it's actually

quite amazing, especially with the way that the book frames this. This is non-trivial problem if you have to do things based in time. Yeah.

Carter Morgan (22:47)

It's funny. I've kind of stubbornly refused to specialize in my career. just because I don't know, I kind of like working on whatever I think is going to be the highest impact. so, ⁓ this past month or six weeks or so, that's been a lot of like platform engineering, site reliability, engineering stuff. Cause I felt like, okay, this is the biggest problem facing the business. ⁓ we're wrapping up some of that and, and I'm going to move on to building out some new features of the businesses where they wanted.

for a while and I think they're gonna be pretty high impact anyhow. ⁓ And so I like kind of being a generalist and then as I read this book, I'm finding my boundaries and I'm finding that if I ever wind up in a role where I am debugging clock synchronization, I'm gonna be like, that's it, something, this is too far. Like I don't wanna do this. Although who knows, right? Cause I think I've spent the past six weeks really, really invested in like.

open telemetry and improving our metrics processing and like our deployment pipelines and like, do we have clean handoffs between hosts? And I think if you had asked me three or four years ago, if that's what I would really want to spend my time doing, I would've been like, what? No, are you kidding me? ⁓ so who knows maybe four years from now, this will be my life. It's just making sure that all the clocks line up across all these different systems.

Nathan Toups (23:55)

Yeah.

I look at this in the sense of there are hard things that make people's eyes glaze over. And I think that most of the time we really don't need to spend our time thinking about it much. But every once in a while you get into some domain where you're like, wow, this really matters to the type of problem I'm trying to solve at work.

And that's the kind of thing where I'll just chase that down. I'll be like, okay, I really need to know why. And a lot of times for me, was, my background was SRE ⁓ in DevOps before that. So I really, I was mostly doing operational work and less back in programming. And I kind of got into programming more and more for systems programming over time, mean, not systems programming, distributed systems programming over time.

Carter Morgan (24:25)

Rime, Rime.

Nathan Toups (24:49)

and had to think about how the software layer works. But a lot of times it is, it's kind of like in graduate algorithms where we learned about NP-hard problems, right? And NP-hard are the non-deterministic polynomial time problems. These are the ones that as they get more complex, we don't actually know how long it will take to solve it. And some of them might actually be exponentially, you know, they're all exponentially difficult with the more variable changes, which means you really just can't even solve it in a reasonable amount of time.

Carter Morgan (25:06)

Right.

Nathan Toups (25:20)

A big thing that you take away from that class is, okay, I'm identifying that this is an NP-hard problem. I'm not going to try to brute force my way through it. This is not solvable, right? Like the traveling salesman is not, I can't make a 100 % correct problem. Okay, I should use the shortcut. That's the best practices version that gets me like good enough, right? Within whatever thing. I think a lot of these problems that we're getting into in chapters eight and chapters nine are really this. It's like, can I identify

Carter Morgan (25:40)

Rime, rhyme.

Nathan Toups (25:48)

that the reason we're struggling with this is because time skew is hard. And so therefore, I really shouldn't try to solve this problem myself. I should really look and see what the latest thinking in distributed database technology is. And that's what I should use, right? Like who else has solved this problem, outlined why it was hard, and then I trust them. ⁓ And then again, there's a few people who are like research engineers in really excellent C or REST programmers that really do.

want to like implement some new academic white paper on a new way that you could deal with time skew or something. Like that's the only kind of people that are really gonna be working on these problems. And I will cheer you on from the sidelines. I will, can, you know, I don't have to be, I can look at athleticism and be like, wow, you know, that's an amazing basketball player without being like, you know what?

I should really work on my three pointers because my three pointers are nowhere near as good as theirs. I can just be like, yeah, you're great. You're good at it. And I'm going to cheer you on to do that.

Carter Morgan (26:53)

Yeah, it's a, not to make everything kind of about AI. ⁓ but I think we've seen like a crisis of confidence among software engineers who like, I think kind of fundamentally viewed themselves like my job is to take the requirements and translate that into code. And just for better or worse, that's a commodity now, right? Like anyone can take requirements and translate them into code. ⁓ and granted, there's still a lot of effort to make sure that it's the right code.

Nathan Toups (26:59)

⁓

Mm-hmm.

Carter Morgan (27:23)

you know, it's performing and efficient and all that, right? But I think engineers who for a while have thought of themselves like I'm the kind of guy who solves problems ⁓ have been surviving this kind of AI transition a little better. And, know, I think, yeah, it's like you're saying, Nathan, like I'm looking at myself right now and saying, like, I don't want to be debugging clock synchronization problems, right? But if I ever found myself in a role and it was like, this is the most critical thing right now, right? Like,

This is the highest leverage ⁓ impact point. And we got to figure out how to get these clocks synchronized. Like, I bet I would find myself more animated by it. ⁓ who knows? Who knows what the future holds? If any of you have any jobs out there which need me to synchronize your clocks, let me know. Maybe we could work together. Bye.

Nathan Toups (28:00)

Right.

⁓ To

kind of round out, because I do want to give time for chapter nine and chapter 10. ⁓ A couple of things that also come up out of this, like when we start talking about how hard clock skew is and how do we deal with pausing and these other things. ⁓ I knew of them, but I never thought about them that much or why they're so important. But he talks about that locks aren't good enough, especially in systems that can pause, which is all systems that we kind of program on.

Carter Morgan (28:14)

Yeah.

Nathan Toups (28:39)

talking about what fencing tokens were. ⁓ Were you familiar with fencing tokens before this book? Because I was not. I was not, and I'm sure I've used them in systems and just transparently didn't know about it. I'm gonna take a little side step and talk about them for a second. ⁓ So here's the problem. So let's say that machine A gets a lock. Let's say it spires in 30 seconds. Let's say that machine A, for whatever reason, gets

Carter Morgan (28:45)

No, no.

Nathan Toups (29:08)

paused or frozen and it comes back and somehow warps forward in time past that lock. It comes back up and it still has this valid lock in its mind. Its lock expires. So system B comes in and because the lock has been given back to the pool, system B comes in and gets a valid lock. Now A and B both have a lock on the same system. Just A comes back to life and now tries to write to the database.

and B has already written to the database and now the data is in some weird funky place. There's this really strange thing where the assumptions of linearizability is broken. B should only be able to get a lock if A exceeded its timeline, but wakes back up and it's now like sort of frozen caveman problem. So the argument is the solution is a fencing token does this. It says A gets a lock with a token. Let's say it's like token 33.

Carter Morgan (29:59)

Right, right.

Nathan Toups (30:08)

⁓ A pauses, whatever the lock expires, B now gets a new lock granted to it, but it's with a fencing lock of ⁓ 34 is the ID. And so then what happens is that B goes ahead and writes something to the database ⁓ and the storage engine understands what these fencing tokens are. So it says, cool, this token's higher than the last token I've seen and so I'll accept this write. And then A comes back in and says, cool, write to the database.

here's token 33, and he goes, wait a second, that's below what I have rejected. And it's this nice little clever piece where like the storage engine doesn't actually have to know all the locks are out there. All it has to know is, is this the next, is this a number that increments higher than the last fencing token that I've seen? And it was just one of these very clever things where you're not having to do a bunch of coordination back and forth. It's just, I'm given some sort of claim.

⁓ I go do my task and I submit my little token with the work that I'm doing. And then the storage system goes, I'll accept this. Right. And so it kind of just, it moves, it moves forward. then storage system has, doesn't have to care about all the reasons that maybe some lock was screwed up on the, on the backend. And so anyway, I thought that that was really clever. again, I probably run across similar patterns before, but this is the kind of like fun, if you like that kind of thought experiment.

Carter Morgan (31:25)

Hahaha.

Nathan Toups (31:37)

This book is for you. There's like so many of these weird little things to deal with. So yeah.

Carter Morgan (31:39)

You

Yeah, well, does that take us through chapter eight then?

Nathan Toups (31:47)

Yeah, yes, asterisk. I'm going to mention it briefly. The other one that I'm like kind of obsessed with is something called Byzantine fault tolerance. As some people know, ⁓ you know, I've done some work in blockchain, the blockchain world, ⁓ but the Byzantine fault tolerance is this idea that's much earlier than that. And again, all weird things aside of like whether, you know,

Carter Morgan (31:50)

Okay.

⁓ yes.

Nathan Toups (32:13)

Bitcoin is going to destroy the earth because of the energy consumption, all these other things. ⁓ Distributed systems have to think about Byzantine fault tolerance and what is Byzantine fault tolerance? So the idea is that a lot of distributed systems basically have an operating assumption that all the nodes in the system are at least trying to cooperate. They're trying to be a good actor. ⁓ And so the Byzantine generals problem ⁓ is this idea that

what if we have generals surrounding a city and they're trying to coordinate this attack, I think is what the main piece is, but ⁓ they all have to communicate over time. And some subset of these generals are, ⁓ they're actually playing for the other team. And the general has to figure out they could be traitors. And so how do I coordinate messages ⁓ across the trustworthy generals and don't allow the untrustworthy generals to do things like,

Carter Morgan (32:58)

Yeah, that could be traitorous.

Nathan Toups (33:10)

send false messages, do these other bad things, ⁓ do malicious behavior. And it's a very interesting thing to think about because ⁓ it could be that there's a software bug in the code. It could be that one of the systems gets hacked. so therefore, the bad actor is actually a hacker trying to do something to a distributed system. ⁓ But how do we make a decision as a group ⁓ in a way that's acceptable?

for some next action. And so again, he starts getting into this. He talks about it more in chapter nine. ⁓ But I thought it was very interesting because he goes through all of the ways that this is actually a super hard problem and everything is a set of trade-offs. So, yeah.

Carter Morgan (33:51)

Right.

Well, and, ⁓ yeah, the concept keeps coming up in this book of elections that, the nodes, they'll form a quorum and yes, and they'll, and they'll have an election to determine, like, we know that when, a secondary node wants to become the primary, right? And so like, will submit itself as a candidate and then the other nodes will vote and, and hold an election. That's who becomes the new primary. Same with like Byzantine fault tolerance, like.

Nathan Toups (34:02)

Yeah, very democratic. Yeah.

Carter Morgan (34:24)

nodes will hold an election amongst themselves to determine, you know, like, I don't even understand that the election mechanism in Byzantine fault tolerance, you have a better understanding of how that works?

Nathan Toups (34:35)

So obviously there's no one way to handle Byzantine fault tolerance. It actually, some really interesting academic papers came out of the 60s and 70s ⁓ and basically talked about this. ⁓ there are several ways of doing this. let ⁓ me see, I actually have some notes here.

Carter Morgan (34:39)

Right, right.

Nathan Toups (35:03)

Yeah. So, you know, you're basically trying to get rid of like malicious actors or hardware corruption, these other things. They're all all of the Byzantine fault tolerance systems have some sort of consensus model, right? So like for a good example is that if 51 % of the network agrees in the Bitcoin blockchain, what the longest mined chain is, that's the winner.

Right, so in their case, you would have to take over, they call it a 51 % attack, this is like their Byzantine fault tolerance idea. If I could take over 51 % of mining on the entire Bitcoin network, I could then rewrite history, is kind of the idea here. And so as long as at least half the network has agreed on what the state of truth is, and this is why there's all these restrictions, Bitcoin can handle seven transactions per second, it's like the maximum amount.

Carter Morgan (35:44)

Uh-huh.

Nathan Toups (35:58)

transactions because of how the consensus work has to happen. ⁓ There's all these, again, these big trade-offs. also things, there's what they call a proof of stake network, which is much more efficient, but you still have to get some sort of quorum across. ⁓ And you have to think about who could be conspiring with whom and all these problems. ⁓ Sometimes it's a random election. Sometimes it's a, we all agree on some cadence. This is way blockchain's work is that we agree on some cadence of like,

who's proposing some change to the system, we get a consensus that that's the correct change and then we move forward. ⁓ And there's an idea of like what happens in the default mode. So for instance, do we stop the chain? For instance, like if we get into some existential thing, we just stop processing new problems? ⁓ Can we deal with like a partially?

In other distributism, do we deal with partially synchronous or do we deal with partial crash recovery? How do these things fall into place? I think that what it comes down is like, he calls it safety versus liveness. so if nothing, safety is that nothing ever bad can happen, but you will halt operations, right? If I want to have 100 % safety, it means that there might be certain cases in which I stop doing stuff.

Carter Morgan (37:23)

Right,

right.

Nathan Toups (37:24)

And then liveness is that basically I might trade off some safety, but I'll get more availability. And so this actually is like a really good segue into chapter nine, which we now start talking about consistency and consensus.

Carter Morgan (37:41)

Well, let's talk about it. think, um, can we talk about a bit about linear lies, linear eyes ability, which is a real tongue twister of a word. was going to say linear lies ability, but it's linear eyes ability. Um, which, uh, yeah, is this is interesting because this was one that I kind of, I had a window pulled up because I was like, I, it's a concept that seems intuitive.

Nathan Toups (37:51)

It's a mouthful.

Carter Morgan (38:11)

to like us as humans, but actually does matter as far as consistency guarantees go. So this is to add to the definition, but I think is very in line with what the book said. So linearizability is one of the strongest, most intuitive correctness guarantees for distributed systems and concurrent operations. It defines how operations on shared data should appear to clients.

And for some reason, GPT-5 does this dumb way of speaking where it's like, here's the clean, practical definition engineers use. Like, have you seen this? It's like, here's the no nonsense, no fluff, right?

Nathan Toups (38:41)

Ugh.

No,

I will tell you, I am a, I'm an anti-Sam Altman, anti-Chad GPT person, so I.

Carter Morgan (38:53)

I just canceled

my Chat GPT Pro subscription. ⁓ I switched to Claude.

Nathan Toups (38:56)

Yeah, I'm

not anti-LLMs. am cautious about them, but I've just found that anthropic, clawed stuff has been more useful, especially with the way I've been able to customize the tooling. So yeah, I did used to use JudgeBT, but I felt, I just kept running into this stuff where I'm like, I don't trust you. I also don't trust Sam Ullman, but I mean, that's like, I think it's a top-down problem, but it's, ⁓ yeah.

Carter Morgan (39:20)

I'm only using it

right now because I tried to cancel and they're like, well, do you want a month for free? I'm like, sure, I'll take the month for free. And then I'm trying to preserve all of my Anthropic tokens for Cloud code. And so anytime I just have questions like this, I'm still using ChatGBT just for the next month or so. Anyhow, this is what it says. A system is linearizable if every operation appears to take effect instantly at a single point in time between its invocation and its response and all clients agree on the same order of operations.

Nathan Toups (39:35)

That's so funny, that's so funny.

Carter Morgan (39:48)

even though operations may actually run concurrently or on different machines, the system behaves as if the operation happened one by one in some real time order. This I think is a little confusing because, ⁓ I guess it's, is this guarding against the case where two operations legitimately take place at the exact same time down to the millisecond? Cause that doesn't seem like that. Is that happening very often in big distributed systems?

Nathan Toups (40:14)

So,

I might get, it can, and again, time is an illusion. And if we listen to Papa Einstein, we know that the frame of reference actually matters on the order of events. And I don't mean this so that you can travel back in time. Anything that's causal has to move forward. The arrow of time moves forward. But what appears to be two simultaneous events to one frame of reference can actually be an event, one that happens after the other. And again,

Carter Morgan (40:18)

Right.

Hahaha.

Nathan Toups (40:44)

we can get super weird and we can think about some of his thought experiments, but it is true. If I'm moving towards the speed of light, ⁓ something can reach me ⁓ before something else reaches me. ⁓ Or if I'm looking at it from another frame of reference, it looks like those two events happened at the same time. ⁓ That's not a contradiction. This is how general relativity works.

And he even talks about this, like people will bring this up and how it's like an overuse of this abstraction. But I think it's important to think about the fact that like, ⁓ how can a distributed system play back a set of events so that they're deterministic, right? And I think that's a, actually from a linearizability standpoint, he also talks about them being also called atomic consistency or strong consistency or immediate consistency or external.

consistency. So this is idea that I have a consistent view. Now he also warns that it sounds like linear as ability and serializability are the same thing and they're not. And that this is like kind of breaking my brain a little bit, but you can serialize a set of things from a snapshot. And so a serializable thing, which again happens one thing after the other can still have a linearizable contradiction. ⁓

where I do a set of things in a serialized snapshot, then when I put these things back in place, are they linearizable? And this is where like dirty rights and some these other kind of like weird things that we can get twisted up into, we have to be really careful with. ⁓ I'm not going to pretend to be an expert in this. I'm like, I was nodding my head and like reading these sections and like taking notes on the fact that, ⁓

Carter Morgan (42:31)

You

Nathan Toups (42:37)

sometimes when we talk about serializability, we actually do mean linearizability or can I, ⁓ you know, can I, let's see, what it says in my notes, once a write completes all subsequent reads by any client, see that write. So that's like one of the criteria. ⁓ This is all, he said is also called the recency guarantee. So the idea is that like, is it ⁓ consistent?

universally is this thing, and some things must be linearizable. For instance, you would say like the account balance in your bank account, right? That has to be linearizable. You don't want to have something where if I refresh the page three times, the order of my bank, you know, ins and outs, like can reshuffle ⁓ because that, you know, and so this is the kind of stuff where we're like, we really have to have these consistencies to an order of operations or what really happened

as experienced by all participants, right? ⁓ Yeah, so I think what he talks about, single leader replication, ⁓ which systems are linearizable? Consensus algorithms, so like if we get a consensus, and again, this is why a blockchain works, because a blockchain is a provable set of reproducible events that happen over time, right? They're a chain of these, this latest item in the chain is the hash.

It's got its unique identifier, but it also includes the hash of the previous block, which includes the hash of the previous one. so therefore I can prove that if this, whatever the hash is from this, or get is a good example. Whatever my latest get commit, it's actually like, it's got this history to it that comes into place. But he also talks about things like multi-leader is not linearizable. like this is one, he kind of like really puts this line at end, or

leaderless replication, he was like, it could be. Like, it's probably not. ⁓ Because there's like weird race conditions that can happen even if you have quorums and all these other things. ⁓ But that like having linearizability is this like, it's a really hard problem, especially ⁓ if you're doing distributed systems. And if it's eventually consistent, for instance, like it's not linearizable. ⁓

Carter Morgan (45:03)

And all of this, because I know you've talked about like, well, I have experience doing some of this in the past and some of this is review to me. I was like, how on earth does he have experience doing this? Like this is all, you know, pretty intense, but I totally forgot you've been in blockchain and FinTech, right? I mean, I can see why you'd be touching on this a lot.

Nathan Toups (45:17)

Yeah.

Fintech and blockchain. We also care about this a lot with, yeah, so anything in the financial industry whatsoever. And it is funny because the kind of people writing the code for this, they literally go off and read white papers and then see if they can apply it. And it's the first time I've ever worked with research engineers who literally think in mathematics. And it's funny because you see this and you're like, ⁓ this is what programming used to be.

Carter Morgan (45:34)

Right.

Nathan Toups (45:46)

For like most people it was like, ⁓ how can I get this computer to reliably run these algorithms that express some, you know, mathematically provably correct hypothesis, right? And we use this every day with cryptography. It's a good example of like, I think we're kind of at two different ends of the world. We'll go off on like a little tangent for a second. ⁓ On one side, we're doing like statistical stuff.

Carter Morgan (45:46)

Yeah.

Nathan Toups (46:14)

And statistical is all about fuzzy math. Obviously, we want it to be reproducible when possible, anything that has to do with rounding numbers, anything that has to do with floating point, we're trying to have a model of the universe that explains something. And we use that model to make a prediction or figure some other piece out. And it's interesting because we're using math to give us a best estimate of something. It's too hard to model every little

atom in the universe, so therefore we make a model of how fluids work and do a bunch of math and come up with fluid dynamics or whatever. Then there's the other side, which is the number theory side with cryptography, where if even a single bit is off, I know that this algorithm is inaccurate. It's the most pedantically exact kind of mathematics you can imagine, where you're literally like,

Carter Morgan (46:46)

Rem.

Right, right.

Nathan Toups (47:11)

tumbling factorization of large primes to prove that nothing can be altered and you're doing these crazy things like encryption is this and ⁓ you know, anything that's talking about making guarantees of like, you know, inalterable data. And it's funny because the computer does both of these things. I don't know, sometimes I'm in awe thinking about the fact that like, if I'm pedantic about the order of operations or that this data has been unaltered, I can actually make

some pretty bold, like, I can make some pretty bold design choices if I know I can trust some piece of the system. Like, okay, well, if I get a block, the latest block on the blockchain, I don't have to trust anybody out there. As soon as it's been written, it's gone through this huge amount of effort to get written to it. And so even if I don't trust everyone else in the system, I know that all of these disinterested parties have come to this consensus.

And therefore, I can make all of these choices and decisions on top of it. And in a less absurd way, this is the same thing with a CI-CD pipeline. When you have the hash of a Docker container, and I go, I know that none of the data in my last commit has changed, so only have to redeploy service two and three, and service one and four can be deployed unaltered. Or we can just skip it, because I can prove that the hash of the contents has not changed.

these things still matter. again, this is still like, thinking through these kind of like ideas of what can I lean on, what's provably correct. I think it's a really kind of fun part of computer science to dabble in. ⁓

Carter Morgan (48:57)

Do we want

to talk about how the CAP theorem is bull crap?

Nathan Toups (49:00)

Yeah, this one

kind of blew my mind. like, I've definitely been like, you know, cocktail party stepping on a cap there. I we I think we just talked about kept him last week. ⁓ And, know,

Carter Morgan (49:09)

I think we did, yeah. Consistency,

availability, and partitioning. Pick two of the three. That's the basic idea of the CAP theorem.

Nathan Toups (49:13)

Pick two of the three.

And he basically makes the argument that like, ⁓ this is misleading framing and that partitions will happen. Like if you're building ⁓ systems online, you're going to partition at some point. So really, this is, do you want to optimize for consistency in your partition or are gonna optimize for availability in your partition? That you're not really doing other trade-offs. that's really what this comes down to.

I walked away from this going, he's correct. It was a very compelling argument to basically make the argument that like, I think he says it's unhelpful. That was the, right.

Carter Morgan (49:57)

Basically that partitions are not optional.

Right. And so like, you're going to have to partition your data eventually. And so really, yeah. Like I guess what would be a consistent and available data? Is he saying the cap theorem is not useful at large scale? Because I guess you could imagine a very consistent available database. It's just a single node, right? Like just like a single database.

Nathan Toups (50:20)

Well,

so CAP theorem is specifically about distributed systems, right? So the whole idea is that the data no longer fits on a single node. I think if you do have data on a single node, yes, you can actually have all three. Right, because partition equals one, right? That is what the thing is. ⁓ I thought this was interesting. His argument was, ⁓

Carter Morgan (50:24)

I guess that makes sense about distributed. Right, right.

Right.

Yeah, it's tolerant of partitions because it never gets partitioned, right?

Right, right.

Nathan Toups (50:51)

why was CAP Theorem unhelpful? That it only considers network partitions, ⁓ that it says nothing about network delays. this was an interesting one, right? Like the availability is this sort of this kind of nebulous term. And I think we talked about this briefly is that like a lot of database technology really suffers from marketing jargon. And so some of the terms that we're using isn't really useful.

Carter Morgan (51:15)

Right.

Nathan Toups (51:18)

And he does a really good job here of being like, we really need to define the terms that we're talking about. Availability, what does that even mean? Partition, are you partitioning by what? And because you can see he has this much more, he's a PhD, right? So he has this much more mathematical, provable sort of mindset. I think CAP is like a cool shortcut to thinking about the trade-offs that you're making, that you can't have both availability

Carter Morgan (51:33)

Right.

Nathan Toups (51:47)

and consistency guaranteed at the same time in a tribute system. His basic argument though is that like, as soon as you start thinking about what the trade-offs are, all of that stuff doesn't really matter anymore. That you really need to either decide, is my system linearizable or not? Right? Like I think that's his like major argument is that like, okay, well, one of these trade-offs, I can't guarantee linearizability. just, I, you know, eventually consistent systems may not be

100 % reproducible. ⁓ And then, you know, or am I making other trade-offs in sort of like, ⁓ you know, what default tolerance are, these other kind of pieces. So anyway, we're not doing it justice. Like I guarantee if you get into chapter nine, I would highly recommend reading his work on the CAP theorem because it's very compelling. I've heard other people make this argument.

in that sort of like weird LinkedIn scree, know, where it looks like half of it was written by Chad GBT. And I was just like, it was so obnoxious in its writing that I just kind of dismissed it. But I see what they did now. They kind of like plagiarized ideas from this. And this was actually well argued. So.

Carter Morgan (52:48)

You

Well, I want to make sure we have plenty of time to devote to Chapter 10 batch processing. Is there anything else we want to talk about in Chapter 9 before going to Chapter 10?

Nathan Toups (53:05)

Mm-hmm.

trying to think about, ⁓ yeah, just kind of some like, you know, best hits. I didn't know much about Lamport timestamps. ⁓ That was kind of a cool idea, which is this idea that you have like a counter and a node ID, so you have this like sort of partitioned ID. It gives you some nice guarantees. The reason I bring up Lamport is he's actually the one who coined the term Byzantine fault tolerance. ⁓

Carter Morgan (53:38)

cool.

Nathan Toups (53:39)

Yeah, so this guy, classic 1970s white paper gray beard kind of person who's thinking about cool stuff before it was cool. And the timestamps by themselves, he has some cool ideas, but there's other things that have to built on top of it to work. think, what was another thing that stood out to me in this section? Talked about linearizability.

Vector clocks. This was another one that I didn't know. It was like comparing Lamport timestamps to vector clocks. And again, y'all, this is what you're getting into if you read this book. But vector clocks are, again, it's a guess in this frame of referencing. And I'm not going to get too deep into it, but I thought it was really cool, which was that basically, from my understanding, a vector clock is you want to make some claim that A caused B, right? But that maybe some events happened concurrently.

Carter Morgan (54:16)

Hehehehehe

Nathan Toups (54:36)

And so basically I will have, I'll declare a vector clock, is ⁓ like my count. And then what I know of is your count in this, like let's say there's three nodes in the system. I'll write down my, ⁓ I'll write down my vector clock of my understanding of what time was to me. And the last time I observed something from like these two other places, I think that that's what this is. ⁓ And then we like compare these vector clocks.

to kind of try to see if we can yank out what actually happened in what order. ⁓ Again, this is stretching my imagination. I'm not going to pretend to be some guru of it. I definitely wrote it down in my notes of I need to spend more time thinking about vector clocks or researching it, because it sounded interesting. ⁓ so since these weird ideas are the same ones that are kind of buzzing around Byzantine fault tolerance and some other things that are there, it's like,

I think worth spending some time on.

Carter Morgan (55:39)

Yeah, it's, what was I gonna say about vector clocks, but any of this. Yeah, I think, well, we'll talk about this more kind of at the end, but yeah, like it's insane to me how deep this book goes, right? Like, yeah, again, Kleppman, he really, really knows himself, his stuff.

Nathan Toups (56:04)

And

he's still going like, oh, I'm just scratching the surface. Like, I'm not joking. And at the end of every chapter, he's got these huge set. I guess you don't see this if you're listening to the audiobook. I'm not. It's like two or three pages of like references. like anything he talks about, there's like links back to academic papers or blogs or other things like it is an incredible. It is like a research PhD thesis document for for layman. Right.

Carter Morgan (56:07)

Yeah

Yeah.

Nathan Toups (56:33)

You can go

in and be like, I really want to learn more about vector clocks. OK, there's a link to the academic paper that it came from, stuff like that.

Carter Morgan (56:41)

Yeah, computer science is such an interesting field because it's a made up science in the sense that like, like you talk about like marketing jargon, like this isn't like biology where it's like, there are some hard observed truths about the universe. Like we invented computers, but it is true that at a very base level, they do follow the laws of physics. And so, ⁓ I don't know, like this book is really getting at

some of those ideas. It's kind like you read something like fundamental software architecture, which is going to have a lot of thoughts about like proper software architecture design or philosophy of software design, which has a lot of thoughts like proper code design. And like, is that computer science? Like, it's not really a science. Like there's still a lot of subjectivity in how you design these systems, even if there are some best practices or sensible defaults as Martin Fowler calls them. This is computer science. This is getting into the, yeah.

Nathan Toups (57:35)

Yeah, this is...

Carter Morgan (57:38)

the hard boundaries of reality, right? And just, you know, the physical, the laws of physics that govern our universe. ⁓ And so it's just been funny for me because my work with kind of the site reliability engineering stuff I've been doing, I've been driving the bottleneck kind of deeper and deeper into the stack, which is great. I mean, that's a sign that things are improving and that, know, we're a...

understanding the system better and making it more performant. The stuff Martin Klempman is talking about, he has driven that bottleneck down to like the speed of light, down to like how, you know, how time ⁓ works. It's really, really interesting stuff. But I also, I respect anyone who says like, I'm comfortable not operating in that plane. I'm comfortable with my bottleneck being like the number of connections to my MongoDB cluster.

not Einstein's theory of relativity. ⁓ But then we've got, go ahead, go ahead.

Nathan Toups (58:37)

⁓ Yeah,

absolutely. think that that was, ⁓ yeah, and that actually, that rounds out part two of this book, and boy have we been on a journey. ⁓ We do need to touch briefly on part three. So part two is distributed data. Part three is actually called derived data. And I wanted to make sure I got the name of this correct because... ⁓

Carter Morgan (59:00)

Yes.

Nathan Toups (59:03)

basically the last three chapters of this book, it's gonna be batch processing, which we're covering today briefly, stream processing, and then the future of data systems. And again, this is the future of data systems as seen from 2017, So it'll be interesting. ⁓ It'll be interesting to see what happens. ⁓ What was your feeling about this batch processing section?

Carter Morgan (59:29)

I really liked it. mean, ⁓ he kind of starts off comparing all of batch processing to Unix and the Unix philosophy, which is a really cool way of thinking about it. Basically the Unix philosophy, and if you listen to our episode about Unix, a history and a memoir, and go listen to that if you haven't, because one, Unix history and memoir written by Brian Kernahan, one of the, of course, founding contributors to Unix, and he's been on the podcast twice. ⁓

But that's, it's not only a really great book from like a history perspective, like it was really fun to learn about Bell Labs and, and all the people who work there, but it also really gets into the Unix philosophy, like what makes Unix special and why did Unix eat the world? And then listen to our follow-up interview with Brian Kernighan, cause we asked him kind of like, if he thought that the creation of Unix was inevitable or if it really was something special that came out of Bell Labs. Anyway, really interesting stuff. But the Unix philosophy is this idea that

Nathan Toups (1:00:10)

All

Carter Morgan (1:00:23)

rather than having one giant program that should be able to handle everything, really programs should be able to be segmented and that you should have one program that does one thing really well. And then the key behind it is pipes. And who was it? Was it Kenneth Richie?

Nathan Toups (1:00:40)

⁓ was Mick. ⁓

shoot. That's going to drive me nuts.

Carter Morgan (1:00:45)

Right.

can't remember. Well, you look it up. I'll explain the concept. ⁓ Dennis Richie? I don't...

Nathan Toups (1:00:48)

It's, yeah, I'm pretty sure it's, no, it's

Doug. ⁓ that's gonna drive me absolutely.

Carter Morgan (1:00:57)

Okay,

well you try to find it, I'll explain it, but someone, key contributor, who we're trying to find right now, ⁓ explained that he wrote back in like the 70s, when they're first inventing Unix, like, programs should be like garden hoses. There should be a way to basically take the output of one and then screw it into another. Do you have it? Doug, Doug is McElroy.

Nathan Toups (1:01:17)

Douglas McElroy, yeah, it was

1973 is when he pitched this idea, yeah.

Carter Morgan (1:01:24)

Yeah. And so that is the pipe philosophy in Unix. And if you are, ⁓ you know, if you're doing any sort of bash scripting, you'll see that, you know, the pipe character, and that's literally what it does. It says, here's the output of this program, move it into the next program. And he talks about with Unix that basically everything in Unix, in Unix, a file is just a sequenced order of bytes. And like, that seems very natural to us these days, but it's actually kind of miraculous that,

All, since all files are a sequence order of bytes, they can all, they all share that same interface. And so you can pipe the contents of one file into another file. Um, and, they all share that, you know, again, that, that same interface. so it says that's the idea behind Unix, right? Derived data and batch processing is that at a mega scale, right? Which is this idea that you should be able to take data.

and have one kind of function. And again, that can be a massive function that runs across dozens of hosts, but it formats the data in one way and then it sends it to another function, right? And it's really just that Unix pipe philosophy at a gigantic scale.

Nathan Toups (1:02:41)

Yeah, it's funny, little shout out to years ago at Texas Linux Fest, I think this was probably in 2015, 2014, I actually gave a talk called Pipes, Chains, and Indirection about using Unix command line tools and standard in and standard out and standard error. This is something I've thought was just thing of beauty for a long time. And so yeah, when we ended up getting to talk to Brian Kernahan, was just a dream come true because

Carter Morgan (1:02:55)

Nice.

Nathan Toups (1:03:10)

I actually think, I'm gonna be cheating here a little bit. The opening quote for chapter 11 on stream processing, it's so useful for what we were actually just talking about that I'm gonna bring it up. So John Gall, who wrote Systemantics, also known as like the System Bible, I think, he, in 1975, he says, a complex system that works is invariably found, found to have evolved from a simple system that works. The inverse proposition that,

It also appears to be true. A complex system designed from scratch never works and cannot be made to work. ⁓ And I think that actually...

Carter Morgan (1:03:43)

Yeah. We haven't talked about it

enough, but the opening quotes for these chapters are all bangers.

Nathan Toups (1:03:51)

They're all, every

one, I think the chapter 10 is like a Knuth, ⁓ like a Donald Knuth quote. Let me see if I can, yeah, let me see if I can find it. Yeah, a system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. Donald Knuth. And again, every one of these is kind of like set your mind in the right state.

Carter Morgan (1:03:57)

Yeah.

Nathan Toups (1:04:19)

before you get started in the chapter. ⁓ I wanted to speak about ⁓ batch processing, because actually I have a lot of experience doing this from a system side. So we used to call this like ML ops or data ops kind of work. ⁓ I worked a lot with something called Airflow. ⁓ And Airflow builds direct to day cyclic graphs that do data processing. And so anybody who's using patterns like MapReduce ⁓ and any of these other like, you

Carter Morgan (1:04:41)

Yeah.

Nathan Toups (1:04:49)

data leaks, all these kind of like patterns that come out. You ended up for a long time, you were using batch processing. Eventually we get into stream processing for certain types of workloads, but it's true. If you've ever built a command line tool where you have like, let's say some data, say some log data, and then you wanted to pipe it into something that would sort and then pipe it into something that gripped out certain keywords and then,

you wanted to maybe strip out some other stuff, and you pipe all these things together, and then some output data would come out at the end. ⁓ That is not that different than how large scale data pipelines work. You have ⁓ some defined set of inputs and outputs. You have some expectations of what these schemas are going to be. You have some sort of intermediate data types, and you have some output data. And he talks about why, for instance, you would never in a good pipeline

update a database in the middle of your pipeline because you want this immutability and reproducibility that takes place. And just like with the Unix pipelines, would typically just like, the input data is typically thought of as immutable. You don't really like pipe something in and then overwrite the source data. That would be like, emphatical to ⁓ what you would normally do. You would have some input files and then you produce some output files or whatever. And the same thing happens in data pipelines. ⁓

And you can really see where these ideas were coming from. Google actually came out with this MapReduce paper, and it kicked off this entire industry. And it's really kind of funny because by the time the industry had really kicked off, Google had actually already moved on. They had moved on to the extreme processing and a bunch of other things. ⁓ But MapReduce stuck around a long time, especially if you've ever heard of tools like Hadoop. Hadoop is a really popular ⁓ MapReduce framework, and ⁓ it ended up really kind of taking over the industry for a while.

There's lots of other tools that are out now that use either batch or stream processing. ⁓ But because they...

Carter Morgan (1:06:54)

dupe is a running joke in between my wife and I just because,

and it has nothing to do with this, but I just remember my very first internship. I was like, do you like some Android problem? And like, you don't know what you don't know when you're starting out. Right. And so I'm just totally like in the weeds. and, ⁓ and like I stack overflow trying to figure out something. And I remember getting to a stack over like, it may be a problem with your how to configuration. Like that's not a word I'm done. Like, what, what is this? And so now I know that I was like,

Nathan Toups (1:07:23)

And

then you, yeah. No, and you look it up and you're like, it's a elephant? Like what, this, what is, what am I even looking at here? ⁓ Yeah, no, yeah, no, no, no, no, it's, and so yeah, so like he does a really good job of basically saying like, okay, I think he says, ⁓ you know, we, in a MapReduce, we want to read inputs ⁓ that we map these key value, and a lot of times, it's really kind of funny, we have these like,

Carter Morgan (1:07:24)

Not even close to, you know.

Yeah. Yeah. Yeah. Sorry.

Nathan Toups (1:07:53)

we break up these things, a lot of times the key value step is actually very obvious, but it's very important. You have to map out what data you're actually doing. It gets sorted, and then you do this reduce step, which is like, know, ⁓ a really toy app, like the Hello World of MapReduce is that you count the instances of words, and then you could do this on a really large dataset, and it's like, it all fans out, it does a bunch of stuff, and then it comes back and aggregates your ⁓ counts across some huge dataset. ⁓

Carter Morgan (1:08:12)

Right.

Nathan Toups (1:08:23)

And it does a bunch of things like, you know, it won't, a lot of times you break your pipelines up so that they're interruptible. for instance, and you got it in, you have to go back and they talk about this in the book, like, why did Google build it this way? Well, Google would over provision servers. They would know that they actually would schedule more work than the service could actually handle. And so they wanted to build operating like software so that it could be resumable from intermediate steps.

Carter Morgan (1:08:41)

Mm-hmm.

Nathan Toups (1:08:50)

and rerun it as resources became available. That's not always necessary. The overhead for that is always necessary for everyone. And so again, the book does a really good job of breaking down like, when do you actually need this and when do you not? And I thought this was like a really, I had been working in this stuff for a long time. And so I was listening to this on my run. And I was just, it was like a nice way of like, again, the very pleasant Klepman way of like framing things.

for a topic that was actually quite familiar to me. this, unlike where I had to listen really hard in chapter nine, even with some of the topics I was familiar with, this one I was like, oh yeah, oh cool, yeah, okay, I got this. It was really nice, though, to hear this sort of like evolution. And you can see he's doing a layup because he starts talking about things like data flow engines, like Spark, which again, everyone started using over time.

Carter Morgan (1:09:24)

Right, right.

Nathan Toups (1:09:48)

that's gonna get us into the stream processing, even on top of these map-reduced patterns. And again, I don't wanna talk too much on it, again, it's really good framing. If anybody is listening to this and deals with large-scale data stuff, you very likely are still doing some batch processing work, some work just structured to this really well. And this section just does a really good job of a foundation of all of the concepts from the first nine chapters.

are starting to be applied here and we have enough vocabulary to have like an intelligent conversation as to like why Hadoop is set up the way it is. well, it has these characteristics of fault tolerance and this resumability and you can like have this proof of doing this thing because you have, know, like it's really actually, it's cool that we get to this section of the book and you're like, okay, I actually couldn't have understood all of this in exact language without the first nine chapters.

Carter Morgan (1:10:44)

Yeah, and

I guess maybe, maybe it's time to, unless you have anything else you want to say about that. And I agree that kind of the last section, we're going to talk a lot more about that sort of stuff. As far as some of our, our hot takes go on this book and this will, this blood bleeds a little more into who I'd recommend this book to. I just remember when I was preparing for some of my big tech interviews, I just, I had heard a lot about this book and I knew that big tech was all about scale. And so I'm I'm going to buy Designing Data Intensive Applications. think it might have actually been the very first tech book.

like, you know, the kind of the books we read on this podcast, probably the very first one I ever bought. I got like a chapter into it. And then before I just my interview cycle started heating up and I just didn't have as much time to read. But I always kind of was like kind of kicking myself. Like I should have read that book, right? Because that would have been really, really good to know before these big tech interviews. This book would have been a waste of time for specifically for like a big tech system design interview.

Right? Which is not to say like this book obviously has a fantastic reputation and is the authoritative book on designing data intensive applications. But I think there are very, very few engineers who wind up dealing with these kinds of problems. And so if you're going to read this book, you really need to read it as more of like an academic exercise. Like because you're just curious about what goes into all of this stuff. But

Even I mean again, I did a stint at big tech and I knew lots of people working on lots of different teams I still know lots of big tech engineers None of us were touching the kind of low level problems that ⁓ that this book ⁓ deals with the the one map reduce and kind of I'm curious as we read this final section because ⁓ that did start touching a bit some of the work we were doing ⁓ so that's not to say that

Nathan Toups (1:12:37)

Mm-hmm.

Carter Morgan (1:12:41)

the book isn't good. You know, it's obviously it's the it's the best in the world at doing what it does. But this idea that like I should read this because this is really going to help me level up and get like the big tech job. Lots of other books I would read first. I mean, off the top of my head, Alex shoes, system design, interview, fundamentals of software architecture, the DevOps handbook. Those are kind of more I think immediately applicable to your career. If that's what you're looking to get out.

Nathan Toups (1:12:59)

Yeah.

Right. If you're a data engineer, and I guess maybe that's where I'm influenced by this a bit. When I was working in FinTech, we built this big Kubernetes cluster, had all these service oriented architecture stuff that we were doing, but that was one part of the business. The other part was this huge data pipeline where we were processing NASDAQ data, other data sources that we were processing. had this big batch job that would run from maybe three in the morning when the earliest data was available to trading

to when the trading clock started about an hour before that, so around eight o'clock in the morning. And so these pipelines had to be perfect. And it would be like all hands on deck if certain things didn't run. And so we spent a lot of time and energy doing the data engineering and understanding how we do item potent, reproducible, all this kind of like really important stuff. So I had to sit in the trenches with our data engineers for a while and understand enough to like intelligently talk about how to build systems around it.

And this is the only, unless you're in that space, a lot of this section will not be applicable, right? At the same time, at some point in your career, you might have to get your hands dirty on some data engineering. And this book is beautifully crafted for thinking about why these problems are so hard. But yeah, so my quick two hot takes, I think I actually already talked about them. So one is that time is an illusion.

Carter Morgan (1:14:28)

Right, right.

Nathan Toups (1:14:37)

And also that thinking about Byzantine fault tolerance is actually more relevant than even this book emphasizes. So again, I know biased here, but ⁓ thinking about how systems can misbehave is really important and understanding how things can fall over sideways is really important. And you can be better equipped to understand partial failure modes if you read this book. if that's domain that you're in, yeah.

Carter Morgan (1:15:07)

Well, okay, let's talk, we can talk about what we're do differently in our careers as we, ⁓ because we've read, you know, the third quarter of this book, right? What do you got, Nathan?

Nathan Toups (1:15:22)

⁓ Yeah, I'm repeating, we didn't even talk about it today, but he does a whole section on what ⁓ CRDTs, ⁓ which are the conflict-free ⁓ resolution stuff, ⁓ what CRDTs can do in certain systems where we're trying to understand which events took place first. ⁓ I know that this is what Klepman's been working on with AutoMerge.

This is like further ⁓ enhance the argument as to like why I should think about local first and auto merge as a side project, sort of like learning for myself.

Carter Morgan (1:16:02)

Right. ⁓ as far as me, ⁓ I want to touch on MapReduce, which that's just a concept, which I had a bit of experience with at a previous job, but didn't, ⁓ get a, a ton of hands-on experience with it. ⁓ and I have a feeling we might be approaching a place where this could be relevant at my current company. And so just want to make sure I'm kind of ahead of the ball there. As far as a book, who we recommend this to Nathan, what do you got?

Nathan Toups (1:16:32)

Each week, the bar gets higher and higher. you know, week one, I was like, software engineers are deeply curious about architecture. And then last week, I was like, but you really got to be committed. And that's like 5x true now. It's like, boy, if you made it this far, congratulations, you've made it through part two. ⁓ It is, I've gotten a lot out of this book. I also understand why most people don't finish it. ⁓ I think if it doesn't feel

Carter Morgan (1:16:34)

Yeah, I know, right?

Yeah.

Nathan Toups (1:17:01)

applicable to your life, it'd be really hard to justify finishing this book. ⁓ If you've kind of been around the block a couple of times and you are lacking a PhD's sort of beautiful description of these systems and you go, that's why that's what ties all these things together. It's a very rewarding book. yeah, that's.

Carter Morgan (1:17:22)

Right.

Yeah. I would recommend this book to sickos and maniacs. ⁓ if you are, right. If you are like us, yeah, exactly. Right. And like, ⁓ I've decided to devote too much of your time to reading software engineering books. Hey, check this out. In a more real sense. Like I mentioned before that I think a lot of people's exposure to this book, honestly, should just be this podcast. Like you should listen to this podcast.

Nathan Toups (1:17:27)

Yeah.

Guilty as charged.

Carter Morgan (1:17:51)

And as you're listening to these episodes, if you're like, wow, that sounds really, really interesting. I'd love to learn more about that. Then pick the book up. I always tell people like when they, it's interesting. I got to figure out what a CS 101 class looks like now because like chat GPT can just write all of your homework for you. And so I wonder how schools are dealing with that. But I would always tell people in the before times, right? The before times being 2022, that if you're interested in computer science,

Go, you know, at your local university, take CS 101. And if you like it, there aren't really a ton of surprises after that. Like that, that loop of like, get the problem, write the program, debug it, see the solution. If that kind of makes you feel excited and energized, like then this is the field for you. and I kind of feel the same way with this book. And like, if you're listening to this podcast and you're hearing about like, you know, clock synchronization issues and MapReduce and

You know, all of this, these, the subjects we've covered today you're like, this sounds really interesting. Then like, that's a good sign that not only should you pick up this book, but maybe you should be pivoting your career to kind of focus more on this level of stuff. But I think there are also lot of other engineers like me who are kind of drawn to just the natural, like. Crud application and like they find that really interesting and kind of maybe a little more on the product engineering side. And you might be listening to this podcast.

Nathan Toups (1:19:04)

Mm-hmm.

Carter Morgan (1:19:19)

or reading the book in my case and going like, well done. I'm not sure this is what I want to spend my career focusing on. And, you know, if that's the case, like, again, just listen to this podcast. ⁓ and you don't need to be a sicko and a maniac. You can just enjoy some good quality conversation between Nathan and I, especially because I feel like this episode took the, it's like, Nathan, it's like a, with NFL commentators, it's like you're, you're given the play by play and I'm providing the color commentary, right?

Nathan Toups (1:19:48)

Right,

yeah, no, it's, yeah, ⁓ this one's been interesting. And again, I think I, I don't know how much I've spent on this book, because I have the physical copy, it's in storage up in Colorado right now. I bought the Kindle and I have the audiobook. So I think I've spent well over like $120 or something on material with this. ⁓ But this book has just sat on my shelf for years. I kind of looked at it a couple times and was like, I'm not, I don't have time to read this.

Carter Morgan (1:20:00)

Right,

Yeah, yeah.

Nathan Toups (1:20:18)

And I'm so glad I finally did. yeah.

Carter Morgan (1:20:21)

There we

go. Well, uh, fantastic. And as an invention of the NFL, I'd be remiss. Uh, I grew up in Washington. I am a Seahawks fan and I'm quite excited for them be going to the Superbowl. So this is, this is, uh, what I get for BYU being robbed of its college football playoff spots. So, you know, things are looking, looking up in, uh, in Morgan land over here. Anyhow, thanks for listening folks. Uh, we are very excited. We're going to do the last, uh, chapter, the last, uh,

quarter of this book next week. And we've already decided we rearrange our schedule a bit. We will be covering an essay the week after this. We're going to give ourselves a well-deserved break after powering through all of designing that intensive applications. ⁓ And we'll talk about it more next week. But the essay we've picked out, we're really quite excited about. It's going to be great. Anyhow, what did?

Nathan Toups (1:21:07)

Yeah, we're touching on

Ken Thompson's ⁓ on trusting trust. I think this was actually, this was a recommendation from someone in our Discord. We actually have a, yeah, we have a book recommendations forum subsection. And if you have books that seem like it be something we should cover, throw it in there, it might make it into the backlog. yeah.

Carter Morgan (1:21:13)

Yeah, yeah.

It was, wasn't it?

There we go. All right. Well, thanks for tuning in folks. As always, contact us at contact at bookoverflow.io. Find us on Twitter at Book Overflow Pod or I'm on Twitter at Carter Morgan. Nathan and his work with his consulting agency, Rojo Roboto and his newsletter are at rojoroboto.com slash newsletter. Thanks for tuning in folks. We will see you next week.

Nathan Toups (1:21:51)

See you.

Episodes in This Series

Ep. 97Reliability, Scalability, and Maintainability - Designing Data-Intensive Applications by Kleppman

Jan 19, 2026

Ep. 98Replication, Partitioning, & Transactions - Designing Data-Intensive Applications by Martin Kleppman

Jan 26, 2026

Ep. 100 Time is an Illusion - Designing Data-Intensive Applications by Martin Kleppman(This episode)

Feb 2, 2026

Ep. 101The Ethics of Data-Intensive Applications - Designing Data-Intensive Applications by Martin Kleppman

Feb 9, 2026