Hey there, welcome to Book Overflows, the podcast for software engineers by software engineers where every week we read one of the best technical books in the world in an effort to improve our craft. I am Carter Morgan and I'm joined here as always by my co-host Nathan Toops. How are you doing, Nathan?

Nathan Toups (00:27)

Doing great. Hey everybody.

Carter Morgan (00:29)

Well, thanks for joining us everyone. Like, comment, subscribe, join the discord, all that good stuff. I'm just realizing I was looking before we recorded at like in Riverside, our podcast editor, it just shows you like thumbnails of like the past episodes. I think I've worn the same shirt for all of this book series. I think you're always in the same shirt. Yeah. Yeah. You're always in black and I am, I am generally in a BYU shirt of some capacity, but I was realizing I'm like, this is a, yeah, I think it's the same one. So.

Nathan Toups (00:46)

Really? I'm always in black. Yeah.

That's great.

Carter Morgan (00:59)

What a treat for our YouTube audience that for all of software architecture, the hard parts, I've worn the same shirt.

Nathan Toups (01:05)

You

can imagine that we just did like hours and hours back to back of covering this book.

Carter Morgan (01:08)

Well,

in every one of these episodes actually is six hours of raw footage and then we meticulously edit it. Yeah. Okay. Well, we're excited. This is our final episode of Software Architecture, the hard part. Thanks everyone who was listening. These have done really well for us actually. Like, I don't know. We're glad you guys are liking the content. ⁓

Nathan Toups (01:16)

yeah. Imagine all the outtakes, just us mumbling.

Carter Morgan (01:38)

But as far as we go, this is just kind of another week in the book of reading quality software books. So guess I should say, if you're liking this content, stick around. This is what we do every single week. And we have, you know, we've enjoyed this book, fun to finish it. ⁓ This book is written, usually we do the author introductions, but there's so many of them. We'll just give them a shout out. Neil Ford, Mark Richards, Pramod Satalage and Jamak Deghani. Neil Ford, Mark Richards and Pramod.

are all friends of the podcast. We've interviewed them all. Neil Ford, we've interviewed twice. So if you've been listening to this episode series and have been enjoying it, go check out those interviews. ⁓ Really, really cool guys. ⁓ As far as the book goes, just to recap anyone who is maybe tuning in for the first time or just forgot from last week, this is the book introduction. It's software architecture, the hard parts tackles the difficult problems and distributed architecture. The ones with no clear best practices where every choice is a trade-off.

Neil Ford, Mark Richards, Pramod Sadalage, and Jamak Deghani walked through strategies for service granularity, managing workflows and orchestration, decoupling contracts between services, and handling data across distributed systems. I've read all of this through a fictional case study called the SysHop Squads, showing how these decisions play out in practice. So we've now finished this book. ⁓ I think we've both been listening to it on audio. Nathan, you've been better about referencing some of the ⁓ diagrams. I tend to usually the night or the

Morning the night before the morning of the podcast. I will just open up the book and read through Some of the the parts I was most interested in ⁓ but just kind of give the audience an idea of how we consume this book So give me your thoughts nathan. We now have the complete picture of software architecture the hard parts. What do think?

Nathan Toups (03:17)

Yeah, this third section was great. I'm in the middle of ⁓ doing a data mesh project right now, so this was actually really cool to see all the ways that I could really do a terrible job if I'm not careful. So data meshes, they go really into detail in sagas and contracts, all really great things if you're doing software architecture. I don't normally lead with a quote from the book, but I'm going to lead with a quote with my general thoughts, which is, OK.

I finally think I get it. This is from the SysOps squad. says, we can't really rely on generic advice for architecture. It's too different from all the others. We have to do the hard work of trade-off analysis constantly. And that's this book in a nutshell. It's like all of these things are trade-offs. All of these things are analysis that you're constantly having to sort of like understand the state of the world, understand how the behavior of systems are in place, and

as much as I love this book, it doesn't leave you with this like, ⁓ now I know I can do a good job forever. It's really like, you finish this book and you're like, man, this is hard. This is really hard stuff. And I think I'm going to screw this up a lot while I'm going. At least I can go back and look at this book and be like, where am I going sideways? This is ⁓ super important.

Carter Morgan (04:19)

Yeah, I know.

Yeah.

Yeah. I, I just want to say again, ⁓ Neil Ford and Mark Richards are very talented authors and, promote and Jamaka. I'm sure are very, very talented as well. I'm only calling out Neil Ford and Mark Richards because we have read several books from them. And as, as the people who probably read more software engineering books than conceivably anyone in the world, ⁓ like it.

We tend to only read good books, but even out of the good books, there is a spectrum of how easily digestible a book is. I mean, it's hard for me to think of people who do this better than Mark Richards and Neil Ford. I mean, would you agree?

Nathan Toups (05:20)

Yeah, especially in their domain. mean, again, we did that. We did the sort of March Madness ranking. And obviously they they got up to the very top philosophy of software design. I'm just thinking of books that stick out as like particularly well written and deeply full of wisdom. And ⁓ yeah, Neil Ford and Mark Richards books, anything that they've kind of tacked on to are just really they're actionable. They're readable.

Carter Morgan (05:35)

Right, right.

Nathan Toups (05:48)

This is in that same domain, right? You could read Evolutionary Architectures, you can read this book, you can read fundamentals, and you're going to come away just a better engineer, for sure.

Carter Morgan (05:58)

Yeah, like they make it almost seem like easy, not easy so much, but like I think we all kind of have like that, you know, made to stick talks about it, like the curse of knowledge that like, once you know something, it's hard to imagine you not knowing it. I think a lot of engineers are actually struggling with this, with agentic coding tools like Claude code or codex, because like, like just for example, yesterday, like I was ripping with Claude all day yesterday and like we were building out a notifications feature. So like, this is not rocket science, right? Like.

Not exactly a novel problem building on notifications, but you know, got a ton done in the day. And at the end of the day, I'm just a little like, dang, like, does Claude just do my job now? And then it's a little like, I think we're underestimating how much we're guiding and steering these tools. And, and I read a book like this, and sometimes there's just a part of me that's like, well, anyone could know this stuff. They just read the book and then you're like, one, you have to have a lot of background knowledge as a software engineer to understand this, but two, Mark Richards and Neil Ford do a very good job, you know.

They know who they're talking to and we've talked about that before that like sometimes we read a book I'm like, I wish you had identified your audience a bit more and mark richards and neil ford do that They're like this is for senior level architect level engineers and that's who we're talking

Nathan Toups (07:12)

It makes me think of it. Well, I'll say what doesn't go out of fashion is clarity of thought, right? Like it doesn't matter what the what the tools and the expectations are. If you have a clear set of principles, a clear set of axioms upon which you can react to the world, then we've all seen this. You see somebody who can navigate uncertainty and they really come down to being a principled person and they can kind of just stay cool through the storm. I think that a good software architect

Carter Morgan (07:18)

Yeah.

Nathan Toups (07:40)

has to have those characteristics. Because you really look at what this book is about, it's a deep understanding of the technologies, but it's a deep understanding of what the business needs for trade-offs. And I think that that's an area that it's like, yes, sure, I'm really glad you understand how Kafka works. You should, if you're doing distributed event stream stuff. But when is that actually servicing the needs of the business processes? And that's what an architect has to do. An architect really has to understand,

Carter Morgan (07:53)

Right, right.

Nathan Toups (08:10)

Kafka's great, but it adds complexity in this area. Or there's this semantic coupling, which is something that they bring up when they keep iterating in this book. You will never reduce semantic coupling. All you can do is actually make it worse. You can make the other types of coupling harder, but the semantic coupling, and we'll get into what these things mean, is one of these of constants. And so with that constraint, how do I design a system in these other domains? And this is the kind of...

maturity to understanding how to navigate stuff that you just you don't get just from like an algorithms book. You don't get from a how to do well in your next interview type of book. ⁓ This is the real kind of I don't know. This is the hard parts. This is the stuff that like you come in with the best of intentions and you walk out and it's what is that one where it's like it starts with the tail end of the horse and it's like the horse that you drew. Well, the hard

Carter Morgan (08:46)

Right, right.

yeah, yeah, yeah.

Nathan Toups (09:07)

Software architecture, the hard part is literally, are you actually capable of drawing that nice horse? Can you implement the thing that's in your mind? And I think that's also what you're talking about with agentic stuff. Any of us can write a prompt, but it's like, do you understand how to do the dance? There's a dance that you have to do with these agentic tools. And it's like, how do I rein it in? How do I speak clearly? How do I delegate in a way that it's effective?

Carter Morgan (09:13)

Yeah.

Right.

Nathan Toups (09:35)

and reduces the burden of it just kind of going and losing its mind. All that's aesthetics. All of that is in these principles and these constraints. And the things that make you good in architecture, there's a lot of that that transfers over to how we build stuff with the agentic tools. And I think we'll get into that. One of the things that I think is kind of cool is this, what is it? ⁓ Orchestration versus

I think maybe that would be a good place to move into.

Carter Morgan (10:06)

Yeah.

No, I think so too. think that the one last general thought I want to give is that this book was hard to listen to in a good way because sometimes books like I'll I'll just listen to it and kind of be able to consume it. And but this book, like every other sentence, would have some nugget of wisdom where I'd have to pause and think about how it applied to my job. And then, like, I would just I kept pausing and then just like thinking about the stuff I was building at work and

Nathan Toups (10:27)

Yes.

Carter Morgan (10:36)

Anyhow, like very, very good book. Yeah. Let's talk about orchestration versus choreography. Let's take a quick break and then we'll be right back. Marshmallow three, two, one go. Okay. Orchestration versus choreography. So this is all in chapter 11, managing distributed workflows. And it's this idea of like, okay, how in a microservice architecture, because remember this book written in 2021 microservices are all the rage. And so it's in a very microservice-y domain. Although I'll say like,

Look, any system sufficiently at scale turns into microservices to a degree, right? So how are you managing communication between those services? And they say there's two types, orchestration versus choreography. Orchestration is this idea that you have one central orchestrator service. And Nathan, help me understand what this, so the orchestrator service

This is basically right, like the orchestrator service is gonna tell another service, you do a thing, and then that other service responds back to the orchestrator saying, I have completed said thing, right? And then the orchestrator kicks off the next.

Nathan Toups (11:46)

Exactly.

And a perfect example of this, if you've ever used Kubernetes, if you actually know the term for what a Kubernetes is called, actually like the core services, it's the container orchestration layer. So Kubernetes is the central place that reconciles a desired state with reality. It is that you put all your manifests in there, and it's now

Carter Morgan (12:04)

Yeah.

Nathan Toups (12:14)

talking to all of the things, the different services that you have in place and you're saying, now that's actually on your deployment layer. You also have business logic level orchestration. And again, you can think of it as sort of like a central conductor that kind of understands the state of the system as it should be.

Carter Morgan (12:33)

Right.

Yeah, and that's the whole difference between orchestration versus choreography. Those terms is like orchestration is like an orchestra. There's a conductor who, I mean, the conductor doesn't actually do much. And I heard a funny joke about that, which is like, it's like an orchestra. going, you know, they have this big performance and they, the conductor is sick and can't come. And so the second violin, she works up her courage and she says, I'll do it. I think I can do this. I will conduct the show. She gets up, she conducts the show marvelously.

Nathan Toups (12:39)

Mm-hmm.

Carter Morgan (13:02)

sweating, know, giving it our all. So proud of herself. Next day at practice, she sits down, the first violin turns to her and says, hey, where were you last night? Right? So I don't know.

Nathan Toups (13:12)

boy.

This is funny because I feel

like this is like when everybody's talking about whose job AI is gonna take, it's never their job. Like the first violinist is like, my job will never, it's like, it's the conductor that needs, and of course the conductor's like, ⁓ I can replace the entire symphony orchestra because I'll just have a robot play all of the part. It's everybody thinks it's somebody else, because they understand that their job's complicated, and I think it turns out everyone's job, pretty complicated.

Carter Morgan (13:23)

Yeah, right,

Yeah.

I know, I think actually to like give myself some assurance today, I'm gonna, I think just like in a completely separate window, I'm gonna spin up like a work tree. I'm gonna back up my branch back to right before I built kind of this unit of work yesterday. And I'm gonna prompt Claude, like I think a pretty intelligent product manager would prompt it, right? And I'm curious just like, yeah, you know, let's see what it develops. I might do that as a thought exercise today. I'll report back.

Nathan Toups (14:09)

Cool.

Carter Morgan (14:09)

know, choreography. ⁓ so choreography is the, not the opposite, but, but there is no central orchestrator. Right. And my understanding is this is kind of done in a very kind of like event driven way. Like you're emitting events to some kind of event stream and then the different services are consuming it. And so there's no central service that says, okay, you do this now that you've done that you do this instead. Every service just kind of emitting their own events and then they are.

Deciding how to act based on what events come in. Is that the right do I got the right read on that?

Nathan Toups (14:44)

Yeah, that's my understanding. And I think that one of the ways that I've done this pattern in the past, and again, know, roast us in the comments of I'm off, I typically reach towards orchestration when possible. But one thing with choreography that can be useful is ⁓ if you have a strong idea of state machines, right? So state machines are this idea that something you can map out sort of like the way that a process moves from step one to step two, if you have these really nice

Carter Morgan (14:57)

interesting.

Nathan Toups (15:14)

parts, you sort of this self-describing orchestrator that has no central authority to it. And state machines can be really cool. Actually, in the comments, or not in the comments, but in our Discord, we had a really nice discussion about languages in their type systems that have proper ⁓ enums and proper, ⁓ you know, these different sort of parts of the type system that allow you to have at compile time

know, constraints that give us really nice descriptions of state machines. And I think there are folks who get into this world where it just becomes super nice. Like, can lean on something like choreography if you have strong state mutation, where you understand, can go forward this way, I can go backwards this way. One state always exists, it's self-describing. But it is, it's a trade-off. if I have these

Carter Morgan (16:04)

Right.

Nathan Toups (16:11)

distributed systems and I don't want to have to centrally coordinate or it's too complicated to centrally coordinate. ⁓ The trade off being you have to have this like really strong idea of how do things go forward and backwards within the dependency graph.

Carter Morgan (16:26)

This is, I worry about this at my work. Just like we have switched to an event driven system. It's event driven. It is event driven, but we only have one backend. Really our thought with this wasn't like, it was, okay, if we ever did grow to have kind of like multiple teams, like event driven does give us that kind of flexibility. More of what we wanted out of it was like, how do we remove, ⁓ how do we remove things the user doesn't care about from the critical path?

Right. And so instead of like, um, like notifications are a great example, right? Like when someone does something and you want to admit a notification, we don't require in that path, we don't require the create notification or the, whatever the, the create post endpoint to also call that notification logic. All the create post endpoint does is admit an event. Like, Hey, a post was created that winds up on our event stream. And then we have a downstream consumer that then knows, okay.

I'm the notifications consumer. will consume this. And then really all they actually do is call an internal endpoint back on the backend, right? But the decoupled, we like kind of the decoupled nature about it. And it also makes things faster for, ⁓ you know, users, but it's not like, so like, it's not like orchestration where it's like, Hey, you do this. And now that you do that, I'll tell you to do this. Like it's probably more choreography.

Nathan Toups (17:32)

Right.

Yeah, absolutely.

Carter Morgan (17:53)

but as one guy dancing with himself. So I don't know, right? ⁓

Nathan Toups (17:56)

Right.

And I will say, like, you end up having to think a lot about state management, which, again, thinking a lot about state management is just not a bad idea in general. You really do need to, like, we saw this with ⁓ designing data-intensive applications. It's the edge cases that kill you, right? Like, if you just think about HappyPath, then, you know, those problems aren't that difficult. But you're like, okay, well, what...

Carter Morgan (18:17)

yeah, yeah.

Nathan Toups (18:25)

race conditions can come up or whatever happens if I'm in a, in any distributed system, for instance, partial failure states are thinking, you lose atomicity, right? You lose the ability to have atomic changes to your system because you're gonna eventually get to some eventually consistent aspects. Because again, you have the the general's problem where you're having to communicate and you're trying to figure out, get consensus and understanding. And so you have to have a very resilient state management system that says, the,

we agree that you can make a change to the system without coordination. ⁓ so, yeah, that's a.

Carter Morgan (18:56)

Right.

And

that's what, yeah, like, and that's what's really like, it's a trade off. Like that's the whole point of this book is, is, ⁓ that all software and our architecture is a trade off because what you get with orchestration is you don't need to handle that state management as well. And you don't need to have more sophisticated rollback patterns. ⁓ if, ⁓ if something goes wrong, but the trade off is that you're putting a lot of pressure on the single orchestrator service.

Nathan Toups (19:23)

Mm-hmm.

Carter Morgan (19:31)

And if something, if that goes down, then your whole business goes down and it's got to be able to handle basically like, it's got to handle all of the scale of all the other services combined. Right. And then, ⁓ and then there's also more coupling from like a team dynamic there because now it's kind of like whoever manages the orchestrator services, kind of a little like top dog, you know, and the other services while important, right. Kind of.

Nathan Toups (19:45)

Yep.

Carter Morgan (20:01)

have to constantly defer to how the orchestrator service does things. ⁓

Nathan Toups (20:07)

And a perfect example of this, again, if we go back to Kubernetes, Kubernetes being an orchestration layer, there are real scalability issues. They've had to do amazing bending over backwards so that you can manage thousands or tens of thousands of nodes in a single Kubernetes cluster. And then imagine you need to do multi-region Kubernetes. And then how do we communicate across those? Orchestration gets super difficult at scale. And there are even orchestrators that have

address this directly. HasheyCorp has an alternative ⁓ called, no, this is going to drive me nuts. What is the name of the orchestrator? HasheyCorp, yeah, it's like at the tip of my tongue. Console is their key value store. ⁓ Nomad, Nomad, thank you. So Nomad's like a simpler, philosophically simpler one, but it can actually handle much larger node networks. Now it's still an orchestrator.

Carter Morgan (20:47)

It's an alternative to Kubernetes.

Nomad, yeah.

Nathan Toups (21:06)

It's still the sort of central place that coordinates state. And again, if you think about from like change management for deployments and how services talk to each other and the service mesh registry, you really do need central coordination. You you need some authority that says this service is ready and healthy and happy. It can communicate with this one, right? ⁓ If you're going to have those completely decoupled. Now, the funny thing is inside of the processes,

And I think that this isn't like a one versus the other. There might be layers of your architecture that are orchestrated. And there might be layers that actually use choreography. choreography is nice because you can do embarrassingly parallel stuff. You can do really large things. You don't have to coordinate. So if you can find those little like, maybe you have some central orchestrator, but you branch off into whole domains in which they can operate with ⁓ choreography, that sounds like it could be a very pragmatic way.

Carter Morgan (22:00)

All

Nathan Toups (22:03)

of approaching things where certain things have centralized state management and the other ones, you have some sort of procedural state transition type thing.

Carter Morgan (22:12)

This is the kind of stuff I wish I had known when I at, I'm really enjoying being at a startup right now. It's been a ton of fun, but like you're just not dealing with problems kind of at this scale. And I kind of had only done, worked at like really big companies before this. And like, I wish I had known about these sort of patterns because it would have been neat to try to identify in like the domains I was working in, like, okay, are we orchestration or are choreography? Are we doing what you're talking about? Like, is there some sort of

mix between the two. And ⁓ I think that's why reading books like this or listening to this podcast, just the more you can kind of expose yourself to these concepts, right? Like, it helps. It helps as you're exploring your own domain naturally, like, ⁓ to have, like, we talked about this, I think, in the first episode, like, just some vocab for these concepts. It's surprising how much easier it makes to kind of encounter them in the wild and for them to stick. ⁓

kind of in the same domain as like managing distributed workflows, we have the saga patterns, right? Which is basically this idea that like, if you're gonna have lots of services communicating with each other, right? There's three basically dimensions along which that communication happens. And then depending on where you fall in those dimensions, you wind up in what they call a saga. They have...

And they give them names. They give them the epic saga, the phone tag saga, fairy tale saga, time travel saga, fantasy fiction saga, horror story saga, parallel saga, we're talking about that one, parallel saga and anthology saga. ⁓ I wanted to pull up just so that I have it right here. I was looking at this. So basically they say, okay, there's a couple domains that you can be evaluated on for these saga. Yes.

Nathan Toups (23:58)

It's this three-dimensional, and the

graphic for this is kind cool. It's a cube, and on the x-axis is Choreography and Orchestration. So you're either on the Choreography end or the Orchestration end of the spectrum. On the y-axis, it's Sync and Async, right? So you can be sync or async. Yep. And then you have the atomic versus eventual, and that's sort of the z-axis is your, what is that, eventual.

Carter Morgan (24:04)

Yeah.

Mm-hmm.

Communication. Yeah, yeah.

Yeah. Consistency

is the three C's. Yes, they got communication. Like you said, sync, async, consistency, atomic or eventual and coordination, which is orchestrated or choreographed. And so based on where you fall ⁓ there, you wind up as the epic saga or the phone tag saga or the fantasy fiction saga. ⁓ We talk, you know, there's not a book report podcast.

Nathan Toups (24:28)

Consistency. Yeah.

Carter Morgan (24:48)

But we do want to talk about some of these patterns in particular. We want to talk about the horror story pattern Do you want to talk about this Nathan?

Nathan Toups (24:55)

Yeah,

so the combination, right, so they'll talk about this and they'll refer them in the book. You kind of learn the shorthand as you get used to it. So, horror story is AAC, right? So AAC meaning asynchronous atomic in choreographed, right? ⁓ And so that's pretty much like the worst combination of those things. Like when we're talking about trade-offs, you now get a really, it's a horror story about reason.

Carter Morgan (25:06)

Haha.

⁓ no.

I'm just reading from the

book, it says, one of the patterns must be the worst possible combination. It's aptly named Horror Story.

Nathan Toups (25:27)

Right. Right.

Right. And again, it's because when we're thinking through these sagas, you want to think through, again, the tradeoffs. What are the risks that are involved? And yeah, horror story is, right, so the communication is asynchronous, the consistency is atomic, coordination is choreographed, we don't have a central. So what you're asking of the system is you need atomic changes.

Carter Morgan (25:56)

Yes.

Nathan Toups (25:56)

but you're not using

a central orchestrator, right? So like, first of all, that just seems like a conflict in terms, in my opinion, because now, if I'm letting everybody kind of do their own state management, but I also need to guarantee that it's atomic, and I'm using asynchronous communication, right? Because you could still get away with atomic consistency and choreographed coordination if I have synchronous communication, because I can still at least guarantee one thing happens after the other, right?

Carter Morgan (26:04)

Right.

Yeah.

Nathan Toups (26:26)

spinning a lot of plates, but I can still pull it off. But I'm putting this in this hard mode where I can't reason about the system. And so the complexity, the coupling they say is about medium because you've got some coupling and some non-coupling stuff. And asynchronous reduces the coupling rate, but ⁓ atomic increases it. then ⁓ the coordination, I mean, sorry, the choreographed part. Complexity, though, is incredibly high.

Carter Morgan (26:32)

right way.

Nathan Toups (26:56)

you really have to think through all of the edge cases of how a system can behave. ⁓ The responsiveness is low, meaning because it's asynchronous and you can't move forward in the system until you atomically have proven that whatever state transition is correct, you have to communicate a ton, right? You're constantly going back and forth ⁓ asking, I'm cool. Are you cool? I'm cool. Are you cool? And like over and over again. that the, so the responsiveness and the availability is low.

Carter Morgan (27:01)

right.

Nathan Toups (27:26)

and that the scale is like medium. you can't even, it's like even dealing with a horror story, it'd be one thing if you're like, well, but the scales to infinity, right? It's super complicated, but it scales infinity. It doesn't. It's like, you're to be stuck in this like ugly middle with this horrible saga pattern.

Carter Morgan (27:41)

Right, Yeah,

I mean, just to, you know, maybe give some credits to anyone who wound up here. say, why might an architect choose this option? Asynchronousity is appealing as a performance boost, yet the architect may still try to maintain transactional integrity, which has many myriad failure modes. Instead, an architect would be better off choosing the anthology pattern, and that is asynchronous, eventual, and choreographed, which removes holistic transactionality.

And I think this is just the sort of thing where it's like, you probably build this if you know just too much to be dangerous, right? I think if you're just building a completely naive system, it's just going to be a gigantic big ball of mud monolith, right? Which like, you know, there are lots of businesses that run on that, right? But then if you start thinking like, no, we got to go microservices, right? So we're going to build.

So I think it'd be great if we had microservices. think it'd be great if none of them had to be able to, there was no central coordinator, right? But then this is where kind of understanding your business domain becomes so critical because it's like, well, wait a minute, if we have really atomic needs for transactions, right? If we have a lot of like, this cannot happen until this has happened, or if this failed to happen, then, you know, then I've got to roll back this, then you might wind up here.

Right? Although I guess like, okay, so then we talk about the opposite, which is not the opposite, but just the same thing, but the anthology saga, which is where you have ⁓ asynchronous eventual consistency. So what are you doing differently there then? Because there are a lot of systems that work like this in the sense that like, I can't process the order until the address has been given.

Nathan Toups (29:28)

Yeah, so.

Carter Morgan (29:35)

Right.

Nathan Toups (29:36)

This is where things are kind of interesting. So two of the best sagas that they kind of talk about are the horror saga slightly tweaked. ⁓ So the parallel saga and the anthology saga, think both of these are worth talking about. So parallel saga, I'm asynchronous, I'm in communication. I'm using eventual consistency. Actually, so this one's got two knobs turned. I'm asynchronous in communication, so I'm the same with horror. I'm...

Carter Morgan (29:47)

Right.

Mm-hmm. Yeah.

Nathan Toups (30:04)

I moved to eventual consistency and I'm orchestrated, right? ⁓ And so what this allows me to do is I don't have to have atomic feedback, but I do have a central sort of authority that says, hey, everything's cool. So I can like, can send these things out to a bunch of like child, you know, or downstream pieces and I'm orchestrated in a central way. So I kind of understand the state of the system, but I also understand that like,

the way I'm communicating and the consistency that the entire system has is eventual and asynchronous. And so ⁓ you end up getting, know, what you end up getting there is low complexity. It's very easy to reason about the system. You understand the trade-offs and you get high scalability and high responsiveness, right? If I don't have to guarantee that a stale read isn't a problem, well, then I can respond really quickly from the parts of the system that are there. ⁓ And there's a lot of natural

stuff that goes to this. I remember earlier in the book, talked about, let's say you said you wanted to delete a user account. ⁓ And so the delete of user account, immediately what you want to do is have them log out of all sessions. They can't log in if they try to, you these things show up. But maybe you don't really care that if like every piece of their PII is deleted from the system instantaneously. Right? Like you don't have to delete all

Carter Morgan (31:15)

All

Nathan Toups (31:29)

personally identifiable information at the moment that the account's deleted, next account cleanup cycle, let's say it's within the next seven hours or something, is when it needs to be processed, right? Well, now I get this really clean, I get a highly responsive, highly scalable system that immediately tacks into the auth system, but it's orchestrated in that I have this, I understand exactly how a user delete step behaves.

Carter Morgan (31:33)

right.

Nathan Toups (31:57)

But the communication is asynchronous, and the consistency is eventual across all the systems that might tie into a user. ⁓ It's a nice pattern. If I had to guarantee that when I said delete my account, every ounce of evidence anywhere in the system, anywhere in the logs was deleted before I could come back and say that you're deleted, imagine being Facebook having to do that. There's no way. ⁓ Yeah. Yeah, the anthology saga, though.

Carter Morgan (32:20)

Right, Yeah.

Nathan Toups (32:27)

So this is the other one. So I think this one actually is one knob turn. So we're asynchronous. We are choreographed. So it's process-oriented, where we have these state transitions. But it's eventually consistent. The one difference is that instead of saying we have to atomically agree, we let the thing flow through the system. Eventually, it's consistent. ⁓ I think that in, and again, I'm embellishing a bit with some of my understanding. ⁓

Carter Morgan (32:29)

That's just one knob turned.

Mm-hmm.

Nathan Toups (32:57)

Things like directed acyclic graphs give us some natural ways to do this. If I have this feedback loop, and I'm mutating state and I can't reason about the system, you probably need atomic changes to that to make sure that you don't violate something. But if I have a directed acyclic graph where I know that the things can only flow in one direction, I can now use things like eventual consistency in those changes. And so therefore, I don't have to have a central orchestrator. I know that if it gets to the next state change, I can choreograph.

Carter Morgan (33:00)

Right.

Nathan Toups (33:27)

these through the system.

Carter Morgan (33:29)

This makes me think of Will Larson in crafting engineering strategy, because this idea of orchestration versus choreography, ⁓ I think it is the exact kind of, this falls right in that idea of high level engineering strategy, because as I'm thinking about this, you're like, okay, well, what trade-off do I wanna make? Do I want to put some team or some service in charge of everything and have that be the owner? Or do I want to really lock in

that state management and make sure that it is completely understood across multiple teams, right? There's a little bit of like the second one of like, you teach correct principles and let people govern themselves, right? But at the same time, like you need to be, there's something nice with like a central orchestrator that like code is the best config. And it's like, if it's there in code and it's operating, it's executing, then everything just has to agree with it.

Nathan Toups (34:02)

Mm-hmm.

Carter Morgan (34:26)

Whereas if instead you're saying like, okay, no, this is the theoretical state machine flow, you're buying yourself more flexibility and autonomy amongst the teams, but like, you've really got to lock in on that theoretical state machine. But interesting trade-offs.

Nathan Toups (34:41)

And

yeah, so here's the other kind of cool thing. The difference between the parallel saga and the anthology saga, right, both of them are asynchronous, both of them are eventual. One option is orchestrated and one option is choreographed. And so if you actually look at parallel, it says that the complexity is low, the coupling is low, responsiveness is high, and the scale is high. So that's great. That's a pretty nice thing. again, you're using orchestration. I would say

it's easy to reason about that system. Anthology, though, and you'll see this in super highly scalable systems, is that the coupling is very low, right? I'm describing these paths of the state machines and they operate independently, but the complexity is high. So meaning I really have to understand how my anthology choreography stuff works.

Right? For it to be asynchronous and eventually consistent, the process that things do and the reconciliation in that choreography, really you just have to do a lot of upfront thinking and the complexity that it has. But you get super high scale and elasticity, right? So I get high responsiveness and really high scale and elasticity. And this is why you'll see the, the, the Feng companies, you'll see some crazy service or

thing that they're doing, they'll start flowing into these sort of anthology patterns because they have to. You that you can't deal with tens of thousands of requests per second if you're having to have a central orchestrator making sure that there's no side effects or something.

Carter Morgan (36:22)

Right, right.

Let's talk just a bit about ⁓ contracts. That's what chapter 13 is about, right? Obviously, if you're going to have any systems communicating with each other, you have to have a contract between those two systems, right? ⁓ And there's roughly two types of contracts. It's strict versus loose. A strict contract is going to be something like gRPC, where you are literally like, your contract is almost like a piece of code in and of itself. And that if any service consumes you,

They also have to have that same contract. And so I've seen this where like, basically like when you build and ship and deploy your service, ⁓ you're also building and shipping in a separate module, you know, like, like an NPM package or whatever that GRPC definition, and then other services consume that. And then they can use the, that kind of GRPC shape and whatever language they're in.

And then there's loose contracts, which is like just key value pairs. This is JSON basically, right? JSON, YAML, mostly JSON, ⁓ which is like, you know what? don't care how you consume this. Like we're just going to give you kind of plain text and key value pairs. And ⁓ you can take as much of it as you want, or you can leave as much of it as you want. ⁓ Do have any preference between these two, Nathan?

Nathan Toups (37:50)

So it's funny, in the same world that microservices emerged in, we really started seeing things like gRPC pop up. And it became all the hottest because they were super compact. They were correct. You could even have method and function calls that were passed over the RPC layer. You understood exactly the data and the shape of the data. Of course, the trade-off is you can't interpret a gRPC ⁓

binary payload unless you have the correct ⁓ consumer for it. And it was actually ⁓ when Neil Ford came on the podcast, it didn't click to me about how JSON is a superior sort of loose coupling mechanism. a lot of, I was like, yeah, if you can create these types and you have gRPC passed around, it's so nice because you just have correctness at compile time and all these things.

Carter Morgan (38:23)

Yeah.

Nathan Toups (38:49)

But it's like, yeah, well, you're tightly coupling these two parts of the system. And that might be what you want to do. But if you actually just, if your contract's JSON, ⁓ you can add more JSON fields. As long as you're not violating the old JSON fields, if you keep those in the same shape, I can add more JSON fields. And my consumer just parses the JSON that makes sense to it. I take a JSON object, and I have, let's say, five fields that I'm pulling out of it.

and I'm putting them in a struct in Go or putting in some data structure type script. I can just disregard the other fields. I don't even have to worry about them. And that gives me loose coupling because as the producer, I can come up with new fields, and then can go out and say, hey, we actually have some extra enrichments in our JSON, and you should start using it. And we'll support this moving forward. This is a contract, a loose coupled contract. If I'm doing gRPC,

and I make a new version of my gRPC producer, I either have to have something like a monorepo that goes out and automatically updates the consumer, runs it through my fitness functions and tests, and I find out at build time whether I've broken something or not, which is one way of doing it, but it's tightly coupled, right? I have a tightly coupled system. Or we just make sure that we have good contracts that says, hey, I never break your expectations ⁓ on the other side.

Carter Morgan (39:57)

Right.

Nathan Toups (40:16)

I've kind of come around to being like, you know what, I don't love JSON, and there's a lot of reasons to not love JSON. But JSON's also kind of great in that you can, there's a reason companies like Stripe or public APIs use JSON by default, right? JSON payload, a lot of REST servers use JSON for this, this just structure, sort of like structured system. And so I...

Carter Morgan (40:34)

Thre-

Nathan Toups (40:45)

I guess it really is the trade-off is if you have no control of the consumer of your data, you probably want something that's loose. And if you want full control and it's really important that nothing is ever out of order, you might have a strict contract on the data that's in place.

Carter Morgan (41:05)

I freaking love JSON, which like, like, and, and again, like, you don't want to be like, this one is just better. But in general, I like loose contracts. I like this idea of a system just saying like, this is all the data I give you, you know, do what do with it what you want. Right? We, we were looking at that, like, I guess there's a there's something called TRPC, which is like a TypeScript implementation of RPC and

Nathan Toups (41:07)

Yeah.

Carter Morgan (41:30)

had a junior engineer who's a little hot on it. And you know what, for our use cases, it probably would have been fine just because we're just communicating with ourselves. I'm not really sure what this is. It's one thing where because of our use case, the pros and cons of either approach are kind of all diminished. Because TRPC gives you that really strict ⁓ coupling. I can confirm that the data I'm receiving is in the shape of

that I expect, but it's also a little like we're one team, right? Like it's fine. Like we know what we're getting. It'll be interesting to see, you know, as we evolve what that turns into.

Nathan Toups (42:13)

I think that there's two sides to this too, which especially if you start using the ThoughtWorks, Neo4j and team methodology here, a lot of this can be addressed with good fitness functions, good contracts. And I would actually argue that one of the jobs of a software architect is to enforce the standards that you've declared. ⁓

Actually, I haven't built this yet, but it's something I've been of toying with the back of my mind. We've had this issue where we're building spec-driven development, or you're writing ADRs. I really want a code reviewer that's not just a code quality reviewer. I would like an architect agent that all its job is is to enforce what the declared arguments in the ADRs are. Literally, it goes through all the ADRs that puts it in the context window. Now that we have million-token context windows, this would not be hard.

Carter Morgan (42:56)

Right.

Nathan Toups (43:10)

go through a technical documentation declaring what we value. We like loose coupling. We should default to JSON, right, or whatever, and then do a code review as the architect saying, did this, you ⁓ you've introduced TRPC, but that actually violates our value of we use loose coupling, and this is why, right, and put that in the code review. I think that would be so valuable. It'd be like a really interesting way of having, again, it's a way of kind of having this sort of perspective of,

you've declared that we should either write a new ADR saying, in this case, TRPC is appropriate, and say why. Or we should reject the TRPC introduction because it adds complexity and we say that we like simple JSON. And ⁓ that's really what contra- the best engineering team I've ever worked on, I've talked about this before, that Fintech company, which is really excellent talent. We used, wasn't microservices, but it was a service-oriented architecture that was

Carter Morgan (43:52)

Right, right.

Nathan Toups (44:10)

microservice-like. And we had the strongest contracts I've ever seen in any organization. And it allowed us to move so quickly. Yeah, ⁓ one of them was that you never break your SDK. So we had an API. So we did some kind of interesting things. We wrote the APIs for a service. And that same team owned an SDK for that service. Because our user was a data scientist.

Carter Morgan (44:17)

really?

Nathan Toups (44:39)

And so they didn't need to know every aspect of the API. What we did want to give them is a Python library that helped them think about the data in the way that they're thinking about it. So we would write these helper functions and all these other things as the SDK. What was cool about it is that we had full control over the API and the SDK, so we could introduce a change to the API pretty aggressively because we also owned the SDK change. Now, we had to support backwards compatibility.

meaning if they didn't update the SDK in their code base yet, the API change could not break an older version of the SDK that was still running somewhere. And so we had these tests that basically said, you are responsible for all actively running SDKs. Now, we weren't building stuff for the public. again, we could take control over deprecating an old version of the SDK, but it was the API team, this job was to move them into the new version of the SDK, make sure things aren't breaking. They would kind of enable that team.

to move forward, even writing code sometimes. And because of this, we just had this really amazing hyper stability, hyper let's think about the structure of things upfront so that we could minimize, oops, let me change this, let me change this. We would introduce new features that were not breaking changes. And then we had, I mean, was just like, there was, when I realized there was this contract first sort of approach to engineering.

Carter Morgan (45:55)

Right, right.

Nathan Toups (46:05)

There was just peace of mind. We had these high demand systems that were trading millions and millions of dollars worth of ⁓ portfolio stuff for clients. So like our on-call schedule really mattered, the data pipelines, how the service availability was in place. And yet I think I can count on one hand the number of times I got a call in the middle of the night, right, from being on call. ⁓ It's just that the code quality was super high. We get these huge debates about

you know, how a query should be optimized and things like this. And we had that because it was just the quality of the contracts were so high. So I'm a huge fan. It actually, think it's worth being pedantic about like arguing what is production ready or what is provided by this API service and what is, what do I guarantee will never change? What do I guarantee? What do I say is not stable? So that's actually at a risk to the user, ⁓ you know,

Carter Morgan (46:53)

Right.

Nathan Toups (47:04)

And then how do we write fitness functions that say, ⁓ or detect that I violated a contract, right?

Carter Morgan (47:11)

Well, let's take a quick break. And when we come back, we're going to talk a bit about ⁓ data and data warehouses and all that. Marshmallow. Three, two, one, go. OK, we're back. Let's talk about ⁓ just managing data. This chapter 14 is called Managing Analytical Data. Long time listeners of podcast will know that data engineering is not my forte, nor is it especially my interest. But Nathan, you said you've actually been doing stuff like this lately.

Nathan Toups (47:39)

Yeah, so ⁓ my biggest client right now, we're actually in the middle of doing a pretty big Databricks ⁓ deployment. the big goal here is to have a self-service data for our analytics team and some of the other folks on the business and development side of the company. ⁓ things have changed a lot.

Like the expectations, number one, of consumers have changed a lot, but also the patterns available to us. We used to do this thing called the data warehouse, ⁓ which, you know, it's really funny. This book just like craps all over. It's like, this is the worst thing ever. And it's funny because like, I guarantee you that ThoughtWorks was implementing data warehouse patterns like, you know, 15 years ago. And then we get into what's called a data lake, which is still a very popular

Carter Morgan (48:19)

Hahaha.

Nathan Toups (48:34)

pattern, and of course, then they move us into this idea of a data mesh. And I guess I can kind of breeze over the data warehouse was this kind of like strong place that you would hold analytic data and typically analytic data. probably columnar data, probably stuff where you're running big reports. And when we're talking about analytics, you're also talking about like what they call denormalizing the data, right?

Normalization, if you've ever taken a database class, I always want a single source of truth for anything, right? So I map out my one to many relationships, and I always am like, OK, here's the ID, and here's all the joins, and here's how I build out my BigQuerys. you denormalize the data. I'm sorry, you normalize the data so that you have these complex structures that you can ask the system itself, and it can construct whatever view or whatever.

⁓ of the data that you want. A lot of times in data warehousing, you will end up making really in data mesh type stuff. end up doing a lot of times, you will hear this thing called the medallion. Have you heard of the medallion approach? This was something I was not super familiar with until relatively recently. So different, like Snowflake has their own names for some of this stuff. ⁓ Databricks has their own. They will talk about bronze, silver, and gold. Bronze is like

Carter Morgan (49:46)

I haven't. No, no.

Nathan Toups (50:01)

structured data, it's super close to the source. And then silver maybe has some data cleanup and some enrichments and some maybe more strict enforcement of data types and things like this. And then gold is sort of like what you would imagine a denormalized view. So it's like, you can just use that table itself. And it's got exactly what you would need for like a report or some time series stuff or some other things. And there's these methods, methodologies to like how we

pipeline this data from one place to the other. ⁓ It used to be that you have to use these big databases. Now everything's moved towards typically blob storage, right? So we throw a bunch of data, maybe probably even unstructured data into S3. And then we have all these really cool serverless technologies that can go in and query across huge data sets, right? We're talking about terabytes and terabytes or petabytes even of data and do big analysis. ⁓ And it's kind of...

It used to be that you spent all this time and energy making sure your data warehouse was perfectly shaped and optimized and things like this. And it turns out that that didn't scale very well, especially as your teams get all over the place. And this department and that department and that department need stuff moving back and forth. And so we've now moved to this pattern called ⁓ the data mesh. And the data mesh, the purpose of it is to have a self-service data platform.

⁓ that you end up having these like data as a product. I might like, for instance, we're doing insight stuff. we have like, we have an internal CRM, like a content management system, the customer relations management system. ⁓ There's a lot of stuff inside of the business that people are interacting with that system every day that we need to do analytical insights on. What's a trend? How quickly is something happening from this thing to that thing, right? We need to know the event in time. We turn these things into sort of like,

with the whole temporal data sets where we actually one record everything, but we record it as a change over time. And then in the analytics warehouse, I might ask questions, these temporal sort of questions. But I also might have public data sets. Like, it'd be really nice if I had Fannie Mae's data set or Realtor.com's data set. And so we actually have these ingest pipelines that pull those in and kind of keep it in that structure. And then maybe we have some

ways that we like to think of that data that we might put into a silver or a gold layer, where it's like, OK, yes, you can always get to the raw data. Maybe an analyst wants to do that. But we actually want a refined version of the Realtor.com data. And we're going to produce this as a data product. And this data product sits in our warehouse. And people can use this. And they don't have to do 15 joins across 25 tables and pull a bunch of stuff in.

just to do this thing in the shape that they're looking for. But if I create a gold, that's a product now. There's now a contract where we say, OK, we're producing this on this interval. You can rely with this level of reliability. ⁓ And the team that builds that thing owns that thing. We now have this domain ownership of data. And so that's this idea that ⁓ I have a data product. We produce this thing. And you can think a lot of it like,

kind of like building an API, except what we're doing is building some deep, rich data set. ⁓ And the technology's fundamentally changed. I was doing a bunch of Airflow-based data pipelining stuff, I'm trying to think now, 12 years ago, I guess? Yeah. No, more than that. 14 years ago. I'm an old man. So this is 14 years ago. I think we were some of the early users of Airflow ⁓ at this ⁓

research project that I was working on that turned into that quantitative hedge fund. ⁓ we were just trying to figure out, you know, we were using Redshift and big columnar data warehouse technologies, and we saw the trends changing over time and the expectations changing. you know, there are technologies that exist now that didn't exist before. ⁓ Some of it is a madhouse, but it's kind of weird. It's like we now have systems where anyone can actually pile in data and make the transformations

self-service, like in their own thing. And then if you come up with these quality standards, you then can make this data as a product idea. you have this centralized way of thinking about how do we store data and transform it. And then because we all agree with how this kind of feels, ⁓ you can self-service. can go look through. You literally have, in our case, we have a catalog.

of, and we have these naming conventions, and you can just go, ⁓ that's a cool data pattern. And then you go, who owns this? And you're like, this team over here. And I can go talk to them and be like, hey, ⁓ is this a reliable data set? I see this in our staging environment. yeah, cool. Great. I'm going to start building my own data product based off of the outputs of these other data products. And so it actually fundamentally changes what was possible before. You don't have to spend months planning.

you can kind of do a lot more ad hoc stuff. ⁓ yeah, it actually takes a lot of the principles. Like anybody who's a software engineer and has been building RESTful APIs, this might feel really familiar. You're like, why are you acting like this is like a big exciting thing? But this is not how data warehouse has worked for a long time. Yeah.

Carter Morgan (55:38)

Do you think there's,

do you think we've arrived at kind of the final paradigm or as you've been working in this is there like, this is a pain point.

Nathan Toups (55:44)

I hope not. mean,

like, you know, you know, when we invented the cassette tape, did we arrive at the final paradigm of like consumer audio? Like, no. ⁓ I do think that it is allowing folks. We're in a world full of data and we're in a world full of data that like using traditional tools was just overwhelming. You could not safely.

Carter Morgan (55:51)

Right, right.

Nathan Toups (56:11)

enable all of the ways that people need to process data in these big data systems. And these new patterns, think we're continuously seeing them evolve and change. I'll say even with like, again, with Databricks and with Snowflake, they're reacting to, well, now people want agentic stuff sitting on top of their data, right? How do we make it so that we can run agents that are going in and doing deeper work? so like Databricks has this entire thing where you can have like,

large language model runtimes that sit on top of your data warehouse and can do, you know, ⁓ rag models and, you know, all the other stuff that's like very like domain specific understanding. Well, that's going to change how you structure data over time, right? Like the optimizations that are in place there. ⁓ But of course, they keep building on top of stuff. Like I do think that data mesh is, it'll be a foundational thing, right? Like I don't think that we're going to go away from that. In this idea of data as a product,

It's super cool. We've already seen it. I've only been on this project for, I guess, a little over two months now. And it's transforming how we even have conversations about stuff in the organization, which has been cool because they've been burned a few times. They've had a couple of initiatives in the past where they were like hoping. And it's kind of exactly what they bring up. They talk about some of the like data warehouse and data lake stuff ⁓ where it kind of solves some problems, but then it causes a bunch of new problems.

And ⁓ I think that the data mesh methodology is actually given us, because again, we're working with a lot of partners. We have a bunch of departments ⁓ that have their own needs. And what I'm always constantly looking for is like, what is a data product that's generally usable? Can we think of the boundary? And again, we pull in, there's like a real estate component to this. So we'll pull in Zillow data and we have

Carter Morgan (57:40)

Yeah

Nathan Toups (58:09)

the raw data that's straight up the queries that come from that and we store it because we storage is cheap. Then we do bronze, which is the tabular format that we care about. But actually, most of the columns are like strings. Like we don't even do validation because we want it as close to the raw as possible. And then we do a silver layer, which again, does typecasting and data integrity stuff. But because we have all of these things, get

Carter Morgan (58:33)

Right, right.

Nathan Toups (58:38)

reproducibility, we can see exactly when we thought. If we've correct some data, we don't actually overwrite it. We actually have an as of column. We just put another row in there and we say, OK, what was the truth as of this date? We have these really nice data structures in place so that if they go back and go, why did we make this business decision six months ago? And we go, it's actually because the data sucked.

Carter Morgan (58:46)

Yeah, it's just another layer, right?

Yeah.

Nathan Toups (59:04)

this is how

we thought reality was, and we actually fixed this and moving forward, but we still need reproducibility. If you don't have the ability to go back in time and say, what did we think was the truth on January 1st, 2026, again, you start running into these, can't reason about the system problems, right? Yeah.

Carter Morgan (59:25)

What

a, I think that, you know, like I said, this is not my forte. ⁓ I really enjoyed hearing that. That's bronze, silver, gold concept, I think is really, really interesting. right, right.

Nathan Toups (59:36)

Yeah, they call it the medallion model. Yeah,

it's rabbit hole that I've gone down. And again, I'm ⁓ not a data engineer at the caliber of some of the data engineers that I know, but I've done enough software architecture with data engineers, and I've been keeping track of the newer trends. It's been really exciting, actually. The world of data engineering,

and where applications are developed and how we solve business problems are like blurring more and more for sure.

Carter Morgan (1:00:10)

Yeah. Well, let's talk, we can wrap up with kind of their wrap up, chapter 15. And that's kind of where your quote that you shared at the beginning, like, there's no secret, you know, it's, are your architecture is going to be unique enough that you have to do the trade off analysis yourself? I liked, about avoiding snake oil evangelism. And Nathan, you said that this resonated strongly with you. So why don't you introduce the concept of what snake oil evangelism is?

Nathan Toups (1:00:16)

Yeah.

⁓

Yeah, snake oil and evangelism, right? the idea, and I think we talked a little bit before this episode starts. I'm really kind of cool. I'm excited where Carter wants to take this conversation here. ⁓ So it's really easy if you get excited about technology and especially if it's new and you're like, man, the old way of doing stuff was terrible. This new way is way better.

Carter Morgan (1:00:41)

Yeah.

Nathan Toups (1:01:05)

It's very easy if you have a persuasive personality to evangelize some new thing. All the jokes are on MongoDB, replacing MySQL or Postgres. There were some evangelists who came in and was like, imagine a database where you don't have to think about schemas and we can just move super fast. They push all these cool ideas like, yes, schema migration suck, and we should just be able to write JavaScript and shove it in somewhere and query it out and bring it in.

Carter Morgan (1:01:14)

WebScale.

Nathan Toups (1:01:35)

And they evangelize this and what is missing? A conversation around trade-offs, right? There's no conversation around why did this new thing get invented and what are we losing by not having the old thing? And if you come in and you use your clout as an architect or you use your clout as like a top software engineer, evangelizing some technology,

Carter Morgan (1:01:40)

Right, right.

Nathan Toups (1:02:02)

you really need to take on the burden of owning the trade-offs, right? Otherwise, you can really get yourself into a bind or the business, even worse, right? The business in a bind because they've adopted something without doing a sober analysis of the trade-offs and you get stuck. You get in a really bad situation.

Carter Morgan (1:02:20)

This is just something I believe in general about life like I mentioned on the podcast I'm regrettably a bit of a politics junkie and but I feel that with like public policy too. I'm always skeptical of any Pundit who does not acknowledge the trade-off of whatever they're proposing and you and you know Just like with all trade-offs you might believe that what you're proposing that the trade-off is entirely worth it ⁓ But there's got to be a trade-off right and I think we see we see a lot of this

Nathan Toups (1:02:43)

Yeah. Right.

Carter Morgan (1:02:50)

in the industry and it's just with any hype cycle and the current hype cycle is large language models, right? And I feel like people are not talking enough about the trade-offs of using large language models, right? Which is like, yes, you can write code and you can ship it faster. There are trade-offs that come with that, right? You are trading a more detailed understanding of your system. You're trading a more detailed understanding from your engineers. mean, I just... And these trade-offs might be worth it, right? But like...

I was in a lot of front-end work yesterday, And a front-end is, I know enough front-end, right? But I would not say I'm particularly gifted at it. I have been pleased with the code that Chat Cheap, or that Cloud code has been generating. And it certainly is writing better front-end code than I think I could have written. ⁓ But with it comes this trade-off that like, I don't understand the system as well. There's something about kind of beating your head against the wall until you figure out, now that's how you do that thing, right?

⁓ Maybe it is a maybe it's trade-off worth making I don't know but it's it's a trade-off and I Don't know it's like Yeah, I Again, I'm just so skeptical of any voice that is not acknowledging the trade-off

Nathan Toups (1:04:05)

Yeah, the,

and I will say that like, you know, overzealous evangelism is, I think it's still have purity of intention, but you are underselling the trade-offs, right? So that, I would say that you're like, there's a purity in the sense that like your intent is good. But there's, the other part of this is the snake oil side. And I think this is the part where you're actually a con man, or you're charlatan, and you're also using your

Carter Morgan (1:04:28)

Right, right.

Nathan Toups (1:04:34)

of evangelism, but you're intending to mislead. And that's the part where I do think that there's a lot of that going on in the AI LLM space right now. ⁓ We saw this with serverless. Remember when everybody was like, yeah, you don't need to run any servers anymore. Everything will be serverless. It'll scale to zero, and you'll save a bunch of money. Turned out, some workloads are great for that.

Carter Morgan (1:04:40)

Yes.

Right, right.

Nathan Toups (1:05:02)

Cron jobs, for example, or something that's sparse, where I really don't need to run it, but every once in while I burst and do a bunch of stuff. There's trade-offs, like cold boot times and these other things. But there's also the large English model, which is like,

Carter Morgan (1:05:04)

Right, right.

Nathan Toups (1:05:22)

we're being oversold and I think it's really a problem with leadership where leadership is being fed snake oil. Hey, if your engineers use this tool and they don't become 10x more productive than they were before, then you hired the wrong people, right? Well, that's snake oil. I don't know anyone who is consistently 10x, this idea that I could do a year's worth of work in ⁓

Carter Morgan (1:05:42)

right.

Right, right.

Nathan Toups (1:05:52)

you know, in three days, right? Like, and that's, you know, that's 100x. Yeah. A month's worth of work in three days, right? That's crazy. And so, like, I... And I will say there's certain types of tasks. Like, I was recently doing something where it was really nice to think through some Terraform, where I needed to do something that I was, like, not looking forward to on my own, and I wanted to see if the subagent model could, handle it. And it did.

Carter Morgan (1:05:54)

Right, right.

That's 100X, but still a month's worth of work in, or a year's worth of work in a month, roughly. Yeah, or yeah, exactly.

Nathan Toups (1:06:21)

a phenomenal job. And again, it was very specific constraints. had really good fitness function testing to go around it. And it just saved toil on my side. It was something that would have been pretty hairy to do. was a green field terraform, so I didn't have to worry about production going down or anything. ⁓ And I was like, ⁓ this is so nice. But then I've also had it where I'm getting it to update some documentation. And I think there's a report that recently came out. was Microsoft had a report that their research team, and they

It's full of euphemisms, so they're trying to make it not sound that bad. But Microsoft basically said that large Zing image models are just completely screwing up about 25 % of documentation when you have long-running context windows. And I think we've all experienced this, where I'll get some documentation generated from some work that I've done, and I'm like, this is pretty good. And then I'm like, OK, we did a bunch more work. I get it to update the documentation. I don't really look at it as closely that time.

Carter Morgan (1:06:53)

Right.

Dang.

Nathan Toups (1:07:20)

And I do it again, update documentation. Then I look at the documentation later that day and I'm like, what are you describing right now? This is not what we built. Or it'll even reference the crappy documentation and be like, yeah, well, we're adding underscore flag to all of our Boolean values. And I'm like, I didn't even ask for that. What are you

Carter Morgan (1:07:20)

Right, right.

Yay.

Yeah.

Nathan Toups (1:07:38)

my mistake. And I'm like, get out of here.

Carter Morgan (1:07:40)

Yeah.

I like beat it into submission the other day. I was like, it was going back and forth with me and like I was convinced like my approach was right. Like we should really do it this way. And it said like, fine, we'll do it your way capitulating like capitulating like what is wrong with you?

Nathan Toups (1:07:44)

you

Yeah,

I've noticed that Opus 47 is way sassier and I was like, I don't like your tongue. I feel like an idiot anytime I'm talking.

Carter Morgan (1:08:00)

Yeah, I know. It's funny. Yeah.

But it goes back to Carl Brown's point, which like this is not hardware, it's software. like this, like, and I think what Anthropic is doing is, cause I do like this about Claude, like chat GPT is just so sycophantic, right? And I think people do like that Claude will push back a bit, but I think they kind of dialed in a little too much to that. But also to talk about in terms of trade-off, like,

Nathan Toups (1:08:13)

Right

⁓ I hate it.

Carter Morgan (1:08:30)

Anthropic is all about like, we're just shipping so fast. We don't write code anymore. And I also, I'll just say this again. I hate when people say like, I don't write code anymore. I'm like, well, that's like saying like, I don't walk anywhere. Like I drive most places. But like I'm driving. so like, talk to me about that. Talk to me about how much code is being produced completely autonomously. Because like that is a completely different thing from a software engineer directing.

Claude code, because I don't write hardly any code by hand these days, right? It's almost all generated, but it's me doing the generation. Anyhow, but it's so anthropic. They're like, oh yeah, and we're shipping so fast. We don't even review our code. And then it's like they just published this postmortem yesterday about how they completely screwed up Claude code. Did you see this, Nathan?

Nathan Toups (1:09:07)

Yeah.

I saw that it was out there. didn't get to take a look at it closely yet.

Carter Morgan (1:09:27)

Yeah,

it's like a three, it's worth bringing up like Anthropic.

Nathan Toups (1:09:32)

Yeah, I'm

sorry. We vibe coded everything, and now we don't understand how anything works anymore, right? And it's basically that. right. Yeah, well, again, because I think this is that regression to the mean. This is, great. ⁓

Carter Morgan (1:09:37)

That is basically what it was, right?

And now I can't read the report because their website doesn't Okay, here it loaded

finally. Yeah, it's worth just bringing up. Basically say they switched it to reduce reasoning effort from high to medium to reduce long latency, but they didn't tell you as they were doing that. Added a system prompt to reduce verbosity, but that hurt coding quality, right? But then also, this is the big one.

We should have to change the clear clods older thinking from sessions that have been idle for over an hour to reduce latency when users resume those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made clods seem forgetful and repetitive. So basically, like, if you've been idle for over an hour, they thought, OK, we'll clear that history. Even then, I'm not exactly sure why you'd want to do that.

Nathan Toups (1:10:29)

Yeah, and I definitely got

burned by this because I, know, maybe it's my undiagnosed ADHD, but I will have these long lived sessions where I'm just like, I've got things that I'll queue up that are appropriate for cloud code across several different domains. And then I've got like my primary task that I try to stay focused on. And I kind of just like, you know, play traffic cop on some other things that are happening in the background.

Carter Morgan (1:10:40)

Right.

Right, right.

Nathan Toups (1:10:59)

But I think I've noticed the same exact thing where I'm like, how did you forget what we're talking about? So defensively, I've had to be very aggressive about writing markdown files of stuff that's in process. I've also been using, I think I've mentioned this in Discord, I started using a plugin called Superpowers. I don't use that many plugins ⁓ or skills. Have you used Superpowers? Have you looked at that one?

Carter Morgan (1:11:28)

No, remember

my team is really hot on them.

Nathan Toups (1:11:32)

It's impressed me. So it's like a lighter weight version of spec-driven development. ⁓ And it actually does a good job of breaking things out into subagents. it's in the official ⁓ plugin repository. for certain types of work, I found it really nice. But again, I'm very apprehensive about putting in too many plugins, because I like to have

Carter Morgan (1:11:35)

Okay.

Nathan Toups (1:12:01)

pretty close to bare bones, Claude. I have a user-level agents.md or Claude.md, and then I try to keep a decent Claude.md per project. ⁓ But try not to do too much. I'm really kind of sensitive to making sure that my context window doesn't get too big, and I feel like if you have too many MCPs and too many skills and too many rules, ⁓ it just...

Carter Morgan (1:12:03)

That's me too.

Right, right.

Nathan Toups (1:12:30)

I don't know, the thing that drives me crazy, everyone's gonna be like, hey, just like John Oosterhout said, you should use deep modules or something. And I'm like, you're freaking me out right now. wasn't even talking about it. It's like somewhere in its memory, it's heard that I liked John Oosterhout.

Carter Morgan (1:12:37)

Yeah. Yeah. Yeah. Yeah.

Well, that's a but again, kind of like with with Anthropic is just a little like like I'm sorry, but some of these bugs you're talking about shipping like this is unacceptable for a company that is rumored to be worth 800 billion dollars and pretty much every professional software engineer uses like that's insane. And so when we talk about trade offs, it's like

Yeah, you're shipping faster, but whoops, you accidentally broke your premium product for all your users for a month,

right? And so, again, I appreciate this book's theme of everything is a trade-off. You have to evaluate those trade-offs. And I think this is the most important concept you need to really internalize to become a senior or principal level engineer, is that everything's a trade-off.

And yeah, know, like, vibe coding is trade off. you're just a hotshot, it's like, I found the one true way to program, the one true way for architecture. Like, I'm sorry. Like, I just think there's no easier way to betray yourself as a non-serious thinker. So, you know, if you're listening to this and you're thinking like, that sounds like me, like, it's time to change, right? And reading this book is kind of a great...

Nathan Toups (1:13:39)

Vibe coding is a trade-off, right?

Carter Morgan (1:14:03)

First step, again, maybe we can just jump to book recommendations first. Just because I'd say read this book. Again, I really recommend Fundamentals of Software Architecture before this one. It's an excellent, excellent book. ⁓ But aside from that, just think, yeah, anyone who's kind of trying to become a senior or principal level and you're feeling like, I don't just want to take tickets. I don't just want to implement features. I want to be kind of in the room where it happens and making these bigger decisions. Or if you're already in that room, like,

Nathan Toups (1:14:11)

Mmm, for sure.

Carter Morgan (1:14:31)

This and fundamentals, like great books and really, I think that the first stops to learn how to make those decisions. How about you, Nathan, who would you recommend?

Nathan Toups (1:14:42)

Yeah, I'm in same bit. You and I are in complete agreement. It seems like you and I are in complete agreement, especially when it's a ThoughtWorks, Neil Ford, Mark Richards type stuff. ⁓ I think this is super strong book. It's absolutely advanced topics. I think you definitely need the fundamentals of software architecture under your belt and probably spending some time building systems and screwing up before you kind of appreciate.

Carter Morgan (1:14:51)

Rhyme rhyme.

Nathan Toups (1:15:12)

why these hard parts are the hard parts. I think if you're just like, ⁓ sync versus async and what the trade-offs are, doesn't make a lot of sense unless you've built something that's sync versus async and you're like, yeah, I really got in this like nitty gritty and it didn't behave the way I thought it would or things went sideways. ⁓ The Hard Parts, I think is a really great book for if you have a little bit of lived experience and you need a better vocabulary for expressing like,

How do I talk to my team and let them know that this really is something we should spend some time thinking about, right? ⁓ It's exactly what this book clips you for. And ⁓ yeah, I think it's super strong in that side.

Carter Morgan (1:15:52)

Well, how about you? What you're going to do differently in your career? For me, just, I feel like before we kind of settled on an event driven approach, I did not give a serious look at orchestration versus choreography. And even now, like I just want to look at kind of our pattern and be like, okay, like, like just kind of look at the saga pattern and just lock in and be like, this is the saga we're using. It's not going to change anything, but I just feel like as a responsible

architect, I should understand I should be able to explain the philosophy of our system.

Nathan Toups (1:16:23)

Yeah. And I'll tell you,

parts of your system might use ⁓ choreography, and parts of your system might use orchestration. You might find that there's certain critical components that aren't going to scale to crazy number of concurrent pieces. That orchestration will get you very far along. I mean, it's much easier to reason about. And ⁓ you can always move to a coordination system, I mean, a ⁓ choreographed system later. That is totally possible.

Carter Morgan (1:16:53)

How about you?

Nathan Toups (1:16:53)

I'm

gonna, what I'm changing is ⁓ I need to really keep evangelism in check. ⁓ Mainly that if I'm really, feel very strongly about a set of technologies that I really take that sober trade-off analysis first. And I say, hey, I think that this thing will actually be huge. I think it's a really important thing. think that this is the way that we're approaching the problem incorrectly. think if we introduce this new technology, this would actually be, you know, this is a,

10X or 20X for trial and investment. Here are the trade-offs, right? Like that should be the very next thing that comes out of my mouth and debate both the benefits and the trade-offs as well, right? Negotiate this with the rest of my engineering team. I already kind of go down that direction, but I think being more intentional, making sure that I have this upfront, mostly because I also want other engineers to have the same expectations, know, lead by example so that if other evangelists come to the team, I say, cool.

Carter Morgan (1:17:29)

Right, right.

Nathan Toups (1:17:52)

What are the trade-offs? Let's go through this, right? And they don't point back at me and go, you didn't do that, right?

Carter Morgan (1:18:01)

I think that about wraps it up for this week. I'm excited. ⁓ Next week we are doing Build an unorthodox guide to making things worth making. Have you read this? Okay.

Nathan Toups (1:18:10)

This is a great book. ⁓ The guy who invented, I read it in the past. I think when it

first came out, ⁓ he is the, the author is the creator of Nest thermostats. And he was the designer of the, he was on the team that built the original iPod.

Carter Morgan (1:18:23)

very cool, yeah.

Yeah, I know that's why he made Nest, because he just freaking hated thermostats.

Nathan Toups (1:18:32)

Yeah, so he's incredible

industrial design meets hardware meets constraints, figuring out how to build valuable products. We haven't done a business-y book in a while. ⁓ And he's a true engineer. yeah, I think this would be a fun book to read.

Carter Morgan (1:18:35)

Yeah, well this is great.

Yeah, it's been a couple months.

Very cool.

I'm really

excited. Bye bye. I'm going to do this one on audiobook. I have a feeling this one will probably translate pretty well to audiobook. Yeah. Nice.

Nathan Toups (1:18:53)

It's a great audio book. That's how I originally listened to it. And it's like,

yeah, it's just full of good wisdom.

Carter Morgan (1:19:00)

Well, thanks for tuning in everyone. As always, you know, if you like this content, stick around. We're going to keep producing it until, I don't know, we're 80, I guess. And then you can always look at our old content. ⁓ And you can always, you know, follow us ⁓ on Twitter at BookflowPod. I'm on Twitter at Carter Morgan. You can go to BookOverflow.io. That's going to show all our past episodes, the upcoming reading. Even has like, Nathan, you got all of our transcripts up there for like easy searchability. Like it's really, really cool.

And Nathan has worked with his consulting agency is at rojo about calm because he has newsletter there at rojo about calm newsletter Thanks for listening everyone. We will see you later