Ep. 69Monday, July 7, 2025

Is DevOps a Silver Bullet? - The DevOps Handbook

Watch on YouTube

Book Covered

The DevOps Handbook, 2nd Edition: How to Create World-Class Agility, Reliability, & Security in Technology Organizations

by Gene Kim, Jez Humble, Patrick Debois, John Willis, Nicole Forsgren

Get the book →

Book links are affiliate links. We earn from qualifying purchases.

Authors

Gene Kim

Jez Humble

Patrick Debois

John Willis

Nicole Forsgren

Hosts

Carter MorganHost

Nathan ToupsHost

Transcript

This transcript was auto-generated by our recording software and may contain errors.

Nathan Toups (00:00)

Caring about the process matters. Yes, engineers have to be responsible for some aspects of how you ship code, thinking about the stuff, but we don't want to overload them with the cognitive burden. And I think that's really the kind of thesis of these first two parts of the book.

Carter Morgan (00:23)

Hey there, welcome to book overflows, the podcast for software engineers by software engineers are every week. We read one of the best technical books in the world and an effort to improve our craft. I'm Carter Morgan and I'm joined here as always by my cohost, Nathan tubes. How are doing, Nathan?

Nathan Toups (00:35)

Doing great. everybody.

Carter Morgan (00:37)

Well, thanks for tuning in everyone. As always, like, comment, subscribe wherever you're at, share the podcast with your friends and coworkers on LinkedIn or on Slack. And yeah, keep listening. And ⁓ we don't have anything specific to announce just yet, but we do have new author interviews working through the pipeline. So stay tuned for those. We're excited to announce those as we come along and we're excited. We're doing a new book. ⁓ Moving on to a big one. This is, I don't know, Nathan, what'd you say?

Is this a very well known book? The DevOps handbook.

Nathan Toups (01:11)

Yeah, I mean, it's, I think it's a compliment to Unicorn and Phoenix Project. I think it kind of sums up those things, but yeah, it's, mean, it's in the list that people recommend for sure.

Carter Morgan (01:23)

Absolutely. This is, yeah, we're doing the DevOps handbook. Um, like you said, it's very much a compliment to Unicorn Project and Phoenix Project. It's written by, I know Jean Kim, because it's got four authors. We've got Jean Kim, Jez Humble, Patrick Dubois, John Willis, and Nicole Forsgren, who wrote the forward. How many, what's the overlap of the Phoenix Project there? Is that, uh, I'm looking it up right now. Who wrote that? Okay.

Nathan Toups (01:50)

These

are all heavy hitters in the DevOps community. And of course, that's my background, I think probably even more than you, Carter. So I've read a lot of this material and seen a lot of talks. There's ⁓ DevOps days, there's the Accelerate ⁓ work. Nicole also worked on Dora and Space frameworks, which are really important ways of measuring.

Carter Morgan (01:59)

Yeah.

Nathan Toups (02:15)

things, obviously Jez Humble and Jean Kim are prolific voices in the community as well. Yeah, there's a lot of wisdom in this book.

Carter Morgan (02:28)

I'm seeing only Gene Kim is the only overlap, at least, with the unicorn project. But let's go ahead and give quick some background on each of these authors. We've got Gene Kim is a technology researcher and author who founded IT Revolution, is best known for the Phoenix Project and his work studying high-performing technology organizations. We've got Jez Humble, who's a software engineer, researcher, and co-author of Continuous Delivery, who has worked at companies like Google and currently teaches at UC Berkeley. Patrick Trebois.

is considered the father of DevOps for organizing the first DevOps Days Conference in 2009 as a consultant who helps organization with their DevOps transformations. John Willis is a DevOps evangelist consultant with over 40 years of IT experience who has worked at Docker, Red Hat, and Chef and co-host the DevOps Cafe podcast. And Nicole Forsgren, who is a researcher and expert in DevOps who led the state of DevOps reports, co-authored Accelerate, and was VP of Research and Strategy at GitHub after her startup, Dora, was acquired by Google.

And then just the introduction for the book. We've got DevOps Handbook is the definitive guide for applying DevOps principles across entire organizations, not just IT departments, recognizing that technology is now core to every business. The second edition adds 15 new case studies from organizations like Adidas, American Airlines, and the US Air Force, plus new research and insights from Dr. Klof Forsgren. With over 100 pages of new content, it provides tools and practices for anyone working with technology to create organizational success. So we read the first.

two parts this week, where there's six parts, we're gonna cover it over three weeks. Nathan, give us your thoughts on the first two parts of DevOps Handbook.

Nathan Toups (03:59)

Yeah, this book is a great sort of more business centric, ⁓ more business centric text. If you don't want to read an allegorical story like the Phoenix project or the unicorn project, and you really kind of want to get all the all the major plot points in the form of something that's kind of more traditional ⁓ software engineering, software business type book.

So there's a lot of wisdom, there's a lot of really good case studies. This book is also like, it was right at my alley. mean, I'm from this world. It's been really cool to see this transformation. I will say I do have some concerns that maybe, I think DevOps has a bad habit of overselling what it can do. And I think this book is no exception to it. I mean, every example here is look, as soon as people apply these principles,

everything magically gets better. And I think that I don't think that DevOps can overcome certain organizational problems. And I think that's why books like team topologies and other things exist. ⁓

But and I would like to address those concerns as we as we go through it. I don't want to just like rubber stamp a book, ⁓ even if I'm especially if it's a book I'm a big fan of. think if applied correctly, this is really cool. And again, this is like a nice playbook. This is called a DevOps handbook and handbook to me feels like it's something I can hold in my hand. And this is more like a huge volume, right? This is going to take several weeks to get through. yeah.

Carter Morgan (05:25)

Yeah, yeah.

Yeah. Yeah. For me, it's, um, we, picked up this book because I'm a new company. It's a startup. so where we have some DevOps stuff, we want to get done. Like we're not doing CI CD quite yet. Um, and so I was like, you know what? Like DevOps would be a great thing to look at this book. It was written in 2016. And I think, again, suffers a little from like that whole Seinfeld isn't funny thing. Like this is, was I'm sure.

very revolutionary at the time, it's been 10 years and a lot of companies work like this, especially a lot of the bigger companies I've worked at. And then... ⁓

Nathan Toups (06:09)

Actually, think that

the second edition came out in the early 2020s, like 2021 or so.

Carter Morgan (06:13)

The second

Yet.

Yeah, but I wonder like how different the first and second edition are from each other like they say they say the 15 new case studies and there's over a hundred pages of new content I guess at least the first third of this book I'm curious as it goes on is kind of really trying to sell the idea of DevOps and in particular sell this idea of like Having feature teams instead of functional teams, right? We've talked about in the

Nathan Toups (06:38)

Hmm.

Carter Morgan (06:49)

the idea that like you should have a team dedicated to a service and that team should have everything you need on it. Like your database.

Nathan Toups (06:56)

This is one of my critics, and I know I've interrupted your initial thoughts, but this is one of my criticisms of this book. It's incredibly enterprise focused. is, this is not, at least the first third of this book is not, Hey, you're a startup. You want to do things right from day one. Here's how you transition from being scrappy founder code into awesome production. This is like American airlines had a eight month deployment cycle and now they're down to multiple deploys per day. And here's

Carter Morgan (06:58)

Now go for it.

Yes. Yes.

Mm-mm.

Nathan Toups (07:26)

they did it with their multi-million dollar budget. And you're like, that doesn't help me at all when I'm a Y Combinator startup. this isn't, you know, I think that's one critique is like, this feels like a consulting descendant of the Agile Manifesto consulting groups, right?

Carter Morgan (07:33)

Yes.

Nathan Toups (07:47)

I'm sure that Gene Kim and some of these other folks have probably gotten some pretty sweet Fortune 500 consulting gigs out of this, where really smart engineers who maybe have taken the wrong approach have come in and completely revolutionized how they ship so that they can be competitive. And if you think about this with Phoenix Project and Unicom Project, it's own brand, right? The whole point of that was this whole automobile.

parts manufacturing company becoming relevant and that everybody's really a tech company and all these other things. But if you're a scrappy startup and you're like, okay, wait, I don't want to be those guys. I want to be David, not Goliath. How do I do this from day one? This isn't that.

Carter Morgan (08:26)

Yeah.

I had that same thought too. like, there's so many, like, like it's, saying great things. Like, and if you've worked at a big company, again, kind of like a classic Phoenix project or unicorn project type company, like there's a lot of value, at least in the first third of this book, but it's trying to sell me on this idea of like, this is why you shouldn't have a database administration team and a separate, you know, uh, whatever server provisioning team or, uh, you know, and

this idea of like ⁓ feature teams, not functional teams. And I'm kind of like, so at my company, we have eight engineers, right? Like we can't even, we couldn't even have those teams if we wanted. And so is it great to know this stuff as the company grows and not fall into these patterns? Absolutely. But I am a little like, yeah, I, it's very, very enterprise focused. ⁓

Nathan Toups (09:23)

I will say where I do think it would be useful is this is a really good, it's too much for just this use case. I think this could be a blog post, but it's a cautionary tale of not cargo colting process. So like one of the things as your team grows is you might go, oh, well, you know, so and so puts all these quality checks or we need a QA person or these other things. And you're like, you really don't like, and this is a good example of why having a separate QA is terrible.

Right? Why? cause right now you're in a wonderful, this is like the coolest place ever. Your entire engineering organization is a two pizza team. Right? Like y'all can, y'all can have a hackathon and feed pizza to the team and like do things and unblock each other and, know, include the CEO and all these things that you can't do in these larger organizations, at least not unless you do. You know, team topology stuff where you're.

Carter Morgan (09:51)

Yeah.

Nathan Toups (10:17)

doing X as a service and enablement and all these other kind of big structural things, y'all have that. Like you are DevOps, if you wanna be DevOps. I will say I accidentally read ahead a little bit and I do think that like starting in part three, there are gonna be more applicable things. So there is silver lining. ⁓ But yeah, in this early part it is, it really does seem like you're like trying to convince kind of a.

you know, Eeyore, that he should cheer up a little bit kind of a kind of a thing.

Carter Morgan (10:50)

Yeah, it's a, and I'm trying to find the balance at my current place. Like when you work at really big companies, the product is generally considered established in whatever team you join. There's ⁓ exceptions. Maybe you're joining a new initiative, but at any rate,

So if you show up to a project, a really big company, you can kind of look at any place in the code base and if there's anything that isn't following best practices or sensible defaults or what, you know, as Martin Fowler calls them, you can kind of point to that and say, Hey, we're not doing that. We should do that. And everyone's going to be like, great. Like, yeah, we hadn't gotten to that yet, but that's a great thing for you to focus on. Right. I'm trying to be better at a startup. Like for example,

This is a small one, but I noticed like in react, if you're working with graph QL, ⁓ the, think we're using Apollo as our graph QL library, but anyway, how it works is exactly right. You define your graph QL schema in files within the app, and then you run a CLI command and that generates the, ⁓ the TypeScript hooks for those graph QL files that you created. ⁓ And those are, you know, it's like,

Nathan Toups (11:50)

Pretty common, yeah.

Carter Morgan (12:10)

File.generated.ts. You're not supposed to commit the generated files. It just kind of clogs up the repo, right? For some reason we are. I don't know why that happened. But I was kind of like, an old me would have been like, oh my gosh, such low hanging fruit. Let's not commit the generated files. Whereas I'm trying to be better at this new company of kind of being like, well, wait a minute. Like, is there a good reason we're committing the generated files? And more importantly, is this the highest priority thing we could be working on right now? Right?

The company's not profitable yet, right? Like when the company's not profitable, probably not a really great idea to focus lot of your time on making sure that generated files don't get committed.

Nathan Toups (12:48)

So I love this. This is a good conversation. I think this is a good non sequitur before we get into some of the more of the stuff here, because these are the kind of problems that you need to think through. I've been in organizations in which we do commit generated files, and I've been in organizations where we never commit generated files. They both can be OK. ⁓ It depends on, for instance, where it's common to commit generated files are like, let's say you're using protobuf.

Carter Morgan (13:14)

Mm-hmm.

Nathan Toups (13:15)

and you

have the protobuf manifests and or the protobuf descriptions and then I need to generate the go or rust or typescript or these type of things. I want this because I won't have the type signatures if I don't have them and it's nice to have that and they don't change very often. So when the generated stuff does happen and we can also put in our CI CD.

Carter Morgan (13:34)

Hmm.

Nathan Toups (13:40)

that it tries to generate it. And if there's a diff in the generated versus what's committed, you stop the CI process, meaning that's kind of like a user-generated thing. The same with lenders and other stuff, right? You can kind of go in and say, hey, wait a second. When I try to idempotently do everything, ⁓ I'm seeing something different. And so therefore, there's something wrong with this commit.

those little optimizations, actually do think making space for them. And I think they talk about this in the, ⁓ what do they call it? There's like this process improvement where they say you should carve out on regular intervals, time to do things that maybe aren't delivering value to the customer directly, but do it indirectly. And even though you may not want to, you may decide not to change the generated files, you might make it so that it,

The readme explains why the generated files are in the code and why we may address this at some point in the future or something so that the future next Carter isn't confused, right? They go, you know, they can still assess like, is this a good idea or not? ⁓ But, and that's the part that I do think that aspects of this book are applicable in that you can still evaluate, are we maximizing flow? One of the things I like to do in early stage startups,

This is kind of my bag of tricks, because again, this is my background. How quickly can I get local dev up and running? And how similar is local dev to what goes through our CI CD process? ⁓ I like those to be equivalent, if possible. I can run, I like to, a very common DevOps thing is to abuse a make file ⁓ as a job runner. Of course, you can use PNPM or something if you've got more native tools.

but have something that literally just wraps test, testing coverage, linter, ⁓ pretty, doing all the basic things. And I should be able to run all these on my machine. And those should be the same exact commands that my CI CD pipeline runs. And what gives me equivalence there is that I shouldn't have some special local magical Docker container. I should hopefully be using the exact same container for local dev that I'm also going to be deploying, but maybe with some

Carter Morgan (15:41)

Mm-hmm.

Nathan Toups (15:58)

environment variable changes or something, because I want to reduce surprises, right? The closer I can get my dev environment to what's going to end up being in production, the fewer weird quirks that come up, because what you don't want is at three in the morning, after a big release, you're trying to reason about something that doesn't exist in your local environment, and it gets back to like, it worked for me syndrome. ⁓ And there's a lot of little things that you can like play, you know, to kind of learn.

Carter Morgan (16:22)

Yeah.

Nathan Toups (16:28)

across the way and a lot of startups, they haven't figured that process out yet. And so it's a really nice learning experience.

Carter Morgan (16:36)

Well, let's let's take a quick break and then we're going to get into kind of our discussions about the first third of this book,

So we're going to try something a little different for this podcast. Sometimes we kind of go through the chapters and try to give a holistic view of the book. be honest with this book, there's a lot of excellent case studies and the book is worth it for those case studies alone. We've also talked about a lot of these themes before again, because we covered the unicorn project, which Jean Kim also wrote.

So, know, with chapter one is agile continuous delivery into three ways. Then they talk about the different ways, the first way, the principle flow. Then we got the principles of feedback, the principles of continual learning. We've covered a lot of this before. And like you said, Nathan, you've got some concerns about maybe the way this book sells itself. So we're going to kind of tackle the podcast from that route to maybe some of the concerns we have about this book. And with that, we'll be able to talk about.

all the great stuff this book has, but also, you know, our experiences and, ⁓ things like that. So Nathan, I don't know. So we read the first third of the DevOps handbook. mean, what's something or some concerns you have about it.

Nathan Toups (17:47)

Yeah, and I'll do some framing here, right? Which is that ⁓ there's this classic story of ops in dev. And I think they even mentioned that a lot of folks who worked in this world where you were doing rack and stack servers and provisioning resources, we basically had these vertical teams, right? You'd have the storage allocation and security team. They'd be separate and you'd get these really crazy processes as your team grew and you had hundreds of engineers and...

Carter Morgan (17:55)

Mm-hmm.

Nathan Toups (18:17)

⁓ It didn't scale and a new breed of company came out like Google, who really kind of, in Amazon, who flipped these ideas on their head and said, hey, how do we make more autonomous teams who can move faster and we can really trust our processes to do these things? And so when they started bringing the development cycles into the ops process, that's the whole point of DevOps. And so of course the thesis of this book is, hey, ⁓

We've learned a lot of this stuff. have all this data now. You know, these ideas started coming out in early 2000s, but they're really crystallized and solidified by 2015. And now they're considered obvious by 2025, right? I think even if you choose not to do some of the things in this book, you're deciding to react against the standard knowledge of DevOps. You're not being like, what? I didn't know that you could do continuous delivery. It's really more like we chose not to use Kubernetes.

in this workflow because we're a startup and that's too much overhead, right? And you go, cool. Well, and that is a hot take that's been coming up more and more. think if you've ever seen like DHH and other folks who do like these bootstrapped type companies that aren't trying to do infinite growth, ⁓ they don't want to use cloud and experimentation platforms and all these other things. And so I guess my first criticism or critique is this book is incredibly enterprise centric. ⁓

Carter Morgan (19:31)

Hmm.

Yes.

Nathan Toups (19:43)

And so therefore the audience is relatively small, deep pockets. Obviously enterprises can spend, you know, a hundred K, 200 K, 300 K on consultants to come in and help fix processes. but there's really, there's not a single case study of. We had this idea with two founders. We needed to hire a team of 15 people. And here's how we use these methodologies to go from founder code into, you know, two, two pizza teams.

without stepping on each other's toes. I wish that that existed in this book, but it doesn't.

Carter Morgan (20:17)

Yeah. The other thing with this being so enterprise centric is this is really focused towards enterprise leadership. I would say I write like, cause when I was, ⁓ selecting a new company that what was really important to me was just, especially a startup was having like absolute trust in the senior leadership. I've, you know, again, I worked on an awful little startup a couple of years back and

Nathan Toups (20:26)

Yeah.

Carter Morgan (20:45)

That was the biggest problem is I was just like, my gosh, the CTO and to some extent the CEO, I just don't trust them. ⁓ and that was a really hard lesson and like, you can't manage up. Like it's a really tough thing to come into work every day and to be trying to convince senior leadership. Like this is what we're supposed to do. And so on the one hand, like they're talking about kind of like all these kind of enterprise businesses, but I think this would be like, if you write this book, you'd be like,

burdened with knowledge if you were like a rank and file dev worker. I don't know. mean, because you've kind of mostly, I guess that is a gap in our collective experience.

Nathan Toups (21:23)

I've,

yeah, and I've been in this situation multiple times where I, know, I've, and I will say this is the other, one of the other critiques is that, you know, I wrote down DevOps as a silver bullet. So this idea that if I just come to the table with these DevOps ideas, I will revolutionize stuff. And the thing is, first of all, if you just read this book and then you try to convince senior leadership this is the right way to do it and you mis-implement it,

Carter Morgan (21:27)

Yeah.

Nathan Toups (21:51)

you're going to give DevOps a bad rep, right? They're gonna go, ⁓ we tried this, it was really complicated. This feels like Agile 2.0, we already don't wanna do this. I feel like, and I've seen this in organizations where it's almost great, like it's almost awesome, ⁓ but it's not quite fair. And I think a lot of frustration that exists right now with...

these sort of like smaller to medium sized companies that have invested in DevOps is that they've invested in something that's, it itself becomes overly complicated. And then there's a lot of tech debt and then there's a lot of resentment. And you get back into this old dev versus ops situation, but now you have this very expensive DevOps team versus what you probably had before.

Carter Morgan (22:41)

Yeah, it's ⁓

I think it's, yeah, I'm just trying to think about that, again, like that process of like managing up and like, I hate it. I hate it so much. Like, and so with this book, if there's a criticism, like again, this really seems like it's geared towards decision makers and not, we've read a lot of books before, like for example, a philosophy of software design, that is not geared towards decision makers.

That's just a great book for you as an individual engineer to read. And you can implement those patterns in your code base. And then you can become a leader in the code base, you know, amongst your team, ⁓ just by, because, you know, there's a management isn't going to be in the weeds of every line of code you write, but this is really advocating for some big changes, right? This is advocating for like, again, if you're, ⁓ working on a lot of, ⁓

functional team, we're going to place a lot of functional teams, like to really implement DevOps, you've got to kind of blow up those functional teams and rebuild them all as feature teams. How are you going to do that as an individual contributor? Like you just can't. I'd be really curious at the breakdown of our audience of individual contributors to managers. I imagine we're mostly individual contributors, but who knows?

Nathan Toups (24:05)

Right,

statistically, most businesses are small business. So while this is an interesting, ⁓ this does solve a very interesting problem for long lived, profitable, large engineering organizations and not discounting that. ⁓ We definitely.

Carter Morgan (24:08)

course.

Nathan Toups (24:27)

you know, we definitely have to like think about the fact that there's a relatively small audience for this. Yeah, it's like engineering leadership and people who've had some false starts, I guess, in the past with with how they're actively effective. This gets me to my next concern, which is what I'm going to call a survivorship bias. ⁓ And what I mean by this is that all the case studies are they're kind of like Phoenix Project and Unicorn Project.

Carter Morgan (24:46)

Hehehe.

Nathan Toups (24:53)

And I get it, but all of the case studies are like, oh, we were having this problem. We did this DevOps principle. Everything was amazing afterwards. Right. There wasn't a, I would wish that there was case studies in which we tried this thing, you know, it was an epic failure because our organization wasn't set up to support it. Or we tried five things. Three of them didn't work for our company. Two of them work great. And at least we still got improvements, but we wasted a bunch of money over six months.

trying to figure out which ones would stick. That feels a lot more like a case study where I'd feel, be like, yeah, you know, I've struggled with that too. That would be really interesting. I wonder how we could overcome that. But instead it's almost like apocryphal stories about how DevOps always fixes every problem. And I'm like, okay, this seems to undermine credibility in my opinion.

Carter Morgan (25:40)

Uh-huh.

Well, and I'll give credit where credit's due. Like lot of these case studies like take place over like two or three years. Right. And again, exactly. Like I got a great initiative, guys. It's going to take two years to implement. And granted, wouldn't take a as long to implement because they can move a little quicker. But again, I think just kind of coming back to that idea of the, it's really, really enterprise focused.

Nathan Toups (25:50)

Yeah, right. Which again, startups can't do that. ⁓

Carter Morgan (26:14)

Here's okay. Here's a question I have for you the and on cord philosophy that is ⁓ For listeners who are not familiar it comes from the Toyota Production system and it's that idea that anyone in the factory There's a cord you can pull and if you pull it that stops production on the factory the idea being that ⁓ The the sooner you you encounter an issue and stop it the sooner you can fix it before it gets down the line that it's

by pulling it and stopping everything, ⁓ all attention can be directed to that issue. so the entire organization learns from the issue and that the people who have the most knowledge about when to stop the production process are those closest to the process. so, that's why anyone has the, the ability to pull the cord. Two thoughts about this. One, I believe in this principle. I think it's a smart principle.

Nathan Toups (27:02)

Yeah.

Carter Morgan (27:12)

I cannot imagine what it was like for the first guy. In fact, I was just looking this up and this is chat GPT knowledge. So if it's wrong, uh, blend the computer, uh, Tai Chi Ono. Yeah, exactly. Uh, Tai Chi Ono, who's one of the architects of the Toyota production system is credited according to chat GPT with, uh, coming up with the Anlon cord. I cannot even imagine pitching this in like the 1950s, right? Like I got a great idea. We're to make a cord and.

Nathan Toups (27:21)

If it's wrong, it still sounds great.

Carter Morgan (27:42)

Anyone can pull it and it stops the entire factory when it gets pulled. Like that's insane. I mean, I think the logic checks out, but I'm just, I'm so curious how the genesis of that idea. Um, cause that, that is so counterintuitive. Um, but also there was a case study where they, in this book, where they talk about a team that implemented like an Andon cord system where, you know, if you.

whole the, I think they had like a Slackbot setup where they could type in like, and on, and basically like that stopped what everyone on the team is working on. And everyone would kind of like swarm the problem. that again, like the reason I was thinking like this idea seems insane because like that idea seems insane to me, right? Like, I I'm in office these days. And so the fun thing about working at a startup is that I was thinking like, could we bring up a Slackbot like talk to a Raspberry Pi that like when you

Nathan Toups (28:29)

Yeah.

Carter Morgan (28:38)

Trigger it like, you know, it's connects like a red flashing light that goes off and like that could be our and on court thing. But I'm just trying to like, like, okay, let's stop every engineer what they're working on. All eight of us gather around and fix my problem. I don't know. Have you ever worked at a place that does something similar to that?

Nathan Toups (28:44)

That's hilarious.

Yeah.

So, you you like to talk about your, ⁓ that one little startup that you were in before I was in one engineering team in which it was the highest functioning engineering team I've ever been on. And I've mentioned it. I was in this FinTech company. There was eight of us total, including an engineering manager who's like head of engineering, you know, more of honorary title for a company so small, but he was amazing. It was the most productive group that I've ever been in.

Carter Morgan (28:59)

Yeah.

Nice.

Nathan Toups (29:23)

And we did exactly, actually this book describes us as a startup, but with these principles and they were all from larger companies. admit folks had been at like Bizarre Voice, which was like a big deal in Austin. I was in Austin, Texas at the time. And then it's some other tech companies that have done really cool DevOps stuff. All of us were software engineers, except for me. I was the one SRE on the team.

Carter Morgan (29:29)

Yeah.

Nathan Toups (29:45)

And, but we all share these responsibilities. We all could stop the business processes if we had to. And we did a few times that the interesting thing I think about having something it's like the mutually assured destruction piece, right? Like everyone having the atom bomb prevents the atom bomb from going off is that you really ask, you know, the fact that I have the power to push the button makes it very rare that I ever do because I really respect the rest of my team.

Carter Morgan (30:04)

Ha ha ha ha.

Nathan Toups (30:16)

And if I ever do press the button, it's because it really, really is important. And because I respect my team, I know that I should stop whatever I'm doing. I'm not going to roll my eyes. I'm going to say, Hey, you know what? Brian over here, push the button. I, we, we, we need to go and, ⁓ you know, sort this out as a mob. And it typically was like for our case, it might be, we can't trade tomorrow because we found some production issue. And I think I can count on.

half of one hand the number of times we've ever had to hit a button like that. ⁓ And we had really good postmortems and we never had that problem again. We never repeated these kinds of things. We were able to divvy up tasks to make sure that our processes got more resilient. It's literally what I would, you know, I'm being a little hard on this book, but I actually do believe that maximizing for flow, limiting the number of open tasks you can have, ⁓ we were really good like,

I've gotten really bad about it, but I used to be really good at this company of stop starting, start finishing. This is like a really big topic that comes up in the first part of this book. And this idea is that like, if you start something and you let it fester, it's blocked, people can't do it, and it sits there for months, the odds of you ever finishing it become very low. And it's this huge burden in cost because like you have this half implemented thing, it probably shows up in some board slide deck.

Carter Morgan (31:21)

Mm-hmm.

Nathan Toups (31:40)

the board, maybe somebody from the board comes around and goes, wait a minute, didn't you say you were working on this feature? you ever ship it? And you're like, eh, we got busy with this other thing. We never did it. And their whole point is, optimize towards, and this gets back to agile stuff, optimize towards having some finishable product on some regular interval, even if it's not feature complete from where you envision it. And make sure that every time you do this as a team, the thing can run. It can go into production.

Carter Morgan (31:47)

huh.

Nathan Toups (32:09)

maybe you hide it behind a feature flag. But that was the big thing for me is that like, yes, we did have this. It worked on our small team and we would mob. I can remember a couple of times where we had to mob a problem. It was hard to reason about. It kind of touched on maybe two or three teams that were, or two or three services that were hard to think about unless you had the whole group there. And it also helped us learn like, ⁓ yeah.

this is really bad. If you have to reason about service B or service C and you're in service A, we don't have a good contract. And so we were obsessed with what we called our social contracts across services, where you should be able to consume any service without any knowledge of how it worked internally. And you should not hit some red flags, right? Your service should be able to just consume that thing. And you should also know that we're not gonna break it for you in the future. ⁓

This is the only team, regardless of size of institution I've ever been in, that took that blood oath so seriously that, ⁓ you know, and I've never been into Google or, or a meta or something like that. I think maybe those kinds of things are more mature. They probably also have CI processes that would prevent you from doing something like that. ⁓ But we had a very mature process. had incredible CI stuff. would, we would so extreme. And we haven't gotten to this part in the book yet.

But traditionally, a DevOps person would be the person who presses the button to actually do a deploy. ⁓ I had to build our systems in that any engineer on the team, if it was in the main branch, so we did trunk-based development, any engineer on the team could promote to production unsupervised by me. So everybody had complete autonomy for their services. They could ⁓ roll out and roll back.

do whatever they needed to. And so we had to build tools in a way that everyone could wrap their head around the deploy process, that they owned it. We could do multiple deploys per day without any interruptions. And that's just a very, it's a different mindset of building things in so that you can empower the engineers so that they can hit the button and say, look, I did this thing, it was supposed to work this way, things broke, right?

Carter Morgan (34:28)

you

Nathan Toups (34:31)

What about you? So would you think about doing this at your new startup? Like what would you imagine if you had given this button to your team?

Carter Morgan (34:43)

That's what I'm trying to figure out, right? What I want to be careful with. It's funny just because like...

As we've been reading all these books and I've been taking kind of, know, chunks of them, little pieces and applying it to my daily work. But again, I've worked at established places this whole time. And so there hasn't been any really real big drastic improvements that need to be made. But all of sudden I'm at this startup and I can point in any direction and say, this is a problem. This could be better. You know, there's room for improvement here. ⁓ But it's all about doing the right thing.

Right. And I want to like, again, cargo culting. don't want to cargo cult these things. And so this is one of those things that like, it just kind of stood out to me as like, Oh, wouldn't this be neat? So like, again, what if I rigged up the Slack bot that, you know, sounded a physical alarm in the office? Wouldn't that be fun and very like startup-y? Right. But I just had asked myself, like, do you need this? Does the team need this right now? And the thing is, I can't think of why we would.

need this. again, taking kind of the Andon chord philosophy, which is like, you do this because problems get, it's good to stop a problem now, right? You know, in the here and now, then to let it get further down the line. ⁓ But I just don't think we're really struggling with that right now. Like, it's not, again, there's only eight of us, like, ⁓

Nathan Toups (36:17)

Great.

Carter Morgan (36:18)

There's not much of a line to get down. ⁓ We have a good

Nathan Toups (36:19)

I... I...

Carter Morgan (36:22)

culture of helping individuals helping each other out. I can't see why we need all eight of us to stop, right?

Nathan Toups (36:25)

Yeah.

Yeah, and again, to me, the Andon cord stuff really comes down to like critical things. Let's say the next log4j comes out and you realize that you're in a race against time to get some critical system patched before, you know, customer data gets exposed. That's an Andon cord.

Carter Morgan (36:35)

Yes, yes. Yeah, yeah.

Nathan Toups (36:49)

situation, if you don't have a mature process that's like two lines of YAML and it just patches it and automatically does it, if y'all have to have a little bit of care to make sure that you've really played whack-a-mole all across some infrastructure, maybe folks don't remember this that much, but log4j was this very embarrassing ⁓ exploit that was in the Java runtime and it could do this horrible ⁓

Carter Morgan (36:50)

Yes.

Nathan Toups (37:19)

shell, like that you could do a reverse shell into production machines and pull a bunch of stuff out. And it was like it actually when I was in graduate school, we had a lab ⁓ virtual machine. OK, cool. Yeah. So we had a lab virtual machine that actually had the active exploit in it. And you got to see it and you're like, boy, this was bad. This was really bad. ⁓ And so.

Carter Morgan (37:23)

Yeah, yeah, that was crazy.

Yeah, I had that same lab too.

Nathan Toups (37:46)

Those kind of things, obviously security incidences, but there's also like, had a little one yesterday. We have a brand new Greenfield project. I kind of got them set up with something that was really easy to get for like a staging environment, but we don't have alerting and monitoring and a bunch of other stuff. And one of the little services was down. And we realized it was because they'd made some big changes and it wasn't well documented where one of the environment variables got injected into the system. And so it was kind of in this like deploy fail loop.

And we, as a group, jumped on together, looked at the logs, scratched our head. We realized we were missing a slash v1 in the API path on some environment variable. But the thing was, that environment variable was not set in version control. It was injected into some secret manager on the deploy platform. And so it wasn't easy to reason about. And so that opened up a couple of tickets. Number one, we're going have basic up-down alerting. And number two, ⁓

Carter Morgan (38:34)

Hmm.

Nathan Toups (38:44)

not like environment variables that don't have like passwords in them really should be in version control, right? Those are two learnings that we had from this new team. Well, those are very minor changes, but in the future, this will prevent an entire category of how did this happen problem. And I think that's one of this like continuous learning and also gave me an opportunity to do basic DevOps ⁓ methodology to a software engineer who's super smart, but has not spent as much time thinking about deploy cycles. Cause again, I've set them up so that

it's trunk-based development, right? So as soon as they approve a PR, goes into the main branch, it automatically updates the staging environment, right? Really cool, because they get instant feedback on any peer-reviewed code, they get to start thinking about CI, CD, it's as close to production. We try to have it as close to production as possible. And we're doing this with a team that's like five engineers, four engineers, something like that, right?

Carter Morgan (39:37)

Yeah. Yeah.

Where, is your product, is it very customer facing?

Nathan Toups (39:44)

So this one is, so it's funny, is I'm working at this blockchain startup. We are always trying to experiment with ways that we can use consumer applications for a blockchain. And so this is actually like a social media experiment, kind of like a Twitter clone, but using some of the blockchain technology. And so this team is building a user, like what they call, and this is gonna sound obnoxious, Web3. Web3 is like the world of blockchain stuff. So this is like a Web2 looking interface.

but is backed by Web3 technologies. So we try to like, you don't need to become a crypto blockchain expert to use this, but we're doing like, you know, all the state of the machine and all this other stuff are on chain. And so, yeah, it's funny. A lot of the weird stuff I'm doing under the hood elsewhere is like peer to peer consensus and you know, all these other weird things that you do with blockchain, but here it's like a Postgres database, a... ⁓

JavaScript, node application, very traditional, like you would feel very comfortable. You could work on this code base and not know a thing about blockchain. And so yeah, this is something I could get up and running for them very quickly. Blockchain stuff, have to be a lot more careful with because a lot of that data is permanent. It's permanently written into the ledger on the chain. so you have to think about things. But in this one, we're consuming data from the chain, throwing it into a schema in a Postgres database, then showing that.

to a basic feed in a social media app. And that's all very familiar things that we've all looked at.

Carter Morgan (41:19)

trying to figure out a CACD I've done a lot of.

work? I guess, safety on the front end is what I'm trying to figure out right now because like one, we don't have a ton of tests at the company, right? And even then tests are tests are easier on the backend. Not only, not only are unit tests sometimes a little easier on the backend than the front end, but also you can write kind of like end to end tests that you can kind of assert your sanity a little more. Like even if you have no unit tests in your backend application, you can write some end to end tests that are just like, okay,

Nathan Toups (41:37)

You

Carter Morgan (41:56)

Do the five main APIs work? And so when we deploy, we're just going to validate that those five things work end end. And as long as those do, like odds are the application is pretty good. The front end is trickier. And like right now I'm trying to figure out if this is like a good pattern or a bad pattern. We're doing like bug bashes. So like, for example, we had this new feature. uh, uh, they, changed the way that people could, uh, set.

pricing on some of like the comprehensive coaching packages they offer. Right. And so that came out and we decided as a team, like, you know what, let's swarm it. And two days we got it all. I'm going to use quotes here done because we got it all merged into staging. And then we did a bug bash on it. It's like, okay, let's check it out. And we, as a team found 50 bugs on it. Right. Like that doesn't surprise me. Like we, we hustled to get it done, but like, uh, but yeah, like it was two days of kind of pretty frantic work. And so.

Nathan Toups (42:45)

Yeah.

Carter Morgan (42:54)

What does that look like in a CI, CD world, like a DevOps world? Because you can't test that end to end in every single use case and validate, of course, this works. But then if we just deployed it straight to production, we're introducing all these bugs to the user. I mean, guess feature flagging is maybe the right answer there.

Nathan Toups (43:12)

So.

Feature

flags one, ⁓ there are, and again, you have to kind go down the rabbit hole, there are automated QA frameworks.

that give you that sort of end-to-end testing for UI stuff. ⁓ They typically have, you know, there's like commercial tools, like, I don't know if you've heard of Rainforest QA, they're like an interesting one. There's a few others ⁓ that have basically given like QA automation for UI stuff. But there's also, you know, there's a good number of them that have headless browsers that can do a bunch of rendering and stuff like that. They're a little more cumbersome to set up, but they can be really useful. ⁓

Carter Morgan (43:25)

Mm-hmm.

No.

Yeah, yeah.

Nathan Toups (43:55)

There's two paths, right? If it takes a long time for these tests to run, you can do it from a regression testing standpoint, or you basically say, I'm going to deploy this into this environment, ⁓ but it can't be promoted to another environment until it's passed all of the heavier testing that takes a longer cycle. We did this at home.

airline startup I was in, we had a lot of the same exact issues, but much larger team where they just didn't have the discipline of doing end-to-end testing. Or they would have manual QA. Manual QA was, they were super smart group, but it was manual. And, you know, they have run books for stuff, but they would miss things and, you know, stuff would happen. the engineer, what's worse is the engineering team would rely on QA. it, and I look, I played this game. If QA finds the problem,

Carter Morgan (44:33)

Yeah, yeah.

Nathan Toups (44:46)

we failed as software engineers, right? You really don't want it to get that far down the pipeline. A QA engineer should really find emergent, surprising behaviors that we hadn't thought about testing yet. ⁓ But yeah, CI pipelines, to me there's two parts. There's like the obvious stuff you do in a PR, right? I do a pull request. There's a bunch of fitness functions that should pass, right? If we're gonna get into software architecture stuff, those obvious ones.

Carter Morgan (44:48)

So yeah, yeah.

Nathan Toups (45:12)

And then, and those are unit tests. Of course, you know, I've noticed this, I don't write a ton of UI code, but to write JavaScript in a way that is unit testable is itself a skill. It's very easy to embed a bunch of logic in ⁓ your view that you have, which is bad. Obviously, if you can put heavy business logic that has to be correct in the JavaScript library and all it does is the pure thing,

Carter Morgan (45:26)

Yeah, yeah.

Mm-hmm.

Nathan Toups (45:41)

pure business processes where I can inject a mock of the database. I can do all these other things. And then I happen to use this as a thing I bring into my view. ⁓ Now I have this unit testable thing. If I have it in the view, the only way can test it is in testing or integration testing or these other things, which is not impossible. And there are some folks who that's only way they want to test. I find that unfortunate because I'm like, if I can describe

Carter Morgan (45:59)

Hmm.

Nathan Toups (46:11)

creating a user in a simple function that has no opinion about how the UI works, I should rip that out into its own user's library, have a bunch of test coverage, and then have this well-test covered thing that I know how it behaves inside of my view. And then any bugs that come out of that for integration testing, it's like UI bugs, right? It's like, ⁓ we're not displaying the JavaScript that comes out of this properly, or I mean, the JSON that comes out of this properly. ⁓

Carter Morgan (46:21)

Yeah, yeah.

Nathan Toups (46:41)

It's tricky and UI does make things hard. will say like any of us who have been done with API work, it's like one nice thing about doing API work is it's very, we have very mature test frameworks in the JavaScript world. It's a lot harder. Luckily, there some really nice React testing frameworks and some other ones. But you have to prioritize it. And I would say what I like to do is what I call like a gateway drug approach, which is like,

Basically, I will put a stub in my CI process. I'll literally put a test state that's a TBD. And so the test will pass because there's nothing in it. Then I will do like the most obvious and simple tests that I can do. Like, ⁓ you know, the thing compile, or not compiles, but the, you know, some really basic interface just to get it so that the code coverage tool then shows up, right? And it shows the sad, what I call the sad path where it's like, it's, you know.

Carter Morgan (47:21)

Yeah, man.

Nathan Toups (47:42)

you know, it's only got 20 % code coverage or 5 % code coverage. Now I have a target. Now I can say, hey, we want to increase code coverage on these hot code paths over this period of time. And it reduces. So one of things that this book brings up really well is reducing cognitive load, right? And the idea of reducing cognitive load is that if my software engineer wants to solve a problem, but it requires them to understand five other business processes or five other abstract ideas, they won't do it.

And so they'll go, ⁓ even getting testing setting up is horrible. It's going to slow me down. I'm not going to be able to ship this feature that our CEO cares about. So I'm just not going to do it. Or we'll do it in QA. That's just the conclusion you'll reach. But if you come in and you say, you know what? Our test coverage is going be awful. But at least I will have it so that you get feedback. ⁓ And then do a quick little internal demo and says, hey, we have this test coverage tool. Here's how easy it is to write a test.

I think we should start actually doing this so that we can add more coverage over time. And that maybe we can make a new rule that says if you write a new function or new class or whatever, it should at least have a test to cover it so that we can extend that over time, right? That gives me like a very like approachable way. And it also gives me a conversation point in my PR to say, hey, as a team, we agreed that we're gonna start writing tests for new code. I don't see a test here. So like go ahead and add a test and then I'd be happy to.

Carter Morgan (48:52)

Yeah, yeah.

Nathan Toups (49:08)

The code looks good, I would like to see a test to cover it. It looks good to me, a conversation point at that point.

Carter Morgan (49:15)

Yeah, I did something similar to that where like we didn't even have any testing framework set up at all. And so I was like, you know, I'm just going to get this set up right. And I'm to write kind of like some trivial tests for this feature that doesn't really need them just so that the next time an engineer says, or if I tell an engineer, Hey, you should write some tests or if they feel like they should write some tests, like it's there and we can at least do it. I like the idea of. We need to introduce into the CICD process a stub for the testing section just to.

And it cannot even run the test at this point, but we will have it there. ⁓

Nathan Toups (49:48)

To me, is like, and this got back to that great engineering team, ⁓ make your bed, clean your room, right? Like if you, or if we put it into, think actually what some, an industry that actually has a lot to do with us, even though we're not overlapping, is if you're a chef and you're on a team making food at let's say an excellent restaurant.

Carter Morgan (49:55)

Yeah. Yeah.

Nathan Toups (50:12)

If you ever went back to the back of that restaurant, I can tell you it's impeccably clean. A ⁓ good cooking staff is not only as really good at food prep, they're not only really good at presentation and cooking the food and cleaning, they do all of these things and they're expected to play their part. How the food's stored in the refrigerator, all the process. If you have a busy kitchen at a high functioning restaurant, ⁓ the team approaches

Carter Morgan (50:17)

yeah.

Nathan Toups (50:41)

the process of delivering the food to the customer, right? Which is all the customer cares about. The customer wants an amazing dining experience, especially if you go into like a Michelin rated restaurant, Amazing dining experience, maybe surprise and delight them, give them a quality of food and a craftsmanship of creating that food. But if you actually look at the high functioning team that deals with the flow of all the orders that are coming in every night.

They have all these processes in place. Everything from how they receive the food and approve whether the fish that they're going to pick for that day is meeting their standards to how they pre-process it and store it.

engineering teams have to do the same thing. You don't get to just be a star chef. You also have to be like, you know what, I need to learn on my culinary skills of how to use a knife. And how do I store these things? And how do I decide this food doesn't meet our standards and we should get rid of it, right? ⁓ And how do we introduce this to our engineering teams is this is DevOps.

Right? Caring about the process, this gets us back to the book. Caring about the process matters. Yes, engineers have to be responsible for some aspects of how you ship code, thinking about the stuff, but we don't want to overload them with the cognitive burden. And I think that's really the kind of thesis of these first two parts of the book.

Carter Morgan (51:56)

Absolutely, and that takes us to a good wrapping up point. So we the first two parts, we're gonna read the rest of the book over the next couple weeks. But Nathan, what are you gonna do differently in your career as a result of having read the first two parts?

Nathan Toups (52:08)

I'm so guilty of this, but I know that I should do it, but I need to like put conscious effort into it. I said, Institute, stop starting and start finishing. My team has this right now. Actually, it inspired me this week. I actually wrote a guideline of how we're going to finish three projects that we have open. And we're not taking on any new work until we get these to this finish line. And I'm pretty excited about it.

Carter Morgan (52:18)

Yeah, mine is similar. ⁓ we, as a team are actually really good about that. Just like keep, we say, keep whip low with being work in progress. ⁓ so I want to, ⁓ you know, I have a project right now that I started and they got put on the back burner for these new kind of package pricing updates, but I just, I want to stay good about that. And, and, ⁓ we're spinning up kind of a new, we're not calling them teams or call them squads, just, ⁓ we're dividing the team kind of in half. ⁓ and that's

Nathan Toups (53:02)

Nice.

Carter Morgan (53:05)

That's my squad. have the chance to kind of. ⁓

set some principles for how we operate, and that's one I want to keep consistent. And then who would you recommend this book to?

Nathan Toups (53:19)

Yeah, so if you're in a big company and you're struggling, like if you're the same kind of person that would be reading refactoring legacy code or you're navigating a larger team that you know has this initiative and you've heard that there's a DevOps transformation about to go on, this is a must read, absolutely. ⁓ If you're in... ⁓

if you happen to be in the DevOps world and you feel like your organization maybe doesn't define things properly or there's too much expectations or you haven't hit the outcomes that you want, it's a great book. I would skip it if you're a startup. I think there's other books you could start with.

Carter Morgan (53:59)

Yeah, I would say like if you are a decision maker in an enterprise world and you are finding that your flow is awful, that it's taking a long time from feature conception to feature delivery to deliver something like, yeah, absolutely. Check out this book. Like this is a huge helper to you. I agree. Startups may be not targeted for you. ⁓ but yeah, we'll, we'll find out as we continue reading this book, what is applicable and what's not applicable, but

Yeah, as always find us on Twitter at BookoverflowPod, me on Twitter at Carter Morgan. Our email is contact at Bookoverflow.io and Nathan's newsletter is functionallyimperative.com. Thanks a lot, everyone. We will see you next week.

Nathan Toups (54:40)

See you.

Episodes in This Series

Ep. 69Is DevOps a Silver Bullet? - The DevOps Handbook(This episode)

Jul 7, 2025

Ep. 71Deployment Strategies for Success - The DevOps Handbook

Jul 14, 2025

Ep. 73Shifting Left on Security - The DevOps Handbook

Jul 21, 2025

Ep. 87Patrick Debois Reflects on The DevOps Handbook

Oct 30, 2025