Ep. 73Monday, July 21, 2025

Shifting Left on Security - The DevOps Handbook

Watch on YouTube

Book Covered

The DevOps Handbook, 2nd Edition: How to Create World-Class Agility, Reliability, & Security in Technology Organizations

by Gene Kim, Jez Humble, Patrick Debois, John Willis, Nicole Forsgren

Get the book →

Book links are affiliate links. We earn from qualifying purchases.

Authors

Gene Kim

Jez Humble

Patrick Debois

John Willis

Nicole Forsgren

Hosts

Carter MorganHost

Nathan ToupsHost

Transcript

This transcript was auto-generated by our recording software and may contain errors.

Nathan Toups (00:00)

The security instrumentation is this, in the whole security posture piece, is this idea that we should always make time to think about security and they call it shifting left on security. So used to be that you'd write all bunch of code and at the very end, your security team would be like, yes, I'm gonna give it my blessing and you can go out. But that's way too late. The idea is that you should actually have

Carter Morgan (00:19)

Mm-hmm.

Nathan Toups (00:21)

security best practices in from day one.

Carter Morgan (00:31)

Hey there, welcome to Book Overflows, the podcast for software engineers by software engineers where every week we read one of the best technical books in the world in an effort to improve our craft. am Carter Morgan and I'm joined here as always by my co-host Nathan Toops. How are you doing, Nathan?

Nathan Toups (00:42)

Doing great,

Carter Morgan (00:44)

Well, thanks for listening everyone. Like, comment, subscribe, wherever you're at. ⁓ make sure to share the podcast with your friends, share it on LinkedIn. You can tag us on LinkedIn if you want. ⁓ and yeah, we're, we're excited to have you here and make sure to check out while you're here, our interview with Kyro Bobrov, author of Grokking Concurrency. That was a fun one. You know, he is not, I believe we are his first podcast interview ever, which is unusual for most of our authors, but pretty neat, huh?

Nathan Toups (01:10)

Yeah.

So cool. And yeah, he's a great all around person and ⁓ had some cool things to say about his experience on LinkedIn and stuff too. So that was, ⁓ it was cool. I love engaging with the authors.

Carter Morgan (01:24)

Yeah. Yeah. This podcast ⁓ started out as a, as we like to joke as a book club masquerading as a podcast, but it's quickly just turned into a vehicle for Nathan and I to meet cool people. So we're very happy about that as well. ⁓ And yeah, we're, we're excited about things we got working through the pipeline. ⁓ We always try to interview as many authors as we can and when they happen, they happen. ⁓ But you're always entitled, I guess I would say to our weekly episode discussion.

And we've got that this week. Finishing up the DevOps handbook. This has been quite an endeavor, huh, Nathan?

Nathan Toups (02:03)

Yeah, it's funny too. I think this might be the first book where the further I get into it, the more I'm like, yeah, this is a great book. I wish that if there was a version three that they would maybe make it more ⁓ periodic and give me maybe some examples, get into the nitty gritty, do some philosophy on it, and then...

Carter Morgan (02:13)

Okay

Nathan Toups (02:30)

rinse and repeat, because I feel like what we saw was a bunch of like abstract framing, and then some ideas on how you implement, and then like really into the nitty gritty at the end here, ⁓ which of course I loved. actually, I think that there's a type of reader, and I will talk about this at the end of the episode, that could really just read parts, maybe three, four, five, and six, and that's all they need out of the whole book, right?

Carter Morgan (02:59)

⁓

yeah, well, you know, I guess we can get right into our general thoughts. We've this is week three of covering the book. so, I mean, it's the DevOps and book. we'll, we'll, won't introduce each author again. We got Gene Kim, Jez Humble, Patrick Dubois and John Willis. And yeah, it's just the definitive guide for playing DevOps principles across entire organizations. ⁓ yeah, I mean, I think it's interesting you say you liked it more as it went on. had kind of not the opposite impression, but we talked about what the first episode parts one and two being like.

trying to really sell you on the principle of DevOps, which like, I think if anything, the DevOps movement has been a victim of its own success in that there's not a whole lot of like selling needed at this point. ⁓ And so, you know, for people like us who are very familiar with it, parts one and two are not entirely necessary. Loved parts three and four. That really got into the nitty gritty. I thought of like, okay, well, what does it actually look like to implement DevOps?

in an organization. Parts five and six, ⁓ this is just a me thing, but anytime we start talking about security, my eyes glaze over. ⁓

Nathan Toups (04:09)

Wow, okay. This will be a good episode then because

anytime we talk about security, that was literally every elective I took at Georgia Tech, right? So like, I'm like security obsessed and this whole conversation of shifting left on security. So this is a good experience for me because I've experienced the engineer's eyes glaze over when I get excited about something at work. And I'm like, okay, so Carter and I will be a good sort of like back and forth on this. Great.

Carter Morgan (04:18)

Okay. Okay.

Well,

I need to be sold. I am convinced of the importance of security. ⁓ But I just, yeah, it's not something I naturally gravitate towards. I also, I like building things and anytime security comes along, it's like, you have to slow down. I'm like, grr. So yeah. But we'll get into all of it. Part six is really about security. Part five is about ⁓ continual learning and experimentation.

I guess we can just get into it right now. ⁓ Let's start with part five, continual learning and experimentation. ⁓ mean, anything that really stands out to you in this chapter, Nathan, I did like a lot of the case studies here.

Nathan Toups (05:19)

Yeah, you know, I really hate this sort of like fuzzy stuff where people are like, oh yeah, we have to create a safe space. And sometimes people, like, it gets too touchy feely, except I do believe that you can't have good experimentation and you can't have teams taking risks unless you make a safe space. And I think part of that is by safety. mean, not that you'll never hurt someone's feelings. That's not, I think you need to have radical candor and things like that. we've talked about.

Carter Morgan (05:26)

Sure.

Nathan Toups (05:48)

To me, a safe space is if something blows up or if you find out that it's a failure of an idea, that as a group, we can kind of identify, well, what was a failure? And maybe we can figure out a way to take a risk that takes a smaller experimentation window. Or maybe this team and this team ⁓ had experiments, but our information is so siloed that we don't actually, I don't know that another team already tried three of the five things that I tried.

And I think this section was really good as far as like reminding us that it's our duty to share our findings, whether they're good or bad, right?

Carter Morgan (06:26)

Yeah, I think. ⁓

Like you talk about, when we talk about creating a safe space and a safe culture, that's not this idea that no one will ever misspeak or no one will ever hurt your feelings or no one will ever, you know, maybe accidentally offend someone. But it's that idea of being able to pursue risks, both big and small. I've seen mistakes in a macro sense and a micro sense on that level. The micro sense is something I think a lot of us are familiar with, the idea that like,

⁓ you brought down prod and that's a big deal and you're punished. ⁓

I work in at, both my examples come from Amazon. One of like a macrosense, one of the microsense. The micro thing was funny. Something that Amazon did where they like were trying to do the right thing. But it just cracked me up as like, if there was an incident, generally you are responsible for writing the COE, the correction of errors. And they insisted so much.

that writing a COE was not a punishment. For what it's worth, I believed them the first time. But they insisted so much that it wasn't a punishment that I started to think it might be a punishment. Right, and I don't know, maybe that just was a poor culture fit thing. I tend to take people out there worried, but I know some engineers can be skittish people who need lots of reassurance, but I just started to get suspicious.

Nathan Toups (07:59)

You know, it's funny, I think it's kind of like maybe the tactics that my parents used to make me eat my vegetables, right? So you're like, this isn't a punishment, right? But they're like, yeah, you have to, like, you're not finished unless you eat your vegetables with dinner. ⁓ We use a different tactic with our daughter, which is that, first of all, we try to make vegetables absolutely delicious, you know, lots of good oils and salt and heat. ⁓ But the other one is that like, if you teach somebody to listen to their body and say, hey, look, if you eat all the like sugary sweet stuff first,

Carter Morgan (08:06)

Yeah.

Yeah, yeah.

Nathan Toups (08:29)

and then you try to the healthy stuff second, your tummy could hurt and you're actually not gonna get good nutrition and it could affect your sleep. And we kind of get her to be more mindful about why we eat vegetables, right? Like, that's nutrient dense and it actually like helps with digestion. And as she's gotten older, I think this has been really helpful. She has a very healthy relationship with food and like thinking about these things. I think the same is true with like blameless postmortems is that

Carter Morgan (08:37)

Yeah, yeah.

Nathan Toups (08:55)

And then other sort of corrective measures. If you're showing someone why eating your vegetables is good, you know, that this is just a part of a balanced diet, the organization needs it, it's a part of healthy operation. But if you give it in this thing of like, hey, and by the way, you must eat your vegetables for you to do more fun stuff. Yeah, I think that there's a dysfunctional way that you can actually like enforce a healthy behavior, but do it in a way that feels like a punishment. And yeah.

I, one of the things I loved about this is, and I think I've been in only in one organization that was really good about this. Um, it was in chapter 19 and they talked about learning from near misses. So a lot of times, you know, when you have responsive stuff, it's like, well, my boss is going to look bad and his boss is going to look bad or her boss is going to look bad. Um, if we don't write this good postmortem and do these other things, but they actually said, Hey, let's say we caught something before it became a problem.

Carter Morgan (09:35)

Okay, yeah.

Nathan Toups (09:51)

something showed up in the logs, somebody just randomly was doing something and realized there was like a fundamental error, wasn't user facing yet. ⁓ Treating that as if it was just a series of an incident as a sev1 or sev2 or something. And by sev, mean severity rating, sev0 means like the whole website went down, sev1 is like a major outage and there's like these other pieces. ⁓ But a pretty serious severity score. If you treat it internally, gives you...

Carter Morgan (10:09)

Yeah, yeah.

Nathan Toups (10:18)

something else that they bring up later in the book, is they call them game days, which again, turns out the same organization that was really good about doing near misses also was really good at game days. ⁓ Again, for the uninitiated and one of the the book brings up, a game day is kind of like a scrimmage that a team would do before the real game, right? Scrimmage being that they try to the event, the outage as real as possible. ⁓

Carter Morgan (10:23)

Yeah, I am.

Nathan Toups (10:46)

And so maybe this would be, do something to staging server, we all treat it like it's normal. ⁓ Or I've even seen organizations that actually cause minor outages on production, kind of like chaos engineering, where they have to see what corrective measures can come into place. And I think it's a huge deal because it gives you buy-in, right?

Carter Morgan (10:57)

Yeah, I am.

I've done game days before. Yeah. Just, and I want to talk about that. I've just, about risk taking though. I I've seen in a macro sense, Amazon did this as well where Amazon had previously had a great culture of like, have you signed up on a team that was taking a big risk? Like maybe you're trying to build the next generation of Alexa or, know, Amazon auto or whatever. don't know. ⁓ and it failed like it's fine. The Amazon would just shuffle you to another team and like no harm, no foul.

taking a big swing. And one of their more recent round of layoffs, they laid off a lot of those like big swing teams. And so it kind of really chilled the culture of like, well, I'm just going to sign up with the projects that are for sure making a lot of money. Yeah, it's a tough thing to, to not screw up as far as game, game days go. Yeah. mean, game days are hugely valuable. Also tough to simulate. And my last job, we were having that problem where like,

Nathan Toups (11:44)

Yeah.

Carter Morgan (12:06)

Theoretically, we were supposed to have a game day, but we didn't have a staging environment. And so it's like, what are you supposed to do aside from actually break production? When you've seen people break production, because I've seen that work if your production is huge. For example, at a big company like Meta, and when we talked about this in the last episode, their changes, they'll roll out to just 1 % of users or something like that. Or I've seen it like,

Nathan Toups (12:34)

Mm-hmm.

Carter Morgan (12:36)

We're just gonna break Zimbabwe, right? Because like we don't have a ton of traffic in Zimbabwe. ⁓ It's harder if you only have like one kind of one region that you're operating out of. How have you seen people break production in the past?

Nathan Toups (12:50)

So it's interesting, I have not been in an organization mature enough to do the production stuff. Like I've read from the sidelines and I have a friend who he worked at a ⁓ place that made like HR software that was kind of one of these rapid scale companies. And they did full end end game days. They actually talk about this in the book and it was cool because I mean, I'd heard his personal experience with it where they talk about communication tools being actually a critical thing to test.

Carter Morgan (12:54)

Interesting.

Mm-hmm.

Nathan Toups (13:18)

Right. We don't think about it, but like if the outage actually causes you to no longer be able to use Slack or causes you to no longer your email to go down. a lot of times, you you talk about all this HAA and load balancing and also like cool stuff that you have in your infrastructure. But if you go one layer of the onion further out, all of your assumptions about how you're going to coordinate these things or how you're going to communicate just completely end. And I thought it was cool. Like Google, they talked about at one point, ⁓ actually

Carter Morgan (13:23)

Yeah, yeah.

Nathan Toups (13:48)

they would do like wild game day stuff like alien invasion, ⁓ earthquake that takes out, you know, the entire Bay Area, San Francisco area so that it's like cut off from the rest of the Google infrastructure. Obviously you have to be a large organization to be able to pull those kind of things off. ⁓ But to a lesser extent, I have seen it. Most of the smaller organizations I've been in, we have a staging environment. ⁓ We don't do the...

Carter Morgan (13:58)

Yeah, yeah.

Nathan Toups (14:16)

version of trunk-based development where it is that you have all these canary deploys in a single deployment environment because somebody at Google scale, ⁓ you can't make a server-to-server replica of what production looks like. Your production is larger than the entire compute power of parts of Europe and Asia combined and things like that.

Carter Morgan (14:21)

Yeah, yeah.

Yeah.

Yeah, it's, I mean, as far as game days go, I think it's funny. You talk about communication, ⁓ during game days. And I thought it was funny because I, when I was at Amazon, the, the reason I ultimately left Amazon is cause they, they had hired me remote and they wanted me to move Seattle. And that was not something we really wanted to do. And then they were really selling all the benefits of, of in office work, ⁓ but like in a weird vague way, right? Like they.

Nathan Toups (15:02)

Hmm.

Carter Morgan (15:12)

Amazon really prided themselves on being data driven. And so a lot of employees, when the decision was made, asking like, can you show us the data on how we've improved? And they never would. Management, they would say like, we don't have the data, but we know that it's working. And that was another thing that really broke a lot of trust with employees. Yeah, exactly. Right. ⁓ At least at my current place, we're in office and like, there's no one trying to say it's justified by data. They're just saying like, I'm the CEO and I like this. I'm like, you know what?

Nathan Toups (15:25)

Trust me, bro. Yeah.

Carter Morgan (15:42)

I respect it. ⁓ But I thought that was funny because I think that's actually a huge advantage that remote teams have over ⁓ in-person teams is when incidents happen. Because a lot of times an incident is going to happen at 9 PM, 10 PM. the whole team already being very, that's the default method of operation is working physically alone.

debugging things and then getting on Slack or Zoom or what have you and solving it. And I also found that interesting at Amazon. think by far my, the moments I'm proudest of when I worked at Amazon ⁓ are big incidents. And when we would all have to get on a call and work hard to resolve something. And even when they'd moved back to in office, again, a lot of those incidents happen after hours. And so I thought,

Like that to me was one of the best examples of like a team functioning really well. And that was a completely distributed team resolving some pretty big incidents. ⁓ so, but I don't know if I'll say how things happen if we have a big incident at my current place and we have to resolve it in person, what that looks like. as far as a staging environments go, that's something I've been trying to work on. And now that I'm at a much smaller company is like at any, anywhere you look at a small company.

If you're familiar with like good development practices, you can point to something they're doing and say, well, that could be better or this could be better or why aren't we doing it this way? And I'm trying to be careful not just to say like, we should do X because X is best practice. ⁓ I want to make sure there's a solid business case ⁓ for each practice we recommend. And so I actually had a cool opportunity recently to, ⁓ to bring in a best practice that solved a business use case, which was infrastructure as code, Terraform.

we haven't been doing any sort. Yeah. We haven't been doing any sort of that. It's just, it's all just click ops right now because, know, I know. Right. Right.

Nathan Toups (17:37)

⁓ yeah. Woo.

boy. Well,

that's, and that's what I would call less low-hanging fruit once you've kind of, once you've seen the light, right? Once you realize this is what chaos feels like, I see it in the room, and then you're like, there's a better way and you can show this absolutely provably correct way of doing things. ⁓ It's beautiful. Beautiful.

Carter Morgan (17:50)

Yes.

Yes, absolutely.

I'm working with Cloudflare and their Terraform stuff is not as great as I'd like it. And then there's legitimate bugs in the Terraform where it's creating the resources as it should, but then it's not forwarding along some random variable. I don't know. It's weird. And so we're getting errors even though it's working correctly. So Terraform fun.

I think I mentioned this in the podcast, but like, I'm convinced you can't really join a company to change it unless you've specifically been brought in to change it. And even then you're not really changing it. You're, you're the, the leadership has, you, can't change the values a company has. And so that's what I like about this place I've joined is I, know, I talked to senior leadership and say, Terraform, what are our thoughts? And they say, we want it. We haven't had time to do it, but we want it. And so that gives you the license to really pursue it. And so I,

Nathan Toups (18:50)

Right.

Carter Morgan (18:53)

we've just started using CloudFlare for, we've been using it for like DNS management, but we're using it for our CDN. And so with that came, we decided to store a lot of our assets for a part of the company in CloudFlare's version of S3 R2, because they don't charge egress costs from, if your stuff's in R2, there's no egress costs to the CDN. And then with that too, for streaming video, CloudFlare has a cool product called CloudFlare Stream.

Nathan Toups (19:13)

Right.

Carter Morgan (19:21)

which if you upload a video to Cloudflare Stream, they just handle everything streaming wise for you. So you don't have to do the encoding. You don't have to convert it to the HLS format. Like it's just, it's all there. Really, really neat product. So.

Nathan Toups (19:36)

I'm consistently impressed with Cloudflare. They kind of, they freak me out a little bit because they are kind of like seeping into everything and they do things like SSL termination and some other stuff that like someone who's into like crypto sovereign stuff might ⁓ have a little issues with. But I will say their engineering blog is one of the best around. The way that they problem solve is really interesting and they really understand their customers. ⁓

Carter Morgan (19:45)

Yeah,

Hehehe.

Yes.

Nathan Toups (20:03)

especially if you start growing with Cloudflare, for instance, not only do they have good egress stuff with themselves, but if you're also on a cloud provider that it works well with Cloudflare, like Google Cloud or Azure, they actually have negotiated discounts on egress, unlike AWS, which they don't care because they're just the big dogs in town. ⁓ So we actually used, ⁓ my last startup,

Carter Morgan (20:23)

Yeah.

Nathan Toups (20:31)

we used Cloudflare plus Google Cloud for a lot of stuff. And we would use Cloudflare for also, they basically were the entire DNS infrastructure for everything we did. ⁓ They also have some really cool edge security stuff. Maybe this will be a segue into stuff later, but ⁓ they do a lot of good shift left on security things. And they've been doing really interesting stuff with AI models for looking and detecting. They just recently announced, right, that they have really advanced ⁓

Carter Morgan (20:39)

Yeah, yeah, I remember they impressed with their DNS stuff.

Yeah.

Nathan Toups (21:00)

large language model crawler bot detection. yeah, they have an AI labyrinth and now they have one where you can optionally ⁓ offer to give the large language model access to your data if they pay. So it's like a paywall for large language models, which we'll see if that actually is successful or not. But it's it's interesting that they're trying to figure out is there a way to have an arrangement so that, know, cause these large language models are incredibly.

Carter Morgan (21:04)

I saw that. Have you seen their AI labyrinth?

Interesting. Yeah.

Nathan Toups (21:30)

hungry. ⁓ But that's awesome. think this gets actually into ⁓ Chapter 21 talks about reserving time to create organizational learning. And they talk a lot about things like 20 % time in hack days, internal conferences, improvement blitzes. And I feel like this Terraform thing is kind of an improvement blitz, right? Like you're kind of carving out some time to do this.

Carter Morgan (21:53)

Yeah, a little bit, right? ⁓ I think as I've gotten more senior in my career, right? Like you, you can't just say, okay, what's the ticket, tell me what the ticket is and I'll get it done at a certain point. You have to start saying, I'm going to start working on this. I'm going to do this. so, this cloudflare stuff is a great opportunity because, ⁓ we, we just were, someone asked, like, I was just kind of clicking around to get things set up and someone's like, what about staging? like, well, we don't have staging.

we're just kind of all riding to one bucket. I like, that's not great. We should have a staging environment, right? And so then that's, thought like Terraform. Terraform is, we're a completely kind of new provider as far as doing object storage and Cloudflare. And so we have a fresh start here. So I'll just get this all in Terraform. And like, yeah, it was a little more work upfront. But then it was awesome. Once I kind of was using it to create staging resources and I built my Terraform in such a way where...

You just pass the top level environment variable that, you know, the environment for production or staging. And then I ran it again. And it just created every single resource, but instead of appending staging to all of them and appended prod. And so that was really cool because, you know, not only was it a best practice and, but it was a, ⁓ there's a real deal business case for it, which is like here. Now we have identical, so to speak, staging and production environments. And we can point to exactly.

they're supposed to look like because we have it all implemented in a declarative way. I mean, yeah, improvement blitzes. Yeah, it's interesting because that wasn't like a mandated thing. Like, okay, as leadership, we are saying that here now is time for improvement blitz. That was just something I kind of as an employee took ownership of. I'm trying to think. I don't know if I've worked at places that have done like a ton of improvement blitzes. Is that something you've seen?

Nathan Toups (23:44)

I've definitely seen this with companies that had like legacy code and ⁓ needed to have a cadence for doing this. ⁓ You know, again, this book is very enterprise-focused and it sounds like they actually shifted on this. It seemed like after reading the afterword at the end of the book, they kind of talk about the expansion of DevOps into the enterprise. And I actually would be really curious to see how enterprise-focused it was in the version, the first version of the book versus the second. Though I do think it's really,

Carter Morgan (23:48)

Yeah.

Yeah, yeah.

Nathan Toups (24:14)

it's really interesting because it still gives us this idea that we should make time, right? And even if we're talking about 20 % time, even if it's informal, I think that figuring out a consistent way of saying, how do I improve this process? And you're an early stage startup where you have a disproportionate amount of weight to shaping the future of good processes.

Carter Morgan (24:41)

yeah, yeah.

Nathan Toups (24:42)

You know, like you have a lot of autonomy and there's a lot of trust, right? If you deliver this Terraform project and it makes engineers life easier and it makes the ability to spin up an environment simpler, then next time you come up with an idea that seems a little outlandish, there's going to be more credibility behind it. they go, you know what, Carter comes to us with some left field ideas sometimes, but you know what, he's got this like great batting average, right? Like it's, think that's the, that's the part here where

the safety part comes in, which is, hey, maybe not every idea is gonna work out. And again, I talked about this in the other episodes, this book disproportionately just talks about these amazing success stories that happen all the time. But I do think it needs to be safe enough that you might come in and go, you know what, we did all this Terraform stuff, but there's a couple of parts of our process that are not compatible with infrastructure as code, and it's actually slowing things down. And we need to have a deeper conversation on why...

Carter Morgan (25:23)

Yeah, yeah.

Nathan Toups (25:41)

are we doing it this way? And that if we can invert our process a little bit to be more Terraform friendly, then we get all the cool stuff that Terraform gets us, right? And sometimes there's these like social contract negotiations that have to come out of like, hey, I realized that, know, Jill does this one click-op step that I literally can't model inside of Terraform. And I think maybe we don't do that anymore. And we just have a container with this like fixed

hash name at the end, which is the commit that it's coming from, and that pushes through the pipeline. And you might have to these weird kind of, again, their eyes might glaze over. The CEO is like, why would I care about hashes at the end of a tag or something? And you're like, no, you don't understand. If we have a hash at the end of the tag, I can write a program that will always detect the right place to take this asset and do these magical things, right? ⁓ And maybe you don't even explain it in the weeds that much. You just go, hey,

This is a way to ship artifacts and we want to care about artifacts because security team wants it. We'll need it for SOC 2 type 2 when we go down that road or eventually, you know, like you can kind of like make these business cases as like, good example, you're eventually, and I don't know if you are on this path yet. Have you all started talking about SOC 2 type 2 or no? Yeah. So you're going to, at some point you're going to try to partner with some other B2B business. Like it's going to happen and they're SOC 2 type 2 compliant.

Carter Morgan (26:54)

No, my brother's in security and I hear about SOC a lot.

Nathan Toups (27:06)

and there's this entire web of trust and you're pressured to have every vendor who's in your chain also SOC 2 Type 2. And you actually have to write a report as to like why you're allowing this vendor who's not SOC 2 Type 2 certified to be in your web of trust. And it's, no one likes to do it. Like once you kind of gotten into that little walled garden, you don't want to have to go down that path because it's just not a fun thing.

And so you get these social pressures. ⁓ It used to be really hard, super expensive, take a long amount of time. There's now organizations like Vanta and a lot of other kind of cool players now that make this a lot more streamlined. ⁓ But this is another one of those things where like the customer, most customers won't care that you're SOC 2 Type 2. Like nobody's going to go, you know what, I would have used this mentorship platform, but they're not SOC 2 Type 2. So like I'm not doing it.

Carter Morgan (27:57)

Sure,

Nathan Toups (27:59)

But

you will miss certain business opportunities if you don't. so there is this weighing of like the value of partnerships is now a new type of customer. And you have to like think about balancing feature development with like security and compliance. And it gets, it gets a little more complicated, but if you have these great CICD practices and you started doing these good hygiene of like, you know, being able to blow things away and spinning them back up and really understanding who owns what bucket and all these other things.

when you start going through the SOC 2 Type 2 checklist, a lot of that stuff's in there. The expectations are already there. And I was able to get a company through SOC 2 Type 2 because we wanted to work with a bank who required it. Absolutely, if you were not SOC 2 Type 2 certified, you could not work with them. And we did the whole thing from zero, I call it zero to hero, right? So zero to passing SOC 2 Type 2 within seven months. that was nuts. Like we went super crazy fast. ⁓

I would say it's pretty close to as fast as you can do it, because you have ⁓ to have at least a three month observation window from auditors who just watch that you're actually following the processes and procedures. ⁓ But yeah, ⁓ it was interesting. I think, again, this is where these of hack days or improvement blitzes really pay off long term. ⁓ Don't do it just because that's like...

you look at the DevOps checklist and you're like, I've got to do these things and win all the awards. You do it because ⁓ when the system gets more complex, you double or triple your head count. ⁓ The default thing is the correct thing, right? That's like the big thing for startups.

Carter Morgan (29:37)

Well,

mean, then talk about, ⁓ sell us on security, Nathan. I think we all get why we should care about security, right? But what about what the DevOps handbook teaches makes security easier on developers or makes it an interesting challenge for developers to tackle? Help us understand this.

Nathan Toups (29:42)

Why? Why should I care about security?

Right.

I'll put it this way. let's say I always like using the kitchen allegory ⁓ for software stuff because we're shipping things on a regular interval, right? In a commercial kitchen or a kitchen inside of a restaurant, ⁓ there's an expectation of operational excellence. And you're also just shipping meals to people who expect those meals on regular basis so that the chefs can't just be like, make a great meal and then like, you know, cut loose. They actually have to make consistently great meals. ⁓

and keep the kitchen clean and keep the inventory full and make sure that there aren't rodents or bugs in the kitchen, right? There's all these things that you have to keep from an operational standpoint. And so like a high functioning kitchen is a great way of thinking about software development. ⁓ Also there's trends, right? Like, ⁓ know, broccolini is super in right now. And so like, you gotta like, do you decide to hop in on the broccolini trend? Are you going to stick with whatever, right? ⁓ I think with security, the idea here is that

Security is this cat and mouse game. And folks who are trying to exploit systems are always trying to find creative ways to use a system that you weren't thinking about. The whole part is, is the system so complex that if I break five layers of assumptions, I can get root access to your Linux host, and then I can start scraping all the data in, or I can compromise people's machines with exploits? ⁓ Or...

can I do really wild things? And they talk about this in the book, like for instance, ⁓ probably one of the least scrutinized areas in security are like your CI-CD pipeline, right? Those typically have like very privileged access. They have access to security keys. If I could inject some code, I probably could do a lot of damage. ⁓ And so the idea here is, ⁓

Carter Morgan (31:37)

Mm-hmm.

Nathan Toups (31:51)

you should treat your security posture like you would treat your reliability posture, right? Like in the same way that I don't, I want people to, if I have a shopping cart online, I want people to be able to check out. If there's a bug right before they hit the, you know, ship me this order button ⁓ and they can't, they bail because they get frustrated, they can't their credit card in, they leave, ⁓ that's a huge impact on the business. On the security side, if I can't trust,

are deployment processes or there's a supply chain attack. we talk about this, ⁓ this has become even a bigger deal, which is, know, ⁓ lot of companies YOLO, the dependencies that they have, especially if they're JavaScript heavy. And some of these JavaScript libraries will get, the person will get burned out. know, state actors come in like North Korea or China, they will start, you know, hot swap stuff. And if you're just...

pulling in the latest version of the code, you can get compromised even in what they call the supply chain and some trusted path. ⁓ The security instrumentation is this, in the whole security posture piece, is this idea that we should always make time to think about security and they call it shifting left on security. So used to be that you'd write all bunch of code and at the very end, your security team would be like, yes, I'm gonna give it my blessing and you can go out. But that's way too late. The idea is that you should actually have

Carter Morgan (33:11)

Mm-hmm.

Nathan Toups (33:14)

security best practices in from day one.

And for example, I just rolled this out with my team, who's like got a new code base and we use a lot of Go code. And there's this module called GoSec. It's a static analysis tool. And it just catches a bunch of security flubs that happen in your code. And it's not gonna catch everything, but we now have it with our, we do linting, we do this GoSec, we do...

test suite, these are the quality gates that we put in. But it's caught a few things that like really minor bugs on my side, but some bugs where I'm like, oh yeah, I hadn't thought of the unintended consequences of this. Other hygiene, would say like easy wins is one of the things I love to do when I first get involved with a code base is our containers too permissive. So a lot of times we'll use build containers where we like build all the stuff.

And then we ship those into production, right? Like that's a naive way of doing it, where I literally do my copy my files in, maybe pull in my dependencies and then ship that to production. But most of the time, what you really want is what they call a multi-stage build, where I do all my building and then I only pull in exactly the files that are needed for runtime. I use a super minimalist, like locked down container. And you get two things out of this. Number one, your containers get way smaller. Instead of it being 500 megabytes or something, you might have a container that's like

23 megabytes. ⁓ So not only is it easier to deploy stuff, but you also get this what they call a lower attack surface. A lower attack surface just means there's fewer things for a security researcher to fiddle with. If I only have, like for Go, I'll have a single static binary running in a chain guard container, which is a super paranoid container image. ⁓ And even if they exploit my Go code, there's something wrong with it.

It's running as unprivileged. It's running as a single binary inside of it. There's no extra access to the system. ⁓ It isolates. has a small, as Amazon would call it, small blast radius. ⁓ And again, you could come in with a dev spike and just be like, hey, I'm going to make our containers load faster and act smaller and reduce latency when we do deploys. And it's also a security posture thing. You kind of get both. You get a more efficient...

Carter Morgan (35:19)

Hmm.

Nathan Toups (35:39)

deployment cycle and this better security posture. And that's what I typically look for in security stuff. It's like, you don't have to fix everything, but you should look at it in the same way that you're like looking at infrastructure as code. It's like, hey, are we really building our containers the right way? Are we really validating that there's another good example. And this one is actually like hard for a lot of engineers to wrap their heads around.

A lot of times you'll do branch-based builds. So I'll build everything in my PR, or have it in my staging branch, and then when I deploy it to production branch, or let's just say we're doing branch-based, the pipeline will rebuild the container. The funny thing is, is that it's actually very hard to have what they call hermetic builds, which is proving that my container from staging environment which worked is identical to the one that's built in production.

it's actually much better to take what they call like take an artifact and promote that artifact. So build it once, prove that the data has not changed. Like if your runtime code is the same between branches, ⁓ I shouldn't rebuild the one that's been merged in the main. I should actually just say this asset is being promoted to this new environment. So I build it one time because that container asset is what's gone through the rest of my stuff. And again, this is like,

Carter Morgan (36:43)

Mm-hmm.

Nathan Toups (37:05)

one of the many ways you can do security posture where I, ⁓ we think of an asset oriented artifact, like these artifacts are, you know, moved through these pipelines, not just build processes where again, surprising behavior can come out.

Carter Morgan (37:22)

This is tangentially related to security. ⁓ but as far as, there any point to having a staging environment? I guess the bit, point to having is so CICD world, right? Where theoretically the idea is I, I develop locally, right? I pushed a staging. There's validation done there. And then if it passes all of that validation, goes to prod, but

Like in a CI-CD world, if it's good enough to pass the validation and make it into staging, isn't it just by nature good enough to pass the validation to get into prod? Like what is the purpose of the staging environment at that point? Is there the purpose of a staging environment or should we be thinking of staging more akin to like a dev environment, but just kind of up in the cloud?

Nathan Toups (38:13)

Yeah.

This one's contentious. I also think that it's, staging and production and these separations are organizationally driven. They're not driven by what's the correct answer for you. I will tell you that like, I'm working on like a little side hack project right now, and I'm purposefully choosing to only have a single production environment. And I'm doing that because I'll have the ability to do things like,

dynamic routing with headers to do alpha and to like basically have a virtual staging environment inside of my production environment, right? I can see alpha and beta features through feature flags, but I have one continuous environment. I think that that is a sign of maturity if you can get into that world. Now, that being said, if you have one production environment, it doesn't necessarily mean that everything's touching the same backing database. You might end up having a service

Carter Morgan (39:00)

Yeah.

Nathan Toups (39:12)

that has like an alpha version with an alpha version of its database, microservice type of thing, that then it only touches that when it's an alpha and then it hops over into the production one. There's all kinds of different ways that you can manifest those things. I think some environments have regulatory rules where they must have a staging environment. I think that's another good example. Or folks that have a high risk where

Carter Morgan (39:33)

Mm-hmm.

Nathan Toups (39:40)

they're not comfortable just having feature flags as the thing that blocks it. They really need complete separation, or at least that's their perception of it. A good staging environment should look as identical as to production as possible. That's one of the big ideas. it's more expensive. It's more expensive to operate an end-to-end staging environment and figuring out where those barriers and delineations are.

But where I've gotten is I was trying to, I always try to push us to a world in which the pure version of a trunk-based development happens, where ⁓ everything can be in the same infrastructural environment and you don't have to have staging versus production. ⁓ But I don't know, again, there might be, again, regulatory reasons, there might be ⁓ separation of duties reasons, like for instance,

Carter Morgan (40:18)

Yeah.

Nathan Toups (40:37)

we do need to have, like, I'll put it this way. If you have a software product where you're constantly pushing out updates and nobody knows, you don't think of like, which version of Facebook are you running, right? Like Facebook just constantly updating versus if you're like building an operating system, right? Or you're building something that like has a cadence or release cycle. ⁓ Of course you're going to have a nightly builds.

Carter Morgan (40:47)

Yeah.

Nathan Toups (41:01)

and a alpha testing, and then a stable, right? Like you're gonna have these environments that are very separated from one another. And I think that's fine because you're shipping a product that has a cadence to it, right? People say, okay, I'm gonna update my operating system every time a long-term support version comes out, right? Or the node runtime, right? Node doesn't, you don't use the nightly build of node.js most of the time. Most of the time you're saying, okay,

What's the most practical long-term support version? Okay, well, we'll use that for the next year and a half. And then when the next long-term support comes out, we'll get ready and do the upgrade. ⁓ And again, depending on your cadence, depending on who relies on your code to do other things, I think that there's really good cases as to why. ⁓ We had the same thing with FDA regulatory stuff. We actually had to keep an FDA environment.

that was what we will release to production after FDA approval. And it basically was treated with like white gloves. So we did continuous delivery in our staging environment. That was like our nightly build. And then at some point we would say this commit hash is what we're going to deploy into the FDA environment. That's what gets the scrutiny. And once that got blessed, became all of the assets, all of the artifacts in that became the production release.

And again, we had to do that because it fit what the expectations of the FDA were, right? We couldn't do continuous delivery. I don't really want my heart defibrillator to have an update that could go sideways on a nightly basis, right? Right, right.

Carter Morgan (42:34)

Yeah.

I just test in prod. Test

your hard defibrillator in prod. ⁓ So in a pure trunk-based environment, is there no concept of a staging database?

Nathan Toups (42:54)

Yes, again, this is where, ⁓ if we really talk about the decoupled services, ⁓ the teams have autonomy. So if you think about it, back to team topologies, where I should be able to operate, let's say most of time you have a trunk-based environment, you also have microservices, or at least a service-oriented architecture. And I have a team that has control over their own deploy cycles of some piece of the puzzle. ⁓ You get to put the things that happen in there. A lot of times,

what you'll think of is that staging becomes this sort of like ephemeral part of the pipeline. So during the pipeline, during regression testing or all these other things, your pipeline will have some like, it'll have some fast paths, like what you'll do for a PR, right? You don't want your pipeline running for hours and hours, but there might be a part of your pipeline that runs these regression tests against a bunch of other.

larger parts of the system and you might want to run that for hours or even days sometimes. If you're doing fuzzing is a good example where ⁓ maybe I want to fuzz some really important code for eight hours or 12 hours or 24 hours. It could be that you run those parts of your pipeline and then those don't get promoted to production until the blessing comes from that quality gate, right? So...

it's ephemeral in that you probably spun up a database and you probably did all these other things. And it might even interact with some long lived regression testing tooling that you have. ⁓ But there's not staging environment that like a human hits a button and deploys to, right? And they kind of touch on this a little bit. And again, it really depends on what you have. And we did this a lot with this fintech company I was in. ⁓ We did have a staging environment. ⁓ So basically what we did is you had a local dev.

You could spin up all or part of the microservices infrastructure. You could spin up your application for dev. You could hook it into the staging environment if you wanted to. You could do all these kind of interesting things. ⁓ Staging was kind of this like, we called it like pre-pride. It was like a production ready. So the idea of staging is that at any moment somebody should be able to hit the promote button and it just, it goes to production, right? And so you should write production quality code to go in staging. Staging is not a dev environment where you can have a bunch of stuff that breaks.

Carter Morgan (44:56)

Yeah, I am.

Yes, yes.

Nathan Toups (45:10)

Right. And so, and then you hold your whole team accountable and you say, Hey, I've noticed that like your service has been real flaky or I've noticed that, ⁓ when I try to update my dependencies to this stuff that you have, ⁓ I've had to go back and reflect a lot of stuff in that you're not honoring the contract where you don't break the API. is one of our cardinal rules. Right. ⁓ and we would stop that would be when we pull the end in court and, and say, Hey, look, let's update our social.

agreement and say that this should never happen. Otherwise I have the ability to roll back your service, right? Like, and we would do these kinds of things. ⁓ And so what we ended up having is the ability to make pretty huge changes without having side effects. We all got really comfortable with the deployment process. ⁓ And we measured ourselves by saying that we wanted the diff between staging and production to be as little as possible.

Right? So we wanted to spend as little time as possible having a service have to sit in staging before it was promoted to production. ⁓ And so we would then say, okay, well, how do I meet my principles? My principles being don't break stuff, honor the social contract. So we got these, a bunch of automated testing in place. And then we would run regression testing inside of staging continuously. And so

We would have like, for instance, what one of the things that we cared about, we had this like very normal cadence of like when we rebalance our portfolios is that, you know, we would basically say like, okay, well, if it gets into staging and it makes it through these like two or three regression testing cycles, it gets the like promotion blessing and it can go up. Or if we could prove that the code path was not impacting regression testing, like, you know, maybe it was improving SwaggerDoc or something. You could just like push that up, push it on, push it through.

But again, a lot of this stuff is like organizational, it's the Conway's law stuff, right? It's like, how does it help our team ship better? And so yeah.

Carter Morgan (47:10)

Yeah, I am.

So then talking about security then and the big, the big idea that this book is trying to sell is that security does not have to be a barrier to flow. Security can be part of your flow. have you worked in the past with dedicated security engineers, like embedded in a team, or is this just something that the engineers naturally on the team have adopted security procedures? Because like I said, I've mostly worked at big companies.

And when I have worked at those big companies, the security team has always been a barrier to flow. It's always been fill out this 80 question, you know, this 80 question form, ⁓ or submit to these extensive reviews before you can publish a feature. and so if you, if the idea is, okay, let's get it in the pipeline, let's just have security be part of our DevOps process and we can just

Nathan Toups (48:00)

Yeah.

Carter Morgan (48:17)

be sure that every change that's rolling out is secure. Again, thinking about it from like a Conway's law perspective, like I would think what you would need to do is embed security engineers in every team because if you naturally have a separate security organization, of course your system's going to evolve to have a separate security part of the process. So what do we do here?

Nathan Toups (48:39)

Yeah, so there's

always that dysfunction. And the book brings this up really well. It's like, hey, if you have this multi-part forum and this other team that you have to engage with every time, you're violating some of the DevOps principles. And actually, one of my favorite case studies, and I've kind of breezed over most of them, there's a Capital One, I think it's called the Capital One No Fear Releases case study. And basically,

It was a partnership between the security team and the engineering team. And they eventually got to 10 plus releases on their banking website per day. And you're like, well, how do you do that when you have to do all the security and compliance stuff? And one of the big things that they advocate here is that, yeah, if you're doing something brand new, you probably do have to have a quality gate that kind of stops the earth kind of thing.

Carter Morgan (49:18)

Yeah, yeah.

Nathan Toups (49:31)

But what a good security team should be doing is giving the correct defaults, right? If you're going to use a new SSL and encryption library, you probably have to have a lot of scrutiny. But there should also be some officially blessed versions of OpenSSL and the default configurations that the security team should weigh in on, maintain, and then provide to all of the teams saying, hey, if you do this, you don't have to fill out the form. We have already blessed this operation. We've already blessed this tool.

And that was a good example of how, again, we do the same thing with, again, we use Chain Guard. ⁓ We're not sponsored by them or anything, but Chain Guard's really cool. They just give these like, yeah, you can totally. ⁓ Chain Guard is a really great community-driven way of having high-security posture container base images. They're optimized for all the major languages. You ⁓ can kind of have it as,

Carter Morgan (50:10)

Jane Gardner, if you're out there, you can sponsor them.

Nathan Toups (50:30)

is lightweight, ⁓ a good security team should come in and say, if you're deploying Go code, these are the blessed base images. Here's what the hygiene has to be. ⁓ Maybe have automated security scans to look for suspicious behavior. I think it was, was it Neil Ford, when we were talking to him, he called this orthogonal coupling, right? Orthogonal coupling is that we come up with a blessed way of doing certain things like,

How do I issue logs? How do I do security stuff? How do I do these things where even though all of our teams have autonomy in how they're going to build things, if you use a blessed way of doing it, you take the easy road. Your team doesn't have to build all these things from scratch. And I think when you can come up with these, but at the same time, you must give them a way to deviate if it's needed for their team. So yes, you might put up some extra barriers and say, hey, fill out this form or show us why.

Carter Morgan (51:12)

Yeah, yeah.

Nathan Toups (51:29)

does our out of the box tool not work for you? Because one of two things comes out of that conversation. They either come back and go, you know what? You guys are being super weird, and we're going to have to do some extra scrutiny, and that's fine. Or they go, you know what? I can add one feature flag to our core security tool, and I'll let you all give us a two week dev cycle, and we'll have it for you. So use your own thing right now, but I'll give you the blessed way of doing it in two weeks and just negotiating.

these sort of like security best practices that go on top of it. To answer your other part of the question, have had, I've not, again, I've not been on big enterprise teams, but I have been on teams where we have dedicated security folks. And the ones I've worked best with kind of take the same approach that like a SRE or a platform engineer does, right? They do the enablement as we talk about in team topology is like, they do enablement in that they, a lot of times there's like some learning that your engineering team has to do as to like why we're here.

Like, ⁓ there are these three extra steps you have to go through because here's the exploits that happen if you don't, right? And you're like, ⁓ okay, now I get it. Instead of just being like, no, you can't do that. Like no more fun, you know, that's dysfunctional. And I think that's like defensive and that's like the traditional ops path on things. I, I always try to dismantle that if I can, I guess.

Carter Morgan (52:34)

Hmm.

Yeah.

I think it's important. One of my beef with security folks is like, they seem to take a great pleasure in saying no and in denying. I think there's a part of that that's natural because it's kind of like, if I caught this thing that was dangerous, then I'm doing my job. And so kind of the more I say no to things, the more I'm doing my job. So I think there is a real. ⁓

Nathan Toups (52:59)

Yes, right.

Carter Morgan (53:16)

mindset shifts that may need to happen there, which is like a security team's job is not how do we catch as many things as we can and more, how do we enable development teams to get to yes as quickly as possible? And so, you know, maybe that looks like if you have a separate security team, maybe you're developing a lot of tool, you know, plugins, building blocks ⁓ that development teams can use. Salesforce did this in one of the case studies where

They classify low-risk changes as a standard and that bypass their whole approval process. And so I can imagine as a security team looking and saying, okay, what changes can we say if you do it in X way it's standard and you don't have to do that. Maybe there are some bigger things where we say. We don't want you to handle this on your own. And if you do handle on your own, you're going to have to submit to an extensive review. But if you use X component that we've built, that's the blessed path. And so.

Now that's an automatic approval, right? Yeah.

Nathan Toups (54:17)

Yep.

And I think it's in this, this goes beyond the DevSecOps, know, that's kind of like the term that folks will use. ⁓ The whole idea though, is it just like make the correct behavior, the default behavior. correct is defined by the teams that are involved. And a security team should always be doing two things. Number one, setting security standards that are easy to implement, or as easy as possible, I should say to implement.

The other one though is that you should really be red teaming in time to your organization, right? So let's say you told a team, this is a bad practice and you're going to get in bad shape. And let's say you deploy that into your, know, dev or staging environment. You should, if you're a security team, should take an adversarial approach and always be trying to hack your own infrastructure, right? So like, if you, if you know that a team deviated from some auth library that you have officially blessed and they're using some other pattern.

and then you can show an exploit, the data is way better than a no, right? Hey, this is why you can't do this and this is why we can't have nice things if this went into production, right? Boom, here's the data. ⁓ If you had used our auth library like we'd asked you to, then this whole category of exploits gone. ⁓ That's a great way to talk to an engineer, because nobody wants to be like, see a nasty report with a bunch of red text on it that shows you that you did something dumb, right?

Carter Morgan (55:41)

Yeah.

Nathan Toups (55:43)

And that's way better than like, no, that's not best practice. And you're like, okay, but show me what I did. Like show me how this could be exploited, right? That's show me is like, is important.

Carter Morgan (55:53)

Well, I think that brings us to a good stopping point here. Yeah, so I mean, that takes us to all the way through the DevOps Handbook. I was really glad to be able to read this one because it is kind of one of the more recent seminal texts in the industry. And so I've really enjoyed this. I mean, what are you going to do differently in your career, Nathan, from having now we finished the entirety of the DevOps Handbook?

Nathan Toups (56:20)

Yeah, chapter, I mean, sorry, part five really spoke to me. I think this idea of ⁓ the internal learning, the learning organization. I've been in organizations in which we had open topic Fridays or something where anybody could share. We did a lunch and learn where you literally could be like, I built a fence over the weekend and here's what I learned about building a fence, right? Or, ⁓ we just used this new auth library. ⁓

Carter Morgan (56:45)

Mm.

Nathan Toups (56:49)

So this idea of, want to do more things around improvement blitzes and internal conferences. could we, like I'm in a blockchain space and there's like way too much changing in the industry all the time. It'd be really nice if we had some of our category authorities do an internal conference to make sure that all the engineers understood what was possible. And we don't have that right now. So I'm going to be pushing for something like that.

Carter Morgan (57:13)

Yeah, I had a very similar thought, ⁓ which is mine is kind of a big mushy answer. just, yeah, how do we improve team learning processes? Like I thought it was really cool. We do demo days, which is awesome. Everyone shows off what they work. But one of our new engineers, it's funny, there's three of us who all used to work at Amazon. And we call ourselves the Amazon refugees. And ⁓ we ⁓

But he's a really bright, talented guy. But so we do, we, did all kind of our demos of the features and stuff we've been working on. But then he asked at the end, he's like, do we do code demos? I'm like, what do you mean? He's like, you know, just neat pieces of code that we've read him. Like, yeah, sure. Why not? You can do that. And he showed us that he founded like the code base, like some, uh, it was Java and like, it was, it was like a constructor for a variable, but it was just like, it took like 19 parameters. And so like the first six are like, you know,

value, value, value, value. And then it's just null, null, null, null, null, null, null, null, null, false, null, null, null, null, And it was just like, what the heck? And so he wrote it, just rewrote the object to have a builder pattern, right? And so then he just built it and then have all those nulls. And I thought that was one, like just great practice. And two, really needed him to take the initiative to show that to the team because that just enforced a good team culture of like, hey, look at this small improvement to the code I made. And now,

Every other engineer and we've got more junior engineers, right? knows when they run into that. No, that, that giant contractor with lunch and nulls, I'll say, wait, didn't, Sham do something like that? And, ⁓ and they can implement the builder pattern. And so I want to do more things like that. Like I'm trying to figure out like, does that look like a lunch and learn like, you know, for example, like Terraform when I, when I started doing Terraform work, I take the impression because our team is about.

our company is about 35 % engineering, but then there's 65%, which is like sales and business ops and then things like that. But I'm pretty adamant about the fact that we're a tech company. And so if you are a non-technical employee here, that doesn't absolve you from understanding the technical parts. And so I was explaining Terraform over lunch to several non-technical people to kind of say like, this is the value we're getting out of this. so

Again, I don't know what that looks like necessarily, but I would like a better internal culture. Like he's almost internal conferences, internal talks, lunch and learns. ⁓ I think that could be really valuable for the team.

Nathan Toups (59:40)

frame.

Yeah,

one of the interesting things when I was at one of the startups I was in, they had built some, and this happened, They weren't using Terraform, they're using something called Troposphere. And I had never seen it before, but it's sort of like a prerequisite to using something like a Terraform or a Pulumi or one of these other sort of, you wrote Python, it used the AWS SDK and it spun up a bunch of infrastructure. And of course there were some messy abstractions and there's some other things that they had kind of done and they weren't DevOps people. there's some...

Carter Morgan (59:59)

Okay.

Nathan Toups (1:00:19)

things like item potency and other stuff were not there. But one of the clearing problems I found was they would hard code IAM user credentials into the servers that they deployed, which is like a super old school way of doing it. We're talking about 2010 era way of doing things before IAM roles existed, or at least people's understanding of IAM roles existed.

Carter Morgan (1:00:33)

Hmm. Whoops.

Mm-hmm.

Nathan Toups (1:00:48)

And so I gave a little luncheon learned on why ephemeral credentials are awesome and why IAM roles is the way to do things. And it was just a really basic primer for a little interesting piece of security ⁓ that not only did it make our security posture better, but it actually made their life easier because you didn't have to keep sensitive credentials and build pipelines, which you never want to have. ⁓ it was kind of like, there was a couple of engineers who just never thought about this domain of problem.

before and it was really appreciated. And again, it was a small little 15 minute lightning talk, right? And this opened up a whole other world of like really cool, just really cool conversations that we had about improvement, right? It was an improvement of our processes. And I ended up being able to eliminate a bunch of building code and give us a more reliable pipeline and check some boxes on the Vanta.

thing which is like, no long live credentials in production, right? Like, boom, we knocked out like three things in one and it was awesome.

Carter Morgan (1:01:49)

Well, that, I mean, yeah, that sounds nice. And ⁓ so, yeah, I think, I think we're both kind of on the same page and just wanting to kind of level up team learning. We've now finished the entire book. mean, who would you recommend it to?

Nathan Toups (1:02:03)

You know, so I'll take what I took from last week and kind of expand it a little bit. Obviously, this has a real enterprise focus. I think if you're struggling in the enterprise, let's say in some sort of legacy environment, and now you've realized, hey, we're actually a tech company because that's the future, and you really need to level up the thinking and the posturing here. Obviously, this is like a very enterprise focused book, but there is some meat.

There's some meat to this book that I think would be useful to a broader audience. Again, DevOps folks that are new to the industry and then engineers who maybe have found themselves kind of thrown into DevOps. I really do think that part three, four and five, which really get into the implementation details of the three sort of major goals for this book. And then part six, which is the security thinking, ⁓ it's worth a read. So again, if you're not in enterprise, but you still need some DevOps,

understandings of how these can be successful and why they're useful in a business of basically any size. I do think you could skip parts one and two, just jump right into, or skim parts one and two, get into three, four, five, and six, and there's a lot to gain.

Carter Morgan (1:03:17)

Yeah, I think this.

This book is obviously most useful for senior engineers or leaders who have the ability to kind of make cultural changes. ⁓ But I think it is still good for junior engineers who want an idea of what an ideal vision looks like. That's something I found really helpful throughout the podcast is reading all of these things and kind of knowing. You have a North Star that you're working towards and you're

Anywhere you work, you're not going to have everything perfect all the time. But you can kind of know if you're veering into the wrong path. ⁓ And I just think anyone who

who needs that vision could really benefit from this book. I did some consulting work, actually, recently with ⁓ someone who is from private equity firm. And they had acquired a little company with ⁓ a development team. And so he's managing it, but he's not a developer. And so he's just trying to get my advice on what everything should look like. And he's asking about team dynamics and things like that. And he's talking about they're kind of like,

SVP of tech, know, CTO for all intents and purposes. ⁓ and he was just saying like, he's really, really good technically and we can't afford to lose him. And we really value his technical expertise. But then I was also asking about kind of like their development processes and they were just weird things. like, well, we release every two weeks. I'm like, that's not good. Right? Like you're not doing any sort of like CI CD. ⁓ they were talking about like PR approval. So like, well, your PR, there's 12 engineers on the team and there's like,

Nathan Toups (1:04:54)

All right.

Carter Morgan (1:05:01)

three who are blessed to approve PRs. Like that's weird. Like, you know, even a junior engineer should be able to approve a PR. Um, and so that, that SVP, they say they really value his technical expertise and I'm sure he's a very smart guy, right? Does he even know that this world exists? Does he even know that there's a way to do releases more than every two weeks? Right? I've just seen that happen before where

You have people who like are like, they've got like a lot of like technical IQ and horsepower, but they are kind of set. They're, living in the stone age a bit and they're applying all of that horsepower to like, you know, nailing the two week release. And they have all these processes and firefighting and, and judo knowledge to making that, that release every two weeks work. But what if you applied all of that horsepower to DevOps?

and to getting rid of the two week release problem and to building this culture instead. I think, you know, if you're still kind of operating in that world where like releases are a big nightmare, like read this book because it'll give you the vision for what an alternative looks like.

Nathan Toups (1:06:14)

Yep, I agree. yeah, sometimes learning how to get out of your own way is so hard, right? It's very, very easy to talk about what we need to do. ⁓ unless you really live these principles, it's so easy to get sucked into the minutia of just dysfunction. I think books like this are a good reminder. And like I think I'd said before, this is a technical compendium to the Unicorn Project. So if you read the Unicorn Project and you're like,

Carter Morgan (1:06:19)

Yeah, yeah.

huh.

Mm-hmm.

Nathan Toups (1:06:43)

This speaks

to me. I see my organization here. I feel like these challenges are ahead of me. ⁓ This will give you the deep technical way to communicate to leadership. And hopefully you're in a position of influence. And if you are those two things together, this is really valuable. It should be eye-opening.

Carter Morgan (1:07:00)

It's

we might wind up reading all of the Gene Kim books, but backwards because he wrote the Phoenix project and then he wanted to write like the prescriptive version of it, which is the DevOps handbook. Then he wrote Unicorn project. We did Unicorn project and DevOps handbook, but we just heard someone try the Phoenix project. We'll probably have to read it eventually.

Nathan Toups (1:07:06)

Mm-hmm.

Also, apparently he got inspired to write The Phoenix Project because of a book called The Goal. I'm adding that, I have it in my notes, I'm adding it to our backlog. apparently it's this, it's like how to do things within constraints ⁓ and operational excellence type stuff. I think it's gonna be one of those where like, if we get into a phase where we're doing the sort of businessy type books, like, you know, Seven Habits of Highly Effective People and, you know, things like that, I think.

Carter Morgan (1:07:23)

interesting.

Yeah. I'd love to read

how to win friends and influence people one day. That's okay. I think that that wraps us up. ⁓ As always, you can find us on Twitter at Carter Morgan or at book overflow pod. You find Nathan's newsletter, functionally imperative, functionallyimperative.com. And you can contact us at contact at book overflow.io. Thanks folks. We're excited for what's to come. Stick around.

Nathan Toups (1:07:46)

yeah, there you go. There you go.

Episodes in This Series

Ep. 69Is DevOps a Silver Bullet? - The DevOps Handbook

Jul 7, 2025

Ep. 71Deployment Strategies for Success - The DevOps Handbook

Jul 14, 2025

Ep. 73Shifting Left on Security - The DevOps Handbook(This episode)

Jul 21, 2025

Ep. 87Patrick Debois Reflects on The DevOps Handbook

Oct 30, 2025