Security vs speed: A culture that chooses both | Contrast Security

X

There is a persistent myth that there is a tradeoff between speed and security or quality — that going faster means less security and lower quality. This is just not true. All the data says that teams that are shipping to production multiple times per day have much lower security risk than those that take a couple of weeks (or longer) to get a new release out.

Why? The simple answer is that they've automated all of their quality and security checks. But a better answer is hinted at by this quote from Gene Kim's book, The Phoenix Project: “Improving daily work is even more important than doing daily work.” True DevOps teams have taken this to heart and are constantly learning/changing to do even better. The most wonderful thing about this is that developers, unlike typical QA and security folks, love to capture that learning with more robust code for both the product itself and everything necessary to deploy and run that product in production. Anytime a problem is found, we don't just fix it, but rather we make a change that means we'll never have a similar problem again!

Unfortunately, a lot of nominal DevOps development teams are not getting this speed + quality and security benefit. The difference is in the details of their DevOps approach. This talk illustrates very clearly the difference between "cargo-cult" — i.e., copying code without understanding how it works or whether it’s actually required — and "true" DevOps. As well, this talk provides a simple framework for getting the promised value out of your DevOps cultural shift.

This video is from the DevOps Experience 2024 event presented by Techstrong Group.

About the author:

Larry Maccherone is a thought leader on DevSecOps, Agile, and Analytics. At Comcast, Larry launched and scaled the DevSecOps Transformation program over five years. In his DevSecOps Transformation role at Contrast, he's now looking to apply what he learned to guide organizations with a framework for safely empowering development teams to take ownership of the security of their products. Larry was a founding Director at Carnegie Mellon's CyLab, researching cybersecurity and software engineering. While there, he co-led the launch of the DHS-funded Build-Security-In initiative. Larry has also served as Principal Investigator for the NSA's Code Assessment Methodology Project which wrote the book on how to evaluate application security tools, and received the Department of Energy's Los Alamos National Labs Fellow award.

Full video transcript

Hi. I'm Larry Maccherone, and I'm here to talk to you about the title of this talk, security versus speed, a culture that chooses both. So we normally think of security and speed as trade offs, but the data does not back that up. The teams that are able to move the fastest are actually the ones with the highest security. And, I've not spent a ton of time explaining why that's true, but I'm gonna describe to you how you get to a situation where that's the case.

And then you'll just have to try it to see how you experience both going up at the same time. So I'm gonna give you a little provocative idea to sort of drive home this point of what I really mean. And the provocative idea here is that the way you do service level agreements for vulnerabilities, about ten days for criticals, thirty days for highs, a hundred and eighty days for mediums, or whatever the heck your SLAs are are actually harmful. I think that if you have any medium, any highs and any criticals, if you have any criticals, you shouldn't work on any highs.

If you have any high highs or criticals, you shouldn't work on any mediums. And you should focus all of the attention on the criticals first and get to completely clean and not just completely clean, get into a blocking mode such that you never have criticals get into production again after that point. And so that's where the less than one day SLA comes from. You basically don't have an SLA on anything other than the part you're working on, and it gets resolved and stays resolved, stays clean, and then you start to work on the other areas.

And that is a much more effective way of doing risk reduction.

So, before I get too far into this, a little bit of back about my background so you know where I'm coming from. There's some logos that are gonna come up there that sort of describe where I'm from and what I've done. But there's really two things I would like you to know about me. First of all, I was the head of application security at Comcast, and most people realize in the states at least that Comcast is a cable company. But there's a lot of properties and a lot of product development. Ten thousand developers, ten thousand people working on development teams and six hundred different different development teams and spread across all sorts of business units that you maybe forget are part of the Comcast family, like NBC and Universal and DreamWorks and Peloton and and Hulu back end and, the all the cable company systems in Canada, are back ended by Comcast, products, etcetera.

So, all the ad placement. So there's a lot of a lot of lot to that to that environment. Very diverse, lot of acquisitions, and I had to put in place a system that got developers to take more ownership of security over the course of five years it took me. I've got pretty much all the development teams to at least commit to doing that.

I got about halfway there before I left, but then the commitment from management was that we would do the rest. And it was highly successful. Lower cost of running the program, AppSec program, and much higher, a much higher risk reduction, six six x better risk reduction than the prior way of doing it. And the system I'm gonna talk to you about today is essentially how you get to that culture, how you make that culture actually happen.

And, the second thing I'd like you to know about me is that I'm an active developer.

I write code almost every day. I'm the primary author of a dozen open source projects, one of which gets a million downloads a month.

And it is used by, you know, all the cryptocurrency exchanges and every cloud vendor, and it's considered critical infrastructure by the US government because they don't want it to be a vector for supply chain attacks. And so it's gotta be highly secure. And everything I'm gonna talk to you about here is the way I run that project. There are twenty or so contributors to that project, and I run it exactly like the system that I'm about to describe to you here in this talk.

Okay. So I'm gonna start with sort of setting the scene a little bit here. So app and API security is fundamentally broken today. And so, this is pretty typical, this is a cumulative flow diagram, but it's pretty easy to read, if you've never seen a cumulative flow diagram before.

Basically, this orange line goes up when new vulnerabilities are detected. This green line goes up when they're either marked as resolved or marked as false positives, and then this blue one represents the portion that is resolved, as by being marked as false positives. And you can see here that this gap in the open vulnerabilities, it never really gets lower. And, in fact, it's increasing dramatically.

And the only place where they reduced it dramatically is when they marked a bunch of stuff with false positives here. And, you know, you could argue that they weren't really. They just sort of said, oh, we're just gonna declare risk accepted and and and move on. So this widening gap, we stopped getting bit dinged with this.

This is pretty typical. In fact, I see a lot worse than this at times where the findings just run away from the resolution. And and and then you are stuck with this inventory management problem, and that's where the SLAs tendency comes into play, that hundred and eighty SLA.

Of course, that just gives them permission to wait the full hundred and eighty days before they even think about it. And they're still not gonna think about it one hundred and eighty days past unless you ding them for being past a hundred and eighty days. But you shouldn't ding them for that if they've got some criticals or even highs that are that are open. So this is fundamentally unhealthy. This is what a healthy cumulative flow diagram looks like.

So you took them a little while to get going with resolution. But once they got going with resolution, it took about three months after the tools started detecting vulnerabilities.

And, they resolve them pretty rapidly. And, you know, there's a little story about this spike in false positives for this team and this resolution ramp is essentially a representation of that because we changed the tool to the false positive because the tool was configured wrong. And when we did that, we did that right here. Boom. All those findings that were from that, bad rule, got fixed. And so this way of running the program where you actively are trying to reduce false positives leads to better trust between the engineers and the security folks who are running the tools, and at least to this sort of rapid resolution. And they're pretty much staying even with it evermore.

And you could look at this curve for just criticals and and then it would just shift to the left a little bit because the emphasis is to just resolve the criticals, don't even think about the highs. And they didn't really start working on the highs till till this steep curve here. And and so this one has both criticals and highs being shown here, and there's no medium shown here. This team never got to mediums, before I left Comcast.

This is data from Comcast I have permission to show because I've shown it publicly a lot right there. So I keep saying don't work on the highs until the critical is resolved. Don't wear the mediums till the highs are resolved. Let's theoretically back that up with why that's the case.

And the theory is called the theory of constraints. And the idea here is very similar to the weakest link concept is that every human process and resolving vulnerabilities as a human process has bottlenecks.

And if you make an improvement anywhere beside the bottleneck, it's just wastefulness.

So and and the weakest link con is the same concept. So if you improve the strength of this link, the chain doesn't get any stronger. The only link that you can improve the strength of and the chain will get stronger is this. And so I contend that finding vulnerabilities is not the bottleneck and resolving them is. And yet we spend a lot more energy buying new tools to find more stuff or better find stuff, more easily find stuff, and rolling them out too far and wide without worrying about the resolution curves. We wanna get the tools spread across the environment.

And then we think about resolving it as a later exercise. When you'd be much better taking a depth first approach, depth first in terms of you install a tool in one team for one product, and then you expand it to a second product, and then you expand it to another team, expand it to a whole business unit, you expand it to another biz so deploy it that way. But every time you deploy it, you also focus on resolution, not just deploying it. You focus on just the criticals first, and then you focus on just the highs after that, and then you focus on the mediums after that. And so it's a depth first approach rather than a breadth first approach, and we tend to take a breadth first approach, to our detriment. And that's sort of the one of the key insights to the whole program that I implemented at Comcast that greatly reduced risk.

So about the time I launched the program at Comcast, I wrote this thing called the DevSecOps manifesto, the original one. There was another one that came later.

And, basically, I've kinda drifted away from that a little bit. And I've even drifted away from the term DevSecOps a little bit because it's gotten overloaded and it's, you know, misused. And a lot of people have basically misused it by saying, let's slap some DevOps lipstick on a traditional security pig and call it DevSecOps.

And so I don't actually use the phrase DevSecOps any more.

Even though my title at Contrast where I work now and help team companies to sort of implement this culture, it has DevSecOps, a transformation architect.

So even though it's in my title, I still think of it more as developer centric or shift left or shift smart or you know? There's not a great term, unfortunately, for it. It's all of these things. But all of these things mean three things to me.

It empowered engineering teams taking ownership of the security of the products that they are building. So you build it, you run it. You've probably heard of the DevOps world. You build it, you run it, you secure it is DevSecOps to me.

Or you build it, you secure it, you run it maybe if you wanna get the order right.

So they own it. And then they don't own all of it, and they get a lot of help from the security group just like they get a lot of help from ops people. And as particularly SecOps is still gonna be a separate thing for the foreseeable future.

But they own as much of it as possible, and they are worthy of being trusted with that ownership. So that's one. Two is you do it in a DevOps way, and I don't just mean you slap some DevOps lipstick on it. And I don't just mean you bought a CI tool and you started, quote, using the CI tool. You basically follow these concepts that DevOps is sort of, put forward. The three ways of DevOps, they're called, flow, which has now actually been renamed.

It was always originally the systems thinking, but flow was shorter. So I think Jean Kim originally went with the flow. But, basically, flow and systems thinking are to think holistically about the risk of the overall system and the work you could do in the overall system. Feedback, rapid feedback, in context feedback, rich feedback, and a culture of experimentation and learning.

So try things and measure how effective they were and then adjust based on that. And don't just go with the policy manual as a dead document that is out there or trusting some third party list like Open Sam or or the OS list, framework or or PCI or whatever. Basically learn and adapt and evolve. If you do this, you come up with what I call practices.

Oh, by the way, before I move on, the third thing here on this page is never forget that you're building software. You know, and that's the value that the software engineering folks provide to the organization. And that's the bottom line.

It's DevSecOps. It's not SecDevOps. It's not Ops sec dev. It's dev sec ops.

You really have to produce a product. And anything you do that slows that down, it better be slowing it down temporarily and speeding it up later. And this is the way to do it. So these are the practices.

This is an example list of practices that, as a company who is engaged with me to help them develop this program to adopt this program, this transformation blueprint, if you will, might come up with. I might come up with an example because I don't want you to simply adopt this one, but I'm gonna use this one as an example to describe what a really good set of practices actually looks like and what the characteristics of that are. And even some of the specifics of the way that the practices are defined here. I'm gonna talk aloud a little bit.

So when I do these workshops to help teams develop their own list, they come up roughly similar, maybe two thirds, three quarters the same in terms of the things that are on the list. And the weightings can be more different than that.

But but but so this is a good representative senate list. So let's talk about the characteristics that make this list important. So it starts with non security engineering practices, And there's a couple reasons for this. First of all, it's not all of the SDLC defined here. It's just four practices from a robust DevOps SDLC.

It's the ones that will make it easier to efficiently and effectively do the security things later that I'm highlighting here. And they're the ones that if you're missing these, you can't do it the optimal way. And so what the tendency of security leaders is, I need a least common denominator policy or set of practices that will work for even the teams that aren't doing great engineering.

But my argument is you can't have great security without great engineering, and you gotta advocate for some minimally great engineering if you're really gonna ever do security effectively. And I think it's important for you to be that's the first reason. The second reason is if you come out as an advocate for great engineering, you put yourself in a different light. Security people put themselves in a different light to the engineering people, and that builds the relationship. And a lot of this is psychology and sociology, and then that's the key to the difference between success and failure rolling out a program like this. And so this is one of those things that you do that sort of helps with that psychology aspect of it, sociology aspect of it.

So I've got, you know, working agreements. I've got an ephemeral build to test infrastructure. So, you know, that a lot of people have that. They bought it.

How well are they using it? For instance, can they stand up a database in the test infrastructure ephemerally and populate it with enough data to run automated tests? If they can't do that, they're not really effectively gonna get the DevOps benefits that are advertised from DevOps. It’s this whole idea of cloud native providing you with this ability to just instantly stand up a virtual environment.

Now there's a lot of engineering work that has to go into making this transition from adding a dedicated test environment to having an ephemeral one, including databases and message buses and data in those databases. And so this is what I'm calling out here is that work that needs to get done to do that. And then, are you running tests in this ephemerally? A single test that gates on the pull request is, like, is, like, worth a ton because as soon as you get a single test, then you start to get more tests.

And anytime we implemented this at Comcast helped someone implement this effectively at Comcast.

Within six months, they had eighty percent test coverage run in the pipeline, most of the time. I mean, not every not every time.

And then getting to that eighty percent level is also important as well. So those are the three the four that I call out, for pre engineering practices.

This prioritization, which enables gamification.

So there's a wait on these things. So the order is generally in the order of dependencies.

Like, you have to do this one before you do this one.

And then the waiting is based on sort of the value, the risk reduction value it might reply to it might provide.

Or maybe it's the portion of the value that's sort of prework versus the later risk reduction values. It's not, it's not, it's not science. It's sort of like psychology. We're gonna put these many points on doing this thing, and then you're gonna gamify it.

And by gamify it, I mean, you have a leaderboard for each development team. You know, when they adopt a practice, then they get that many points, improvement points. And the leaderboard for the teams with the most improved scores in the last ninety days are on the leaderboard, and and the ones with the absolute highest overall scores are on a different leaderboard. And you emphasize the improvement leaderboard at least at first, but maybe even indefinitely.

And then the absolute one is just to sort of reward people who got there and finished the program and are continuing to slightly improve after that.

Gamification, there's a lot more to that. I don't have time to go into all that today, but it's really key, and it's really important to do it.

You get no points for running scanning tools. You get no points for finding vulnerabilities.

That is of zero value. In fact, it's probably of net negative value. You only get points if you get to clean for some small slice of the findings, and the small slices are risk prioritized. So we have here critical clean for third party code you import.

So this is SCA. This is open source vulnerabilities. It's just open source vulnerabilities, not SQL injections that your own developers wrote, and it's just the criticals. And that's worth twelve points.

And, it's worth twelve points for critical clean for the code you write vulnerabilities, the first party code, vulnerabilities SQL injections and cross site scriptings that your own developers have injected in there. And then you start to work on the highs, and they're worth less points. And then I don't even list the mediums on this on this example here, but you could and and and get assigned points to them.

So it's this idea that you get and stay clean, and clean involves putting a blocker in place so that you can't ever release it with criticals. Once you get criticals clean, you never release with criticals ever again after that. And that's how you get to this less than one day MTTR that I spoke about earlier.

So how do you, how do you sort of organize this? Well, it's gotta be sliced pretty thin. So it's gotta have a shallow on ramp. That's why I separate criticals from highs, and that's why I separate first party code and third party code because I wanna give people something they can achieve in ninety days and get and completely accomplish and and consider that done and never fall back on again.

And it has to be small enough that they don't feel like it's too daunting, and then and then we move on. In fact, when in practice, you might slice this even smaller. If there's a hundred eyes and they don't think they can get it done in ninety days, then you might say, okay. That's fine.

You know, you've got all the criticals. You have less than one day MTTR for all the criticals going forward. You have highs with the first five of the OWASP top ten or the first twelve of the OSAN's top twenty, and you slice it even narrower than that. That shower on rent is really important.

Okay. So how do you get this list? Well, here are the critical elements to hosting a workshop, and I host these. At my job at Contrast, I do these a lot. I do these publicly. I do these with just your organization.

But the critical elements are this. First of all, you have to have the engineering leaders in the room, the three to five most respected engineering leaders. It can't just be the CTO or the VP of engineering. If they aren't actively working with code every day, they're maybe out of touch.

Maybe they're invited, but you gotta also have some of the hands on, folks as well. In fact, it should be dominated by those hands on folks. It's typically the team leads of the hottest products, the crown jewel products at your organizations. These are the ones that have the best tools and the best teams, and everyone wishes they could be like these guys and listens to them, looks up to them.

Why do you have these people in the room? Two reasons. First of all, you'll come out with a better list of practices, this way. You know?

So that's the obvious reason. But the more important and a little more subtle reason, though, is that you're starting the sales process here. So if you were to just come out as a security leadership group, come out with a new policy or a new set of practices and say, here, you have to do this. They're likely to ignore it, and the decision to ignore it is gonna be basically, they're gonna go to these three to five most respected people.

They could say, hey. You know, is this just another one of those things we can just sort of let die and not actually listen much to unless we get, you know, harassed with it? And do we really adopt this? And like I said, I was in the room when we created that, these three to five leaders are gonna say.

And it is really good. It is really the right way to do security, the developer or the engineering way to do security, not the security way to do security. Man, I was there to help make sure that was the case. Now if you pre draft it, that doesn't happen.

And so you gotta enter the room with a blank slate, and you can have it in your head. Right? You know, what you think should be on the list if you're a security leader. But you gotta not bring a draft of the practice list into the room.

You have to create it with post-it notes. And the reason for that is that you get more. You don't get groupthink that way where one person says something, everyone goes, yeah. That's pretty good. I don't really feel like arguing why it's not perfect, and I'll just roll with it.

And you don't want that to happen. You get everyone to work independently, like, just one person coming up with their own five favorite practices, and then you have them all put them up on the board and you organize them. And that way you don't get any group think, and you get everyone's wording is different. The terminology usage is different, and you get the conversations as you start to do the grouping, and that's where all the magic happens.

That's where the great practice list comes out of that.

You end up with this mindset shifting. These conversations lead to mindset shifting and blind spot revealing, alignment, and that is hugely valuable. That's probably the most valuable aspect of hosting this workshop. You also get this weighted list of practices.

And then and then it's written in language that is acceptable to and well understood by isn't ambiguous to a developer. I remember, you know, there was a policy that talked about known vulnerabilities.

And I asked people in the security group who wrote the policy manual what was meant by known vulnerabilities, and they had different answers. And so then you went to the developers and you asked them, and they had different answers. And so how do you actually enforce a policy if you have ambiguity of terms?

So you don't use that phrase, or if you do, you define it clearly there. And that's an example, but there's a lot of things like break the build. It happens a lot. Like, what does that actually mean break the build? And and and so you have to actually define these things more carefully, and use language that's explicit and and really gets it accurately right.

You don't just say the workshop's over and we're done. This is a living and breathing thing. And in particular, the first couple weeks afterwards, you're testing this list out by coaching teams. We'll talk about coaching here briefly for a minute because I'm running out of time here.

With real folks. And then you keep, having to, sort of tweak it over time. Maybe you get to the point where you don't change the list itself or the weightings about once a year, which is the point we got to at Comcast. But you're tweaking the documentation that's behind the bulleted list of practices all the time, you know, with examples and links to architecture, security architecture, working code, libraries, etcetera. All of that gets built out over time and constantly gets tweaked. So you have this list of practices because you hosted this workshop.

You tested it out briefly. How do you actually roll it out? Will you roll it out with coaching?

And coaches are not necessarily security experts just like Ted Lasso was not a soccer expert when he went to coach a soccer team. He was an American football expert, expected to fail. But Ted didn't fail because he knew about getting more out of people, getting them to work well together. And that's the role of the coach.

It's very hard to get people who have been doing vulnerability management to step into this role effectively. They're used to calling someone's baby ugly all day. They're used to the people who you're speaking to are used to being talked to by them as either babies or their baby's ugly all day. It's very hard to get that trusting coaching relationship going there.

I hired scrum masters, the first few roles that fill this. I later actually hired an auditor and a few other different types across the organization, some of which you could argue were doing vulnerability management. But that was later in the program when it became clear, they even realized that the way they had been doing vulnerability management was destructive and not productive. And the new way that I was sort of pushing to place theirs, and their jobs were going away, some of those people did come over and become coaches at Comcast.

One of the principles of coaching is that there are no red marks for the current state of maturity, the current adoption rate. The only thing you get dinged for is tell you to improve. You're not even trying.

And you get this commitment from engineering leadership upfront when you start to roll out the program.

We are gonna ask every team to adopt one to three of these practices every ninety days. So it takes about a year and a half to adopt all of it starting from zero.

But, are you okay with that? Will you help us advertise that? Will you set that expectation that, you know, even if it takes away a little velocity from feature work, they are to adopt one to three of these practices every nine days.

And that's all you get dinged for is if you fail to improve in a given ninety day period below a certain threshold. Once you get to, you know, eighty percent or of the points, then it stops being something you would even get dinged for failure to prove because, you know, it's harder than last twenty, and they're less valuable than last twenty.

Okay. So, I don't have time now to go into detail of the coaching philosophy a little bit, but it is well thought out. And also I just moved through these slides quickly.

Notice there's no red marks for the adoption maturity. It's just shades of green. We had red, amber, green, and we found that people were tending to see the lie when it was amber or green or red on the board. So we just shifted it to shades of green.

You host workshops as coaches to do this, and there's some key people that have to be in the room. The business people have to be in the room to do them, and there's a process for hosting the workshops. There's some tooling you can use. And so one of my open source projects is called transformation dot dev blueprint. You can go to transformation dot dev and sign up for beta if you want.

But it has this way to put in this list of practices in a way to sort of host the coaching sessions and a way to visualize the output. And this is sort of example screenshots for an earlier version of that.

You have to coach each team individually. You can't do it. Say, all of engineering, we're gonna just say all of you have adopted this practice because it doesn't work that way. Even within a single business unit, you know, two sister teams that work closely together can have very different maturity in tech stacks and the whole nine yards.

And so the first answer I get to this is it doesn't scale. It scales beautifully because this coaching model is very much workshop driven, and they're ninety minutes for the first one and sixty minutes for the follow ones, and you only have to host them once every ninety days. So a single coach can handle, theoretically, a hundred teams. At Comcast, typically once they got above seventy five, we started hiring new coaches to get that back down again.

So we never had anyone consistently be above a hundred teams that they were in their domain. But but but we got, you know, in the seventy five to a hundred range for everybody. We tried to stay in that sort of range, and it worked. They had enough time to pay attention to those teams and keep them going with that.

And different teams are at different stages in the process. So, you know, when they're a year into the program and they've done four quarterly workshops, they pretty much know the routine and they can do a lot of it on their own and sort of move. I mentioned the transformation blueprint. I'm gonna mention, contrast.

When I left Comcast, I had a choice where to go. All the tool vendors, we had a dozen sort of tool vendors. All this, the top name SaaS vendors you can think of, like Checkmarx and Vercode and and AppScan and and Coverity and and you name it. We had it at Comcast.

And I chose Contrast. I got job offers from most of them. I chose contrast because teams at Comcast that have been using contrast were the most successful.

And and, so I wanted to come to a company that basically fit with that vibe. I don't have too much time to go into sort of what contrast is, but it's basically one agent based tool, very much like an APM agent, except you use it, pre prod for most people most of the time. So it's not just runtime in production. It's runtime, during testing.

And that's why automated testing is so important, to use to to have, not just for quality reasons, but for security reasons. So you can replace your SAST and DAST and SCA tools. But it also has production sort of oriented things. We can block attacks, and we can give you, you know, the blast radius for an attack.

And we can also give you a reverse engineered security blueprint for an attack going on, and what databases are involved and what kind of data is in this database. So we have all these different things in our product. It's a great fit for this model that I just described.

So that's all I have for you today. Please, hit me up for questions. Connect with me on LinkedIn and send them directly.

You can also ask for a demo there, or even ask to schedule a transformation workshop. There's no charge for that first workshop.

It's half a day or spread out over a week. And I do these all the time for people that maybe don't even buy contrast in the end, although a lot of them do. So that's why we continue to offer it for free, to prospects, for Contrast, and they continue to pay me to work there. So thank you.