When I'm interviewing potential candidates for our Developer jobs at Wildfire, one of my favorite discussions is the question "How do you define DevSecOps?" A quick search of Hacker News will show you myriad definitions; almost every company has its own unique set of responsibilities and roles. This question helps me ensure a common vision between my leadership goals and my teams' understanding of their mandate.
Personally, I define DevSecOps as an intersection of three pillars which can be used to support the greater goals of the organization as it grows and builds.
Pillar 1: Efficiency
Developers can be a finicky bunch. They want tools that are simple to use, but perform incredibly complex tasks. I often describe my team as a JIT (Just In Time) application for our developers. We should be accomplishing our tasks immediately before the devs need them; ideally identifying these needs before they even notice. Proactive problem solving makes the development experience a joy, rather than a chore. This is the difference between a team meeting with developers saying they are blocked by tooling and a meeting where DevSecOps can bring new functionality to solve a problem the developers had not yet anticipated.
Just In Time
To avoid wasted effort by either the core development team or our team, DevSecOps must be deeply involved in planning, particularly for greenfield or large projects. Visibility into these major changes allows the necessary tooling to be built in time to avoid blocking development. However, this must be balanced with other DevSecOps team goals and delivered when the development team is ready to accept them.
If we deliver too early, we risk slow adoption or pre-judgement by the team. There is also potential that we misunderstood the need and built the wrong tool. If delivery occurs too late, development will be blocked waiting on tooling.
Timing in software is notoriously hard and no team will ever get this perfect. As our primary customers are internal, we have a tremendous advantage in planning and scheduling: we can talk directly to our customers and get feedback quickly, all while potentially delivering incomplete solutions that are "good enough."
The DevSecOps team can be thought of almost as a startup within the organization that will be rapidly iterating prototypes to best serve their customers. We should consistently strive to be as precise as possible when delivering our products and ensure that they meet the acceptance criteria, but they can often have only a "happy path" and tremendously simple (or ugly) user experiences, especially for first delivery.
In addition to JIT delivery, DevSecOps should strive to be force-multipliers within the organization. Every hour of effort we exert should return multiple hours to our development teams. When we automate a manual build process, that results in many hours returned to devs over the coming months. When we write a tool to take five steps that a developer might always perform in reaction to an application error, this allows a developer to focus on the core research of root cause analysis instead of toiling on tasks that do not add useful information.
It is easy for DevSecOps to become viewed as a cost sink within the organization, as we do not deliver user-facing products that can be sold. In reality, a properly functioning DevSecOps team will be a major driver of productivity within the engineering group as a whole, removing inefficient processes and replacing them with automation or tooling that gives developers back time and thought cycles that can be better used creating new, exciting features for the organization's customer base.
Pillar 2: Security
As I am sure you noticed, I inserted "Sec" into the more traditional DevOps role title. Security in modern software stacks is critically important as supply chain attacks, vulnerabilities, and data breaches become more common. A core function of our job, then, is defining security guardrails and access control to ensure safety of our data, our customers' data and our software stack.
As we build toward a zero trust architecture, we continue to improve the security of our systems every day. The examples below only scratch the surface of current security posture best practices, but give some context to the concerns we balance.
Modern infrastructure-as-code tooling allows DevSecOps teams to manage many of the roles and access privileges that would historically have been done by IT teams, while providing a significantly more auditable and reliable configuration by removing manual clicking in web interfaces and replacing them with code definitions that can be approved prior to application and automation for the actual change process.
Deeply integrated into a secure software stack is automated and manual vulnerability detection. From code-scanning and dependency checking in repository providers like GitHub's Dependabot to container image scans on upload to cloud providers, the tooling to proactively detect known vulnerabilities has never been better. Taking advantage of these tools can mean catching a potential attack before it can be leveraged into a massive data breach, which could be catastrophic for the organization.
A team that is in tune with current news about security vulnerabilities is also critical to successful security practices. Knowing which CVEs are applicable and which parts of the software stack must be adjusted to mitigate them is crucial.
Rather than waiting on a data ransom request or serious breach report, we must continually monitor logging and other tooling to ensure threat actors are detected immediately upon entry to an unauthorized system. This can be as simple as a developer whose permissions are too broad when compared to policy or as complex as a talented hacker breaking through multiple defense layers to gain access to sensitive protected data. Each violation of security boundaries must be thoroughly investigated and the hole must be swiftly closed.
Pillar 3: Reliability
The biggest concern of any cloud-first business is often reliability. While this goes hand-in-hand with security in many ways, it is also a separate concern encompassing downtime for maintenance, user-impacting bugs and capacity planning. No system is 100% reliable, which is why many SaaS platforms promise "3 9's" or "5 9's" of uptime (99.9% or 99.999%, respectively) as an indicator of high reliability, while still allowing for service outages when no other alternative exists.
Alerting & Monitoring
To keep availability high, we must continuously monitor critical infrastructure pieces and alert the right group(s) when there is an issue in need of human intervention. Many modern software stacks are self-healing and don't need to fire an alert every time a memory bug is hit, for instance, but we still have a strong need for investigation and mitigation in the event of a fundamental drop in capacity or unrecoverable bug leading to continuously crashing software pieces.
When alerts of a certain importance fire, the incident response plan is integral to a common and predictable mitigation effort. Having well-defined playbooks and clear lines of communication, along with the ability to "break the glass" and take high-impact action outside of normal access control rules when required enables response teams to effectively and quickly react to emergencies with the proper authority and plans.
As the organization grows, one of the primary concerns is maintaining proper system capacity while balancing cost. When the capacity cannot keep up with load, reliability suffers for some subset of users, at best, or all users, at worst, in the event of a cascading failure mode. As the primary team tasked with reliability, DevSecOps must consistently monitor capacity thresholds and provision additional resources before users experience issues. On the flip side of that, when capacity demands drop, we must ensure that we are not wasting idle resources and that they are effectively decommissioned and removed from the stack.
Wrapping Up: Balancing the Weight
This article has been a verbose way of saying DevSecOps needs to have significant expertise in improving efficiency, maintaining security, and ensuring reliability. However, there's a bigger picture to see here, and that is the need to properly balance each of these pillars.
If you focus too much on one pillar, the others will topple. You can imagine a scenario where you lock down everything from a security standpoint: no machines can be accessed from the outside, they cannot talk to each other and no one can log in. Even better, turn all the computers off. This system has perfect security. However, developer efficiency and system reliability will immediately drop to non-existence.
This is obviously a contrived example, but it perfectly illustrates the point of how intertwined each of the three pillars are. If a team spends too much time focusing on developer efficiency, they may lose sight of the need to maintain security. DevSecOps' position within the organization as force-multipliers also extends in the negative direction. If we fail to do our job, not remaining focused on the balance, errors will be magnified many times over.
If striking this balance is something you're passionate about, check out our open jobs -