DevOps quality metrics¶
Estimated time to read: 7 minutes
As a DevOps engineer or manager, you already know the tremendous potential of DevOps and Agile methodologies. They streamline the software development process and make it more efficient, allowing your team to deliver high-quality software quickly. However, to truly unlock the power of DevOps, tracking the right metrics and continuously monitoring your team's and product's performance is crucial.
In this blog post, we will discuss two sets of DevOps metrics informed mainly by the book "Accelerate: The Science of Lean Software and DevOps" and the DORA program and, when necessary, reference other high-ranked resources. These metrics will help you monitor various aspects of your DevOps process, including cost, code quality, and technical debt, enabling you to make data-driven decisions and improve your team's performance.
Accelerate and DORA-inspired Metrics¶
The book "Accelerate" and the DORA program inspired the first set of metrics. These metrics focus on the core aspects of the DevOps process, such as deployment frequency, lead time, and incident management.
-
Deployment Frequency Monitoring deployment frequency allows your team to identify bottlenecks in the workflow and optimise the delivery process. Frequent deployments can result in faster feedback loops and reduced lead time.
-
Lead Time Measuring lead time helps you evaluate the efficiency of your development process from idea to production. Shorter lead times indicate faster response to user feedback and reduced time-to-market.
-
Mean Time to Recovery (MTTR) Tracking MTTR lets you measure your team's ability to quickly recover from incidents or failures. A shorter MTTR indicates better incident management and a more resilient system.
-
Change Failure Rate Monitoring the change failure rate helps you assess the stability of your application. A low change failure rate suggests that deployments occur quickly and regularly without compromising application stability.
Additional DevOps KPIs¶
The second set of KPIs offers a broader perspective on your DevOps process, covering aspects like code quality, infrastructure efficiency, and user experience.
-
Code Churn Tracking code churn helps identify planning, code quality, and team stability issues. High code churn may indicate a need for better code review processes or more effective communication within the team.
-
Code Coverage Monitoring code coverage allows you to measure the effectiveness of your testing efforts. High code coverage indicates better test coverage, leading to fewer defects and improved code quality.
-
Infrastructure Utilization Measuring infrastructure utilisation helps you optimise your infrastructure costs by identifying over-provisioned or under-utilised resources.
-
System Usability Scale (SUS) Tracking the SUS score provides insights into the perceived usability of your software, which can directly impact user satisfaction and adoption.
Core metrics¶
Metric | Category | Description | Estimation |
---|---|---|---|
Deployment Frequency | Deployment & Change | Measures how often new features or capabilities are launched. | \(\text{DF} = \frac{\text{Total Available Time}}{\text{Time per Deployment}}\) |
Lead Time for Changes | Deployment & Change | Measures the time it takes for a change to go from code committed to code successfully running in production. | Time(production) - Time(commit) |
Change Failure Rate | Deployment & Change | Refers to the extent to which releases lead to unexpected outages or other unplanned failures. | Failed changes/total changes |
Mean Time to Recovery (MTTR) | Detection & Recovery | Measures the time it takes to address the problem and get back on track once a failed deployment or change is detected. | Sum(recovery times)/number of incidents |
Defect Volume | Quality & Defects | Focuses on the actual volume of defects. | Total number of defects |
Availability | Availability & Compliance | Highlights the extent of downtime for a given application, measured as complete (read/write) or partial (read-only) availability. | Uptime/(Uptime + Downtime) |
Service-Level Agreement (SLA) Compliance | Availability & Compliance | Measures compliance with service-level agreements (SLAs) between providers and clients. | SLA goals met/total SLA goals |
Unplanned Work Rate (UWR) | Work Management | Measures the time dedicated to unexpected efforts in relation to time spent on planned work. | (Unplanned Work Time/Total Work Time)*100 |
Rework Rate (RWR) | Work Management | Relates to the effort to address issues brought up in tickets. | Reworked tasks/total tasks |
Customer Satisfaction | Customer Focus | Measures customers' satisfaction level with the software product or service. | Average customer satisfaction score |
Employee Satisfaction/Engagement | Culture & People | Measures the level of satisfaction and engagement of the employees involved in the development, deployment, and maintenance processes. | Average employee satisfaction score |
KPIs that can be useful for DevOps team to measure¶
Additional DevOps metrics, KPIs (Key Performance Indicators), and OKRs (Objectives and Key Results) that can be useful for a team to measure:
Metric | Category | Description | Estimation |
---|---|---|---|
Work in Progress (WIP) | Work Management | Measures the number of tasks or items currently progressing in development. Helps identify bottlenecks and optimise workflow. | Number of tasks in progress |
Code Churn | Code Quality | Measures the amount of code that is added, modified, or deleted over a specific period. High code churn may indicate planning, code quality, or team stability issues. | (Lines added + Lines modified + Lines deleted)/time period |
Code Coverage | Code Quality | Measures the percentage of code that is covered by automated tests. High code coverage indicates better test coverage and can help identify areas that need more testing. | (Lines covered by tests/total lines of code) * 100 |
Test Execution Time | Testing Efficiency | Measures the total time taken to execute a complete set of tests. It helps identify slow tests and areas where test optimisation is needed. | Total time for test execution |
Automated Test Pass Rate | Testing Efficiency | Measures the percentage of automated tests that pass during a test run. A high pass rate indicates higher test reliability and quality. | (Number of tests passed/total tests) * 100 |
Mean Time Between Failures (MTBF) | Reliability | Measures the average time between system or application failures. A higher MTBF indicates greater system reliability. | Sum(time between failures)/number of failures |
Incident Response Time | Incident Management | Measures the time it takes for the team to respond to an incident or issue reported. A shorter response time indicates better incident management. | Time(response) - Time(incident reported) |
Incident Resolution Time | Incident Management | Measures the time it takes for the team to resolve an incident or issue after it has been reported. A shorter resolution time indicates better incident management. | Time(resolution) - Time(incident reported) |
Deployment Rollback Rate | Deployment & Change | Measures the percentage of deployments that need to be rolled back due to issues or failures. A lower rollback rate indicates better deployment quality. | (Number of rollbacks/total deployments) * 100 |
System Usability Scale (SUS) | User Experience | Measures the perceived usability of the software or application. A higher SUS score indicates a better user experience. | Average SUS score |
Infrastructure Utilisation | Infrastructure Efficiency | Measures the utilisation of infrastructure resources, such as CPU, memory, and storage. Helps identify over-provisioned or under-utilised resources and optimise infrastructure costs. | (Resource usage/total available resources) * 100 |
These additional metrics can provide valuable insights into the performance of a DevOps team and help identify areas for improvement. Keep in mind that the relevance of each metric may vary depending on your organisation's specific goals and context. It's important to focus on the metrics that are most relevant to your team and use them to drive continuous improvement in your DevOps processes.
Conclusion¶
Tracking these DevOps metrics will give you valuable insights into your team's performance, allowing you to make data-driven decisions and continuously improve your processes. By monitoring aspects like cost, code quality, and technical debt, you can ensure that your team delivers high-quality software rapidly, leading to increased customer satisfaction and a competitive edge in the market.
So, go ahead and start measuring the metrics that fit to your environment today to supercharge your DevOps performance. Unlock your team's and product's full potential!