iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article

Understanding "Lead Time for Changes" Correctly

に公開

Introduction

There is a book called "Accelerate: The Science of Lean Software and DevOps". This book introduces metrics for measuring software delivery performance, which are often referred to on the internet as the "Four Keys"[1].

  • Lead time for changes
  • Deployment frequency
  • Time to restore service
  • Change failure rate

These are the four metrics (I have listed them here using the terms from the book). In this article, I will focus on "Lead time for changes".

I am choosing this because there seem to be various interpretations of "Lead time for changes" on the internet. Therefore, by understanding what "Lead time for changes" means as explained in the original source, "Accelerate," let's escape the state of "I have no idea. We are just measuring the Four Keys based on the general vibe."

What is Lead Time for Changes?

It is the "time it takes from when the development of a feature is completed[2] until it is deployed to the production environment."

In other words, it is the lead time taken for "delivery." This lead time becomes longer if, for example, there are manual tests or manual deployments.

It's a discussion of "Let's do testing automatically," "Let's make deployment a one-click fully automated process," and "Otherwise, performance will be low."

It does not include the "time spent on development."

Why "Time Spent on Development" is Not Measured

On page 21 of the book, it is written as follows:

In terms of the time it takes to design and validate products or features, in many cases, it is not clear when measurement of the time required should begin, and there is also the problem of very high variability.

What it says makes a lot of sense.

I can strongly relate to the point that "it is not clear when measurement of the time required should begin."

If we think simply, the commit date and time comes to mind, but that alone does not include the time spent thinking about "how to implement this" before the actual commit. Sometimes that feels insufficient.

Also, there may be opinions that the time spent considering functional requirements like "what kind of feature should it be" should be included, or ideas that measurement should start from the point when a request is received from a customer.

It is stating that it is "not clear" in those respects.

I can also agree with the point regarding "very high variability."

The time it takes for coding varies greatly depending on the difficulty of the feature and the skill of the developer. Therefore, simply measuring time is highly likely not to lead to a meaningful evaluation.

If this were included, a team building a very simple app (for example, a simple ToDo list app) would easily get a high evaluation, and it would not necessarily be possible to say that the team is excellent.

If the development difficulty of all features were the same, it might be possible to evaluate the excellence of engineers by measuring coding time, but realistically, such a situation is impossible.

"Accelerate" surveyed many companies and teams—over 2,000—and as a result of analyzing that data, it shows that teams with specific capabilities have higher software delivery performance.

And it explains that those capabilities are practicable in many companies and teams. Therefore, metrics that depend heavily on the difficulty of the software or the excellence of the team's engineers are not appropriate as performance evaluations.

For the above reasons, "time spent on development" is not included in performance measurement.

Does Measuring "Time Spent on Development" Have No Meaning?

I don't think so extreme.

For example, when introducing AI like GitHub Copilot, there should be a need to understand how much time was saved. In such cases, measuring development time can be a useful metric.

However, as mentioned earlier, it is important to note that it is heavily influenced by the difficulty of the feature being developed and the skill of the engineer. Without a proper understanding of these factors, it could lead to incorrect evaluations.

Why are the Interpretations of Lead Time Fragmented?

I believe the cause lies in the explanation provided by DORA.

I myself used to wonder what the "Four Keys" were after hearing about them frequently. Apparently, development productivity[3] can be measured using the Four Keys. And the Four Keys were supposedly proposed by Google's DORA. If that's the case, I thought it would be best to refer to the original materials if I wanted to understand them accurately.

Then I came across an article titled "Using the Four Keys to measure your DevOps performance", which explains it as follows:

Lead time for changes — the amount of time it takes a commit to get into production

Since it says "from a commit," you might wonder, "What commit? Which one?" I thought so too.

Since I wasn't sure, I searched the internet for a correct understanding and noticed that even among articles explaining the Four Keys, the interpretation of "when the commit is" varies, and many articles do not explain it clearly.

Also, the term "First commit" [4] is frequently seen. Perhaps influenced by this term, I see articles explaining it as "the time from the first commit to deployment."

DORA assumes trunk-based development. Under that assumption, a deployment flow automatically runs for commits made to the repository, and if it passes unit tests and automated acceptance tests, it is deployed to production. In other words, DORA assumes a state where "the CI/CD pipeline runs for every commit and is deployed as soon as it passes," which is why they express it as "the amount of time it takes a commit to get into production."

However, for many development teams that likely adopt a process where they cut a feature branch, accumulate commits over several days, review them in a Pull Request, merge them, and only then does the deployment flow run—it would be difficult to imagine this explanation.

I speculate that this discrepancy in understanding is one of the reasons why the interpretations of lead time have become fragmented.

Summary

I have explained "Lead time for changes," one of the four metrics for measuring the software delivery performance of development organizations introduced in "Accelerate: The Science of Lean Software and DevOps."

In "Accelerate," "Lead time for changes" refers to the time it takes from when the development of a feature is completed until it is deployed to the production environment. It does not include the "time spent on development."

Also, let's not forget that the Four Keys are, first and foremost, metrics for measuring software delivery performance and are not intended to measure development productivity.

脚注
  1. Incidentally, the term "Four Keys" does not appear in the book. However, since it is commonly used in Japan, I will refer to it as the "Four Keys" here as well. ↩︎

  2. For example, when developing on GitHub, it is reasonable to consider the point at which a Pull Request to implement a single feature is merged. ↩︎

  3. Care is also needed with this term. The Four Keys proposed by DORA measure software delivery performance, first and foremost. ↩︎

  4. I don't know where this "First commit" came from. There is no such term in the original DORA documentation or the fourkeys repository. ↩︎

GitHubで編集を提案

Discussion