Chaos costs money. In my experience working with various organisations, I’ve seen firsthand how they struggle to manage the chaos that often surrounds the delivery of usable, working products to their customers. The result? Substandard work, a barrage of bugs hitting production, and a frustrating cycle of rework because things simply don’t meet the minimum standards we expect. It’s a familiar tale, and one that I’ve encountered time and again.
Understanding the Shift
When I’m building products, I often find myself in a position where I need to solve a problem. You identify a challenge, envision a solution, and set off in that direction. However, as time passes, the landscape can shift dramatically. This shift might be due to emerging market opportunities or the need to scale operations.
The solution that once seemed perfect can quickly become inadequate. I’ve witnessed this in various sectors, particularly in industries like airlines and car rentals, where technical debt accumulates over time. Choices made years ago can lead to a tangled web of outdated systems that no longer serve the organisation’s needs.
The Dangers of Technical Debt
Take, for instance, a customer I worked with who built their own source control system three decades ago. At that time, the options available were limited, and they had no choice but to create a bespoke solution. Fast forward to today, and it’s clear that this approach no longer makes sense. With robust source control systems like Git readily available, the need to cling to outdated technology becomes a liability.
Investing time and money to transition from legacy systems to modern solutions is crucial. This is where technical leadership and engineering excellence come into play. We must focus on ensuring that our systems and processes are as effective as possible, making our teams’ jobs smoother and more efficient.
The Value of Optimisation
Satya Nadella at Microsoft exemplifies this approach. He prioritises optimising systems so that the best engineers can focus on delivering features rather than wrestling with outdated processes. This investment in system optimisation is not merely a cost; it’s a value centre that supports our ability to innovate and deliver new features.
A prime example of this is the Azure DevOps team. Back in 2012, they were delivering around 25 features to production each year with a workforce of 650 people. Through a commitment to technical excellence and leadership, they transformed their output to over 600 features annually. This remarkable increase was achieved by addressing both technical debt and the accumulation of what I like to call “technical cruft”—the sediment that builds up in systems over time.
The Impact of Testing
One of the pivotal changes they made was shifting from long-running system tests to unit tests. This transition took four years of dedicated effort, but the results were staggering. They reduced the time it took to verify changes from 48 hours to just 3.5 minutes. Imagine the productivity boost if you could instantly know whether a change was successful rather than waiting two days.
This rapid feedback loop allows teams to make smaller, more frequent changes, leading to a better product that is scalable and less reliant on quick fixes. Instead of patching over problems with Band-Aids, teams can focus on building robust solutions that stand the test of time.
Conclusion: The Cost of Chaos
Ultimately, the cost of chaos stems from poor technical leadership and a culture of engineering mediocrity. By empowering your best engineers to focus on optimising systems, you enable every team member to add value more easily. This shift not only enhances productivity but also fosters a culture of excellence that can propel your organisation forward.
In my journey, I’ve learned that addressing chaos is not just about managing the present; it’s about preparing for the future. By investing in the right systems and processes today, we can ensure that we’re not just surviving the chaos but thriving in it.
Chaos costs money. Most organisations that I work with and have worked with really struggle with controlling the chaos within the context of delivering usable working product to their customers. They find that the work is substandard. We’ve got a lot of bugs hitting production. We’ve perhaps got a lot of rework happening because things aren’t quite what we expect them to be or don’t meet the minimum standard for our organisation, and we end up going down dead ends.
I’m going to need to explain that, but a lot of the time—and I do this as well when I’m building products—you try and figure out a solution to the problem you’ve got. Right? You’ve got a problem; you know where you want to get to just now, and you come up with a solution that gets you to that thing that you want to get to. But over time, that where you want to get to does in fact shift. Right? It could shift because of different market opportunities that arise, or it could shift because we’re scaling.
We came up with a solution that was on par at the time, but now it’s subpar. It doesn’t, you know, it’s slow. We’re running into problems. We’re having a lot of support calls because of technology choices that we made that are no longer valid. You see this a lot in the extreme in the airlines and car rental companies. Right? They have a lot of, I would call it, technical debt. The choices that were made at the time that they’ve never gone back around and refactored.
Right? So they still have mainframe systems, and they don’t have people who understand the mainframe systems anymore. So they’ve got a double problem. Not only do they still have those mainframes, but nobody in the organisation knows how they work, how to reimplement them, how to manage them, or where to kick them when something goes wrong. And that happens all the time in technology. Right? The technology moves forward.
I work with lots of organisations who have solved problems in ways that made perfect sense at the time because there was no solution on the shelf out there that you could go get. I’ve got one customer that built their own source control system because they started developing software 30 years ago, and 30 years ago, building software at scale, there weren’t very many options for source control systems. The ones that were out there maybe didn’t fit their needs, so they created their own one. But today, that doesn’t make any sense whatsoever. Right? There are plenty of good source control systems out there. Git is the de facto standard, and they should have all of their code, all of their systems on Git.
But you need to invest time and money to move from what you had before to what you need now, and that’s part of this story of technical leadership and engineering excellence. We need to be focused on ensuring that our systems and processes that our people are using are as effective as possible so that their job is as slick and easy as possible. Right? Satya does good work on this at Microsoft. Right? He would rather people spent the best engineers, the most skilled people, spent time on optimising the systems that we have because then we can come back around and build all the features we want because we have these really slick systems.
And it seems like it’s a cost, right? But actually, the slickness of these systems that control the chaos are a value centre because they are the thing that supports our ability to deliver new features. A great example is the Azure DevOps team. They, back in 2012, before they improved their systems, were delivering something like 24, 25 features to production each year with 650 people, with an application of technical excellence and effort. Right? This is not for free, and great technical leadership within the organisation. They took that from 25 features to production each year to over 600 features to production each year, and that difference was because they paid back a lot of the—I’m going to say technical debt. Some of it was technical debt; other stuff was technical cruft, just a buildup of, I don’t know, sediment, a buildup of rust in your system because things get old.
Technology moves really fast, and we need to keep up with it. And because they made those changes, just one simple thing, they flipped from lots of long-running system tests to unit tests. It took them four years of effort to get there, but they took their engineering team’s ability to see whether what they’ve done works from 48 hours down to 3 and a half minutes. Think what that would do to your productivity if you had to wait 48 hours to find out if the simplest change had been successful. You’d push bigger and bigger changes through the pipe.
So when you do have a problem, it’s harder and harder to figure out what that problem is, whereas if you’re able to find out if it works in 3 and a half minutes, you’ll be running that all the time on the smallest changes you make, and you will build a better product that’s more scalable and has less Band-Aids. Right? You’re not out with the super glue and sticky tape trying to seal over the gaps for bad choices that you made because you didn’t know that there was a problem. You didn’t know it wasn’t going to meet the standard.
So that cost of chaos is because of poor technical leadership and low engineering excellence. You might see engineering mediocrity within an organisation. You can solve these problems by having your best engineers, your best people, focus on delivering the systems, the optimisations to the systems that you need so everybody on every team has the easiest possible job adding value in your product.