When organizations engage with us for DevOps consulting, it’s rarely by chance. Typically, they’ve identified a problem—a problem that’s grown too big to ignore, a problem that requires expertise beyond their current capabilities. As a DevOps consultant, I’ve seen this scenario play out many times. One of the most significant engagements I’ve experienced involved a large organization in the oil and gas industry. This case illustrates the complexity and challenges organizations face when they try to implement DevOps, and the transformational impact that a well-executed DevOps strategy can have.
The Root Cause: Technical Debt in a Large-Scale System
Identifying the Problem
The company I worked with had a product that sold for $50,000 per license—a desktop product with significant complexity. They had 90 teams spread across 13 locations in nine different countries, all working on this product. Each team had its own long-term branch, leading to divergent development paths that could last as long as a year. The organization attempted to mitigate these divergences by applying force—a common but flawed approach.
They spun up a dedicated DevOps team, centralizing DevOps engineers in one area. However, this team faced a monumental challenge: they couldn’t directly influence what the 90 teams were doing but were responsible for ensuring continuous delivery and daily builds of the product.
The Complexity of the System
To achieve their goal, the DevOps team built an infrastructure that facilitated about 11,000 build executions per day—1.2 million a year—just to keep the product functioning daily. The system had to manage code from 90 different teams using different source control systems, including Git, Team Foundation Version Control, Subversion, and even custom in-house systems. Each platform had its own branching and merging capabilities, which the DevOps system had to integrate and unify.
The complexity of the product was staggering. Developers needed workstations with 128GB of RAM and 24-core processors just to build the product. It was an obscene setup, but necessary due to the product’s massive and complicated architecture.
The Lesson Learned: The Impact of Technical Debt
Assessing the Situation
We conducted a DevOps assessment for this organization, identifying key areas of concern:
Diverse Source Control Systems: Teams were using different systems, creating unnecessary complexity.
Distributed Teams: With 90 teams in 13 locations, coordination was a nightmare.
Multiple Funding Routes: The product had 13 different funding sources, limiting control over the entire system.
The product had been in development for 25 years, during which the company acquired competitors and integrated their technologies, often without addressing technical debt. Instead of migrating new acquisitions to their systems, they kept legacy systems intact, adding layers of complexity.
The Road to Simplification
Over four years, we worked to align the organization’s development practices. We consolidated everything into a single source control system, which allowed us to perform a unified build that produced one version of the product. We also reduced the number of branches and moved towards mainline development.
This process wasn’t easy. It required patience, collaboration across different locations, and a deep understanding of both the product and the organizational structure. The teams involved were not within the same reporting structure, making it impossible to simply dictate changes. Instead, we had to influence, persuade, and gradually bring everyone on board.
The Value of Refactoring
A significant part of this transformation was helping the organization understand the importance of refactoring. Over time, they had accumulated technical debt by integrating new systems without proper refactoring. This debt had to be paid back to simplify the product and make future development more manageable.
The organization had been integrating rather than refactoring—patching together systems rather than taking the time to rebuild them properly. This approach made the system increasingly unwieldy, obscuring the forest for the trees. The key lesson here is the importance of regular refactoring and simplifying as you go.
The Outcome: Simplification, Efficiency, and Happiness
What Organizations Can Expect
The outcomes of a successful DevOps transformation are profound:
Reduced Cost and Time: Streamlining the development process reduces both the cost and time required to deliver new features.
Improved Team Happiness: Developers spend less time grappling with unnecessary complexity and more time solving meaningful business problems.
Increased Capability: With a focus on refactoring and clean code, teams become more effective and can deliver more value over time.
A great example of this is the Azure DevOps team at Microsoft. Before they embraced DevOps, they delivered about 25 new features to production each year, with 600 people working on the product. After investing in reducing build times—from 72 hours to just 3.5 minutes—they scaled up to nearly 300 features per year with the same number of people.
The Importance of Skill and Ownership
One critical point to understand is that DevOps is not something you can simply “install” in an organization. It requires skill, dedication, and a willingness to embrace change. If a consultancy promises to do all the work for you, your people won’t learn anything. They won’t go through the necessary pain of cleaning up their technical debt and will likely repeat the same mistakes.
Think of it like cleaning your teenager’s room. If you do it for them, they won’t learn the importance of keeping their space tidy. But if they do it themselves—especially if they have to clean up a particularly nasty mess—they’re more likely to avoid making the same mess in the future. The same principle applies to DevOps and engineering practices.
A Final Thought
Bringing DevOps into an organization is about upskilling and taking ownership of the problems you’ve created over time. It’s about dealing with your own “crap” so that you create less of it in the future. This approach enables teams to be slicker, more effective, and ultimately, more successful in delivering value to the business.
Conclusion
The journey of DevOps transformation is not easy, but it is immensely rewarding. By focusing on simplification, refactoring, and ownership, organizations can achieve significant improvements in efficiency, cost, and team satisfaction. The story of the oil and gas company is a powerful reminder that while the road may be long and complex, the destination is well worth the effort. Remember, the key to successful DevOps is not just in the tools or the processes but in the mindset and skills of the people involved. Embrace the challenge, learn from the experience, and watch as your organization transforms into a lean, agile, and highly effective machine. 🚀
So when customers engage with us, they’re quite often in a place where they’ve identified a problem themselves. They don’t call a DevOps consulting service randomly; they have some kind of problem that they’ve identified and they want help figuring it out and what’s the next thing to do.
Probably the biggest engagement I saw was with a really large organisation in the oil and gas world. They had a product that was, I think, $50,000 a licence for this product. It’s a desktop product, and the main problem that they had was that what they thought their main problem was and what actually the main problem was might be different things. This is, I’m thinking from retrospection, right, because I’ve been through it. One of the main problems that they had was that they had 90 teams in 13 locations in nine different countries working on this product. Each of those teams had their own long-term branch that they worked on, so things would diverge. They could diverge for quite some time; it could be as much as a year. They were trying to mitigate that with an application of force. That’s probably the way I would think about it. They tried to solve the problem with an application of force.
So what they did was they spun up a DevOps team, right? A dedicated DevOps team. Here are our DevOps engineers; they’re in this central area. Some of you have already spotted the problem. They can’t change what all of these teams are doing, but they have to figure out how they solved the problem of wanting continuous delivery. They wanted to have a daily build of their product; that was their goal. So they ended up creating a bunch of infrastructure to allow them to do that and managing that infrastructure. They would manage it, right? The teams didn’t have to do anything, and if the teams changed something, that was their problem to go fix.
You had this team, I think it was eight or nine people, but I think it was eight or nine people on this team, and their whole job was building this product. In order to do that, they ended up building a system that facilitated about 11,000 build executions a day. That’s 1.2 million build executions a year, and that was just to have a working product every day. They would have these temporary branches where they brought the code together and automated merging of code from all of these different branches. All of those different teams—these 90 teams—were all on different source control systems, different not only systems but platforms as well. They might have had 10 teams on Git, 10 teams on Team Foundation Version Control, another 10 teams on Subversion, and another 10 teams doing something else.
So they all had different capabilities, different branching capabilities, different merging capabilities, and their build system, their build engine, had to go reach out to all of these systems and pull together a version of the code and then build it to be able to create this unified version of the product. It was insanely complicated, insanely complicated and super expensive. I think in order for a developer to work on the product on their workstation, in fact, to run it in production as well, you had to have like 100 cores. I can’t remember exactly what, but it was 128 GB RAM, lots and lots and lots of cores—24-core machine type of thing. It was just obscene, the speed of the machine required because it was a massive complicated product, and you couldn’t just build parts of it. You had to build the whole thing to make sure it worked—the whole platform and all the things that were built on top of it.
It was just hugely unwieldy. So we did a DevOps assessment, a state of DevOps assessment for them. We identified those key areas: everybody’s on different source control systems, everybody’s doing everything differently. They had some limitations because I think there were 13 different funding routes for this product, so they didn’t have full control of the whole thing. But what we were able to do was, over a number of years—because it’s a really big product, a really big organisation—I think there were something like 600 to 700 people working on it in different locations, and they weren’t in the same reporting structure. You can’t just tell them to do stuff.
It took four years to get everything aligned into a single source control system, right? So that we could do one build out of this source control system and result in one version of the product. Then how do we fold down the branches so that they don’t have so many branches working towards that model? How do we work towards this idea of mainline development? It’s these types of ideas, these types of outcomes that vastly simplify the problem that companies are trying to solve. People don’t get into these positions from a nefarious intent.
In this particular example, this product had been built and worked on for 25 years. They had bought anybody who tried to compete with them. So bring that piece into the puzzle, right, and you end up with a massive product with a very complicated architecture that’s very difficult and time-consuming to work on. You need lots of people; it’s very distributed because all of these different parts of the puzzle were brought in from different entities that now all work for the same company. They hadn’t taken the time to pay back their technical debt. They’d accrued debt. You know, we use Git; we take on a company that uses Subversion, and instead of helping them migrate all of their stuff into Git and fix all of the stuff that we need to fix in order to bring it into our system, we’re just going to leave them as is and plug into them and pull their stuff out and integrate it into our system.
So rather than doing that work because it seemed insurmountable at the time, they didn’t do that. They just pulled the stuff in. On top of that, perhaps things didn’t exist 25 years ago, so they had a lot of teams that were on their own custom in-house built source control system. When they started working on this product, there were no large-scale source control systems. You were probably talking about Visual SourceSafe at the time, but Visual SourceSafe was developed at the time of small networks and had a max size of about 5 GB. This thing was ginormous, right? So how do you manage that? Well, you have to build your own systems. When you go to try and create an automated build, perhaps there isn’t a commercial automated build system, so you build your own. Then once a commercial automated build system becomes available, you adapt into that, but you’re actually just calling out to your existing thing because you’ve not taken the time to rewrite everything because that’s an inordinate task.
So we integrate it rather than refactor, right? We’re missing all of those refactors over time, and it just gets bigger and bigger and bigger and more unwieldy. Sometimes it’s difficult—that’s a terrible expression—but difficult to see the forest for the trees, right? All of these things, like what should we go fix? Where should we go look? Where should we start? What’s our biggest bang for our buck that we can go fix and figure out? Perhaps lots of little things we need to fix around the edge.
So what I would expect an organisation to get—what’s like the outcome and improvement that our customers can expect? It’s a more effective process, right? We’re talking about taking what the organisation is doing right now, usually within the context of a product when you’re talking about DevOps, but it could be holistically across an organisation and figuring out how do we eliminate waste? How do we ensure that we have automation, that that automation is effective? Because you can have ineffective automation, like that massive build system I was talking about.
How do we simplify, simplify, simplify? What I would expect the outcomes to be would be reduced cost to deliver new features, reduced time to deliver new features, improved happiness of the people that are building the product because they’re spending less time struggling with the complexity that we’ve created over time and more time focusing on solving the business problem. These are multipliers for your capability to deliver.
A great example of those multipliers is actually the Azure DevOps team at Microsoft, right? They create a product called Azure DevOps; it used to be Team Foundation Server when it was local. Back in 2010, before they started doing much more frequent deliveries and all the automation and focusing on these things, before they brought DevOps into their story, they were delivering about 25 features to production each year, and that was with 600 people working on it. They were delivering about 25 new features, lots of bug fixes, lots of little tweaks, but 25 new features to production each year.
Fast forward to five years later, they’d spent a huge amount of time investing in reducing the amount of time it takes them to build their product from, I think, three or four days—72 hours plus—to build their product down to three and a half minutes. Those types of capabilities enabled them to go from 25 features to production each year to nearly 300 features to production each year with the same number of people and the same people. It’s not different people; it’s the same people, right?
So what we’re talking about is holistically scaling up all of the people that we have within our context so that everybody understands DevOps, everybody understands refactoring and clean code and what the impact is on all of these things so that we can make them more effective over time. You will be faster, you’ll be slicker, and it will be cheaper, right? That’s what we’re talking about when we talk about bringing DevOps and the DevOps philosophy into an organisation. But it takes skill. We can’t install DevOps in your organisation; you still have to do the work. If you can find a consultancy that says they will do all the work for you, your people are not going to learn anything. They’re not going to have gone through the pain of actually fixing their problems.
It’s like getting a cleaner in to clean your teenager’s room, right? Has your teenager learned to clean up their room? No, they haven’t. They’re just going to do the same thing over and over and over again. When you do those big rewrites of your product and you take the same people and get them to rewrite your product, what do you think is going to be the outcome except a rewritten product that’s in exactly the same state over time as your existing product? Because they’ve not gone through that crucible of learning the pain of actually cleaning up—the pain of pulling that mouldy plate out from under the bed and having to deal with this mouldy plate. Maybe next time I won’t leave the plate because I don’t want to deal with that mouldy plate, right?
That’s what we’re talking about with engineering practices. We’re talking about upskilling, dealing with our own crap, right? Dealing with it ourselves so that we make less of that stuff in the future, that we do things in a better way that enables us to be slicker and more effective.