Rethinking Capacity Planning
Explores how effective capacity planning shifts focus from individual hours to system-level flow, using Lean and Agile principles to improve …
TL;DR; Focusing on estimation accuracy as a performance metric leads to fear, gaming, and a culture of compliance rather than real improvement, which undermines trust, innovation, and actual value delivery. Research shows that when teams are judged on how closely they meet estimates, they pad numbers, hide risks, and avoid complex work, resulting in false success and missed opportunities for learning. Instead, shift attention to evidence-based metrics that reflect customer value, system health, and delivery flow, and use estimates only to support learning and informed conversations, not as tools for control.
In many software organisations, estimation accuracy is mistaken for predictability and control. Leadership asks teams to compare original estimates to actuals in hopes of improving forecasts. But this creates a false sense of certainty , one that undermines trust, distorts priorities, and derails delivery.
Metrics are never neutral. Once teams are judged by how closely they meet estimated timelines or planned outputs, those metrics stop reflecting the truth. The more visible and enforced the target becomes, the more teams adapt, not to improve outcomes, but to survive the system. What follows is a cascade of distorted behaviours: silence replaces honesty, delivery becomes performance theatre, and metrics become tools of compliance rather than learning. The following patterns are not outliers; they are systemic symptoms of measurement misuse.
When systems overemphasise compliance, teams don’t rebel; they comply. Maliciously. They log the hours. They meet the metrics. They do exactly what’s asked; but no more. They stop asking questions. They stop raising concerns. They stop caring.
This kind of mechanical compliance doesn’t improve delivery; it undermines it. Developers fill in timesheets at the end of the week with whatever gets approved. They make up hours to satisfy reporting tools. What ends up in the system looks clean and green, but it’s fiction.
And what gets lost is far worse: safety, curiosity, technical excellence, and any sense of pride in the outcome. A culture of malicious compliance breeds disengagement, risk blindness, and degraded quality. If you’re measuring in six-minute increments, you’re not managing for value. You’re auditing obedience.
Once metrics become the focus, honesty becomes optional. Teams under pressure to hit targets will show green status until the very moment they can’t hide red anymore. This phenomenon, sometimes called “green shifting,” isn’t a failure of individuals. It’s the predictable result of a system that rewards status optics over empirical feedback.
When the dashboard matters more than the work, risk gets buried. Quality is sidelined. And problems that could’ve been solved early are deferred until they explode. This isn’t management; it’s theatre.
When performance is judged by how closely estimates match actuals, teams shift into survival mode. Psychological safety evaporates. People stop flagging problems, bugs, and risks. It’s not due to apathy, but fear of missing the number. Defects get buried. Safety is deferred. Risk is hidden.
The focus moves from building the right thing to defending the wrong metric.
When you penalise unpredictability, you don’t get more predictability. You get fear, silence, and a culture optimised for hiding reality. This is how delivery becomes theatre.
Comparing estimates to actuals can be useful for learning, but when it becomes a performance metric, it changes behaviour. Teams are no longer incentivised to improve forecasting; they’re incentivised to look predictable.
What happens next is entirely predictable:
In one large organisation, teams were told they could deliver no more than five points per story, and no more than 24 points per sprint. The result? Teams padded everything to hit exactly 24 points. Story sizes gravitated to five points regardless of complexity. Innovation vanished, curiosity died, and delivery became a game of maximising perceived output. They met the metric perfectly and completely undermined the point of estimation. This is what happens when the system is designed for optics, not outcomes.
These aren’t edge cases; they’re rational adaptations to a distorted system. The result is a culture of compliance, not curiosity.
Thurlow’s Law of Metric Distortion: “Any metric you measure will appear to improve in the short term. This doesn’t mean the system improved, only that people adjusted their behaviour to game the metric.”
This principle highlights a broader risk. Once teams realise they’re being judged on metric performance, they start optimising for appearances. They stop focusing on delivery, learning, and value. The metric becomes a distraction from what really matters. It reinforces behaviours that prioritise green dashboards over working software. This is how green shifting starts. Status reports stay green until the moment they can no longer hide the red. It’s not deceit. It’s self-preservation. In a system optimised for appearances, truth is delayed until failure is unavoidable. The focus shifts away from delivery, learning, and value.
Studies from Lederer & Prasad, Jørgensen, and others show that using estimation accuracy as an evaluation criterion strongly influences behaviour, and often negatively. When estimation accuracy becomes a KPI, it reshapes incentives across the system, often with unintended results. One experimental study (Lorko et al., 2022) found that when participants were rewarded solely for estimation accuracy, they systematically overestimated and deliberately slowed down to “finish on schedule.” The appearance of control was preserved, but efficiency was lost.
Another study (Jørgensen & Grimstad, 2008) showed that people who knew they’d be judged on their estimates produced more biased and less realistic figures. They weren’t aiming for truth; they were aiming for safety.
This is a textbook example of Goodhart’s Law. When a measure becomes a target, it stops being useful as a measure and starts driving the wrong behaviours.
Goodhart’s law: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”
If you treat your engineers like they’re untrusted contractors who need to account for every six-minute increment, don’t be surprised when morale tanks. One developer put it bluntly: “If you’re going to track me like a machine, don’t expect me to act like an innovator.” Research shows employees who feel trusted are more engaged and productive. Conversely, heavy time tracking breeds a culture of micromanagement and mistrust. More than half of knowledge workers say time tracking actually prevents them from doing their best work. When people feel every minute is under a microscope, they’re less likely to ask questions or offer improvements. You’re starving your team of psychological safety, and with it, the conditions for innovation, quality, and honesty. When people are punished for missing estimates, they stop raising risks. They stop discussing trade-offs. Data becomes performative. The real work gets buried under ritual. The system becomes more predictable on paper, but more brittle in reality.
Software development is creative problem-solving. No two tasks are truly alike. You can’t reliably predict how long it will take to untangle a thorny bug or integrate a library. Sometimes, a “quick” fix can turn into a two-day rabbit hole. So why beat people up when they miss an arbitrary prediction? Estimating in hours assumes everyone is equally experienced and works at a constant pace. They don’t. Pressuring developers to “improve” their guesses assumes effort and duration are predictable. In knowledge work, they’re not. It only creates stress and encourages padding or sandbagging. It’s a game with no winners.
When management’s only lever is the schedule, quality suffers. Tom DeMarco and Tim Lister, in Peopleware, warn that unreachable deadlines force developers to cut corners: “Workers kept under extreme time pressure will begin to sacrifice quality… deliver products that are unstable and not really complete.” Lab studies back this up. Developers under tight time pressure work faster, not better, and quality drops. And when shortcuts pile up, the cost isn’t just bugs, it’s fragile systems, frustrated customers, and eroded trust.
Customers don’t buy effort, hours, or estimation accuracy. They buy working software that solves their problems. A day spent cleaning up architecture might look unproductive on a timesheet, but it delivers enormous long-term value. Optimising for logged time only encourages burnout, presenteeism, and a celebration of busyness over outcomes.
Metrics like velocity or hours measure output, but they don’t measure the value customers care about. It’s better to track what matters: how frequently you can deliver features, how quickly you recover from failures, and whether you’re improving the user experience. These metrics help track how fast you’re learning (Time to Market), how much waste exists in your delivery process (Ability to Innovate), and whether users are sticking around and benefiting from what you’ve delivered (Current Value).
If you’re serious about improving delivery outcomes, stop obsessing over time. Time-based metrics show what happened, not what mattered. They miss the nuance of complexity, cognition, and asynchronous problem-solving. When you treat delivery like stopwatch management, you reward appearances over insight.
Evidence-Based Management (EBM) is a way of managing with data that reflects actual outcomes and system capability. It helps leaders move beyond speculation by focusing on what is observable and valuable.
Good decisions start with real data, not guesses. EBM helps teams and leaders focus on what actually delivers value, not what was forecast, promised, or imagined.
In large-scale systems, direct customer contact is rare. That makes feedback loops even more critical. We must rely on proxy signals like usage trends, satisfaction scores, defect trends, and change failure rates to know if we’re on track. Not every team needs direct access to the customer, but every team needs access to evidence that what they shipped is working.
EBM encourages decisions based on what is actually happening, rather than what was predicted. Forecasts can support decision-making, but only when used transparently to explore assumptions, not when turned into compliance targets. When forecast accuracy becomes a performance metric, it violates empiricism by rewarding appearances rather than actual outcomes.
Leadership must create transparency around outcomes, not intentions. This means embracing metrics that reflect customer value, system health, and delivery capability, even when they challenge the status quo.
Let’s be clear: in complex, knowledge-based work, there is no meaningful diagnostic value in “estimate vs actual.” Take, for example, a cross-functional team building an internal developer platform. In the first quarter, leadership tracked the estimated vs actual across epics to improve forecasting. Developers quickly learned to overestimate tasks, avoided exploratory work, and padded estimates to match targets. The numbers looked better, but progress slowed, innovation stalled, and valuable refactoring work vanished from the backlog. By the time leadership realised the disconnect, technical debt had doubled. The team hadn’t become more predictable; it had simply become more cautious and less effective. This is the cost of measuring the wrong thing. It leads to the wrong conclusions.
“Estimate vs actual measures the work, but the waste lives in the gaps , the wait, the handoff, the delay. So you’re optimising the wrong thing.” - Nigel Thurlow
This is a clear example of Systems Thinking, as outlined in The Flow System (Thurlow et al., 2020). The true constraint rarely lies in the task. It lies in the system: the queues, context switching, blocked dependencies, or fragmented communication paths that hinder the delivery of value. In most cases, the constraint lies in the workflow, rather than in the functions themselves.
Even when used “diagnostically”, estimate vs actual as a metric misleads:
A reminder of Thurlow’s Principle of Estimation Distortion above!
EBM organises improvement around four Key Value Areas (KVAs):
The metrics we use should support these questions, not distract from them. Here’s how EBM-oriented alternatives compare:
Instead of… | Try… |
---|---|
Estimate vs Actual | End-to-end lead time from commitment to usable customer delivery (Time to Market) |
Story points completed | Customer satisfaction (Current Value) |
On-time delivery rate | Quality Trends, or % of effort on new vs sustaining work (Ability to Innovate) |
Headcount-based planning | Opportunity backlog delta (Unrealised Value) |
Time-based metrics must be contextualised. Without insight into value, complexity, and customer outcomes, they risk becoming another distorted proxy.
To understand and improve delivery, stop obsessing over how close your guesses were. Instead, measure how your system behaves across the value stream and under varying flow loads. EBM encourages the use of actionable, outcome-aligned metrics that reflect actual system health, rather than projected compliance.
Cycle time only tracks how long one piece of work took. It says nothing about what the customer waited for or whether the system is flowing well. Lead time tells you how long the customer waits, starting from the moment a request is made until they receive something usable. Always measure from the outside in.
If you must discuss estimates, use them to explore assumptions and complexity, not to evaluate people. The ultimate goal is to deliver meaningful outcomes to customers. That requires embracing uncertainty, surfacing impediments, and improving system capability. The aim is not to enforce forecast compliance. Value lies in understanding, not accuracy.
Estimation should support informed conversations about uncertainty. It should not become a tool used to force predictability.
Most metrics in delivery are quantitative, including lead time, flow efficiency, and throughput. But numbers don’t tell the whole story. If you want to know whether you’re building the right thing, you need qualitative feedback: real customer conversations, issue sentiment, satisfaction narratives, and behavioural observations.
Quantitative data tells you what happened. Qualitative insight helps you understand why.
No chart or trendline can replace a conversation with a frustrated user or a support ticket that describes unmet needs. The most resilient teams blend data with dialogue, metrics with meaning.
This isn’t about shielding teams from accountability. It’s about holding ourselves accountable to a higher standard of leadership. Framing time estimate accuracy as a condition for trust is a failure of leadership. It signals a lack of psychological safety and a misunderstanding of how complex work unfolds. True leadership fosters environments where learning is safe, discovery is encouraged, and performance is judged by value, not conformity to expectations. It’s not helping them grow; it’s punishing them for unpredictability inherent in complex work. Radical candour means caring personally and challenging directly. The challenge here is to stop clinging to false certainty and instead focus on the outcomes that matter for your business and your customers.
Don’t replace one flawed proxy with another. Metrics like cycle time, throughput, or flow efficiency are helpful, but only as part of a broader conversation about value, quality, and improvement. Alone, they tell you nothing about whether you’re solving the correct problems or improving customer outcomes. Consider adopting Evidence-Based Management and DORA to shift focus toward empiricism and value flow across the organisation. Talk with your team about impediments and improvements rather than the hours they logged. When you remove the spotlight from the clock, you’ll find your people deliver better software, enjoy their work more, and build trust along the way.
The Estimation Trap appears to be a process improvement effort. But underneath it creates a fear-based culture that rewards gaming and punishes uncertainty. It distorts delivery and kills innovation in the name of control.
All quantitative measures can do, is inform of system efficiency. They cannot inform of system effectiveness!
Instead of asking, “Why didn’t we match our original estimate?” ask, “What did we learn, how did we adapt, and are we improving the outcomes that matter?”
Real progress starts when people feel safe enough to tell the truth about complexity, risk, and what it actually takes to deliver. That’s the objective measure of a team delivering meaningful outcomes, improving their system, and creating value for customers.
Each classification [Concepts, Categories, & Tags] was assigned using AI-powered semantic analysis and scored across relevance, depth, and alignment. Final decisions? Still human. Always traceable. Hover to see how it applies.
If you've made it this far, it's worth connecting with our principal consultant and coach, Martin Hinshelwood, for a 30-minute 'ask me anything' call.
We partner with businesses across diverse industries, including finance, insurance, healthcare, pharmaceuticals, technology, engineering, transportation, hospitality, entertainment, legal, government, and military sectors.
Healthgrades
NIT A/S
Ericson
Graham & Brown
ProgramUtvikling
SuperControl
Teleplan
Slaughter and May
Bistech
Boxit Document Solutions
YearUp.org
Microsoft
Genus Breeding Ltd
Xceptor - Process and Data Automation
Trayport
Illumina
Emerson Process Management
Qualco
New Hampshire Supreme Court
Nottingham County Council
Washington Department of Transport
Royal Air Force
Ghana Police Service
Washington Department of Enterprise Services
MacDonald Humfrey (Automation) Ltd.
Lean SA
Slicedbread
Epic Games
Milliman
Boeing