a·gen·tic a·gil·i·ty

The Estimation Trap: How Tracking Accuracy Undermines Trust, Flow, and Value in Software Delivery

TL;DR; Focusing on estimation accuracy as a performance metric leads to fear, gaming, and a culture of compliance rather than real improvement, which undermines trust, innovation, and actual value delivery. Research shows that when teams are judged on how closely they meet estimates, they pad numbers, hide risks, and avoid complex work, resulting in false success and missed opportunities for learning. Instead, shift attention to evidence-based metrics that reflect customer value, system health, and delivery flow, and use estimates only to support learning and informed conversations, not as tools for control.

Published on
Written by Martin Hinshelwood and contributed to by Ralph Jocham , Nigel Thurlow
15 minute read
Image
https://nkdagility.com/resources/rE-_hlb3Y34
Comments
Subscribe

In many software organisations, estimation accuracy is mistaken for predictability and control. Leadership asks teams to compare original estimates to actuals in hopes of improving forecasts. But this creates a false sense of certainty , one that undermines trust, distorts priorities, and derails delivery.

When the Metric Becomes the Target

Metrics are never neutral. Once teams are judged by how closely they meet estimated timelines or planned outputs, those metrics stop reflecting the truth. The more visible and enforced the target becomes, the more teams adapt, not to improve outcomes, but to survive the system. What follows is a cascade of distorted behaviours: silence replaces honesty, delivery becomes performance theatre, and metrics become tools of compliance rather than learning. The following patterns are not outliers; they are systemic symptoms of measurement misuse.

Malicious Compliance: When Teams Give Up on Caring

When systems overemphasise compliance, teams don’t rebel; they comply. Maliciously. They log the hours. They meet the metrics. They do exactly what’s asked; but no more. They stop asking questions. They stop raising concerns. They stop caring.

This kind of mechanical compliance doesn’t improve delivery; it undermines it. Developers fill in timesheets at the end of the week with whatever gets approved. They make up hours to satisfy reporting tools. What ends up in the system looks clean and green, but it’s fiction.

And what gets lost is far worse: safety, curiosity, technical excellence, and any sense of pride in the outcome. A culture of malicious compliance breeds disengagement, risk blindness, and degraded quality. If you’re measuring in six-minute increments, you’re not managing for value. You’re auditing obedience.

Green Shifting: When Metrics Replace Truth

Once metrics become the focus, honesty becomes optional. Teams under pressure to hit targets will show green status until the very moment they can’t hide red anymore. This phenomenon, sometimes called “green shifting,” isn’t a failure of individuals. It’s the predictable result of a system that rewards status optics over empirical feedback.

When the dashboard matters more than the work, risk gets buried. Quality is sidelined. And problems that could’ve been solved early are deferred until they explode. This isn’t management; it’s theatre.

Fear-Driven Delivery

When performance is judged by how closely estimates match actuals, teams shift into survival mode. Psychological safety evaporates. People stop flagging problems, bugs, and risks. It’s not due to apathy, but fear of missing the number. Defects get buried. Safety is deferred. Risk is hidden.

The focus moves from building the right thing to defending the wrong metric.

When you penalise unpredictability, you don’t get more predictability. You get fear, silence, and a culture optimised for hiding reality. This is how delivery becomes theatre.

Distorted Behaviours and False Success

Comparing estimates to actuals can be useful for learning, but when it becomes a performance metric, it changes behaviour. Teams are no longer incentivised to improve forecasting; they’re incentivised to look predictable.

What happens next is entirely predictable:

In one large organisation, teams were told they could deliver no more than five points per story, and no more than 24 points per sprint. The result? Teams padded everything to hit exactly 24 points. Story sizes gravitated to five points regardless of complexity. Innovation vanished, curiosity died, and delivery became a game of maximising perceived output. They met the metric perfectly and completely undermined the point of estimation. This is what happens when the system is designed for optics, not outcomes.

These aren’t edge cases; they’re rational adaptations to a distorted system. The result is a culture of compliance, not curiosity.

Thurlow’s Law of Metric Distortion: “Any metric you measure will appear to improve in the short term. This doesn’t mean the system improved, only that people adjusted their behaviour to game the metric.”

The System Learns to Lie

This principle highlights a broader risk. Once teams realise they’re being judged on metric performance, they start optimising for appearances. They stop focusing on delivery, learning, and value. The metric becomes a distraction from what really matters. It reinforces behaviours that prioritise green dashboards over working software. This is how green shifting starts. Status reports stay green until the moment they can no longer hide the red. It’s not deceit. It’s self-preservation. In a system optimised for appearances, truth is delayed until failure is unavoidable. The focus shifts away from delivery, learning, and value.

The Evidence Behind the Trap

Studies from Lederer & Prasad, Jørgensen, and others show that using estimation accuracy as an evaluation criterion strongly influences behaviour, and often negatively. When estimation accuracy becomes a KPI, it reshapes incentives across the system, often with unintended results. One experimental study (Lorko et al., 2022) found that when participants were rewarded solely for estimation accuracy, they systematically overestimated and deliberately slowed down to “finish on schedule.” The appearance of control was preserved, but efficiency was lost.

Another study (Jørgensen & Grimstad, 2008) showed that people who knew they’d be judged on their estimates produced more biased and less realistic figures. They weren’t aiming for truth; they were aiming for safety.

This is a textbook example of Goodhart’s Law. When a measure becomes a target, it stops being useful as a measure and starts driving the wrong behaviours.

Goodhart’s law:  “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Trust Is a Two-Way Street

If you treat your engineers like they’re untrusted contractors who need to account for every six-minute increment, don’t be surprised when morale tanks. One developer put it bluntly: “If you’re going to track me like a machine, don’t expect me to act like an innovator.” Research shows employees who feel trusted are more engaged and productive. Conversely, heavy time tracking breeds a culture of micromanagement and mistrust. More than half of knowledge workers say time tracking actually prevents them from doing their best work. When people feel every minute is under a microscope, they’re less likely to ask questions or offer improvements. You’re starving your team of psychological safety, and with it, the conditions for innovation, quality, and honesty. When people are punished for missing estimates, they stop raising risks. They stop discussing trade-offs. Data becomes performative. The real work gets buried under ritual. The system becomes more predictable on paper, but more brittle in reality.

Bad Estimates Don’t Make You a Bad Developer

Software development is creative problem-solving. No two tasks are truly alike. You can’t reliably predict how long it will take to untangle a thorny bug or integrate a library. Sometimes, a “quick” fix can turn into a two-day rabbit hole. So why beat people up when they miss an arbitrary prediction? Estimating in hours assumes everyone is equally experienced and works at a constant pace. They don’t. Pressuring developers to “improve” their guesses assumes effort and duration are predictable. In knowledge work, they’re not. It only creates stress and encourages padding or sandbagging. It’s a game with no winners.

Time Pressure Kills Quality

When management’s only lever is the schedule, quality suffers. Tom DeMarco and Tim Lister, in Peopleware, warn that unreachable deadlines force developers to cut corners: “Workers kept under extreme time pressure will begin to sacrifice quality… deliver products that are unstable and not really complete.” Lab studies back this up. Developers under tight time pressure work faster, not better, and quality drops. And when shortcuts pile up, the cost isn’t just bugs, it’s fragile systems, frustrated customers, and eroded trust.

Hours Worked Do Not Equal Value Delivered

Customers don’t buy effort, hours, or estimation accuracy. They buy working software that solves their problems. A day spent cleaning up architecture might look unproductive on a timesheet, but it delivers enormous long-term value. Optimising for logged time only encourages burnout, presenteeism, and a celebration of busyness over outcomes.

Metrics like velocity or hours measure output, but they don’t measure the value customers care about. It’s better to track what matters: how frequently you can deliver features, how quickly you recover from failures, and whether you’re improving the user experience. These metrics help track how fast you’re learning (Time to Market), how much waste exists in your delivery process (Ability to Innovate), and whether users are sticking around and benefiting from what you’ve delivered (Current Value).

What to Do Instead

If you’re serious about improving delivery outcomes, stop obsessing over time. Time-based metrics show what happened, not what mattered. They miss the nuance of complexity, cognition, and asynchronous problem-solving. When you treat delivery like stopwatch management, you reward appearances over insight.

Evidence-Based Management (EBM) is a way of managing with data that reflects actual outcomes and system capability. It helps leaders move beyond speculation by focusing on what is observable and valuable.

Good decisions start with real data, not guesses. EBM helps teams and leaders focus on what actually delivers value, not what was forecast, promised, or imagined.

In large-scale systems, direct customer contact is rare. That makes feedback loops even more critical. We must rely on proxy signals like usage trends, satisfaction scores, defect trends, and change failure rates to know if we’re on track. Not every team needs direct access to the customer, but every team needs access to evidence that what they shipped is working.

EBM encourages decisions based on what is actually happening, rather than what was predicted. Forecasts can support decision-making, but only when used transparently to explore assumptions, not when turned into compliance targets. When forecast accuracy becomes a performance metric, it violates empiricism by rewarding appearances rather than actual outcomes.

Leadership must create transparency around outcomes, not intentions. This means embracing metrics that reflect customer value, system health, and delivery capability, even when they challenge the status quo.

Let’s be clear: in complex, knowledge-based work, there is no meaningful diagnostic value in “estimate vs actual.” Take, for example, a cross-functional team building an internal developer platform. In the first quarter, leadership tracked the estimated vs actual across epics to improve forecasting. Developers quickly learned to overestimate tasks, avoided exploratory work, and padded estimates to match targets. The numbers looked better, but progress slowed, innovation stalled, and valuable refactoring work vanished from the backlog. By the time leadership realised the disconnect, technical debt had doubled. The team hadn’t become more predictable; it had simply become more cautious and less effective. This is the cost of measuring the wrong thing. It leads to the wrong conclusions.

“Estimate vs actual measures the work, but the waste lives in the gaps , the wait, the handoff, the delay. So you’re optimising the wrong thing.” - Nigel Thurlow

This is a clear example of Systems Thinking, as outlined in The Flow System (Thurlow et al., 2020). The true constraint rarely lies in the task. It lies in the system: the queues, context switching, blocked dependencies, or fragmented communication paths that hinder the delivery of value. In most cases, the constraint lies in the workflow, rather than in the functions themselves.

Even when used “diagnostically”, estimate vs actual as a metric misleads:

A reminder of Thurlow’s Principle of Estimation Distortion above!

A Better Path Forward with Evidence-Based Management

EBM organises improvement around four Key Value Areas (KVAs):

The metrics we use should support these questions, not distract from them. Here’s how EBM-oriented alternatives compare:

Instead of…Try…
Estimate vs ActualEnd-to-end lead time from commitment to usable customer delivery (Time to Market)
Story points completedCustomer satisfaction (Current Value)
On-time delivery rateQuality Trends, or % of effort on new vs sustaining work (Ability to Innovate)
Headcount-based planningOpportunity backlog delta (Unrealised Value)

To understand and improve delivery, stop obsessing over how close your guesses were. Instead, measure how your system behaves across the value stream and under varying flow loads. EBM encourages the use of actionable, outcome-aligned metrics that reflect actual system health, rather than projected compliance.

If you must discuss estimates, use them to explore assumptions and complexity, not to evaluate people. The ultimate goal is to deliver meaningful outcomes to customers. That requires embracing uncertainty, surfacing impediments, and improving system capability. The aim is not to enforce forecast compliance. Value lies in understanding, not accuracy.

Estimation should support informed conversations about uncertainty. It should not become a tool used to force predictability.

Quantitative vs Qualitative

Most metrics in delivery are quantitative, including lead time, flow efficiency, and throughput. But numbers don’t tell the whole story. If you want to know whether you’re building the right thing, you need qualitative feedback: real customer conversations, issue sentiment, satisfaction narratives, and behavioural observations.

Quantitative data tells you what happened. Qualitative insight helps you understand why.

No chart or trendline can replace a conversation with a frustrated user or a support ticket that describes unmet needs. The most resilient teams blend data with dialogue, metrics with meaning.

Radical Candour: Have the Courage to Stop

This isn’t about shielding teams from accountability. It’s about holding ourselves accountable to a higher standard of leadership. Framing time estimate accuracy as a condition for trust is a failure of leadership. It signals a lack of psychological safety and a misunderstanding of how complex work unfolds. True leadership fosters environments where learning is safe, discovery is encouraged, and performance is judged by value, not conformity to expectations. It’s not helping them grow; it’s punishing them for unpredictability inherent in complex work. Radical candour means caring personally and challenging directly. The challenge here is to stop clinging to false certainty and instead focus on the outcomes that matter for your business and your customers.

Don’t replace one flawed proxy with another. Metrics like cycle time, throughput, or flow efficiency are helpful, but only as part of a broader conversation about value, quality, and improvement. Alone, they tell you nothing about whether you’re solving the correct problems or improving customer outcomes. Consider adopting Evidence-Based Management and DORA to shift focus toward empiricism and value flow across the organisation. Talk with your team about impediments and improvements rather than the hours they logged. When you remove the spotlight from the clock, you’ll find your people deliver better software, enjoy their work more, and build trust along the way.

In Summary

The Estimation Trap appears to be a process improvement effort. But underneath it creates a fear-based culture that rewards gaming and punishes uncertainty. It distorts delivery and kills innovation in the name of control.

All quantitative measures can do, is inform of system efficiency. They cannot inform of system effectiveness!

Instead of asking, “Why didn’t we match our original estimate?” ask, “What did we learn, how did we adapt, and are we improving the outcomes that matter?”

Real progress starts when people feel safe enough to tell the truth about complexity, risk, and what it actually takes to deliver. That’s the objective measure of a team delivering meaningful outcomes, improving their system, and creating value for customers.


References

  1. Thurlow, Nigel; Turner, Brian Rivera; Helm, John. The Flow System: The Evolution of Agile and Lean Thinking in an Age of Complexity (2020)
  2. Lederer & Prasad (1998). “A causal model for software cost estimating error”
  3. Lorko et al. (2022). “Hidden Inefficiency: Strategic Inflation of Project Schedules”
  4. Jørgensen & Grimstad (2008). “The impact of irrelevant and misleading information on software development effort estimates”
  5. Jørgensen (2004). “A review of studies on expert estimation of software development effort”
  6. Abdel-Hamid et al. (1999). “The dynamics of software project performance”
  7. Peopleware Book Summary
  8. Impact of time pressure on software quality
  9. Why Managers Should Focus on Outcomes, Not Hours
  10. Accelerate: The Science of Lean Software and DevOps (Forsgren, Humble, Kim)
  11. SPACE Framework Whitepaper (GitHub)
  12. Why Leading Agile Teams Focus on Customer Value
Smart Classifications

Each classification [Concepts, Categories, & Tags] was assigned using AI-powered semantic analysis and scored across relevance, depth, and alignment. Final decisions? Still human. Always traceable. Hover to see how it applies.

Comments
Subscribe

Connect with Martin Hinshelwood

If you've made it this far, it's worth connecting with our principal consultant and coach, Martin Hinshelwood, for a 30-minute 'ask me anything' call.

Our Happy Clients​

We partner with businesses across diverse industries, including finance, insurance, healthcare, pharmaceuticals, technology, engineering, transportation, hospitality, entertainment, legal, government, and military sectors.​

Healthgrades Logo

Healthgrades

NIT A/S

Ericson Logo

Ericson

Graham & Brown Logo

Graham & Brown

ProgramUtvikling Logo

ProgramUtvikling

SuperControl Logo

SuperControl

Teleplan Logo

Teleplan

Slaughter and May Logo

Slaughter and May

Bistech Logo

Bistech

Boxit Document Solutions Logo

Boxit Document Solutions

YearUp.org Logo

YearUp.org

Microsoft Logo

Microsoft

Genus Breeding Ltd Logo

Genus Breeding Ltd

Xceptor - Process and Data Automation Logo

Xceptor - Process and Data Automation

Trayport Logo

Trayport

Illumina Logo

Illumina

Emerson Process Management Logo

Emerson Process Management

Qualco Logo

Qualco

New Hampshire Supreme Court Logo

New Hampshire Supreme Court

Nottingham County Council Logo

Nottingham County Council

Washington Department of Transport Logo

Washington Department of Transport

Royal Air Force Logo

Royal Air Force

Ghana Police Service Logo

Ghana Police Service

Washington Department of Enterprise Services Logo

Washington Department of Enterprise Services

MacDonald Humfrey (Automation) Ltd. Logo

MacDonald Humfrey (Automation) Ltd.

Lean SA Logo

Lean SA

Slicedbread Logo

Slicedbread

Epic Games Logo

Epic Games

Milliman Logo

Milliman

Boeing Logo

Boeing