The Estimation Trap: How Tracking Accuracy Undermines Trust, Flow, and Value in Software Delivery

In many software organisations, estimation accuracy is mistaken for predictability and control. Leadership asks teams to compare original estimates to actuals in hopes of improving forecasts. But this creates a false sense of certainty , one that undermines trust, distorts priorities, and derails delivery.

When the Metric Becomes the Target

Metrics are never neutral. Once teams are judged by how closely they meet estimated timelines or planned outputs, those metrics stop reflecting the truth. The more visible and enforced the target becomes, the more teams adapt, not to improve outcomes, but to survive the system. What follows is a cascade of distorted behaviours: silence replaces honesty, delivery becomes performance theatre, and metrics become tools of compliance rather than learning. The following patterns are not outliers; they are systemic symptoms of measurement misuse.

Malicious Compliance: When Teams Give Up on Caring

When systems overemphasise compliance, teams don’t rebel; they comply. Maliciously. They log the hours. They meet the metrics. They do exactly what’s asked; but no more. They stop asking questions. They stop raising concerns. They stop caring.

This kind of mechanical compliance doesn’t improve delivery; it undermines it. Developers fill in timesheets at the end of the week with whatever gets approved. They make up hours to satisfy reporting tools. What ends up in the system looks clean and green, but it’s fiction.

And what gets lost is far worse: safety, curiosity, technical excellence, and any sense of pride in the outcome. A culture of malicious compliance breeds disengagement, risk blindness, and degraded quality. If you’re measuring in six-minute increments, you’re not managing for value. You’re auditing obedience.

Green Shifting: When Metrics Replace Truth

Once metrics become the focus, honesty becomes optional. Teams under pressure to hit targets will show green status until the very moment they can’t hide red anymore. This phenomenon, sometimes called “green shifting,” isn’t a failure of individuals. It’s the predictable result of a system that rewards status optics over empirical feedback.

When the dashboard matters more than the work, risk gets buried. Quality is sidelined. And problems that could’ve been solved early are deferred until they explode. This isn’t management; it’s theatre.

Fear-Driven Delivery

When performance is judged by how closely estimates match actuals, teams shift into survival mode. Psychological safety evaporates. People stop flagging problems, bugs, and risks. It’s not due to apathy, but fear of missing the number. Defects get buried. Safety is deferred. Risk is hidden.

The focus moves from building the right thing to defending the wrong metric.

When you penalise unpredictability, you don’t get more predictability. You get fear, silence, and a culture optimised for hiding reality. This is how delivery becomes theatre.

Distorted Behaviours and False Success

Comparing estimates to actuals can be useful for learning, but when it becomes a performance metric, it changes behaviour. Teams are no longer incentivised to improve forecasting; they’re incentivised to look predictable.

What happens next is entirely predictable:

Padding: Teams inflate estimates to guarantee hitting the target.

In one large organisation, teams were told they could deliver no more than five points per story, and no more than 24 points per sprint. The result? Teams padded everything to hit exactly 24 points. Story sizes gravitated to five points regardless of complexity. Innovation vanished, curiosity died, and delivery became a game of maximising perceived output. They met the metric perfectly and completely undermined the point of estimation. This is what happens when the system is designed for optics, not outcomes.

Risk aversion: Complex and innovative work is avoided because it’s difficult to estimate.
Scope distortion: Work is redefined midstream to match the estimate.
False success: Projects finish “on time” and “on budget,” but deliver little value. Many organisations never validate whether the promised benefits were actually realised. One team might deliver a multimillion-dollar system, only for no one to ever measure its usage or customer impact. The project is declared a success, but no one checks if it made a difference. In some cases, even the most advanced organisations fall into this trap. The cost of not checking actual outcomes is hidden until it’s too late.

These aren’t edge cases; they’re rational adaptations to a distorted system. The result is a culture of compliance, not curiosity.

Thurlow’s Law of Metric Distortion: “Any metric you measure will appear to improve in the short term. This doesn’t mean the system improved, only that people adjusted their behaviour to game the metric.”

The System Learns to Lie

This principle highlights a broader risk. Once teams realise they’re being judged on metric performance, they start optimising for appearances. They stop focusing on delivery, learning, and value. The metric becomes a distraction from what really matters. It reinforces behaviours that prioritise green dashboards over working software. This is how green shifting starts. Status reports stay green until the moment they can no longer hide the red. It’s not deceit. It’s self-preservation. In a system optimised for appearances, truth is delayed until failure is unavoidable. The focus shifts away from delivery, learning, and value.

The Evidence Behind the Trap

Studies from Lederer & Prasad, Jørgensen, and others show that using estimation accuracy as an evaluation criterion strongly influences behaviour, and often negatively. When estimation accuracy becomes a KPI, it reshapes incentives across the system, often with unintended results. One experimental study (Lorko et al., 2022) found that when participants were rewarded solely for estimation accuracy, they systematically overestimated and deliberately slowed down to “finish on schedule.” The appearance of control was preserved, but efficiency was lost.

Another study (Jørgensen & Grimstad, 2008) showed that people who knew they’d be judged on their estimates produced more biased and less realistic figures. They weren’t aiming for truth; they were aiming for safety.

This is a textbook example of Goodhart’s Law. When a measure becomes a target, it stops being useful as a measure and starts driving the wrong behaviours.

Goodhart’s law: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Trust Is a Two-Way Street

If you treat your engineers like they’re untrusted contractors who need to account for every six-minute increment, don’t be surprised when morale tanks. One developer put it bluntly: “If you’re going to track me like a machine, don’t expect me to act like an innovator.” Research shows employees who feel trusted are more engaged and productive. Conversely, heavy time tracking breeds a culture of micromanagement and mistrust. More than half of knowledge workers say time tracking actually prevents them from doing their best work. When people feel every minute is under a microscope, they’re less likely to ask questions or offer improvements. You’re starving your team of psychological safety, and with it, the conditions for innovation, quality, and honesty. When people are punished for missing estimates, they stop raising risks. They stop discussing trade-offs. Data becomes performative. The real work gets buried under ritual. The system becomes more predictable on paper, but more brittle in reality.

Bad Estimates Don’t Make You a Bad Developer

Software development is creative problem-solving. No two tasks are truly alike. You can’t reliably predict how long it will take to untangle a thorny bug or integrate a library. Sometimes, a “quick” fix can turn into a two-day rabbit hole. So why beat people up when they miss an arbitrary prediction? Estimating in hours assumes everyone is equally experienced and works at a constant pace. They don’t. Pressuring developers to “improve” their guesses assumes effort and duration are predictable. In knowledge work, they’re not. It only creates stress and encourages padding or sandbagging. It’s a game with no winners.

Time Pressure Kills Quality

When management’s only lever is the schedule, quality suffers. Tom DeMarco and Tim Lister, in Peopleware, warn that unreachable deadlines force developers to cut corners: “Workers kept under extreme time pressure will begin to sacrifice quality… deliver products that are unstable and not really complete.” Lab studies back this up. Developers under tight time pressure work faster, not better, and quality drops. And when shortcuts pile up, the cost isn’t just bugs, it’s fragile systems, frustrated customers, and eroded trust.

Hours Worked Do Not Equal Value Delivered

Customers don’t buy effort, hours, or estimation accuracy. They buy working software that solves their problems. A day spent cleaning up architecture might look unproductive on a timesheet, but it delivers enormous long-term value. Optimising for logged time only encourages burnout, presenteeism, and a celebration of busyness over outcomes.

Metrics like velocity or hours measure output, but they don’t measure the value customers care about. It’s better to track what matters: how frequently you can deliver features, how quickly you recover from failures, and whether you’re improving the user experience. These metrics help track how fast you’re learning (Time to Market), how much waste exists in your delivery process (Ability to Innovate), and whether users are sticking around and benefiting from what you’ve delivered (Current Value).

What to Do Instead

If you’re serious about improving delivery outcomes, stop obsessing over time. Time-based metrics show what happened, not what mattered. They miss the nuance of complexity, cognition, and asynchronous problem-solving. When you treat delivery like stopwatch management, you reward appearances over insight.

Evidence-Based Management (EBM) is a way of managing with data that reflects actual outcomes and system capability. It helps leaders move beyond speculation by focusing on what is observable and valuable.

Good decisions start with real data, not guesses. EBM helps teams and leaders focus on what actually delivers value, not what was forecast, promised, or imagined.

In large-scale systems, direct customer contact is rare. That makes feedback loops even more critical. We must rely on proxy signals like usage trends, satisfaction scores, defect trends, and change failure rates to know if we’re on track. Not every team needs direct access to the customer, but every team needs access to evidence that what they shipped is working.

EBM encourages decisions based on what is actually happening, rather than what was predicted. Forecasts can support decision-making, but only when used transparently to explore assumptions, not when turned into compliance targets. When forecast accuracy becomes a performance metric, it violates empiricism by rewarding appearances rather than actual outcomes.

Leadership must create transparency around outcomes, not intentions. This means embracing metrics that reflect customer value, system health, and delivery capability, even when they challenge the status quo.

Let’s be clear: in complex, knowledge-based work, there is no meaningful diagnostic value in “estimate vs actual.” Take, for example, a cross-functional team building an internal developer platform. In the first quarter, leadership tracked the estimated vs actual across epics to improve forecasting. Developers quickly learned to overestimate tasks, avoided exploratory work, and padded estimates to match targets. The numbers looked better, but progress slowed, innovation stalled, and valuable refactoring work vanished from the backlog. By the time leadership realised the disconnect, technical debt had doubled. The team hadn’t become more predictable; it had simply become more cautious and less effective. This is the cost of measuring the wrong thing. It leads to the wrong conclusions.

“Estimate vs actual measures the work, but the waste lives in the gaps , the wait, the handoff, the delay. So you’re optimising the wrong thing.” - Nigel Thurlow

This is a clear example of Systems Thinking, as outlined in The Flow System (Thurlow et al., 2020). The true constraint rarely lies in the task. It lies in the system: the queues, context switching, blocked dependencies, or fragmented communication paths that hinder the delivery of value. In most cases, the constraint lies in the workflow, rather than in the functions themselves.

Even when used “diagnostically”, estimate vs actual as a metric misleads:

It ignores queues, rework, and dependencies, which are often the actual sources of delay. Lean thinking teaches us that to improve flow we must visualise queues, limit work in progress (WIP), and actively manage handoffs; none of which are addressed by focusing on task-level estimate variance.
It reinforces the illusion that better estimation leads to better outcomes.
It promotes local optimisation over systemic improvement.

A reminder of Thurlow’s Principle of Estimation Distortion above!

A Better Path Forward with Evidence-Based Management

EBM organises improvement around four Key Value Areas (KVAs):

Current Value - Are We Delivering Value to Customers and Stakeholders Today?
Unrealised Value - What additional value could we deliver in the future?
Time to Market - How quickly can we learn, respond, and deliver?
Ability to Innovate - How effectively can we change and adapt the product?

The metrics we use should support these questions, not distract from them. Here’s how EBM-oriented alternatives compare:

Instead of…	Try…
Estimate vs Actual	End-to-end lead time from commitment to usable customer delivery (Time to Market)
Story points completed	Customer satisfaction (Current Value)
On-time delivery rate	Quality Trends, or % of effort on new vs sustaining work (Ability to Innovate)
Headcount-based planning	Opportunity backlog delta (Unrealised Value)

⚠️ Warning

Time-based metrics must be contextualised. Without insight into value, complexity, and customer outcomes, they risk becoming another distorted proxy.

To understand and improve delivery, stop obsessing over how close your guesses were. Instead, measure how your system behaves across the value stream and under varying flow loads. EBM encourages the use of actionable, outcome-aligned metrics that reflect actual system health, rather than projected compliance.

Cycle time trends can reveal delivery latency across the value stream, but must be interpreted with caution. Without understanding the nature and complexity of the work, as well as its value, these trends are just noise. Measure flow to inspect how the system behaves, not how long individual items take.
ℹ️ Note
Cycle time only tracks how long one piece of work took. It says nothing about what the customer waited for or whether the system is flowing well. Lead time tells you how long the customer waits, starting from the moment a request is made until they receive something usable. Always measure from the outside in.
Work item ageing reveals stuck or neglected work, or requirements that were added and then discarded.
Flow efficiency indicates the proportion of total time spent progressing work versus waiting. It’s a measure of delay, not value. But beware: systems often mask latency by moving queued work into “in progress” prematurely. High flow efficiency with unchanged lead time may signal gaming.
Throughput variance only tells you something if your work items are roughly the same size. If not, throughput becomes noise. Teams that right-size work can use this as a stability signal. Otherwise, avoid using it as an indicator of performance.

If you must discuss estimates, use them to explore assumptions and complexity, not to evaluate people. The ultimate goal is to deliver meaningful outcomes to customers. That requires embracing uncertainty, surfacing impediments, and improving system capability. The aim is not to enforce forecast compliance. Value lies in understanding, not accuracy.

De-emphasise ’estimate vs actual’ entirely. It is a false signal in complex domains.
Reward flow mastery, not forecasting tricks.
Focus on learning, adaptability, and real customer outcomes.

Estimation should support informed conversations about uncertainty. It should not become a tool used to force predictability.

Quantitative vs Qualitative

Most metrics in delivery are quantitative, including lead time, flow efficiency, and throughput. But numbers don’t tell the whole story. If you want to know whether you’re building the right thing, you need qualitative feedback: real customer conversations, issue sentiment, satisfaction narratives, and behavioural observations.

Quantitative data tells you what happened. Qualitative insight helps you understand why.

No chart or trendline can replace a conversation with a frustrated user or a support ticket that describes unmet needs. The most resilient teams blend data with dialogue, metrics with meaning.

Radical Candour: Have the Courage to Stop

This isn’t about shielding teams from accountability. It’s about holding ourselves accountable to a higher standard of leadership. Framing time estimate accuracy as a condition for trust is a failure of leadership. It signals a lack of psychological safety and a misunderstanding of how complex work unfolds. True leadership fosters environments where learning is safe, discovery is encouraged, and performance is judged by value, not conformity to expectations. It’s not helping them grow; it’s punishing them for unpredictability inherent in complex work. Radical candour means caring personally and challenging directly. The challenge here is to stop clinging to false certainty and instead focus on the outcomes that matter for your business and your customers.

Don’t replace one flawed proxy with another. Metrics like cycle time, throughput, or flow efficiency are helpful, but only as part of a broader conversation about value, quality, and improvement. Alone, they tell you nothing about whether you’re solving the correct problems or improving customer outcomes. Consider adopting Evidence-Based Management and DORA to shift focus toward empiricism and value flow across the organisation. Talk with your team about impediments and improvements rather than the hours they logged. When you remove the spotlight from the clock, you’ll find your people deliver better software, enjoy their work more, and build trust along the way.

In Summary

The Estimation Trap appears to be a process improvement effort. But underneath it creates a fear-based culture that rewards gaming and punishes uncertainty. It distorts delivery and kills innovation in the name of control.

All quantitative measures can do, is inform of system efficiency. They cannot inform of system effectiveness!

Instead of asking, “Why didn’t we match our original estimate?” ask, “What did we learn, how did we adapt, and are we improving the outcomes that matter?”

Real progress starts when people feel safe enough to tell the truth about complexity, risk, and what it actually takes to deliver. That’s the objective measure of a team delivering meaningful outcomes, improving their system, and creating value for customers.

a·gen·tic a·gil·i·ty

The Estimation Trap: How Tracking Accuracy Undermines Trust, Flow, and Value in Software Delivery

When the Metric Becomes the Target

Malicious Compliance: When Teams Give Up on Caring

Green Shifting: When Metrics Replace Truth

Fear-Driven Delivery

Distorted Behaviours and False Success

The System Learns to Lie

The Evidence Behind the Trap

Trust Is a Two-Way Street

Bad Estimates Don’t Make You a Bad Developer

Time Pressure Kills Quality

Hours Worked Do Not Equal Value Delivered

What to Do Instead

A Better Path Forward with Evidence-Based Management

⚠️ Warning

ℹ️ Note

Quantitative vs Qualitative

Radical Candour: Have the Courage to Stop

In Summary

References

Metrics that matter with evidence-based management

Rethinking Capacity Planning

Prioritising Value Over Estimates in Agile

Leadership Is System Design, Not Command

Story Points and Velocity Signal Team Immaturity

Say-Do Metrics: Risks and Outcome Focus in Agile

Applying Metrics for Predictability

Evidence-Based Management

Agile Kata Professional

Faster, Predictable Software Delivery

Engineering Excellence

Culture Transformation & Team Enablement

Technical Leadership

Business Agility Consulting

Engineering Excellence

Connect with Martin Hinshelwood

Our Happy Clients

What you get?

What We Do?

Who are we?

Company Pages

Principle Consultants

Consultants & Trainers

Staff

Associates

What we think?

Theme

Text Font

Currency

a·gen·tic a·gil·i·ty

The Estimation Trap: How Tracking Accuracy Undermines Trust, Flow, and Value in Software Delivery

Table of Contents

When the Metric Becomes the Target

Malicious Compliance: When Teams Give Up on Caring

Green Shifting: When Metrics Replace Truth

Fear-Driven Delivery

Distorted Behaviours and False Success

The System Learns to Lie

The Evidence Behind the Trap

Trust Is a Two-Way Street

Bad Estimates Don’t Make You a Bad Developer

Time Pressure Kills Quality

Hours Worked Do Not Equal Value Delivered

What to Do Instead

A Better Path Forward with Evidence-Based Management

⚠️ Warning

ℹ️ Note

Quantitative vs Qualitative

Radical Candour: Have the Courage to Stop

In Summary

References

Smart Classifications

Table of Contents

What to read next?

Programs that align

Outcomes and Capabilities

Connect with Martin Hinshelwood

Our Happy Clients​

Our Happy Clients