Problem.Cockpit

Technical Debt Is a Symptom, Not the Problem

technical-debtroot-cause-analysisengineering-leadershipfirst-principles

Everyone talks about technical debt like it's a mortgage. You borrowed speed, now you owe quality. Pay it down incrementally and eventually you're clean.

That metaphor is comfortable. It's also wrong.

The debt metaphor is too comfortable

"Technical debt" implies a rational borrowing decision. A team weighs the trade-offs, consciously takes on the debt, and plans to repay it. That's the story we tell ourselves.

But look at the tech debt in your codebase right now. How much of it was a deliberate decision? How much was a decision at all?

Most tech debt isn't borrowed. It's accumulated. Nobody chose it. It grew in the gaps between sprints, in the handoffs between teams, in the silent consensus that "we'll clean this up later" -- which is another way of saying "nobody owns this."

The mortgage metaphor lets everyone off the hook. It frames debt as a financial instrument -- rational, manageable, temporary. But the thing growing in your codebase isn't rational. It's the residue of a system doing exactly what it was designed to do.

And that's the real problem. As long as you frame it as debt, you keep the conversation on the code. The real cause is almost never in the code.

Why tech debt keeps growing despite "paying it down"

Here's a pattern I've seen at a dozen companies. The VP of Engineering announces: 20% of sprint capacity goes to tech debt. The team cheers. They refactor the worst module. They upgrade the dependency. They rewrite the flaky integration test.

Six months later, the debt is worse.

Not because the team didn't do the work. They did. But they were fixing symptoms. Refactoring a messy module doesn't fix the incentive that created the mess. The next module will be messy for the same reason. The one after that, too.

Tech debt reduction without root cause analysis is symptom management on a schedule. It's the engineering equivalent of mopping the floor while the pipe is still leaking. The 20% allocation makes everyone feel responsible while changing nothing about the system that produces the debt.

The question to ask isn't "which module should we refactor?" It's "why does every module end up needing refactoring?"

The upstream causes nobody names

When you actually excavate tech debt -- when you follow the thread past the code and into the system that produced it -- you find the same upstream causes over and over.

Incentive structure. Shipping features is visible. Code quality is invisible. Promotions, performance reviews, and stakeholder approval all reward what you can demo, not what you can maintain. Engineers respond rationally to this signal. They ship. The debt accumulates not because they're careless, but because the system rewards exactly the behavior that produces it.

Problem framing. "Ship fast" vs. "ship sustainably" is a false choice. But most organizations never make the choice explicitly. They default to speed because speed is the path of least resistance. The debt isn't a decision. It's the absence of a decision.

Process design. If code review is a bottleneck, people skip it. If testing infrastructure is slow, people write fewer tests. If the deploy process is painful, people batch changes into risky big releases. None of this is a character flaw. It's a rational response to a broken process. The debt is a consequence of the process working as designed -- badly.

These are the causes that never make it into a retro. They're structural, uncomfortable, and nobody's job to fix. Problem.Cockpit helps engineering leaders find these upstream causes -- start your own excavation and follow the thread to where the debt is actually coming from.

The physics of tech debt

I use the word "physics" to describe the irreducible truth beneath a problem -- the thing that, once you see it, explains everything above it. Every codebase has a physics that explains its debt.

"We need to refactor the auth module" is a symptom. "The team's incentive structure rewards the appearance of quality over actual quality" is a physics. Change the physics, and debt stops accumulating. Leave the physics in place, and every refactoring sprint is a temporary reprieve before the next wave.

The physics of tech debt is almost always about incentives or organizational structure, not engineering discipline. Telling a team to "write better code" while the system punishes them for taking time to do so is not a strategy. It's a contradiction.

This is why first-principles thinking matters for technical leaders. The refactoring backlog is a surface-level artifact. The physics -- the incentive structure, the process design, the organizational defaults -- is what's actually producing it.

How to excavate your tech debt

Don't start by fixing. Start by understanding. Pick the worst area of your codebase -- the one that makes every engineer groan -- and excavate it.

STATE. What consequence does this debt actually cause? Not "the code is messy." What happens because of the mess? Slow onboarding? Frequent production incidents? Features that take 3x longer than they should? Name the real cost.

SURFACE. What assumptions are you making about why it's like this? "The original team was in a rush" is an assumption. "Nobody cared about quality" is an assumption. Write them down so you can test them.

DRILL. Follow the thread past the comfortable answer. Why was the original team in a rush? What was the pressure? Who set the timeline? What would have happened if they'd pushed back? Each question takes you closer to the system that produced the debt.

CHALLENGE. If your root cause is "we need more time" or "we need more engineers," you stopped too early. Those are resource complaints, not root causes. The system keeps producing the same outcome regardless of how much time or headcount you throw at it. Push deeper.

When you hit the physics -- the irreducible truth that explains the pattern -- you'll know it because it will be uncomfortable. It will implicate a process, an incentive, or a decision that nobody wants to revisit.

That's how you know you're in the right place.

The real ROI of going upstream

Fixing physics is harder than refactoring. It means having conversations about incentive structures, promotion criteria, process design, and organizational priorities. These conversations are uncomfortable and slow.

But they prevent the next round.

A team that changes its incentives spends less time on tech debt reduction because less debt accumulates. A team that fixes its review process doesn't need to allocate 20% of sprint capacity to cleaning up the mess that bad reviews produced. A team that makes quality visible alongside velocity stops producing invisible debt.

The math is straightforward. Refactoring one module costs X hours. Changing the incentive that produced the mess costs 10X in political capital and difficult conversations. But refactoring is recurring. You'll do it again next quarter, and the quarter after that. Fixing the physics is a one-time cost with compounding returns.

The question isn't "how do we pay down our tech debt?" The question is "what system is producing it, and how do we change that system?"


Try it yourself

The gallery has real excavation sessions where engineering leaders traced their tech debt back to its actual root cause. See the pattern in action, then start your own excavation.

See this method applied: Browse the gallery

YOUR TURN

See root-cause excavation in action

Browse real sessions in the gallery, or start your own.