Do no harm: Proofs of concept cause technical debt

Aram Panasenco - Aug 5 - - Dev Community

Just last month, CrowdStrike developers caused the largest outage in IT history by pushing a malformed update to millions of critical Windows machines. In the aftermath, the company lost 32% of its share price. Engineers often feel powerless, but this is one example of the incredible destructive power engineers have over their organizations. It's also not the only one. In this article, I'll show that well-meaning engineers building proofs of concept can cause just as much harm to the company over the long term.

The effects of technical debt

The results of the 2024 Stack Overflow Developer Survey came out two weeks ago. This year's #1 frustration of developers is "Amount of technical debt". Wikipedia defines technical debt as "the implied cost of future reworking because a solution prioritizes expedience over long-term design." For developers to be so riled up about it, the technical debt must be majorly eating into their productivity

StackOverflow Developer Survey: Most common frustrations
Source: StackOverflow Developer Survey 2024.

Imagine two organizations, identical except for one detail. For the first, technical debt is a major frustration, causing longer development time, lower quality, and higher turnover. For the second, technical debt is a minor issue. It's not hard to imagine the second organization's stock being worth 30% more than the first in the long term. Technical debt could have the same impact on your organization as the July IT outage had on CrowdStrike, just not as flashy.

Wikipedia's definition above implies that the responsibility for creating the technical debt lies on the engineers and managers who made the original decisions - that it's some innate quality of the original code written. However, very often a codebase that was originally considered well written becomes perceived as technical debt over time. Why is that?

Think about the last piece of code that you consider technical debt. Why is it technical debt if it presumably worked just fine in the past? An engineer can come up with any number of things to blame. The original developers and their managers made myopic decisions. The users' demands have changed. The management direction has changed. The scale of the organization has changed. Competition is deploying faster. Etcetera. However, the majority of technical debt I've seen has been caused by proofs of concept.

Exploring the appearance of technical debt

Consider the following story. You're an engineer whose job it is to implement a new feature in project A. The project has an established infrastructure and codebase. During your prep work, you realize that a utility module X used in the project could be replaced by a much more elegant, modern, and performant module Y.

Upon your realization, what can you do?

  • Doing nothing and sticking to module X feels contrary to the spirit of continuous improvement and contrary to your growth as an engineer.
  • Immediately replacing X with Y everywhere feels reckless. You can't be 100% sure that Y will actually work before trying it.
  • Therefore, implementing just the feature you're working on using Y instead of X feels like the most prudent course of action. It's a proof of concept - even if it doesn't work, existing production features are not affected. You get approval from your manager in advance to cover your bases.

You set off in your implementation. It was harder than expected, but you eventually got the feature working using module Y instead of module X. You submit the change for review, anxious to hear your colleagues' feedback. Will they accept your change, allowing the team to start a discussion about whether module X should be replaced by module Y everywhere? Or will they reject it, forcing you to rewrite your feature using module X again? What you may not realize is that before the review process even starts, you've already created technical debt.

  • If your colleagues reject your change, the code you wrote for your feature using module Y becomes technical debt. You'll have to pay the debt off by rewriting your feature to use module X again. Even if you want to do that, your project manager might insist that you move on to the next feature, and that this feature using module Y be accepted into production 'temporarily' until you have bandwidth to clean it up.
  • If your colleagues accept the change, all instances of module X in the codebase become technical debt. Ironically, the 'good' outcome is the one that creates much more technical debt. When will your team get the time to do this cleanup effort? Will it require an outage window? How many people will be affected? How confident can you make your team that Y can replace X in all cases? As long as a big investment into cleanup doesn't happen, modules X and Y will coexist in the codebase side by side. At best, there'll be a directive to implement new features in Y.

The snowballing of technical debt

Unless the cleanup effort described above was conducted, project A now has modules X and Y side-by-side. As the years go on, more well-intentioned and capable people introduce 'better' components into the codebase. Some problems are solved in two, three, four different ways. The codebase slowly becomes unreadable. The original design intent is lost, and no new voice has fully replaced it.

With each passing year, there are more and more bugs, misunderstandings, and unintended conflicts in project A. Implementing changes in A slowly takes longer and longer, frustrating both managers and engineers. Finally, a lead engineer working on a project that would normally be implemented as part of A suggests doing a proof-of-concept. She will implement the project using a brand new framework, with none of A's technical debt. Management and fellow engineers eagerly approve Project B, and the lead engineer completes it in record time. There's a light at the end of the tunnel!

Now project A should really be decommissioned and its years of work migrated to project B. Of course, no one wants to irresponsibly copy and paste the spaghetti mess of project A into project B. They'll find the time to do it right later, and for now projects A and B will run in parallel, with all new features getting implemented in project B...

xkcd: Standards
Source: xkcd

The road to hell is paved with good intentions

Attempts at small-scale improvements cause small-scale technical debt which accumulates into large technical debt, which causes attempts at large-scale improvements, which causes huge technical debt. Each well-intentioned improvement effort becomes another strand in a web that traps the whole organization's productivity.

Does this process sound familiar to you? Does it explain the sprawl of redundant AWS accounts, Snowflake databases, Git repos, functions, and modules in your organization? Could the second and third developer frustrations: "Complexity of tech stack for build" and "complexity of tech stack for deployment" also stem from the same root? Are our tech stacks actually complex or just incomprehensible?

It's neither the original developers nor external factors that create the majority of technical debt, but the well-intentioned developers that are trying to make things better.

POCs are not inherently harmful

Proofs of concept (POCs) can be incredibly valuable. They allow engineers to experiment, learn, and innovate without breaking existing code. I'm not advocating to stop using them.

Communication around POCs definitely needs to be improved. Under-the-radar POCs are not a good thing, no matter how well-intentioned. The hidden costs of POCs need to be communicated to all stakeholders. We should continue to innovate on and improve our applications, but also be honest about the resulting technical debt and the effort needed to pay it off. Hopefully this article helps engineers feel more understanding about their organization pushing back on well-intentioned innovation, and helps managers feel less blindsided by the side effects of POCs they approve.

Reasons for POCs should also be examined carefully. Is the original component that the POC aims to replace really beyond help? Is there a path to slowly repair and improve the component? As we've seen, that approach might actually be faster and less painful than a 'quick' replacement.

Conclusion

There is a bigger issue than proofs of concept and technical debt, and that is the mindset with which we approach our work. We computer engineers are not construction workers. We are surgeons of our organizations. A careless stroke of the scalpel can cause immediate enormous harm, as in the CrowdStrike example. But a careless introduction of new organs can cause just as much harm in the long term. We engineers can't just blindly hide behind our managers and established processes. We have to carefully consider the long-term impact all our actions (however benign they seem) have on the organizations we're in. To quote the Hippocratic Oath:

First, do no harm.


Cover image credit: PixArt Sigma 900M on HuggingFace

. . . .
Terabox Video Player