Refactoring vs Rewriting: How to Make the Right Call
Refactoring vs rewrite: how to decide between incremental improvement and full replacement, and why most big rewrites fail to deliver their promised benefits.
In this article:
- The Question Every Engineering Team Eventually Faces
- Why Software Rewrites Fail: The Predictable Failure Modes
- When Refactoring Is the Right Answer
- When a Rewrite Is Genuinely Justified
- The Middle Path: Targeted Replacement
- Conclusion
The refactoring vs rewrite debate is one that every engineering organization faces at some point. The stakes are high: a full rewrite that fails can cost years of engineering time and millions in opportunity cost. A refactoring effort that is too conservative can leave the organization trapped with an unmaintainable system indefinitely. The right answer depends on factors that are rarely analyzed with sufficient rigor before the decision is made.
The Question Every Engineering Team Eventually Faces
The conversation usually starts with a symptom: deployments take three days, every new feature requires touching fifteen files, incidents happen weekly, and engineers are demoralized because the code is difficult to work with. The conclusion the team reaches is that the system is “too broken to fix” and needs to be rebuilt from scratch.
This conclusion is often wrong, but it is understandable. The cost of working with the existing system is visible and daily. The cost of a rewrite is distributed over months and is easy to underestimate.
The refactoring vs rewrite decision requires answering several specific questions. Is the problem architectural (the fundamental structure cannot support what you need) or is it code quality (the structure could work but is implemented poorly)? Is the technology genuinely obsolete, or is the problem maintainability? Do you have the behavioral specification, either as tests or as documentation, that a rewrite would need to reproduce? And critically: can the business afford the velocity reduction during a rewrite?
These questions are not rhetorical. Each has an answer that materially affects the correct choice.
Why Software Rewrites Fail: The Predictable Failure Modes
Big rewrite software projects fail at a high rate. The failure modes are well-documented and consistently repeat across organizations.
Second system syndrome. The new system is designed to solve all the problems of the old system plus add the features that were always wanted. The scope expands during design. By the time implementation starts, the new system is more complex than the old one.
Incomplete specification. The legacy system’s behavior is the specification. It includes thousands of edge cases, workarounds for customer-specific issues, implicit behaviors that callers depend on, and bugs that have been in production long enough to become features. A rewrite team discovers these incrementally, and each discovery extends the timeline.
The moving target. While the rewrite is in progress, the business continues. New requirements arrive. The legacy system gets patches. After 12 months of rewriting, the new system is still catching up to the current state of the legacy system, which has continued to evolve.
The big bang launch. The rewrite produces a new system that must replace the old one at a single cutover point. All undetected differences between the two systems become incidents on launch day. The pressure to launch means the team accepts technical debt in the new system to meet the deadline, recreating the original problem.
The teams most likely to advocate for a full rewrite are teams that have not yet experienced a failed rewrite. Teams that have lived through one are significantly more cautious.
When Refactoring Is the Right Answer
Refactoring is the right answer when the system’s fundamental architecture can support what you need, but the implementation is poor. The architecture is the structure of how the system is organized. Poor implementation means: tangled dependencies, missing abstractions, inconsistent patterns, inadequate test coverage, undocumented behavior.
These problems are fixable incrementally. They do not require starting over. They require sustained, disciplined effort over time.
Refactoring is also the right answer when the behavioral specification is unavailable or expensive to reconstruct. If you cannot characterize what the system does reliably, you cannot verify that a rewrite is correct. Refactoring with characterization tests lets you improve the system while building the specification simultaneously.
The tech debt solution engagement typically starts with an assessment of whether the system’s problems are architectural or implementation-level. In most cases, the problems are implementation-level, and refactoring is the correct approach. Incremental refactoring, combined with improved test coverage and CI, produces measurably better systems over 6-12 months without the risk profile of a rewrite.
Refactoring is harder to sell internally than a rewrite. A rewrite sounds like a clean break. Refactoring sounds like more work on a broken system. But the engineering reality is that incremental improvement on a working system is lower risk and more predictable than building a replacement under time pressure.
When a Rewrite Is Genuinely Justified
A rewrite is justified in a narrow set of circumstances. The technology is so obsolete that no viable path exists to run it in modern infrastructure: the runtime is no longer supported, security vulnerabilities cannot be patched, the hardware it requires is unavailable. Or the architecture is genuinely incompatible with what the business needs: a single-tenant system that must become multi-tenant, a synchronous system that must become event-driven at scale.
Even in these cases, a full rewrite of the entire system is rarely necessary. The correct approach is to rewrite the specific components where the constraint is binding, using the strangler fig pattern to replace them incrementally while the rest of the system continues operating.
A true rewrite, where the entire system is replaced at once, is justified only when the system is small enough to reproduce quickly and completely, and when the behavioral specification is thorough enough to verify the replacement. Systems that meet both criteria are usually simple enough that refactoring would also work.
The honest assessment of most “rewrite” decisions: the team wants relief from the pain of the current system, and a rewrite feels like relief. A disciplined refactoring program also provides relief, more gradually, but without the existential risk that a failed rewrite carries.
The Middle Path: Targeted Replacement
Between full refactoring and full rewriting, there is a middle path that works well for most situations: targeted replacement of specific subsystems using incremental migration patterns.
You identify the parts of the system that are genuinely blocking progress. Typically this is a small fraction of the total codebase, often 20-30% that accounts for 80% of the maintenance burden. You apply the strangler fig or branch by abstraction pattern to those specific components. The rest of the system remains in place and continues operating.
This approach combines the speed of starting fresh on bounded problems with the safety of incremental migration. It avoids both the conservatism of purely incremental refactoring in high-severity areas and the risk of a full-system replacement.
For a legacy modernization engagement, this typically means: assess the full system, identify the highest-cost subsystems, extract them incrementally using migration patterns, leave the stable parts of the legacy system running. The system improves where it matters most, and the risk is concentrated in small, reversible steps.
Conclusion
Refactoring vs rewrite is not a binary choice. The realistic options are: incremental refactoring, targeted replacement of specific subsystems, and full rewrite. Full rewrites are justified in narrow technical circumstances and fail at high rates when used as a general solution to accumulated technical debt. Incremental refactoring is lower risk but requires sustained commitment. Targeted replacement is the pragmatic middle path for most systems that have severe problems in bounded areas. The decision should be based on specific technical analysis, not on how painful the current system feels to work with.
Does your codebase have these problems? Let’s talk about your system