The 20-Year-Old Distribution System — Rescued Without a Rewrite
The Challenge
A mid-sized manufacturing distributor had run their entire operation — orders, inventory, purchasing, shipping, accounts — on the same custom system since 2004. The original developer had retired. Two replacements had come and gone, neither willing to make changes without risking a cascade of failures. The system was crashing weekly, sometimes mid-transaction, corrupting records and triggering a manual recovery process that consumed the operations director's afternoon every time it happened. Three vendors had assessed the situation and quoted full rewrites at $380,000–$640,000. The client couldn't afford the cost, couldn't afford the 18–24 month timeline, and couldn't afford to be wrong again.
Our Solution
SOLUTION:
We started where every engagement starts: reading the code.
Not skimming it. Not asking the client to explain it. Reading it — line by line, tracing the data flow, mapping the business logic that 20 years of real-world use had embedded in a codebase nobody had fully understood in years.
The assessment took three days. What we found was not a system that needed to be replaced. It was a system with two specific, fixable problems that nobody had taken the time to diagnose properly.
Finding the real cause of the crashes
The first problem was a concurrency issue in a specific transaction sequence — a purchase order creation immediately followed by a partial shipment update on the same record. When both operations happened within a narrow time window, the data layer's locking mechanism produced a conflict that corrupted the record and halted the process. The original developer had known about this and documented a manual workaround — a specific order of operations that prevented the conflict. That documentation had been lost when he retired. His successors had never known the workaround existed, and as transaction volumes grew over the years, the crash rate grew with them.
The fix was targeted: we rewrote the affected transaction sequence to use explicit record locking that matched the actual behaviour of the data layer. The conflict became impossible by design. We also embedded the original workaround logic directly into the application, so it could never be accidentally omitted by a future developer.
Fixing the data integrity problem
The second problem was at the storage layer. The underlying data format had a documented incompatibility with the version of Windows Server the business had upgraded to two years earlier — an incompatibility that caused slow, progressive index corruption. Queries would begin returning incorrect results before the system finally halted. Approximately 30% of the crashes were caused by this issue, accumulating silently over time.
We migrated the primary data storage to SQL Server. The application itself — the interface, the business logic, every screen the users worked with every day — was not touched. It continued running exactly as before. The only change was at the data layer, where the corrupting format was replaced with one that didn't have the incompatibility. Users noticed faster query performance. They noticed nothing else.
Documenting what had never been written down
After stabilising the system, we spent two weeks doing something that had never been done for this codebase: documenting it. We produced a technical reference covering the system architecture, the data structures, the non-obvious workflows, and the business rules embedded in the code. The client's current developer can now open that document, understand what the system does and why, and make changes safely.
The outcome
The system has been crash-free since the engagement. The manual recovery process no longer exists. The operations director's afternoons are her own again.
The client's estimate of savings versus a full rewrite: over $500,000 — the gap between the quotes they received and what the rescue actually cost, not counting the 18 to 24 months of disruption a rewrite would have caused. The data migration was completed during a single planned weekend window with the application running throughout. Zero downtime.
The system is now documented, stable, and maintainable. The business has a clear picture of what it does, and a roadmap for incremental modernisation over the coming years — on their timeline, with budget and preparation, rather than under emergency pressure.