5 Signs Your Legacy System Is Failing

TL;DR: Five specific patterns predict a legacy system's collapse: (1) recovery time after crashes is lengthening, (2) only one person can fix it, (3) small changes break unrelated things, (4) the "do not touch" list is growing, (5) the "we need to rewrite this" conversation has gone nowhere for years. Two or more firing = book an assessment now.

If your business runs on software that was built between 1998 and 2010, you probably know it has problems — slow pages, the occasional crash, reports that look slightly off, a feature someone meant to fix six years ago and never did. The instinct is to live with it, because it still works. The risk is that "still works" is not a stable state. Older systems do not gradually degrade. They run, they run, and they run, and then they fall over inside a single bad week.

The five patterns below are what we see in the weeks before a long-running legacy system goes from "still works" to "we cannot ship orders today." None of them is dramatic on its own. Each is the kind of thing operations teams quietly absorb. Taken together, three or more of them firing at the same time is a leading indicator that the system is moving from "ageing" to "actively failing."

This is a non-technical checklist. You should be able to walk through it with your operations director or whoever holds the keys to the software, and answer each one yes/no. If your count of yes answers is two or more, the next step is a written look at the system before something forces the conversation. Book a free 30-minute consultation if you want a second opinion — the call is free and the recommendations don't come with an invoice.

Sign 1 — Recovery time after a normal crash is getting longer

A healthy legacy system, when it crashes, comes back inside minutes. Someone restarts a service, or reboots the application server, or re-runs a stuck job, and the business is moving again. The team has a routine for it. The annoyance is mild.

The pattern that should worry you is when recovery time starts creeping up. A crash that used to take ten minutes to clear now takes forty. The team has to try two or three different fixes before one of them works. The "who do we call?" conversation involves more people. Some recoveries now require pulling someone in after hours.

What this usually means: the system has accumulated more state — more data, more files, more queued jobs, more integrations — than it was designed for, and the recovery procedures that worked at the original scale are now too slow at the current scale. It can also mean the original developers wrote workarounds that depended on knowledge that has since left the building. Either way, the lengthening recovery time is the signal. Track it. If your average time-to-recovery has doubled in the last twelve months, the underlying issue will not fix itself.

The window to act is before the next crash, not during it. Once a system enters the phase where recovery itself is unreliable, downtime starts compounding — each crash is longer than the last, the team gets more tired, and decisions are made in a hurry. That is when the genuinely bad outcomes — corrupted data, lost orders, missed deadlines — happen. Emergency technical support is the right call once you are inside that window; a written assessment is the right call before you are.

Sign 2 — The only person who can fix it is one person

The single-point-of-failure pattern is the most common one we see, and the one business owners are least likely to recognise as a risk because the person in question is usually still there, still answering the phone, still cheerful about it. They have been with the company for fifteen years. They wrote half the code, or all of it. They are the one who knows why that report is structured the way it is, why the night job runs at 2:47 a.m. and not 3 a.m., why two of the database columns have names that don't match their content.

The risk does not go away because the person is still employed. It goes away when their knowledge is written down. If your operation depends on one person's tacit memory, you do not have a maintained system. You have a future succession crisis.

There are a few specific signs of this. The person never takes consecutive weeks off. When they do, things break and have to wait. They have never produced documentation, because every time someone asks for it, there is something more urgent to do. They are the only one with the production database password. They are the only one who knows where the source code is kept. When they explain a problem, the explanation includes phrases like "and only I would know that."

You don't need to remove that person from the business. You need to extract the knowledge from their head before the business needs it independently of them. A rescue engagement of this shape is mostly documentation and mostly conversation; the code work is small. The healthcare accreditation rescue we ran in 2025 is the same problem at a different scale — except in that case, the single point of failure left without warning, and the business had to recover the knowledge with no one to call. The work cost a fraction of a rewrite and saved every pending accreditation. You don't want to be in the version of this situation where the only person is no longer reachable.

Sign 3 — Every small change ships a regression somewhere unrelated

When developers say "this part of the codebase is brittle," they mean: changing one line of code in one file consistently breaks something in another file that has nothing obvious to do with it. The connection is implicit, undocumented, and discovered only when something fails in production.

For a non-technical reader, the visible pattern is this. The team agrees to make a small change — say, a new field on an order form. Two weeks later, the change ships. The order form is fine. But suddenly the daily inventory report is missing rows, or the customer email template has the wrong subject line, or the night billing job stops running. Nobody can explain why. The team works late, fixes the second thing, ships the change again. Now a third thing is broken.

This is the architectural shape of a system that has had every developer add a small workaround for the last fifteen years, without any of them refactoring or documenting the workarounds. The internal coupling is by now invisible. Nobody alive understands all the connections. The system runs because the connections happen to balance out — but every change risks pulling on a thread that nobody knew was load-bearing.

The fix is not to keep applying changes and praying. The fix is to map the system — to write down, in plain English, what each component does, what depends on what, and where the implicit links live. That mapping is the single deliverable most legacy systems are missing, and the one that gives every subsequent change a fighting chance of being predictable. Mighty Advancement's Legacy System Rescue service is built around producing that map first and modifying code second, in that order, because doing it the other way around is how the regressions keep happening.

Sign 4 — The number of "do not touch" areas is growing

Every legacy system has a few do-not-touch areas — code paths nobody understands, where the policy is to leave them alone because the consequences of meddling are unknown. A healthy system has maybe one or two. The pattern to watch is the number growing over time.

Operationally, this shows up as: the team agrees not to fix a known bug because "we don't know what else it would break." Or a needed feature is deferred indefinitely because the part of the system it touches is one of the don't-touch areas. Or new staff are warned, during onboarding, about modules they're not allowed near. Or a quote for a small change comes back with an estimate ten times what it should be, because most of the cost is the developer protecting themselves against unknown coupling in a don't-touch zone.

The growth pattern matters more than the absolute number. Two don't-touch areas, stable over five years, mean the team has worked out how to live with them. Four areas this year, six next year, ten the year after — that is a system trending toward "we cannot change anything anymore." The destination of that trend is a system that is technically running but functionally frozen, while the business that depends on it has continued to evolve. The gap between what the business needs and what the system can do widens every quarter.

This sign is the cleanest one to fix because the work is bounded. You map the don't-touch areas, you put a competent set of eyes on each one (separately, two days at a time), and you produce a "we now understand this" report for each. Once an area is understood, it stops being a don't-touch area. The cost is small relative to the cost of any one of those areas later forcing a rewrite. We treat this in our legacy rescue work — the first deliverable of any rescue is identifying and ranking the don't-touch zones.

Sign 5 — The "we need to rewrite this" conversation has been happening for three years and nothing has shipped

This is the last and most telling sign. Somebody — usually a developer who joined two years ago, or a vendor who took an exploratory meeting — has been saying for three or four years that the system "really needs to be rewritten." Quotes have been gathered. One or two attempts have been started, sometimes spending six figures, and have been quietly shelved. The current state of play is that everyone agrees a rewrite is necessary, nobody has been able to get one finished, and the system is still running.

The reason rewrites fail in this pattern is documented in the industry and reflected in every failed-migration case we have rescued. A rewrite started cold — without first understanding the existing system in detail — has to re-derive twenty years of accumulated business logic from scratch, while the original system continues to evolve. The cost overruns and the scope creep are not failures of project management. They are inherent to the approach.

If you are in a "rewriting it" conversation that has been going for years, the real problem is not that the rewrite is hard. The real problem is that the rewrite is the wrong answer. The system you have probably needs to be repaired, documented, and modernised in place — not replaced. We have written about this in detail in What to Do When Your Software Vendor Disappears, which covers the rescue-vs-rewrite calculation. A failed-rewrite rescue is a specific engagement we run — see the failed migration case study we took over after the original vendor had spent eighteen months without delivering a working system. The rescue completed in four months. The lesson generalises.

If you have a years-old "we should rewrite this" conversation that has not produced a deliverable, stop pursuing the rewrite. Get an honest read on what you actually have. The read may still recommend a rewrite for parts of the system, but it will identify the parts that can be repaired in place — and those are usually the majority. A legacy software rescue company reads the code first, then proposes.

Putting it together: what to do if two or more of these signs are firing

Walk through the list with the person who runs operations, not the person who writes the software. Mark each yes or no honestly.

Recovery time after a normal crash is getting longer.
Only one person can fix the system.
Small changes consistently ship regressions in unrelated areas.
The number of "do not touch" areas is growing.
There has been a multi-year "we need to rewrite this" conversation that has produced nothing shippable.

If your count is one, you have time. Watch the trend; act when it tips to two.

If your count is two or three, schedule an honest look at the system in the next thirty days. The point is not to commit to work — it is to know, in writing, where the system actually is. Most clients we see in this category are surprised by what comes out of that read, in both directions: some don't-touch areas turn out to be straightforward when read carefully, and a smaller number of "stable" components turn out to be hanging on by a thread. Knowing which is which is what protects the operation.

If your count is four or five, the system is in active risk and the conversation should happen this week. Continued operation is fine while the read runs — the goal is to have a clear picture before a forced decision arrives in the form of a crash that doesn't resolve.

Book a free 30-minute consultation to start. We use the call to understand which of the five signs match your situation, which combinations matter most, and whether the next step is a deeper engagement, an emergency intervention, or something else. The consultation is free and produces no commitment to proceed.

A note on the technologies these signs apply to

The pattern described in this article is most common in business software built on:

.NET Framework (any version) with ASP.NET WebForms or WinForms
Classic ASP with VBScript or JScript
Visual Basic 6 desktop applications
FoxPro and Visual FoxPro
Microsoft Access with heavy VBA
dBase or Clipper systems still in production

If your system is on one of those, you are not unusual. There are tens of thousands of businesses worldwide running operations on exactly these stacks. The mistake is treating "old" as the same thing as "broken." Old software that has been read, documented, and selectively repaired is one of the cheapest, lowest-risk options available — provided you have someone who will read it. The market is small but not empty.

If you'd like a second opinion before committing to either a repair plan or a replacement plan, that is what the free consultation is for. We listen first, then propose only what makes sense. Schedule it here.