Why "Documenting" Your COBOL with an LLM Doesn't Solve Anything

April 2, 2026

There's a version of the legacy modernization pitch that goes like this: before you translate your COBOL, use an AI to document it. Let the LLM read your code and generate a plain-English description of what it does. Then you hand that documentation to a development team and have them build the modern system from there.

It sounds reasonable. It sounds like progress. I want to explain why, in most cases, it isn't.

The Documentation Step Doesn't Escape the Core Problem — It Just Moves It

When an LLM documents your COBOL, it reads the source code and infers intent. That's the same thing it does when it translates COBOL directly into Java. The failure modes are identical.

Legacy COBOL programs — especially the ones that actually matter, the ones running payroll or processing insurance claims or calculating pension benefits — routinely encode business logic that cannot be inferred from the code alone. Edge cases baked in to handle a specific regulatory requirement from 1991. Rounding conventions that were negotiated with an auditor. Compensating transactions that exist because of a bug in a third-party system that was fixed fifteen years ago but the workaround was never removed.

These things don't announce themselves. An LLM reading the code can't distinguish between "this looks unusual but is intentional" and "this looks unusual and is probably a bug." So the documentation it generates inherits exactly the same blind spots as direct translation would — missed edge cases, invented logic, confident-sounding descriptions of behavior that doesn't quite match reality.
The problem with hallucination in legacy modernization isn't that it's random. It's that it's plausible. The LLM will describe what a function probably does, based on patterns it's seen before. In most cases it will be mostly right. But "mostly right" in a payroll system or a regulatory compliance engine is not a standard anyone can ship to production.

And Then You've Added a Step

Here's the other problem with the documentation approach: you haven't eliminated the hard part. You've inserted an unverified intermediate artifact into the critical path.

Now a human has to read the AI-generated documentation and validate it against the original COBOL. But if your team could read and validate COBOL, you probably wouldn't need AI documentation in the first place. And if they can't — if they're trusting the documentation at face value — you've just handed them a document that may contain confident errors, with no reliable way to catch them.

After that validation (however thorough it is), someone still has to write new code from the requirements. That's a full development effort, from scratch, in a modern language, with all the testing and integration work that entails. The documentation step hasn't shortened that. It's added a pre-step that carries its own risk of introducing inaccuracies upstream of everything else.

Documentation Answers the Wrong Question

What enterprises actually need from a legacy migration isn't a prose description of what the code does. It's a guarantee that the new code does the same thing. Those are very different requirements.

A well-written LLM document of a COBOL payroll program can look thorough and professional while quietly omitting the specific rounding rule that determines whether a pension calculation rounds up or down. It might describe a file processing loop accurately at a high level while missing the implicit assumption about record ordering that makes the logic work. It might explain what a section does without capturing why it does it that way — and the "why" is often exactly what matters.

Documentation gives you a narrative. What you need is equivalence — provable, auditable, testable confirmation that the new system behaves identically to the old one across all the inputs that actually matter. No amount of documentation gets you there on its own.

Where LLM Documentation Is Actually Useful

I want to be precise about this, because I'm not arguing that LLM-generated documentation has no value. It does, in the right context.

If you're onboarding a Java developer who has never seen COBOL and needs a working understanding of what a particular module is supposed to accomplish before they start reviewing transpiled output, an LLM summary can be a useful orientation tool. It's faster than reading thousands of lines of unfamiliar syntax cold.

That's a developer productivity tool. It's not a migration methodology. The distinction matters, because vendors selling the documentation approach are often implying the latter while delivering the former.

There Is Such a Thing as Trustworthy Documentation — It Just Has to Be Derived, Not Inferred

The argument above isn't that documentation of legacy code is inherently useless. It's that documentation produced by inference is. The distinction matters, because there's a different way to do this — but to understand why it matters, you need to understand precisely how inference fails.

When an LLM reads COBOL and generates documentation, it is doing pattern matching at scale. It has been trained on vast amounts of code and text, and it produces descriptions that match the statistical patterns of how similar code has been described before. When the code is straightforward and common, it often does this well. When the code is unusual — which legacy enterprise COBOL frequently is — it fills the gap with whatever is most plausible, not whatever is most accurate. It has no mechanism to distinguish between the two.

The specific failure modes are predictable. An LLM will correctly describe what a variable is called and roughly what it holds, then silently omit the level-88 condition names that define its valid states — because those require reading the full data division carefully rather than summarizing the apparent intent. It will describe a COMPUTE statement accurately in isolation, then miss that it only executes when two separate enclosing IF conditions are both true — because resolving nested conditions across a large program requires tracking scope state precisely, not reading for meaning. It will document a file's purpose correctly but omit the key field and access mode — because those details are structural, not semantic, and inference is better at semantics than structure. In each case, the output looks credible. The gaps are invisible unless you already know the source well enough to spot them.

Derivation works differently because it doesn't read for meaning at all. A parser doesn't summarize — it traverses. It reads the actual syntax, resolves the actual structure, and records what is provably present in the source. Every field comes with its level, its parent, its PIC specification broken into components, its byte offset, its REDEFINES target if applicable. Every operation records not just the verb and the target, but the complete logical condition under which that operation executes — resolved across all enclosing IF and EVALUATE scopes, not just the immediate one. The condition field in the output isn't an inference about what the code probably intends. It's the logical predicate, fully resolved, as it actually appears in the source.

The result is documentation that cannot hallucinate, because it has no generative step. It can only record what the parser finds. If something is in the source, it's in the output. If something isn't in the source, it isn't in the output. That's a different epistemic category from anything an LLM produces.

And critically, it isn't just descriptive. When documentation is derived by the same engine that drives structural equivalence testing, it becomes the basis for verification — not a narrative about the old system, but the specification against which the new one is measured. The documentation and the proof are the same artifact.

What We're Building

This is exactly the direction RecodeX is moving. We are currently developing a deterministic COBOL documentation engine that parses legacy code (starting with COBOL – but eventually for others as well) — and produces a structured, auditable record of every program's complete instruction stream. Same source in, same documentation out, every time. No inference, no hallucination, no probabilistic gap-filling.

The output feeds directly into our structural equivalence test, so the documentation and the verification aren't separate processes — they're two faces of the same analysis. You get a complete picture of what the COBOL program does, and a systematic way to confirm that the Java (and eventually, other target language) output does the same thing.
We'll have more to share on this soon. In the meantime, if you're evaluating legacy modernization approaches and the documentation question is part of that conversation, it's worth asking whether the documentation you're being offered was inferred or derived — and whether it connects to anything verifiable on the other side.

More like this

What Claude Gets Right About COBOL Modernization — And What LLMs Still Can’t Do

The Business Person's Guide to Legacy Application Modernization

Modernizing Legacy Applications: A Decision Framework

Evaluating Legacy Code Refactoring Methods: A CTO's View

Why LLMs Struggle with Legacy Code Refactoring and What’s Next

How Tertiary Language Transpilers Refactor Legacy Code

Why Refactoring Legacy Code Is Hard (And How Transpilers Help)