RecodeX logo

How Tertiary Language Transpilers Refactor Legacy Code

January 7, 2025
Introduction

As mentioned in our post on the challenges associated with refactoring legacy code, automation of refactoring is possible if you have:

  • Rules to map data types and functions where a 1:1 relationship exists, and to fill capability gaps with equivalent constructs or libraries.
  • A mechanism to decompose the source code into its logical structure, ensuring that variable handling, control flow, and interdependencies are preserved during the transformation.
  • A capability that abstracts logical constructs, enabling better mapping when implementation styles differ drastically between the source and target languages.
  • Context-aware processing that goes beyond static rules, leveraging analysis tools to infer intent, variable scope, and control flow relationships, ensuring functional equivalence is maintained across paradigms.

The even-better news – these capabilities already exist today – they’re called transpilers, but probably not the transpilers you’re thinking of.

When most developers think of transpilers, they think of static, rules-based transpilers which struggle because (1) implementation differences create interdependencies that aren’t linear, so the volume of rules necessary to solve them become exorbitant, (2) the pace of change of modern languages is difficult for rules to keep up with.

But there is another class of transpilers – that use a tertiary language as an intermediary.  Tertiary languages abstract logical constructs and decouple source and target languages, creating a bridge that allows for more generalized mappings and dynamic handling of implementation differences. Transpilers that use a tertiary language as an intermediary not only avoid the issues caused by static rules, but - by the very nature of their structure - perform more accurately, and improve performance more quickly.  It is for this reason that many researchers in the Generative AI space are adopting this technique to enhance their models (which still don’t outperform transpilers).  See here, here and here.

In this post, we’ll explain in more detail how tertiary language-based transpilers work, and why their very structure makes them so well-suited to refactoring legacy code to modern code.

 

Decompose the Legacy Source Code Into Its Logical Structure

The first step for most transpilers is to convert the source code into an Abstract Syntax Tree (AST), which represents the structure of source code in a hierarchical, language-neutral format.

Here’s an example of a simple Java program:

Simple Java program used to illustrate Abstract Syntax Tree

And here is the AST that represents the structure and logic of the program in a hierarchical format:

AST for simple Java program
  1. Program Node: Represents the entire program.
  2. ForStatement Node: Encapsulates the Java for loop structure.
  3. Initialization Node: Describes the declaration of the loop variable (i) and its initial value.
  4. Condition Node: Represents the loop condition (i <= 10) as a binary expression.
  5. Increment Node: Defines the loop increment (i++) as a unary expression.
  6. Body Node: Contains the logic executed during each iteration.
  7. MethodCall Node: Represents the System.out.println call, including its arguments:
  • A static string ("Value of i: ") concatenated with the loop variable (i).

What you notice from this is that the use of an AST ensures the functional intent of the original code is preserved during legacy code refactoring.  This is a critical benefit of transpilers that is missed by Generative AI (at least for now), and creates real risk in manual refactoring when code is not fully documented.

Now, here is an identical program in COBOL:

Simple COBOL program used to illustrate Abstract Syntax Tree

And here is the AST that represents the structure and logic of the COBOL program in a hierarchical format:

Abstract Syntax Tree for simple COBOL program
  1. Program Node: The root of the AST, representing the entire program.
  2. PerformStatement Node: Represents the loop construct, which is central to the COBOL snippet.
  3. LoopControl Node: Encapsulates details about the loop, such as the variable (i), its initial value, increment step, and the condition to exit the loop.
  4. Body Node: Contains the actions performed within the loop.
  5. DisplayStatement Node: Represents the DISPLAY statement, including the static text and the variable to output.

While the code for COBOL and Java look very different, the ASTs for COBOL and Java look a lot more similar to one another, having the same logical structure:

  • Loop initialization, condition, and increment are preserved in their respective nodes.
  • The loop body and its display/print functionality are represented in a similar hierarchical format.

This consistency enables the intermediary representation (AST) to serve as a bridge between COBOL and Java, facilitating accurate refactoring of legacy code.

While an Abstract Syntax Tree (AST) effectively addresses the challenge of decomposing legacy source code into its logical structure, a tertiary language goes a step further by solving four additional challenges:

  1. When 1:1 mappings do exist, a tertiary language replicates the functionality of a static, rules-based transpiler but with added flexibility and customization
  2. Mapping capabilities when a direct 1:1 relationship doesn’t exist
  3. Bridging drastic differences in implementation style
  4. Leveraging context-aware processing for intent inference

 

How the Tertiary Language Solves Key Challenges

1. When 1:1 Data Type and Function Mappings Exist

When legacy source code includes constructs that map directly to equivalent features in modern languages, a tertiary language operates similarly to a static, rules-based transpiler. While the tertiary language ensures that functionality is preserved, it also offers the opportunity to incorporate additional layers of logic. This can include handling unique requirements, accommodating differences in language capabilities, or optimizing for specific runtime behaviors. This added flexibility is critical for handling edge cases, optimizing performance, and addressing specific project requirements that fall outside the scope of simple mappings. Here are a few examples:

Handling Edge Cases

A static transpiler might fail to account for edge cases, such as variable type mismatches or nuanced error handling. A tertiary language allows for incorporating these considerations, ensuring robust output.

  • Example: A COBOL DIVIDE operation might encounter scenarios where division by zero needs explicit handling. A tertiary language can include metadata for runtime error checks.

Customizable Mappings

The tertiary language can adapt mappings to project-specific requirements, such as using alternative libraries or constructs for better performance or maintainability.

  • Example: Instead of mapping a COBOL arithmetic operation to a simple Java statement, the tertiary language might translate it into a library-based implementation for higher precision or compatibility (e.g., BigDecimal).

 

2. Addressing Capability Gaps

Legacy source code often includes features or constructs that lack a direct equivalent in modern languages. These capability gaps can arise from differences in data types, native functions, or built-in language features. A tertiary language bridges these gaps by abstracting the functionality into a neutral, logical representation, enabling the transpiler to map these features to equivalent or custom implementations in the target language.

This approach ensures that the original intent and functionality are preserved, while providing flexibility to accommodate differences in language capabilities or project-specific needs. Here are a few examples:

Handling Capability Gaps in Legacy Data Types

A tertiary language enables abstraction of unique data types from legacy languages, such as COBOL’s packed decimals (COMP-3), which are not natively supported in modern languages like Java.

  • Example:
    • COBOL: COMP-3 is a binary-coded decimal format optimized for precise arithmetic in financial systems.
    • Tertiary Language: Abstracts COMP-3 as a high-precision numeric type with metadata for scale and precision.
    • Java: Maps the abstract type to a BigDecimal in java.math, ensuring equivalent functionality.

This abstraction also allows the transpiler to handle variations, such as regional differences in number formatting or the need for rounding rules, without adding complexity to the source or target code.

Customizing Functionality for Project-Specific Needs

Capability gaps in native functions, like COBOL’s built-in SORT statement, can be bridged through a tertiary language. Instead of relying on boilerplate code to replicate the function, the tertiary language provides a framework to map the intent of the operation.

  • Example:
    • COBOL: The SORT statement processes file-based records directly.
    • Tertiary Language: Represents the sorting operation as a generic Sort Node with attributes like input source, sort key, and output destination.
    • Java: The transpiler can generate an implementation using Java’s Collections.sort() for in-memory sorting or a custom library for large-scale file sorting.

This customization ensures the refactored code performs optimally for the specific use case, whether it’s handling small datasets in memory or scaling to large, file-based operations.

Scalability and Maintainability

Static transpilers often require extensive rule sets to handle capability gaps, which can become unmanageable as languages and projects evolve. A tertiary language simplifies this by centralizing the abstraction of these features.

  • Example:
    Instead of creating specific rules for every legacy function, the tertiary language defines generic nodes (e.g., ArithmeticOperation, Sort, FileRead) that can be extended to handle variations across languages. This reduces rule complexity and makes the transpiler easier to maintain and scale.

 

3. Bridging Drastic Differences in Implementation Styles

Legacy and modern programming languages often approach the same tasks—such as looping, memory management, or error handling—in vastly different ways. These implementation style differences arise from the distinct paradigms of procedural languages like COBOL and object-oriented or functional languages like Java. A tertiary language bridges these differences by abstracting the logical intent of the implementation, enabling the transpiler to map it to the most appropriate construct in the target language.

This abstraction ensures that the underlying functionality is preserved, while allowing the refactored code to align with the design principles and best practices of the modern language. Here are a few examples:

Aligning Control Flow Constructs

Legacy languages like COBOL often rely on procedural constructs such as PERFORM loops or GO TO statements, which can result in less structured and harder-to-maintain code. Modern languages like Java, however, emphasize structured programming with scoped loops and conditionals.

  • Example:
    • COBOL: The PERFORM statement executes a block of code iteratively, with implicit control variables and conditions.
    • Tertiary Language: Abstracts the loop as a Loop Node, capturing its initialization, condition, increment, and body in a language-neutral format.
    • Java: Maps the Loop Node to a structured for or while loop, explicitly defining the control variables and ensuring modularity.

This approach avoids translating legacy constructs into unstructured or error-prone patterns, making the refactored code cleaner and easier to maintain.

Adapting Memory Management Approaches

Legacy languages often rely on static memory allocation, with variables persisting throughout the program's execution unless explicitly reset. Modern languages, by contrast, use dynamic memory allocation and garbage collection, requiring scoped variables and objects.

  • Example:
    • COBOL: A global variable in WORKING-STORAGE is reused and cleared manually (e.g., MOVE SPACES).
    • Tertiary Language: Abstracts the variable lifecycle into a Scoped Variable Node, identifying where it is initialized, used, and cleared.
    • Java: Maps the scoped variable to a local or instance-level variable, relying on garbage collection to manage memory automatically.

By bridging these differences, the tertiary language ensures memory handling is refactored in a way that fits the modern language's paradigm without introducing memory leaks or unnecessary complexity.

Whether it's unstructured control flows, manual memory management, or outdated error-handling patterns, a tertiary language enables the transformation of legacy implementations into clean, modern equivalents. This makes it an essential tool for bridging the gap between old and new programming paradigms.

 

4. Leveraging Context-Aware Processing for Intent Inference

One of the most complex challenges in refactoring legacy code is understanding the intent behind the original implementation. Legacy systems often lack proper documentation, rely on implicit behaviors, and contain tightly coupled logic. Static, rules-based transpilers struggle to infer this intent, as they operate purely on syntax without understanding the broader context of the program.

A tertiary language, combined with context-aware processing, goes beyond static mappings by embedding metadata and semantic analysis to infer the functional intent of the code. This approach ensures that the refactored system preserves not only functionality but also aligns with the original purpose of the code, even in complex, undocumented cases.

How Context-Aware Processing Works

Context-aware processing analyzes the relationships and dependencies within the source code, enabling the transpiler to:

  • Identify how variables are used across multiple procedures.
  • Determine the logical flow of operations and their dependencies.
  • Understand implicit behaviors that are not explicitly defined in the code.

This deeper understanding allows the tertiary language to generate accurate, intent-preserving mappings in the target language.

Examples of Context-Aware Processing

  1. Resolving Global Variable Dependencies

Legacy languages like COBOL frequently use global variables that are shared and modified across multiple sections of the program. Refactoring these variables into a modern language requires understanding their scope, lifecycle, and dependencies.

  • Example:
    • COBOL:
Code snippet for COBOL used to illustrate how context-aware processing resolves global variable dependencies
  • Tertiary Language: The transpiler analyzes where TOTAL-SALES is modified and accessed, tagging it with metadata about its lifecycle and dependencies.
  • Java: Maps TOTAL-SALES to a class-level variable with controlled access, ensuring modularity and thread safety:
Code snippet for Java used to illustrate how context-aware processing resolves global variable dependencies
  1. Disambiguating Overlapping Logic

In legacy code, it’s common for multiple operations to rely on shared resources or tightly coupled workflows. Context-aware processing disambiguates these overlaps, identifying separate logical units and translating them into modular constructs.

  • Example:
    • COBOL: A single procedure handles both file reading and processing logic.
Code snippet for COBOL used to illustrate how context-aware processing disambiguates overlapping logic
    • Tertiary Language: Separates file handling and processing into distinct nodes with contextual metadata about their relationship.
    • Java:
Code snippet for Java used to illustrate how context-aware processing disambiguates overlapping logic

Context-aware processing, powered by a tertiary language, enables legacy code refactoring to move beyond syntax and semantics into true intent preservation. This capability ensures that even the most complex and undocumented legacy systems can be transformed into clean, modern code without losing their original purpose or functionality.

 

Conclusion

By leveraging Abstract Syntax Trees to decompose source code into a logical structure, tertiary languages build upon this foundation to provide:

  • 1:1 mappings with added flexibility for edge cases and customization.
  • Solutions to capability gaps, abstracting unique constructs like COBOL’s packed decimals into modern equivalents.
  • Bridges for implementation style differences, ensuring that modern principles like scoped variables and structured control flows replace legacy paradigms.
  • Context-aware processing to infer and preserve the intent behind undocumented or implicit logic, creating clean, functional code in the target language.

These capabilities not only simplify the transition from legacy systems but also produce code that is modular, maintainable, and aligned with modern best practices.

The challenge of legacy code refactoring is no longer a question of “if” but “how effectively.” Tertiary language transpilers provide a clear path forward, enabling organizations to modernize their systems while preserving critical functionality and intent. By adopting this approach, CTOs and development teams can unlock the potential of their legacy systems and position themselves for a more agile, future-ready technology stack.

Are you ready to transform your legacy code? Explore the power of our tertiary language transpilers and take the first step toward seamless modernization.

 

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram