Experience migrating legacy Cobol/PL1 to Java

Asked 22/6, 2009 at 23:53 Answered 2/4, 2013 at 18:58

Solved java migration cobol code-migration

ORIGINAL Q: I'm wondering if anyone has had experience of migrating a large Cobol/PL1 codebase to Java?

How automated was the process and how maintainable was the output?

How did the move from transactional to OO work out?

Any lessons learned along the way or resources/white papers that may be of benefit would be appreciated.

EDIT 7/7: Certainly the NACA approach is interesting, the ability to continue making your BAU changes to the COBOL code right up to the point of releasing the JAVA version has merit for any organization.

The argument for procedural Java in the same layout as the COBOL to give the coders a sense of comfort while familiarizing with the Java language is a valid argument for a large organisation with a large code base. As @Didier points out the $3mil annual saving gives scope for generous padding on any BAU changes going forward to refactor the code on an ongoing basis. As he puts it if you care about your people you find a way to keep them happy while gradually challenging them.

The problem as I see it with the suggestion from @duffymo to

Best to try and really understand the problem at its roots and re-express it as an object-oriented system

is that if you have any BAU changes ongoing then during the LONG project lifetime of coding your new OO system you end up coding & testing changes on the double. That is a major benefit of the NACA approach. I've had some experience of migrating Client-Server applications to a web implementation and this was one of the major issues we encountered, constantly shifting requirements due to BAU changes. It made PM & scheduling a real challenge.

Thanks to @hhafez who's experience is nicely put as "similar but slightly different" and has had a reasonably satisfactory experience of an automatic code migration from Ada to Java.

Thanks @Didier for contributing, I'm still studying your approach and if I have any Q's I'll drop you a line.

Margaretemargaretha answered 22/6, 2009 at 23:53 Comment(9)

NACA seems interesting, with a detailed presentation at the JAZOON conference. See my update below. – Previdi 25/6, 2009 at 20:31

Definitely interesting, been reading it this morning, thanks @Jim. I'll leave this Q open for another few days in case it attracts anyone who's actually done it and has practical experience. – Margaretemargaretha 26/6, 2009 at 1:42

NACA seems terrible. See answer below. – Cumulous 30/6, 2009 at 5:7

@Ira & @MadMurf: What we're seeing here is evidence of how hard it really is to write a source code translator between two radically different languages. A translator has to preserve correctness via the original semantics, and it has to enable tracability from the new code back to the old, which means it shouldn't do refactoring. It can't magically induce an OO structure from a procedural one. Their Java devs do seem quite good (eg, in their use of a Builder pattern to handle Cobol data type declarations). This is just the nature of automatic translation. – Previdi 4/7, 2009 at 14:3

@Jim: OO-ness is hard; I won't argue the point about getting an OO-result from non-OO code because domain modelling is very hard. But you can argue for a good conversion of the underlying technologies. Procedural code written in java will still run and still be maintainable, but doesn't have to have any hints of the source language (e.g., COBOL) in it. I have some experience here: www.semanticdesigns.com/Products/Services/NorthropGrummanB2.html where the requirements were "keep the business rules in the hands of the JOVIAL engineers (sound familiar?)". – Cumulous 4/7, 2009 at 20:55

@Ira: Your point is very appealing: it would be great if NACA had options that downplayed the Cobol tracability and produced better Java. I'm just very impressed that it apparently works, and that it's open source, allowing us to perhaps add such improvements. ;-) – Previdi 9/7, 2009 at 4:19

@Jim: I think the NACA tool is built around the assumption that it generates a COBOL simulator. I strongly doubt that you'll be able to change that fundamental assumption, so I don't think you'll get a better result that way. – Cumulous 15/9, 2010 at 10:28

8 years gone by. What did you finally do with this system? – Cumulous 26/8, 2017 at 6:15

@IraBaxter budgets changed resulting in a solution of "picking around the edges" being implemented. The old code is still in place and being used, as changes have been required over the years Java RESTful services are now implemented, but 75% of the legacy code is still in use. – Margaretemargaretha 31/8, 2017 at 2:35

Update 6/25: A friend just ran across the NACA Cobol to Java converter. Looks quite interesting, it was used to translate 4m lines of Cobol with 100% accuracy. Here's the NACA open source project page. The other converters I've seen were proprietary, and the materials were conspicuously lacking success stories and detailed example code. NACA is worth a long look.

Update 7/4: @Ira Baxter reports that the Java output looks very Cobol-esque, which it absolutely does. To me, this is the natural result of automatic translation. I doubt we'll ever find a much better translator. This perhaps argues for a gradual re-write approach.

Update 2/7/11: @spgennard points out that there are some Cobol compilers on the JVM, for example Veryant's isCobol Evolve. These could be used to help gradually transition the code base, though I think the OP was more interested in automated source conversion.

I'd be very cautious about this. (I used to work for a company that automatically corrected Cobol and PL/I programs for Y2K, and did the front end compiler that converted many dialects of Cobol into our intermediate analytic form, and also a code generator.) My sense is that you'd wind up with a Java code base that still would be inelegant and unsatisfying to work with. You may wind up with performance problems, dependencies on vendor-supplied libraries, generated code that's buggy, and so on. You'll certainly incur a huge testing bill.

Starting from scratch with a new object-oriented design can be the right approach, but you also have to carefully consider the decades of stored knowledge represented by the code base. Often there are many subtleties that your new code may miss. On the other hand, if you're having a hard time finding staff to maintain the legacy system, you may not have a choice.

One gradual approach would be to first upgrade to Cobol 97. This adds object-orientation, so you can rewrite and refactor subsystems individually when you add new functionality. Or you could replace individual subsystems with freshly-written Java.

Sometimes you'll be able to replace components with off-the-shelf software: we helped one very large insurance company that still had 2m lines of code in a legacy language it created in the 1950s. We converted half of it to Y2K compliant legacy language, and they replaced the other half with a modern payroll system they bought from an outside vendor.

Previdi answered 23/6, 2009 at 5:37 Comment(3)

The NACA output looks COBOL-esque because they designed the translator to be trivial, e.g., they're simply implementing COBOL semantics using java code. There's no reason you can't produce much better translations. For instance, the NACA implementation likely keeps the decimal data type from the mainframe and implements decimal arithmetic, etc, thus ADD A TO B in COBOL gets translated to something like (NACA says "one statement per statement" B.COBOLDECIMALADDTO(A) It should be straightforward to infer when a datatype is always just as pure integer and generate A+=B; as one expects. – Cumulous 4/7, 2009 at 19:3

If all you want to move is the COBOL verbs, this might work. As general rule, there are a large number of other application dependencis you have to simulate in the new world: interanction manaement (CICS), file access methods (ISAM? DB2? ...), screen I/O (3270? 24x80?), job management, ... that the "COBOL translator" won't handle. You can't "just recompile"; you must translate the idioms that represent legacy dependencies into something equivalent. This is generally nontrivial. See #3455956 – Cumulous 8/2, 2011 at 5:6

Rather a late addition: the NACA results are not "natural result of translation". OP said, "I doubt we'll find a better translator". My company makes one we think is much better. You can check its output here: semanticdesigns.com/Products/Services/… – Cumulous 27/4, 2023 at 16:25

It was clearly our intent to obtain initial java code that was very close to the original cobol in order to facilitate the migration of people: they find the good old app they wrote in cobol in exact same structure.

one of our most important goals was to keep initial developers on board: that's the way we found to achieve it. When application migrated to Java, those people can start make it more OO as they further develop / refactor it.

If you don't care about migrating people, you can use other strategy.

This 1-to-1 conversion also made 100% automated conversion simpler & faster: the good consequence is that we made our recurring savings (3 millions euros / year) much faster: we estimate 12-18 months. Those early savings can clearly be reinvested in OO refactoring

feel free to contact me: [email protected] or [email protected]

didier

Linwoodlinz answered 5/7, 2009 at 4:7 Comment(0)

I just looked at the NACA page and docs. From their documentation:

"The generated java uses a Cobol-like syntax. It's as close as possible from original Cobol syntax, within of course the limits of the Java language. Generated code doesn't look like classical native java and is not object oriented from the application point of view. This is a by design strong choice, to enable a smooth migration of Cobol developers to the Java environment. The goal is to keep business knowledge in the hand of people who wrote the original Cobol programs."

I didn't see an example, but the quote gives a strong flavor of the result. Its COBOL coded in Java.

You can always build a "Translator" from one language to another, by simply coding an interpreter in the target langauge. That's IMHO an absolutely terrible way to translate a langauge as you end up with the worst of both worlds: you don't get the value of the new language, and you still have to have knowledge of the old one to keep the result alive. (No wonder this thing is called a "Transcoder"; I'd never heard this term before).

The argument for this stunt is to dump the costs of the mainframe. Where's the evidence that the costs of working on the converted program don't swamp the savings? I suspect the truth is that the operations people lowered their cost by dumping the mainframe, and they couldn't care less that the maintenance tasks got more expensive. While that may be rational for the operations guys, its a stupid choice for the orgnization as a whole.

Heaven help people that are a victim of this tool.

EDIT May 2010: I found an example of NACA's output; one of their testcases. This is absolutely magnificent JOBOL. Its a good thing they are keeping their COBOL programmers and don't want to hire any Java programmers. As your read this, be sure you remember this is Java code.

/*
 * NacaRTTests - Naca Tests for NacaRT support.
 *
 * Copyright (c) 2005, 2006, 2007, 2008 Publicitas SA.
 * Licensed under GPL (GPL-LICENSE.txt) license.
 */

import idea.onlinePrgEnv.OnlineProgram;
import nacaLib.varEx.*;

public class TestLong extends OnlineProgram
{
  DataSection WorkingStorage = declare.workingStorageSection();

  Var W3 = declare.level(1).occurs(10).var();
  Var V9Comp010 = declare.level(5).pic9(10).var();
  Var V9Comp014V4 = declare.level(5).pic9(14, 4).var();
  Var VX10 = declare.level(5).picX(10).var();

  public void procedureDivision()
  {
    setAssertActive(true);

    move("9876543210", VX10);
    assertIfDifferent("9876543210", VX10);

    move(VX10, V9Comp010);
    long l = V9Comp010.getLong();
    assertIfFalse(l == 9876543210L);

    multiply(1000, V9Comp010).to(V9Comp014V4);
    assertIfFalse(9876543210000L == V9Comp014V4.getLong());

    String cs = V9Comp010.toString();
    cs = V9Comp014V4.toString();
    assertIfDifferent("9876543210000.0000", V9Comp014V4);

    inc(V9Comp010);
    assertIfFalse(9876543211L == V9Comp010.getLong());

    CESM.returnTrans();
  }

Kids: This is only done by professionals. Do not attempt this at home.

Cumulous answered 30/6, 2009 at 5:7 Comment(3)

In addition to this, embedded sql processing in COBOL/PL1 are quite readable, when it gets converted it is not readable with the java variant and requires more processing. Embedded SQL java frameworks SQLJ are not quite popular and seems to be not usable as well – Houlihan 18/6, 2020 at 12:20

@cobp: You seem to be objecting that any translation of mainframe langauges to Java will have more clumsily implemented SQL statements. I don't happen to believe that is necessary. However, this particular answer was in response to a specifically-proposed "transcoder" whose output for the procedural part of the progam is beyond ugly/maintainable; in this case, it doesn't matter if the translated SQL is ugly or not.. – Cumulous 18/6, 2020 at 14:38

@IraBraxter SQL part is not ugly or unmaintanable. I would say it has more syntax sugar compared to java. Additionally, if it was originally, developed in java, there could be some other ways to implement it using ORM frameworks for example. – Houlihan 19/6, 2020 at 7:38

My experience is similar but slightly different. We have a large and old code base in Ada (0.5Mloc over 15+years ) that was recently converted to Java. It was outsourced to a company that provided combination of automated/manual conversion. They also did testing to verify that the Ada and Java systems behaved the same.

Some parts of it where written in Ada 95 (ie had the possibility of OOP) but most of it wasn't

Now yes the code is not up to the same standards of code written in Java in the first place but we have been using it since then successfully (18 months now) with no major issues. The major advantage we got was now we can find more developers to maintain our code base with the skills to produce maintainable code. (Any one can develop in Ada but like any other language if you don't have the experience in it you can end up with unmaintainable code)

Oxa answered 24/6, 2009 at 2:44 Comment(1)

What was the reason for the conversion? Ada seems like a good language. – Sadomasochism 19/6, 2014 at 1:33

From a risk avoidance point of view, the NACA approach absolutely makes sense. Reusing their tools might not. They used the developing of the tools to get their people up to speed in java and linux.

The result of the NACA conversion is not going to be good enough, or even OO, and makes it difficult to hire new people. But it is testable, can be refactored, and you can plug in better translators.

[edit] Ira, you don't seem to be very risk-aware.

Sending the cobol programmers to a java course is not going to make them write usable object-oriented code. That takes a few years. During that time, their productivity will be very low, and you can basically throw away all the code they write the first year. In addition you'll lose 10-20% of your programmers, who are not willing or capable of making the transition. Lots of people do not like going back to beginner status, and it is going to influence the pecking order, as some programmers pick up the new language a lot faster than others.

The NACA approach allows the business to continue working, and puts no unneeded pressure on the organisation. The time-schedule for the conversion is independent. Having a separate translator, in java, writen by OO experts, allows a gradual exposure to java for the old team. Writing the test cases increases domain knowledge in the new java team.

The real oo system is the translator, and that is the place to plug in better translators. Make it easy to do that, and you do not have to touch the generated code. If the generated code is ugly enough, that is what will happen automatically: :)

the old programmers will change the cobol input;
the new java ones will change the translator.

[running the translator once] is a bad strategy. Don't do that. And if you need to edit the generated code, maintain a mapping back. That can be automated. And should be. It is a lot easier to do these kind of things in a Smalltalk image, but you can do it with files. There are people with a lot of experience maintaining different views on the same artifact: chip designers come to mind.

The translator should be instrumented, so you can create the daily counts of e.g.

cobol input components;
OO java input components;
cobol style output components;
OO style output components.

You might want to read: Peter van den Hamer & Kees Lepoeter (1996) Managing Design Data: The Five Dimensions of CAD Frameworks, Configuration Management, and Data Management, Proceedings of the IEEE, Vol. 84, No. 1, January 1996

[moving Cobol platforms] Moving from Cobol on the mainframe to Cobol on Windows/Linux could have been a viable strategy for the NACA team, but the question was about moving to java. If the long-term goal is to have a modern OO system, and to get there with as little operational risk as possible, the NACA approach is sound. It is only step one, though. A lot of refactoring is going to follow.

Breakneck answered 2/7, 2009 at 14:0 Comment(3)

The only risk that got avoided was whether their existing COBOL programmers could work on the converted system. If you assume that COBOL programmers are not brain dead, this doesn't seem like much of a risk, and for the amount of money likely to be wasted working with COBOL-as-java, they could have likely sent all the COBOL programmers to a Java training class. Regarding "plug in better translators:" once you've converted, you can't do this. People will change the converted code, and now you can't go back. – Cumulous 4/7, 2009 at 20:43

"The real oo system is the translator..." So what? Once the code is all converted, maintenance will almost surely happen on the translated code. You'll never get to run the translator again, so it is irrelevant. Now, you may be suggesting that the scheme insists the original COBOL is kept, and the COBOL programmers work on that; that's a model, but given that the mainframe supporting that COBOL code is gone, I think it unlikely to be this case. I can't imagine the value of trying to maintain the old COBOL code once the translation completes. – Cumulous 6/7, 2009 at 3:47

If you're gonna keep the application code in COBOL so that you can run the translator repeatedly, in fact what you are doing is keeping the application in COBOL. In this case, the right answer is to translate the COBOL from the mainframe to COBOL on the target system (e.g., COBOL.net for MS, or MicroFocus COBOL). Why on earth you want to keep a translator alive to support mainframe COBOL programming when you want to live on another platform is simply beyond me. And if you aren't going to maintain in COBOL, then keeping a COBOL to whatever translator is equally silly. – Cumulous 6/7, 2009 at 17:43

I'm surprised nobody has mentioned Semantic Design's DMS Software Reengineering Toolkit. I looked into COBOL conversion in the past. I was working on "automatic programming" back then. Before writing a translator, I looked up a bunch of previous efforts and products in that area. Semantic Designs' GLR-based tool was the best of the bunch.

That was many years ago. At the time, the tool translated COBOL to a modern language, refactored it, pretty printed it, etc. Here's the link to it now.

http://www.semdesigns.com/Products/DMS/DMSToolkit.html

They're still around. They've expanded the tool. It's more general. It might help people doing automated conversions or customizing a conversion tool. It's designed to be expandable and tweakable similarly to what Stephan pointed out. Thanks to Cyrus also for mentioning SoftwareMining. I'll look into them too if I run into a COBOL migration in the future.

Apyretic answered 2/4, 2013 at 18:58 Comment(0)

You are speaking of reengineering. The good thing is that a lot of people worldwide tries to do this. The bad thing is that there are a lot of problems concerning legacy applications reengineering: starting from missing sources and up to complex algorithms from compiler construction and graph theory fields.

Idea of automatic translation is very popular, until you will try to convert something. Usually the result is awful and unmaintainable. It is more unmaintainable than original complicated application. From my point of view, every tool that allows automatic translation from legacy to modern language is very marketing oriented: it says exactly what people want to hear "translate your application from ... to Java once, and forget!", than you are buying a contract, and then you understand that you very tightly depends on the tool (because you can't make any change to your application without it!).

Alternative approach is "understanding": the tool, that allows you very detailed understanding of your legacy application. And you can use it for maintenance, or for documenting, or for reinventing on new platform.

I know a little about Modernization Workbench history before Microfocus bought it last year and moved development to another country. There was great number of complex analysis tools, and number of supported target languages (including Java). But no client really used automatic code generation, so the development of generation part was frozen. As far as I know PL/I support was mostly implemented, but it was never finished. But still you can try, may be this is what you are looking for.

Tramway answered 8/7, 2009 at 15:26 Comment(1)

"then you understand you very tightly depends on the tool"? Not generally. Normally translation tools are used once to convert the code, and the tool is discarded; further development work is on the translated code. A more reasonable objection is the quality of the generated code (this is the objection to NACA), and poorly implemented tools do have this problem. Good translations are possible with tools. It helps enormously if the tool isn't constructed specifically for the single migration being contemplated, because long-term tools have evolving foundations strengthen over time. – Cumulous 24/7, 2013 at 14:59

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags