Managing highly repetitive code and documentation in Java
Asked Answered
W

9

72

Highly repetitive code is generally a bad thing, and there are design patterns that can help minimize this. However, sometimes it's simply inevitable due to the constraints of the language itself. Take the following example from java.util.Arrays:

/**
 * Assigns the specified long value to each element of the specified
 * range of the specified array of longs.  The range to be filled
 * extends from index <tt>fromIndex</tt>, inclusive, to index
 * <tt>toIndex</tt>, exclusive.  (If <tt>fromIndex==toIndex</tt>, the
 * range to be filled is empty.)
 *
 * @param a the array to be filled
 * @param fromIndex the index of the first element (inclusive) to be
 *        filled with the specified value
 * @param toIndex the index of the last element (exclusive) to be
 *        filled with the specified value
 * @param val the value to be stored in all elements of the array
 * @throws IllegalArgumentException if <tt>fromIndex &gt; toIndex</tt>
 * @throws ArrayIndexOutOfBoundsException if <tt>fromIndex &lt; 0</tt> or
 *         <tt>toIndex &gt; a.length</tt>
 */
public static void fill(long[] a, int fromIndex, int toIndex, long val) {
    rangeCheck(a.length, fromIndex, toIndex);
    for (int i=fromIndex; i<toIndex; i++)
        a[i] = val;
}

The above snippet appears in the source code 8 times, with very little variation in the documentation/method signature but exactly the same method body, one for each of the root array types int[], short[], char[], byte[], boolean[], double[], float[], and Object[].

I believe that unless one resorts to reflection (which is an entirely different subject in itself), this repetition is inevitable. I understand that as a utility class, such high concentration of repetitive Java code is highly atypical, but even with the best practice, repetition does happen! Refactoring doesn't always work because it's not always possible (the obvious case is when the repetition is in the documentation).

Obviously maintaining this source code is a nightmare. A slight typo in the documentation, or a minor bug in the implementation, is multiplied by however many repetitions was made. In fact, the best example happens to involve this exact class:

Google Research Blog - Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken (by Joshua Bloch, Software Engineer)

The bug is a surprisingly subtle one, occurring in what many thought to be just a simple and straightforward algorithm.

    // int mid =(low + high) / 2; // the bug
    int mid = (low + high) >>> 1; // the fix

The above line appears 11 times in the source code!

So my questions are:

  • How are these kinds of repetitive Java code/documentation handled in practice? How are they developed, maintained, and tested?
    • Do you start with "the original", and make it as mature as possible, and then copy and paste as necessary and hope you didn't make a mistake?
    • And if you did make a mistake in the original, then just fix it everywhere, unless you're comfortable with deleting the copies and repeating the whole replication process?
    • And you apply this same process for the testing code as well?
  • Would Java benefit from some sort of limited-use source code preprocessing for this kind of thing?
    • Perhaps Sun has their own preprocessor to help write, maintain, document and test these kind of repetitive library code?

A comment requested another example, so I pulled this one from Google Collections: com.google.common.base.Predicates lines 276-310 (AndPredicate) vs lines 312-346 (OrPredicate).

The source for these two classes are identical, except for:

  • AndPredicate vs OrPredicate (each appears 5 times in its class)
  • "And(" vs Or(" (in the respective toString() methods)
  • #and vs #or (in the @see Javadoc comments)
  • true vs false (in apply; ! can be rewritten out of the expression)
  • -1 /* all bits on */ vs 0 /* all bits off */ in hashCode()
  • &= vs |= in hashCode()
Williemaewillies answered 25/2, 2010 at 19:59 Comment(6)
It sounds like you're specifically concerned about repetition due to code for handling primitive arrays. Personally, I just avoid this kind of repetition (and encourage others to do the same) by using generic collections and autoboxing, avoiding both arrays and primitives unless absolutely necessary. Do you have any examples of this repetition that don't involve primitive arrays?Marnamarne
+1 for the good post and netlib.bell-labs.com/cm/cs/pearlsKanaka
Repetitiveness due to providing complete overloads is just an example. I've seen this kind of repetition in non-overloading and non-primitive array handling scenarios as well.Williemaewillies
I had a look at that java.util.Arrays class and found the following javadoc snippet pretty amusing: "The code for each of the seven primitive types is largely identical. C'est la vie.".Pacifica
The Sun/Oracle Java library uses a manky templating system for NIO buffers.Locally
It would be nice if Java had Scheme's hygienic macros.Olivo
R
32

For people that absolutely need performance, boxing and unboxing and generified collections and whatnot are big no-no's.

The same problem happens in performance computing where you need the same complex to work both for float and double (say some of the method shown in Goldberd's "What every computer scientist should know about floating-point numbers" paper).

There's a reason why Trove's TIntIntHashMap runs circles around Java's HashMap<Integer,Integer> when working with a similar amount of data.

Now how are Trove collection's source code written?

By using source code instrumentation of course :)

There are several Java libraries for higher performance (much higher than the default Java ones) that use code generators to create the repeated source code.

We all know that "source code instrumentation" is evil and that code generation is crap, but still that's how people who really know what they're doing (i.e. the kind of people that write stuff like Trove) do it :)

For what it is worth we generate source code that contains big warnings like:

/*
 * This .java source file has been auto-generated from the template xxxxx
 * 
 * DO NOT MODIFY THIS FILE FOR IT SHALL GET OVERWRITTEN
 * 
 */
Rathenau answered 26/2, 2010 at 4:6 Comment(5)
Can you provide more details on what code generators they use, etc? I'm not familiar with Trove.Williemaewillies
It's explained in the Trove FAQ, basically they have an Ant target that calls a script that does the modification (if I remember correctly): trove4j.sourceforge.net/html/faq.html (I'm into Java high performance computing and I've seen the technique used several times... We use it here, we have our own Java proprietary code generating more Java code :)Rathenau
@polygenelubricants: btw Trove is a wonderful replacement for the default Java API if you need to work with primitives. For regular collections, then you'll want to look into Javolution or the Google collections etc. The default Java collections are really pretty bad from a lot of standpoints. It works for simple project but they show their limits quite fast once you start to manipulate important amount of data.Rathenau
I happen to like code generation... it needn't be nasty at all. But I would be generating byte code rather than Java source. What if you need to generate at run time, are you going to force end-users to install the JDK?Expediential
@CurtainDog: there's a reason while projects like Trove generate source code and not bytecode. There are cases where bytecode instrumentation is fine and cases where source code instrumentation is better. For what it's worth in the current project I'm working we do both so... Another option if you really want source code instrumentation at runtime (instead of bytecode) is simply to generate the .java server side, compile it, and send it down the wire. I'm not saying you should do this in that later case: I'm saying not only both have their use but both are commonly used.Rathenau
H
16

If you absolutely must duplicate code, follow the great examples you've given and group all of that code in one place where it's easy to find and fix when you have to make a change. Document the duplication and, more importantly, the reason for the duplication so that everyone who comes after you is aware of both.

Heartsome answered 25/2, 2010 at 20:10 Comment(1)
+1 Increasing the length of the duplicated documentation by documenting the duplication seems like it might be a bad idea at first, but it's really much worse to have duplicated stuff that needs to be modified and no documentation about the duplication.Cytherea
K
6

From Wikipedia Don't Repeat Yourself (DRY) or Duplication is Evil (DIE)

In some contexts, the effort required to enforce the DRY philosophy may be greater than the effort to maintain separate copies of the data. In some other contexts, duplicated information is immutable or kept under a control tight enough to make DRY not required.

There is probably no answer or technique to prevent problems like that.

Kanaka answered 25/2, 2010 at 20:20 Comment(0)
O
4

Even fancy pants languages like Haskell have repetitive code (see my post on haskell and serialization)

It seems there are three choices to this problem:

  1. Use reflection and lose performance
  2. Use preprocessing like Template Haskell or Caml4p equivalent for your language and live with nastiness
  3. Or my personal favorite use macros if your language supports it (scheme, and lisp)

I consider the macros different than preprocessing because the macros are usually in the same language that the target is where as preprocessing is a different language.

I think Lisp/Scheme macros would solve many of these problems.

Olivo answered 25/2, 2010 at 19:59 Comment(0)
L
2

I get that Sun has to document like this for the Java SE library code and maybe other 3rd party library writers do as well.

However, I think it is an utter waste to copy and paste documentation throughout a file like this in code that is only used in house. I know many people will disagree because it will make their in house JavaDocs look less clean. However, the trade off is that is makes their code more clean which, in my opinion, is more important.

Livorno answered 25/2, 2010 at 20:23 Comment(1)
+1 you can write the documentation for each duplicated method once in the class Javadoc and use a short method Javadoc which says "see class Javadoc"Creese
S
2

Java primitive types screw you, especially when it comes to arrays. If you're specifically asking about code involving primitive types, then I would say just try to avoid them. The Object[] method is sufficient if you use the boxed types.

In general, you need lots of unit tests and there really isn't anything else to be done, other than resorting to reflection. Like you said, it's another subject entirely, but don't be too afraid of reflection. Write the DRYest code you can first, then profile it and determine if the reflection performance hit is really bad enough to warrant writing out and maintaining the extra code.

Semitic answered 25/2, 2010 at 20:26 Comment(0)
M
2

You could use a code generator to construct variations of the code using a template. In that case, the java source is a product of the generator and the real code is the template.

Mindful answered 25/2, 2010 at 20:55 Comment(3)
Yes, this is what I was alluding to when I said that perhaps Sun has its own preprocessor, etc.Williemaewillies
The officially sanctioned way to do this would be to use an annotation and an annotation processor so that when you compiled the code, javac would call your annotation processor which would in turn generate source code on the fly to be compiled by the compiler. The unofficial way to do it is to have your annotation processor modify internal compiler data structures when it is called. The only free java source generation library I've found is CodeModel.Mindful
This seems reasonable for larger snippets, but a little duplication can be the lesser of two evils compered to adding yet another layer of complexity to the build process.Connacht
B
2

Given two code fragments that are claimed to be similar, most languages have limited facilities for constructing abstractions that unify the code fragments into a monolith. To abstract when your language can't do it, you have to step outside the language :-{

The most general "abstraction" mechanism is a full macro processor which can apply arbitrary computations to the "macro body" while instantiating it (think Post or string-rewriting system, which is Turing capable). M4 and GPM are quintessential examples. The C preprocessor isn't one of these.

If you have such a macro processor, you can construct an "abstraction" as a macro, and run the macro processor on your "abstracted" source text to produce the actual source code you compile and run.

You can also use more limited versions of the ideas, often called "code generators". These are usually not Turing capable, but in many cases they work well enough. It depends on how sophisticated your "macro instantiation" needs to be. (The reason people are enamored with the C++ template mechanism is ths despite its ugliness, it is Turing capable and so people can do truly ugly but astonishing code generation tasks with it). Another answer here mentions Trove, which is apparantly in the more limited but still very useful category.

Really general macro processors (like M4) manipulate just text; that makes them powerful but they don't handle the structure of programming language well, and it is really awkward to write a generaor in such a mcaro processor that can not only produce code, but optimize the generated result. Most code generators that I encounter are "plug this string into this string template" and so cannot do any optimization of a generated result. If you want generation of arbitrary code and high performance to boot, you need something that is Turing capable but understands the structure of the generated code so it can easily manipulate (e.g., optimize) it).

Such a tool is called a Program Transformation System. Such a tool parses the source text just like a compiler does,and then carries analyses/transformations on it to achieve a desired effect. If you can put markers in the source text of your program (e.g, structured comments or annotations in langauges that have them) directing the program transformaiton tool what to do, then you can use it to carry out such abstraction instantiation, code generation, and/or code optimization. (One poster's suggestion of hooking into the Java compiler is a variation on this idea). Using a general puprose transformation system (such as DMS Software Reengineering Tookit means you can do this for essentially any language.

Brockbrocken answered 6/3, 2010 at 16:30 Comment(0)
M
1

A lot of this kind of repetition can now be avoided thanks to generics. They're a godsend when writing the same code where only the types change.

Sadly though, I think generic arrays are still not very well supported. For now at least, use containers that allow you to take advantage of generics. Polymorphism is also a useful tool to reduce this kind of code duplication.

To answer your question about how to handle code that absolutely must be duplicated... Tag each instance with easily searchable comments. There are some java preprocessors out there, that add C-style macros. I think I remember netbeans having one.

Macaque answered 25/2, 2010 at 20:8 Comment(3)
"Sadly though, I think generic arrays are still not very well supported." -- I'm not sure how you can support generic arrays in Java with type-erasure. I think it's impossible.Williemaewillies
I've seen a workarounds. Casting an array of Object or using reflection. Neither one is pretty, but they apparently work.Macaque
Generally, it's best to avoid using arrays in Java. ArrayLists provide a lot more functionality and usually have a negligible performance cost compared to an array.Ut

© 2022 - 2024 — McMap. All rights reserved.