Keeping track of utility classes

H

9

19

I've recently been more and more frustrated with a problem I see emerging in my projects code-base.

I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.

Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?

Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document) which returns a String.

Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...

Update: I guess I better describe the current environment we develop in. We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis. We use agile development including daily talks and (perhaps insufficient) code reviews. We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.

I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.

Harville answered 11/4, 2011 at 17:52 Comment(2)

See #4189419 – Reynaldoreynard 15/4, 2011 at 20:27

I imagine you can also adapt that solution to list all methods with duplicate type signatures. – Reynaldoreynard 15/4, 2011 at 20:29

T

6

Your problem is a very common one. And a real problem too, because there is no good solution.

We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.

The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.

We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.

Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.

And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...

All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.

But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.

In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:

transform your big team that work on one big project into small teams that work on several subprojects;
ensure that interfacing is good to minimize integration problems, but let team have their own code;
inside theses teams and corresponding codebases, ensure you have the best practices. No duplicate code, good abstractions. Use existing proven APIs from the community. Use pair programming, strong API documentation, wikis... But you should really let different teams make their choices, build their own code, even if this means duplicate code across teams or different design decisions. You know, if the design decisions are different this may be because the needs are different.

What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.

Tallulah answered 18/4, 2011 at 9:36 Comment(3)

thanks for your answer, I think you are right. The next logical step for us is breaking up our code into subprojects and defining interfaces where the shared code is minimal. We have been talking about it for a while and I feel that it will provide the compromise we search for. Since I'm not sure any tool exists which can help me solve this I'm awarding the bounty to you, as you are probably closest to defining a way to mitigate the problem – Harville 18/4, 2011 at 10:57

50 days split over 800 potential users is not that much of a overhead – Garnierite 14/4, 2013 at 10:0

Nobody want to pay 50 days for a thing that could be done in 2 days. And if you have to do it because you have a big team, then this big team is in fact decreasing productivity. – Tallulah 14/4, 2013 at 11:17

L

9

A good solution to this problem is to start adding more object-orientation. To use your example:

Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it

The solution is to stop using primitive types or types provided by the JVM (String, Integer, java.util.Date, java.w3c.Document) and wrap them in your own project-specific classes. Then your XmlDocument class can provide a convenient toString method and other utility methods. Your own ProjectFooDate can contain the parsing and formatting methods that would otherwise end up in various DateUtils classes, etc.

This way, the IDE will prompt you with your utility methods whenever you try to do something with an object.

Libb answered 14/4, 2013 at 6:54 Comment(1)

Yep, definitely move towards more of a domain driven model/ hexagonal architecture – Cleavland 14/4, 2013 at 10:5