Dealing with "Xerces hell" in Java/Maven?
Asked Answered
H

11

815

In my office, the mere mention of the word Xerces is enough to incite murderous rage from developers. A cursory glance at the other Xerces questions on SO seem to indicate that almost all Maven users are "touched" by this problem at some point. Unfortunately, understanding the problem requires a bit of knowledge about the history of Xerces...

History

  • Xerces is the most widely used XML parser in the Java ecosystem. Almost every library or framework written in Java uses Xerces in some capacity (transitively, if not directly).

  • The Xerces jars included in the official binaries are, to this day, not versioned. For example, the Xerces 2.11.0 implementation jar is named xercesImpl.jar and not xercesImpl-2.11.0.jar.

  • The Xerces team does not use Maven, which means they do not upload an official release to Maven Central.

  • Xerces used to be released as a single jar (xerces.jar), but was split into two jars, one containing the API (xml-apis.jar) and one containing the implementations of those APIs (xercesImpl.jar). Many older Maven POMs still declare a dependency on xerces.jar. At some point in the past, Xerces was also released as xmlParserAPIs.jar, which some older POMs also depend on.

  • The versions assigned to the xml-apis and xercesImpl jars by those who deploy their jars to Maven repositories are often different. For example, xml-apis might be given version 1.3.03 and xercesImpl might be given version 2.8.0, even though both are from Xerces 2.8.0. This is because people often tag the xml-apis jar with the version of the specifications that it implements. There is a very nice, but incomplete breakdown of this here.

  • To complicate matters, Xerces is the XML parser used in the reference implementation of the Java API for XML Processing (JAXP), included in the JRE. The implementation classes are repackaged under the com.sun.* namespace, which makes it dangerous to access them directly, as they may not be available in some JREs. However, not all of the Xerces functionality is exposed via the java.* and javax.* APIs; for example, there is no API that exposes Xerces serialization.

  • Adding to the confusing mess, almost all servlet containers (JBoss, Jetty, Glassfish, Tomcat, etc.), ship with Xerces in one or more of their /lib folders.

Problems

Conflict Resolution

For some -- or perhaps all -- of the reasons above, many organizations publish and consume custom builds of Xerces in their POMs. This is not really a problem if you have a small application and are only using Maven Central, but it quickly becomes an issue for enterprise software where Artifactory or Nexus is proxying multiple repositories (JBoss, Hibernate, etc.):

xml-apis proxied by Artifactory

For example, organization A might publish xml-apis as:

<groupId>org.apache.xerces</groupId>
<artifactId>xml-apis</artifactId>
<version>2.9.1</version>

Meanwhile, organization B might publish the same jar as:

<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.3.04</version>

Although B's jar is a lower version than A's jar, Maven does not know that they are the same artifact because they have different groupIds. Thus, it cannot perform conflict resolution and both jars will be included as resolved dependencies:

resolved dependencies with multiple xml-apis

Classloader Hell

As mentioned above, the JRE ships with Xerces in the JAXP RI. While it would be nice to mark all Xerces Maven dependencies as <exclusion>s or as <provided>, the third-party code you depend on may or may not work with the version provided in JAXP of the JDK you're using. In addition, you have the Xerces jars shipped in your servlet container to contend with. This leaves you with a number of choices: Do you delete the servlet version and hope that your container runs on the JAXP version? Is it better to leave the servlet version, and hope that your application frameworks run on the servlet version? If one or two of the unresolved conflicts outlined above manage to slip into your product (easy to happen in a large organization), you quickly find yourself in classloader hell, wondering which version of Xerces the classloader is picking at runtime and whether or not it will pick the same jar in Windows and Linux (probably not).

Solutions?

We've tried marking all Xerces Maven dependencies as <provided> or as an <exclusion>, but this is difficult to enforce (especially with a large team) given that the artifacts have so many aliases (xml-apis, xerces, xercesImpl, xmlParserAPIs, etc.). Additionally, our third party libs/frameworks may not run on the JAXP version or the version provided by a servlet container.

How can we best address this problem with Maven? Do we have to exercise such fine-grained control over our dependencies, and then rely on tiered classloading? Is there some way to globally exclude all Xerces dependencies, and force all of our frameworks/libs to use the JAXP version?


UPDATE: Joshua Spiewak has uploaded a patched version of the Xerces build scripts to XERCESJ-1454 that allows for upload to Maven Central. Vote/watch/contribute to this issue and let's fix this problem once and for all.

Helfand answered 26/7, 2012 at 20:32 Comment(10)
Thanks for this detailed question. I do not understand the motivation of the xerces team. I would imagine they are proud of there product and take pleasure in other using it but the current state of xerces and maven disgraceful. Even so, they can do what they want even if it makes no sense to me. I wonder if the sonatype guys have any suggestions.Vital
This maybe off topic, but this is probably the better post I have ever seen. More related to the question, what you describe is one of the most painful issue that we can encounter. Great initiative !Grout
@TravisSchneeberger Much of the complexity is because Sun chose to use Xerces in the JRE itself. You can hardly blame the Xerces folks for that.Isatin
Usually we try to find a version of Xerces that satisfies all dependent libraries by trial and error, if it's not possible then refactor to WARs to split the application into separate WARs (separate class loaders). This tool (I wrote it) helps understanding what is going on jhades.org by allowing to query the classpath for jars , and classes - it works also in the case when the server doesn't start yetProjectile
Just a quick comment if you're getting this error while starting servicemix from git bash in windows: start it from "normal" cmd instead.Robeson
There is a Maven plugin that checks for duplicate classes in the classpath set up with maven dependencies. I do not know its name but it should catch multiple copies of Xerces.Isatin
how will "upload to Maven Central" will gonna solve the transitivity problem where this JAR is being used in multiple other modules, those are used by another project?Mckinnie
@Mckinnie If the project itself does the uploads, nobody else will be tempted to upload their own renamed, rebranded, recompiled, incompetently patched, incompetently packaged etc. version to Maven Central. Which means that other projects won't pick these up, and if you use their project, you won't inherit these projects' decisions to pick up a defective copy of Xerces.Mameluke
It's nice to look back on questions like this and reflect on the problems we don't have to deal with anymoreUs
What's crazy is that one person's decision to basically refuse to play well with everyone else caused this chaos. This would all be fairly straightforward if Xerces just simply adopted mavenCentral as their distribution and published official libs just like every other well behaved lib. It's hard to reverse the car because so many libs are tangled up with their own "special" versions, but the overall experience was ruined due to this decision. Maybe maven could ban unofficial xerces builds and anoint 1 accepted distributor, and tell dependent libs they must convert or suffer being banned.Timmons
O
125

There are 2.11.0 JARs (and source JARs!) of Xerces in Maven Central since 20th February 2013! See Xerces in Maven Central. I wonder why they haven't resolved https://issues.apache.org/jira/browse/XERCESJ-1454...

I've used:

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

and all dependencies have resolved fine - even proper xml-apis-1.4.01!

And what's most important (and what wasn't obvious in the past) - the JAR in Maven Central is the same JAR as in the official Xerces-J-bin.2.11.0.zip distribution.

I couldn't however find xml-schema-1.1-beta version - it can't be a Maven classifier-ed version because of additional dependencies.

Odey answered 7/3, 2013 at 7:30 Comment(4)
Although it's very confusing that xml-apis:xml-apis:1.4.01 is newer than xml-apis:xml-apis:2.0.2 ?? see search.maven.org/…Ganglion
It is confusing, but it's due to the third party uploads of non-versioned Xerces jars, like justingarrik was saying in his post. xml-apis 2.9.1 is the same as 1.3.04, so in that sense, 1.4.01 is newer (and numerically larger) than 1.3.04.Reminisce
If you have both xercesImpl and xml-apis in your pom.xml be sure to delete the xml-apis dependency! Otherwise 2.0.2 rears its ugly head.Godevil
XERCESJ-1454 is resolved since 1th May 2013.Collimator
L
68

Frankly, pretty much everything that we've encountered works just fine w/ the JAXP version, so we always exclude xml-apis and xercesImpl.

Livraison answered 26/7, 2012 at 22:18 Comment(6)
Could you add a pom.xml snippet for that?Cyclostyle
When I try this, I get JavaMelody and Spring throwing java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at runtime.Enlarge
To add to David Moles response -- I have seen a half dozen transitive dependencies need ElementTraversal. Various things in Spring and Hadoop most commonly.Kenny
Yeah, looks like that is a newer xml api.Livraison
If you get java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal try adding xml-apis 1.4.01 to your pom (and exclude all other dependent versions)Pettifog
ElementTraversal is a new class added in Xerces 11 and available in xml-apis:xml-apis:1.4.01 dependency. So you may need to copy the class manually to your project or use whole dependency which causes duplicated classes in classloader. But in JDK9 this class was included so in feature you may need to remove the dep.Elum
V
46

You could use the maven enforcer plugin with the banned dependency rule. This would allow you to ban all the aliases that you don't want and allow only the one you do want. These rules will fail the maven build of your project when violated. Furthermore, if this rule applies to all projects in an enterprise you could put the plugin configuration in a corporate parent pom.

see:

Vital answered 27/7, 2012 at 16:28 Comment(0)
G
42

I know this doesn't answer the question exactly, but for ppl coming in from google that happen to use Gradle for their dependency management:

I managed to get rid of all xerces/Java8 issues with Gradle like this:

configurations {
    all*.exclude group: 'xml-apis'
    all*.exclude group: 'xerces'
}
Galang answered 24/4, 2015 at 6:54 Comment(4)
nice, with maven you need about 4000 lines of XML to do that.Winnow
that didn't solve the problem. any other hints for Android-Gradle people?Nicolenicolea
@Winnow XML is used purely for configuration. Groovy is a high level programming language. Sometimes you might want to use XML for its explicitness instead of groovy for its magic.Indoeuropean
Gradle Kotlin DSL: https://mcmap.net/q/55134/-how-to-exclude-library-from-all-dependencies-in-kotlin-dsl-build-gradle configurations { all { exclude(group="xml-apis") } }Tuxedo
K
17

I guess there is one question you need to answer:

Does there exist a xerces*.jar that everything in your application can live with?

If not you are basically screwed and would have to use something like OSGI, which allows you to have different versions of a library loaded at the same time. Be warned that it basically replaces jar version issues with classloader issues ...

If there exists such a version you could make your repository return that version for all kinds of dependencies. It's an ugly hack and would end up with the same xerces implementation in your classpath multiple times but better than having multiple different versions of xerces.

You could exclude every dependency to xerces and add one to the version you want to use.

I wonder if you can write some kind of version resolution strategy as a plugin for maven. This would probably the nicest solution but if at all feasible needs some research and coding.

For the version contained in your runtime environment, you'll have to make sure it either gets removed from the application classpath or the application jars get considered first for classloading before the lib folder of the server get considered.

So to wrap it up: It's a mess and that won't change.

Kindliness answered 26/7, 2012 at 20:49 Comment(2)
The same class from the same jar loaded by different ClassLoaders is still a ClassCastException (in all standard containers)Vlf
Exactly. That's why I wrote: Be warned that it basically replaces jar version issues with classloader issuesKindliness
E
11

You should debug first, to help identify your level of XML hell. In my opinion, the first step is to add

-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

to the command line. If that works, then start excluding libraries. If not, then add

-Djaxp.debug=1

to the command-line.

Etka answered 14/6, 2016 at 13:51 Comment(0)
D
8

There is another option that hasn't been explored here: declaring Xerces dependencies in Maven as optional:

<dependency>
   <groupId>xerces</groupId>
   <artifactId>xercesImpl</artifactId>
   <version>...</version>
   <optional>true</optional>
</dependency>

Basically what this does is to force all dependents to declare their version of Xerces or their project won't compile. If they want to override this dependency, they are welcome to do so, but then they will own the potential problem.

This creates a strong incentive for downstream projects to:

  • Make an active decision. Do they go with the same version of Xerces or use something else?
  • Actually test their parsing (e.g. through unit testing) and classloading as well as not to clutter up their classpath.

Not all developers keep track of newly introduced dependencies (e.g. with mvn dependency:tree). This approach will immediately bring the matter to their attention.

It works quite well at our organization. Before its introduction, we used to live in the same hell the OP is describing.

Dill answered 2/10, 2015 at 9:52 Comment(2)
Should I literally use dot-dot-dot within the version element, or do I need to use a real version like 2.6.2?Pietism
@Pietism The real version.Dill
W
6

Every maven project should stop depending on xerces, they probably don't really. XML APIs and an Impl has been part of Java since 1.4. There is no need to depend on xerces or XML APIs, its like saying you depend on Java or Swing. This is implicit.

If I was boss of a maven repo I'd write a script to recursively remove xerces dependencies and write a read me that says this repo requires Java 1.4.

Anything that actually breaks because it references Xerces directly via org.apache imports needs a code fix to bring it up to Java 1.4 level (and has done since 2002) or solution at JVM level via endorsed libs, not in maven.

Winnow answered 5/2, 2016 at 11:58 Comment(2)
When performing the refactor that you detailed, you also need to search for the package and class names in the text of your Java files and config. You will find that developers have put the FQN of the Impl classes in constant strings that get used by Class.forName and the similar constructs.Etka
This assumes all SAX implementations do the same thing, which is not true. the xercesImpl library allows for configuration options that the java.xml.parser libraries lack.Niedersachsen
C
2

What would help, except for excluding, is modular dependencies.

With one flat classloading (standalone app), or semi-hierarchical (JBoss AS/EAP 5.x) this was a problem.

But with modular frameworks like OSGi and JBoss Modules, this is not so much pain anymore. The libraries may use whichever library they want, independently.

Of course, it's still most recommendable to stick with just a single implementation and version, but if there's no other way (using extra features from more libs), then modularizing might save you.

A good example of JBoss Modules in action is, naturally, JBoss AS 7 / EAP 6 / WildFly 8, for which it was primarily developed.

Example module definition:

<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.1" name="org.jboss.msc">
    <main-class name="org.jboss.msc.Version"/>
    <properties>
        <property name="my.property" value="foo"/>
    </properties>
    <resources>
        <resource-root path="jboss-msc-1.0.1.GA.jar"/>
    </resources>
    <dependencies>
        <module name="javax.api"/>
        <module name="org.jboss.logging"/>
        <module name="org.jboss.modules"/>
        <!-- Optional deps -->
        <module name="javax.inject.api" optional="true"/>
        <module name="org.jboss.threads" optional="true"/>
    </dependencies>
</module>

In comparison with OSGi, JBoss Modules is simpler and faster. While missing certain features, it's sufficient for most projects which are (mostly) under control of one vendor, and allow stunning fast boot (due to paralelized dependencies resolving).

Note that there's a modularization effort underway for Java 8, but AFAIK that's primarily to modularize the JRE itself, not sure whether it will be applicable to apps.

Cashandcarry answered 22/6, 2013 at 2:59 Comment(2)
jboss modules is about static modularization. It has little to do with runtime modularization OSGi has to offer - I would say they compliment each other. It's a nice system though.Alcinia
*complement instead of complimentCelibate
C
2

Apparently xerces:xml-apis:1.4.01 is no longer in maven central, which is however what xerces:xercesImpl:2.11.0 references.

This works for me:

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
  <exclusions>
    <exclusion>
      <groupId>xerces</groupId>
      <artifactId>xml-apis</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>xml-apis</groupId>
  <artifactId>xml-apis</artifactId>
  <version>1.4.01</version>
</dependency>
Collision answered 4/10, 2016 at 16:45 Comment(2)
Looks like central to me: repo1.maven.org/maven2/xml-apis/xml-apis/1.4.01/… Last modified 2011-08-20?Barboza
sure, with the id xml-apis/xml-apis, but the transitive dependency is xerces/xml-apis, which is why my config exlcudes xerces/xml-apis explicitly, and instead uses the one which you correctly pointed out is in central.Collision
T
2

My friend that's very simple, here an example:

<dependency>
    <groupId>xalan</groupId>
    <artifactId>xalan</artifactId>
    <version>2.7.2</version>
    <scope>${my-scope}</scope>
    <exclusions>
        <exclusion>
        <groupId>xml-apis</groupId>
        <artifactId>xml-apis</artifactId>
    </exclusion>
</dependency>

And if you want to check in the terminal(windows console for this example) that your maven tree has no problems:

mvn dependency:tree -Dverbose | grep --color=always '(.* conflict\|^' | less -r
Tarweed answered 18/5, 2017 at 14:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.