How to correct a mess java .class file set or generate a proper .jar archive from a mess .class set?
Asked Answered
G

1

1

Background

I have to contact different kind of Java project with various of build system. Sometimes the directory structure is different from the package hierarchy. So it is difficult to package.

Even if the build system like Maven and Gradle have its own function to pack .jar but it need a qualified internet connection and a giant size of local repository. Hence I usually build the library I need on the desktop in my office. However I spend more time on my laptop which doesn't have such good connection and storage.

Question

Is there a stable, compatible and secure way to make .jar package? Or move those .class file to appropriate directory? (An official and mature open source tool is better)

I am a beginner of Java Language, so any point may be helpful for me.

In order to make it clearly, here I put some examples.

Case 1: If there is a tool which can moving (moving or copying) the .class file in correct directory it should behave as below:

Before operation:

from_root/
├── a.class
├── b.class
├── c.class
├── d.class
├── e.class
└── f.class

Note in each source file(.java) of .class file, there is package announcement(like package org.hello.world; or package org.hello;)

when typing the_tool ./from_root ./to_root in shell

it will become:

to_root/
└── org
    └── hello
        ├── a.class
        ├── b.class
        └── world
            ├── c.class
            ├── d.class
            ├── e.class
            └── f.class

Case 2: If there is a tool which can packing correct .jar file from mess .class file directory it should behave as below:

Before operation:

from_root/
├── a.class
├── b.class
├── c.class
├── d.class
├── e.class
└── f.class

when typing pack_tool ./from_root ./to_root/pack.jar in shell

A pack.jar will be generate in ./to_root.

When add it in Java Build Path of Eclipse, it should be imported and called properly, instead of wrong name-space hierarchy .

What I have tried

For some famous 3rd part library (e.g. apache tika)

It is build by maven, and I typed cd /path/to/tika-1.18-src/tika-1.18/ then I typed mvn package.

Unfortunately, it was failed.

However when I typed mvn compile, everything went right.

Then in order to build .jar package, I typed jar cvf ./build/tika-1.18.jar $(find ./ -name org | grep target) in terminal.

However, the structure inside the .jar package was totally wrong. I couldn't use it in my project.

Then I tried find ./ -name org | grep target | parallel cp {} ./build/ -R -f then cd ./build and finally jar cvf ./tika-1.18.jar org.

It worked. However there is some disadvantage of this method.

Shortcoming:

  1. If the some source file(.java file)'s name include 'target', it will cause big trouble.

  2. If different subproject contain same file it will cause conflict. For example here is the conflict information: cp: cannot create directory './build/org/apache/tika/batch': File exists cp: cannot create directory './build/org/apache/tika/language/translate': File exists

  3. This method can only handle the situation which the path of those .class file is partly correct. If all the .class file are located in same directory flatly, it will failed to work.

Another attemption

I tried to put the .class file in correct directory according to its binary content.

For example, org.apache.tika.detect.AutoDetectReader

$ hexdump -C /path/to/here/EncodingDetector.class
00000000  ca fe ba be 00 00 00 33  00 0e 07 00 0a 07 00 0b  |.......3........|
00000010  07 00 0c 01 00 06 64 65  74 65 63 74 01 00 54 28  |......detect..T(|
00000020  4c 6a 61 76 61 2f 69 6f  2f 49 6e 70 75 74 53 74  |Ljava/io/InputSt|
00000030  72 65 61 6d 3b 4c 6f 72  67 2f 61 70 61 63 68 65  |ream;Lorg/apache|
00000040  2f 74 69 6b 61 2f 6d 65  74 61 64 61 74 61 2f 4d  |/tika/metadata/M|
00000050  65 74 61 64 61 74 61 3b  29 4c 6a 61 76 61 2f 6e  |etadata;)Ljava/n|
00000060  69 6f 2f 63 68 61 72 73  65 74 2f 43 68 61 72 73  |io/charset/Chars|
00000070  65 74 3b 01 00 0a 45 78  63 65 70 74 69 6f 6e 73  |et;...Exceptions|
00000080  07 00 0d 01 00 0a 53 6f  75 72 63 65 46 69 6c 65  |......SourceFile|
00000090  01 00 15 45 6e 63 6f 64  69 6e 67 44 65 74 65 63  |...EncodingDetec|
000000a0  74 6f 72 2e 6a 61 76 61  01 00 27 6f 72 67 2f 61  |tor.java..'org/a|
000000b0  70 61 63 68 65 2f 74 69  6b 61 2f 64 65 74 65 63  |pache/tika/detec|
000000c0  74 2f 45 6e 63 6f 64 69  6e 67 44 65 74 65 63 74  |t/EncodingDetect|
000000d0  6f 72 01 00 10 6a 61 76  61 2f 6c 61 6e 67 2f 4f  |or...java/lang/O|
000000e0  62 6a 65 63 74 01 00 14  6a 61 76 61 2f 69 6f 2f  |bject...java/io/|
000000f0  53 65 72 69 61 6c 69 7a  61 62 6c 65 01 00 13 6a  |Serializable...j|
00000100  61 76 61 2f 69 6f 2f 49  4f 45 78 63 65 70 74 69  |ava/io/IOExcepti|
00000110  6f 6e 06 01 00 01 00 02  00 01 00 03 00 00 00 01  |on..............|
00000120  04 01 00 04 00 05 00 01  00 06 00 00 00 04 00 01  |................|
00000130  00 07 00 01 00 08 00 00  00 02 00 09              |............|
0000013c

In order to compare I used another .class file org.apache.tika.embedder.Embedder

$ hexdump -C ./Embedder.class 
00000000  ca fe ba be 00 00 00 33  00 14 07 00 0f 07 00 10  |.......3........|
00000010  07 00 11 01 00 16 67 65  74 53 75 70 70 6f 72 74  |......getSupport|
00000020  65 64 45 6d 62 65 64 54  79 70 65 73 01 00 36 28  |edEmbedTypes..6(|
00000030  4c 6f 72 67 2f 61 70 61  63 68 65 2f 74 69 6b 61  |Lorg/apache/tika|
00000040  2f 70 61 72 73 65 72 2f  50 61 72 73 65 43 6f 6e  |/parser/ParseCon|
00000050  74 65 78 74 3b 29 4c 6a  61 76 61 2f 75 74 69 6c  |text;)Ljava/util|
00000060  2f 53 65 74 3b 01 00 09  53 69 67 6e 61 74 75 72  |/Set;...Signatur|
00000070  65 01 00 58 28 4c 6f 72  67 2f 61 70 61 63 68 65  |e..X(Lorg/apache|
00000080  2f 74 69 6b 61 2f 70 61  72 73 65 72 2f 50 61 72  |/tika/parser/Par|
00000090  73 65 43 6f 6e 74 65 78  74 3b 29 4c 6a 61 76 61  |seContext;)Ljava|
000000a0  2f 75 74 69 6c 2f 53 65  74 3c 4c 6f 72 67 2f 61  |/util/Set<Lorg/a|
000000b0  70 61 63 68 65 2f 74 69  6b 61 2f 6d 69 6d 65 2f  |pache/tika/mime/|
000000c0  4d 65 64 69 61 54 79 70  65 3b 3e 3b 01 00 05 65  |MediaType;>;...e|
000000d0  6d 62 65 64 01 00 76 28  4c 6f 72 67 2f 61 70 61  |mbed..v(Lorg/apa|
000000e0  63 68 65 2f 74 69 6b 61  2f 6d 65 74 61 64 61 74  |che/tika/metadat|
000000f0  61 2f 4d 65 74 61 64 61  74 61 3b 4c 6a 61 76 61  |a/Metadata;Ljava|
00000100  2f 69 6f 2f 49 6e 70 75  74 53 74 72 65 61 6d 3b  |/io/InputStream;|
00000110  4c 6a 61 76 61 2f 69 6f  2f 4f 75 74 70 75 74 53  |Ljava/io/OutputS|
00000120  74 72 65 61 6d 3b 4c 6f  72 67 2f 61 70 61 63 68  |tream;Lorg/apach|
00000130  65 2f 74 69 6b 61 2f 70  61 72 73 65 72 2f 50 61  |e/tika/parser/Pa|
00000140  72 73 65 43 6f 6e 74 65  78 74 3b 29 56 01 00 0a  |rseContext;)V...|
00000150  45 78 63 65 70 74 69 6f  6e 73 07 00 12 07 00 13  |Exceptions......|
00000160  01 00 0a 53 6f 75 72 63  65 46 69 6c 65 01 00 0d  |...SourceFile...|
00000170  45 6d 62 65 64 64 65 72  2e 6a 61 76 61 01 00 21  |Embedder.java..!|
00000180  6f 72 67 2f 61 70 61 63  68 65 2f 74 69 6b 61 2f  |org/apache/tika/|
00000190  65 6d 62 65 64 64 65 72  2f 45 6d 62 65 64 64 65  |embedder/Embedde|
000001a0  72 01 00 10 6a 61 76 61  2f 6c 61 6e 67 2f 4f 62  |r...java/lang/Ob|
000001b0  6a 65 63 74 01 00 14 6a  61 76 61 2f 69 6f 2f 53  |ject...java/io/S|
000001c0  65 72 69 61 6c 69 7a 61  62 6c 65 01 00 13 6a 61  |erializable...ja|
000001d0  76 61 2f 69 6f 2f 49 4f  45 78 63 65 70 74 69 6f  |va/io/IOExceptio|
000001e0  6e 01 00 27 6f 72 67 2f  61 70 61 63 68 65 2f 74  |n..'org/apache/t|
000001f0  69 6b 61 2f 65 78 63 65  70 74 69 6f 6e 2f 54 69  |ika/exception/Ti|
00000200  6b 61 45 78 63 65 70 74  69 6f 6e 06 01 00 01 00  |kaException.....|
00000210  02 00 01 00 03 00 00 00  02 04 01 00 04 00 05 00  |................|
00000220  01 00 06 00 00 00 02 00  07 04 01 00 08 00 09 00  |................|
00000230  01 00 0a 00 00 00 06 00  02 00 0b 00 0c 00 01 00  |................|
00000240  0d 00 00 00 02 00 0e                              |.......|
00000247

What's amazing is the content after "...SourceFile..." is the package location of this class.

It is possible to write a program which can scan every .class file and determine their location in the directory. However, there is Java 9, Java 10 and Java 11 will coming soon. different java version will cause the different binary content of .class files. And it might be different between JDK and OpenJDK. So it is not compatible enough. But on the other hand, it shows that it is possible to determine the package location of a certain .class file without other infomation.

Hope someone can provide some ideas, thanks sincerely!

Glycogen answered 28/7, 2018 at 9:47 Comment(19)
To be honest I don't understand what your exact problem is and what you like to achieve? Or what you understand by a "chaos class file set". Furthermore that you will get issues while trying to compile tika with Java 10 is what I expect cause you seemed to be mixing build and runtime ? Furthermore what do you understand by So it is not compatible enough. .. ?Avilla
@khmarbaise, sorry for unclear, I will manage to correct the post. I have no trouble in compiling Tika in Java 10 but pack it. So I gave up using mvn package and I tried to pack it manually.Glycogen
Fix your problem with the maven build - that will cause you the least pain in the long run. Try with Java 8 instead of java 10.Windburn
@ThorbjørnRavnAndersen, Thanks for advice. However, I just mention apache tika as example. I have solved this single problem by the shell command I present in the post. But I need a more universal way to solve more general problem. By the way, Java 8 doesn't have long time before its EOL.Glycogen
@khmarbaise, So it is not compatible enough means the binary format of .class files from different version of javac might be different.Glycogen
Do not work against Maven design decisions - it will just bring you more pain in the long run. Live with the downloads, e.g. by having a local mirror server, and fix any mvn clean install issues.Windburn
What are you doing to get the class files into a flat directory? That should not happen in the first place.Languid
@khmarbaise, what is the meaning of "mixing build and runtime"?Glycogen
@Henry, I just present a extreme example, different building system have their own way to build and a project may be contain several subproject (like tika). I want to merge them and create one.jar. Or sometimes I am lazy just use text editor instead of IDE to create a project.Glycogen
I'm not sure what your real problem is. Based on your example you have a flat directory structure in Java which means no packages and afterwards you would like to have one? If you use plain command line and javac tool you should structure your packages into directories which is basically a foundation of Java packaging. Furthermore if you build with Maven a jar file the structure in src/main/java is compile and packaged into a jar with the same structure so I don't understand your comments/questions on that. If you use an editor I don't thing it makes sense (may be I misunderstand your point)Avilla
So you should also keep a structured directory layout which means to find your components of the projects apart from making it easier to use tools to build your jar file...and related to the build issue with Tika I would recommend to read the docs about building Tika..The question is why you wan't to build yourself and don't consume it via existing repositories..I recommend to install a repository manager...and class files differ between JDK versions but the run on JRE without issues..So my question is the same as before: What kind of problem are you trying to solve? I don't get it....Avilla
@khmarbaise, I'm sorry that I don't describe it correctly. I will add some description.Glycogen
@khmarbaise, I just compile tika to .class files instead of .jar files. For "Why I build Tika by myself", it is because I am not a Android Developer or Web Application Developer, I just want to use tika library to handle a small part of job. it is easy to deploy a headless-jre in every machine, but as for a full-scale repository, it's almost impossible. After that, I need to add tika and the library I developed which use tika, and use shell script as an interface to the other part of my project.Glycogen
Hm..At build time you need the repository manager and your local cache to improve build time..but for runtime you can combine the libs you use in a ueber jar (or a shaded jar file maven terminology) which contains all the libs you need. and that will run with headless-jre ? ..you can create a single jar command line app ?Avilla
@khmarbaise, oops, I missed some words. it is "add tika and the library I developed which use tika to a specific directory in final product"Glycogen
@khmarbaise, this is very powerful technology (shaded jar), thank you so much for tell me this technology! By the way, is it possible to use this technology in other java project like gradle and eclipse project?Glycogen
Maybe I misunderstand you idea but you say you would like to use tiak to a specific directory ? That sounds to me as you might not understand the Java package system and how runtime things are going on...and only the classpath is the thing apart from that this is only valid until Java 8 ...starting with 9 you could define modules ? Furthermore an ueber-jar etc. should be doable in Gradle (I don't know)...In Eclipse? I use Eclipse as an IDE but never for building a project...Avilla
As far as my impression, In Java 8 it can be added some user defined classpath in environment variable. However, since Java 9, this mechanism has been revoked. How to assign classpath in Java 9 and 10?Glycogen
@khmarbaise, Maybe I have to build a independent JVM in my project. There is a long way for me to learn Java. Thank you so much for providing those helpful point!Glycogen
E
3

Generally, it is better to fix the build system issues, to generate the correct directory structure in the first place, rather than trying to fix it after the fact. One problem I see, is that classes from different packages may have the same simple name, so if their class files are written to the same flat directory, one of them will overwrite the other and this data loss can not be fixed afterwards.

Generally, the constant pool at the beginning of the class file contains the qualified class name, so it is possible to extract it, but you need to understand the class file structure to pick the right string. The following method will parse a class file and extract the name (in its internal form):

static String getClassName(ByteBuffer buf) {
    if(buf.order(ByteOrder.BIG_ENDIAN).getInt()!=0xCAFEBABE) {
        throw new IllegalArgumentException("not a valid class file");
    }
    int minor=buf.getChar(), ver=buf.getChar(), poolSize=buf.getChar();
    int[] pool = new int[poolSize];
    //System.out.println("version "+ver+'.'+minor);
    for(int ix=1; ix<poolSize; ix++) {
        String s; int index1=-1, index2=-1;
        byte tag = buf.get();
        switch(tag) {
            default: throw new UnsupportedOperationException(
                    "unknown pool item type "+buf.get(buf.position()-1));
            case CONSTANT_Utf8:
                buf.position((pool[ix]=buf.position())+buf.getChar()+2); continue;
            case CONSTANT_Module: case CONSTANT_Package: case CONSTANT_Class:
            case CONSTANT_String: case CONSTANT_MethodType:
                pool[ix]=buf.getChar(); break;
            case CONSTANT_FieldRef: case CONSTANT_MethodRef:
            case CONSTANT_InterfaceMethodRef: case CONSTANT_NameAndType:
            case CONSTANT_InvokeDynamic: case CONSTANT_Dynamic:
            case CONSTANT_Integer: case CONSTANT_Float:
                buf.position(buf.position()+4); break;
            case CONSTANT_Double: case CONSTANT_Long:
                buf.position(buf.position()+8); ix++; break;
            case CONSTANT_MethodHandle: buf.position(buf.position()+3); break;
        }
    }
    int access = buf.getChar(), thisClass = buf.getChar();
    buf.position(pool[pool[thisClass]]);
    return decodeString(buf);
}
private static String decodeString(ByteBuffer buf) {
    int size=buf.getChar(), oldLimit=buf.limit();
    buf.limit(buf.position()+size);
    StringBuilder sb=new StringBuilder(size+(size>>1));
    while(buf.hasRemaining()) {
        byte b=buf.get();
        if(b>0) sb.append((char)b);
        else {
            int b2 = buf.get();
            if((b&0xf0)!=0xe0)
                sb.append((char)((b&0x1F)<<6 | b2&0x3F));
            else {
                int b3 = buf.get();
                sb.append((char)((b&0x0F)<<12 | (b2&0x3F)<<6 | b3&0x3F));
            }
        }
    }
    buf.limit(oldLimit);
    return sb.toString();
}
private static final byte CONSTANT_Utf8 = 1, CONSTANT_Integer = 3,
    CONSTANT_Float = 4, CONSTANT_Long = 5, CONSTANT_Double = 6,
    CONSTANT_Class = 7, CONSTANT_String = 8, CONSTANT_FieldRef = 9,
    CONSTANT_MethodRef = 10, CONSTANT_InterfaceMethodRef = 11,
    CONSTANT_NameAndType = 12, CONSTANT_MethodHandle = 15,
    CONSTANT_MethodType = 16, CONSTANT_Dynamic = 17, CONSTANT_InvokeDynamic = 18,
    CONSTANT_Module = 19, CONSTANT_Package = 20;

This can be used to fix a wrong file location like this:

static void checkAndMoveClassFile(Path path) throws IOException {
    ByteBuffer bb;
    try(FileChannel ch=FileChannel.open(path, StandardOpenOption.READ)) {
        bb=ByteBuffer.allocate((int)ch.size());
        while(bb.hasRemaining()) ch.read(bb);
        bb.flip();
    }
    String name = getClassName(bb);
    Path newPath = path.resolveSibling(name+".class");
    if(!path.equals(newPath)) {
        System.out.println("moving "+path+" to "+newPath);
        Files.createDirectories(newPath.getParent());
        Files.move(path, newPath);
    }
}

which you can run over a directory easily

Files.list(dirPath)
     .filter(p -> p.getFileName().toString().endsWith(".class"))
     .forEach(p -> {
         try { checkAndMoveClassFile(p); }
         catch (IOException ex) { throw new UncheckedIOException(ex); }
     });
Erastian answered 14/9, 2018 at 12:31 Comment(2)
your answers sometimes scare me, seriously. This is one of those examplesYenyenisei
@Yenyenisei this is one of the answers where I use dense formatting to minimize the necessary scrolling. It becomes less scary when you hit <Format> in your IDE after copying. Parsing the constant pool is not that hard if you know the format. Further, this answer may help understanding. I just stripped the unnecessary parts for this task and lifted the code to handle newer features up to Java 11.Erastian

© 2022 - 2024 — McMap. All rights reserved.