Correct use of Apache Tika MediaType
Asked Answered
L

1

0

I want to use APache Tika's MediaType class to compare mediaTypes.

I first use Tika to detect the MediaType. Then I want to start an action according to the MediaType.

So if the MediaType is from type XML I want to do some action, if it is a compressed file I want to start an other action.

My problem is that there are many XML types, so how do I check if it is an XML using the MediaType ?

Here is my previous (before Tika) implementation:

if (contentType.contains("text/xml") || 
    contentType.contains("application/xml") || 
    contentType.contains("application/x-xml") || 
    contentType.contains("application/atom+xml") || 
    contentType.contains("application/rss+xml")) {
        processXML();
}

else if (contentType.contains("application/gzip") || 
    contentType.contains("application/x-gzip") || 
    contentType.contains("application/x-gunzip") || 
    contentType.contains("application/gzipped") || 
    contentType.contains("application/gzip-compressed") || 
    contentType.contains("application/x-compress") || 
    contentType.contains("gzip/document") || 
    contentType.contains("application/octet-stream")) {
        processGzip();
}

I want to switch it to use Tika something like the following:

MediaType mediaType = MediaType.parse(contentType);
if (mediaType == APPLICATION_XML) {
    return processXml();
} else if (mediaType == APPLICATION_ZIP || mediaType == OCTET_STREAM) {
    return processGzip();
}

But the problem is that Tika.detect(...) returns many different types which don't have a MediaType constant.

How can I just identify the MediaType if it is type XML ? Or if it is type Compress ? I need a "Father" type which includes all of it's childs, maybe a method which is: "boolean isXML()" which includes application/xml and text/xml and application/x-xml or "boolean isCompress()" which includes all of the zip + gzip types etc

Lahdidah answered 20/4, 2014 at 6:51 Comment(2)
Can you clarify what your problem is? Matching the media type? Creating a media type object? Working out what types could come back? Handling type parent/child relationships? Something else?Seismography
I have edited the question and added the following (last section) to the question: How can I just identify the MediaType if it is type XML ? Or if it is type compress ? I need a "Father" type which includes all of it's childs, maybe a method which is: "boolean isXML()" which includes application/xml and text/xml and application/x-xml or "boolean isCompress()" which includes all of the zip + gzip types etcLahdidah
S
5

What you'll need to do is walk the types hierarchy, until you either find what you want, or run out of things to check. That can be done with recursion, or could be done with a loop

The key method you need is MediaTypeRegistry.getSupertype(MediaType)

Your code would want to be something like:

// Define your media type constants here
MediaType FOO = MediaType.parse("application/foo");

// Work out the file's type
MediaType type = detector.detect(stream, metadata);

// Is it one we want in the tree?
while (type != null && !type.equals(MediaType.OCTET_STREAM)) {
   if (type.equals(MediaType.Application_XML)) {
       doThingForXML();
   } else if (type.equals(MediaType.APPLICATION_ZIP)) { 
       doThingForZip();
   } else if (type.equals(FOO)) {
       doThingForFoo();
   } else {
       // Check parent
       type = registry.getSuperType(type);
   }
}
Seismography answered 23/4, 2014 at 9:23 Comment(3)
Your answer is helpful but does not include instantiation of org.apache.tika.mime.MediaTypeRegistryMonocle
You shouldn't be initialising one, you get it from a TikaConfig objectSeismography
@EricUrban As the linked JavaDocs hopefully show, just TikaConfig.getDefaultConfig().getMediaTypeRegistrySeismography

© 2022 - 2024 — McMap. All rights reserved.