How to get a file's Media Type (MIME type)?
Asked Answered
M

28

384

How do you get a Media Type (MIME type) from a file using Java? So far I've tried JMimeMagic & Mime-Util. The first gave me memory exceptions, the second doesn't close its streams properly.

How would you probe the file to determine its actual type (not merely based on the extension)?

Minuscule answered 9/9, 2008 at 9:11 Comment(4)
A good overview on available libraries is given at rgagnon.com/javadetails/java-0487.htmlQuartz
I used the class that was posted as an answer here: https://mcmap.net/q/56672/-java-library-to-find-the-mime-type-from-file-content-duplicateGeophagy
Tika should be the answer now. The other answers below make light of many dependencies with Tika, but I see none with tika-core.Mutism
@Mutism when we use TIka, it coverts the file and it's no longer usable. String contentType = tika.detect(is) .Still
M
354

In Java 7 you can now just use Files.probeContentType(path).

Moue answered 23/1, 2012 at 14:49 Comment(17)
This was very helpful since the mime-util website appears to be down and I can't tell if the library is being maintained at all!Caton
This works well, however I have not found a way to add more file types that I understand. For example an ISO image returns null, as does a .zip archive and even an ini configuration file.Percept
@james.garriss and it's earned me more points than any other answer I've ever given! Crazy, huh? :)Moue
Be aware that Files.probeContentType(Path) is buggy on several OSes and a lot of bug reports have been filed. I have had a problem with software working on ubuntu but failing on windows. It seemed that on windows Files.probeContentType(Path) always returned null. It was not my system so I didn't check the JRE or windows version. It was windows 7 or 8 probably with oracle JRE for java 7.Fusty
I'm running on OS X 10.9 and I get null out for .xml, .png, and .xhtml files. I don't know if I'm just doing something horribly wrong, but that seems rather terrible.Heintz
I have not been able to get this to work successfully if the file does not have an extension.Junction
It appears that at least on *nix-like systems, the default file type detector just returns null and one has to manually add one or more detector implementations, which doesn't appear to be all too straight-forward. So at least for the use case I had, which is a simple method to map a file extension to a mime type, this solution does not work.Slap
A major limitation with this is that the file must exist on the file system. This does not work with a stream or a byte array, etc.Undercast
Even weirder, I've got two windows 8.1 laptops, one of which gets application/x-zip-compressed and the other gets null as a result of calling this on a zip file. Completely unreliable :\. So, given that I want my application to switch on the encoding scheme of a file (lets say my application takes both XML and JSON configuration), and the file is simply called 'configuration' (without an extension), whats the most reliable way to determine the type of that file, sort of cheating and reading a few bytes?Leclair
this method can not return mime type when i remove extension from the name.For exmaple if name is test.mp4 i change it into "test" and method returns null.Also i change movie extension to png etc it returns png mime typeHbomb
This is useless if the file has a missing or wrong extension.Moseley
The Linux-based implementation appears to use Linux /usr/bin/file, which is good, unless there's an extension, which it just believes without looking deeper, which is bad. If you rename an XML file to .json, this will tell you it's JSON. Garbage in, garbage out. You just don't really want to trust this approach unless you're sure of your file data.Rosebay
@RussBateman unless there's an extension, which it just believes without looking deeper, which is bad. Do nginx/apache etc. do more than just look at the extension?Moonset
This method is poor, it goes off extension, doesn't bother with magic number, and returns null on different systems... even with same OS. Was fooled into using it and cannot recommend. You may as well just compare the file extension.Octave
And to get the Path from a String, use Paths.get(str)Strangeness
On the Windows it uses only extensions to determine file type. On the Linux, until Java 8, it had used bunch of detectors: content-based detectors based on Gnome I/O, Gnome VFS and libmagic libraries as well as extension-based, implemented over /etc/mime.types. But starting from Java9 all content-based detectors were removed from JDK (1 2) and only extension-based detectors were left for Linux. So if your file does not have an extension, this method will always return null :(Pb
The default implementation in JDK is probe by file extension name, add github.com/overview/mime-types as dependency, it will use it by SPI, then probes by the magic number.Herrle
S
241

Unfortunately,

mimeType = file.toURL().openConnection().getContentType();

does not work, since this use of URL leaves a file locked, so that, for example, it is undeletable.

However, you have this:

mimeType= URLConnection.guessContentTypeFromName(file.getName());

and also the following, which has the advantage of going beyond mere use of file extension, and takes a peek at content

InputStream is = new BufferedInputStream(new FileInputStream(file));
mimeType = URLConnection.guessContentTypeFromStream(is);
 //...close stream

However, as suggested by the comment above, the built-in table of mime-types is quite limited, not including, for example, MSWord and PDF. So, if you want to generalize, you'll need to go beyond the built-in libraries, using, e.g., Mime-Util (which is a great library, using both file extension and content).

Snort answered 11/5, 2009 at 12:21 Comment(9)
Perfect solution - helped me a lot! Wrapping FileInputStream into BufferedInputStream is crucial part - otherwise guessContentTypeFromStream returns null (passed InputStream instance should support marks)Aimee
Howerver, URLConnection has a very limited set of content types that it does recognizes. For example it is not able to detect application/pdf.Sd
@Sd it detects pdf for me. But it doesn't detect office files, e.g. *.docErmina
It only leaves it locked because you've left yourself no way to close it. Disconnecting the URLConnection would unlock it.Savior
both guessContentTypeFromStream nor guessContentTypeFromName do NOT recognize e.g. mp4Galasyn
guessContentTypeFromName() uses default $JAVA_HOME/lib/content-types.properties file. you can add your own extended file by changing system property System.setProperty("content.types.user.table","/lib/path/to/your/property/file");Normand
It is not detecting .js, .css files. Is there any other method to detect these files too?Longstanding
Any links to Mime-Util ??? I have found project in github but is does not contain any description :(Robbinrobbins
guessContentTypeFromName uses this synchronized FileNameMap getFileNameMap good luck in multithreadingBread
C
69

With Apache Tika you need only three lines of code:

File file = new File("/path/to/file");
Tika tika = new Tika();
System.out.println(tika.detect(file));

If you have a groovy console, just paste and run this code to play with it:

@Grab('org.apache.tika:tika-core:1.14')
import org.apache.tika.Tika;

def tika = new Tika()
def file = new File("/path/to/file")
println tika.detect(file)

Keep in mind that its APIs are rich, it can parse "anything". As of tika-core 1.14, you have:

String  detect(byte[] prefix)
String  detect(byte[] prefix, String name)
String  detect(File file)
String  detect(InputStream stream)
String  detect(InputStream stream, Metadata metadata)
String  detect(InputStream stream, String name)
String  detect(Path path)
String  detect(String name)
String  detect(URL url)

See the apidocs for more information.

Conservator answered 14/2, 2017 at 15:24 Comment(4)
One bad thing about Tika, lots of dependency bloat. It increased the size of my jar by 54MB!!!Stratton
@helmyTika 1.17 is standalone and only 648 KB big.Bartizan
... or just new Tika().detect(file.toPath()) for the file's extension based detection rather than detection based on the file's contentBrambling
@Lu55 docs say that still uses the document content. I think you mean new Tika().detect(file.getPath()), which only uses the file extensionSelfrealization
T
54

The JAF API is part of JDK 6. Look at javax.activation package.

Most interesting classes are javax.activation.MimeType - an actual MIME type holder - and javax.activation.MimetypesFileTypeMap - class whose instance can resolve MIME type as String for a file:

String fileName = "/path/to/file";
MimetypesFileTypeMap mimeTypesMap = new MimetypesFileTypeMap();

// only by file name
String mimeType = mimeTypesMap.getContentType(fileName);

// or by actual File instance
File file = new File(fileName);
mimeType = mimeTypesMap.getContentType(file);
Thelma answered 14/12, 2009 at 17:2 Comment(5)
Unfortunately, as the javadoc for getContentType(File) states: Returns the MIME type of the file object.The implementation in this class calls getContentType(f.getName()).Christianly
And remember you can extend this functionality with META-INF/mime.types file so it is perfect if you are forced to use Java 6. docs.oracle.com/javaee/5/api/javax/activation/…Peggiepeggir
you can skip creating a new object by MimetypesFileTypeMap.getDefaultFileTypeMap().getContentType(file)Gorey
But it still return content type only based on the filename. And this is especially dangerous for files uploaded by users.Bayles
This does not work, for example, for pdf files (application/octet-stream is returned).Matisse
Q
40

Apache Tika offers in tika-core a mime type detection based based on magic markers in the stream prefix. tika-core does not fetch other dependencies, which makes it as lightweight as the currently unmaintained Mime Type Detection Utility.

Simple code example (Java 7), using the variables theInputStream and theFileName

try (InputStream is = theInputStream;
        BufferedInputStream bis = new BufferedInputStream(is);) {
    AutoDetectParser parser = new AutoDetectParser();
    Detector detector = parser.getDetector();
    Metadata md = new Metadata();
    md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
    MediaType mediaType = detector.detect(bis, md);
    return mediaType.toString();
}

Please note that MediaType.detect(...) cannot be used directly (TIKA-1120). More hints are provided at https://tika.apache.org/1.24/detection.html.

Quartz answered 18/5, 2013 at 16:14 Comment(2)
+1 Also Metadata.RESOURCE_NAME_KEY can be omitted (if you don't have any or cannot rely on original name), but in that case you will get wrong result in some cases (office documents for example).Pyrometallurgy
It has some problems detecting XLSX if there's no extension on filename... but this solution is simple and elegant.Reductive
F
26

If you're an Android developer, you can use a utility class android.webkit.MimeTypeMap which maps MIME-types to file extensions and vice versa.

Following code snippet may help you.

private static String getMimeType(String fileUrl) {
    String extension = MimeTypeMap.getFileExtensionFromUrl(fileUrl);
    return MimeTypeMap.getSingleton().getMimeTypeFromExtension(extension);
}
Fritz answered 15/12, 2012 at 6:23 Comment(1)
This is also works if tried with local file paths such as "/sdcard/path/to/video.extension". The problem is if the local file contains space in its path, it always returns nullCaw
T
20

From roseindia:

FileNameMap fileNameMap = URLConnection.getFileNameMap();
String mimeType = fileNameMap.getContentTypeFor("alert.gif");
Tractable answered 1/8, 2011 at 13:34 Comment(5)
Whoever down-voted the answer, please add a comment so I (and others) may learn to post better answers.Tractable
I didn't vote you down but , getFileNameMap doesn't work for many basic file types , for example 'bmp' . Also URLConnection.guessContentTypeFromName returns the same thingPoint
Very incomplete function. As of Java 7, html, pdf and jpeg extensions return the correct mime-type but js and css return null!Frump
I tested with 'webm' and it returned null.Smriti
To clarify, Files.probeContentType(Path.of("my-file.css")) is a much better way to handle (I tested in java11) as it supports more file types.Glycoside
K
19

I was just wondering how most people fetch a mime type from a file in Java?

I've published my SimpleMagic Java package which allows content-type (mime-type) determination from files and byte arrays. It is designed to read and run the Unix file(1) command magic files that are a part of most ~Unix OS configurations.

I tried Apache Tika but it is huge with tons of dependencies, URLConnection doesn't use the bytes of the files, and MimetypesFileTypeMap also just looks at files names.

With SimpleMagic you can do something like:

// create a magic utility using the internal magic file
ContentInfoUtil util = new ContentInfoUtil();
// if you want to use a different config file(s), you can load them by hand:
// ContentInfoUtil util = new ContentInfoUtil("/etc/magic");
...
ContentInfo info = util.findMatch("/tmp/upload.tmp");
// or
ContentInfo info = util.findMatch(inputStream);
// or
ContentInfo info = util.findMatch(contentByteArray);

// null if no match
if (info != null) {
   String mimeType = info.getMimeType();
}
Kaleidoscope answered 25/6, 2013 at 16:7 Comment(6)
Tested it on multiple image files. All had extension renamed. Your awesome library handled it properly. Ofcourse its light too :).Unbalanced
Yes, this works well. And for those needing to use this solution within Android, you can simply include the following in the build.gradle file: compile('com.j256.simplemagic:simplemagic:1.10')Aguiar
This library works with all files. better than all other libraries since it works for documents such as PDF, XLS, XLSX, DOC and DOCX. it doesn't work for XLS properly but you can check it from other methods of ContentInfo like getMessage()Shier
Can you file an issue with it @keivanshirkoubian with a sample xls that isn't done correctly? github.com/j256/simplemagic/issuesKaleidoscope
ok I will do it @Kaleidoscope when I have time.Shier
@Kaleidoscope I've sent an issue in your repository about the old excel files. issue linkShier
P
17

If you are stuck with java 5-6 then this utility class from servoy open source product.

You only need this function

public static String getContentType(byte[] data, String name)

It probes the first bytes of the content and returns the content types based on that content and not by file extension.

Point answered 5/9, 2013 at 15:20 Comment(0)
K
8

To chip in with my 5 cents:

TL,DR

I use MimetypesFileTypeMap and add any mime that is not there and I specifically need it, into mime.types file.

And now, the long read:

First of all, MIME types list is huge, see here: https://www.iana.org/assignments/media-types/media-types.xhtml

I like to use standard facilities provided by JDK first, and if that doesn't work, I'll go and look for something else.

Determine file type from file extension

Since 1.6, Java has MimetypesFileTypeMap, as pointed in one of the answers above, and it is the simplest way to determine mime type:

new MimetypesFileTypeMap().getContentType( fileName );

In its vanilla implementation this does not do much (i.e. it works for .html but it doesn't for .png). It is, however, super simple to add any content type you may need:

  1. Create file named 'mime.types' in META-INF folder in your project
  2. Add a line for every mime type you need and default implementation doesn't provide (there are hundreds of mime types and list grows as time goes by).

Example entries for png and js files would be:

image/png png PNG
application/javascript js

For mime.types file format, see more details here: https://docs.oracle.com/javase/7/docs/api/javax/activation/MimetypesFileTypeMap.html

Determine file type from file content

Since 1.7, Java has java.nio.file.spi.FileTypeDetector, which defines a standard API for determining a file type in implementation specific way.

To fetch mime type for a file, you would simply use Files and do this in your code:

Files.probeContentType(Paths.get("either file name or full path goes here"));

The API definition provides for facilities that support either for determining file mime type from file name or from file content (magic bytes). That is why probeContentType() method throws IOException, in case an implementation of this API uses Path provided to it to actually try to open the file associated with it.

Again, vanilla implementation of this (the one that comes with JDK) leaves a lot to be desired.

In some ideal world in a galaxy far, far away, all these libraries which try to solve this file-to-mime-type problem would simply implement java.nio.file.spi.FileTypeDetector, you would drop in the preferred implementing library's jar file into your classpath and that would be it.

In the real world, the one where you need TL,DR section, you should find the library with most stars next to it's name and use it. For this particular case, I don't need one (yet ;) ).

Kester answered 25/9, 2017 at 14:3 Comment(0)
B
4

I tried several ways to do it, including the first ones said by @Joshua Fox. But some don't recognize frequent mimetypes like for PDF files, and other could not be trustable with fake files (I tried with a RAR file with extension changed to TIF). The solution I found, as also is said by @Joshua Fox in a superficial way, is to use MimeUtil2, like this:

MimeUtil2 mimeUtil = new MimeUtil2();
mimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
String mimeType = MimeUtil2.getMostSpecificMimeType(mimeUtil.getMimeTypes(file)).toString();
Bowling answered 21/9, 2012 at 17:3 Comment(2)
I had no success at all with MimeUtil2 - almost everything came back as application/octet-stream. I used MimeUtil.getMimeTypes() with much more success after initializing with ` MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector"); MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.ExtensionMimeDetector"); MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.OpendesktopMimeDetector"); `Thee
Thanks for the working solution. The documentation of mime-util is not very clear about how to instantiate the utility class. Finally got it up and running, but replaced the classname string with the actual class. MimeUtil.registerMimeDetector(ExtensionMimeDetector.class.getName()); String mimeType = MimeUtil.getMostSpecificMimeType(MimeUtil.getMimeTypes(filename)).toString();Lollapalooza
I
4

This is the simplest way I found for doing this:

byte[] byteArray = ...
InputStream is = new BufferedInputStream(new ByteArrayInputStream(byteArray));
String mimeType = URLConnection.guessContentTypeFromStream(is);
Ifc answered 23/2, 2017 at 14:16 Comment(0)
P
4

If you are working with a Servlet and if the servlet context is available to you, you can use :

getServletContext().getMimeType( fileName );
Pantelleria answered 11/11, 2019 at 6:13 Comment(2)
What is getServletContext?Dustidustie
A method in the HttpServlet-class.Ajmer
G
4

Apache Tika.

<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.24</version>
</dependency>

and Two line of code.

Tika tika=new Tika();
tika.detect(inputStream);

Screenshot below

enter image description here

Gordan answered 10/5, 2020 at 18:14 Comment(0)
B
3

It is better to use two layer validation for files upload.

First you can check for the mimeType and validate it.

Second you should look to convert the first 4 bytes of your file to hexadecimal and then compare it with the magic numbers. Then it will be a really secure way to check for file validations.

Babu answered 19/8, 2014 at 9:26 Comment(0)
O
3

You can do it with just one line: MimetypesFileTypeMap().getContentType(new File("filename.ext")). Look the complete test code (Java 7):

import java.io.File;
import javax.activation.MimetypesFileTypeMap;
public class MimeTest {
    public static void main(String a[]){
         System.out.println(new MimetypesFileTypeMap().getContentType(
           new File("/path/filename.txt")));
    }
}

This code produces the follow output: text/plain

Overcareful answered 3/2, 2018 at 1:44 Comment(0)
A
3

I couldn't find anything to check for video/mp4 MIME type so I made my own solution. I happened to observe that Wikipedia was wrong and that the 00 00 00 18 66 74 79 70 69 73 6F 6D file signature is not correct. the fourth byte (18) and all 70 (excluded) after changes quite a lot amongst otherwise valid mp4 files.

This code is essentially a copy/paste of URLConnection.guessContentTypeFromStream code but tailored to video/mp4.

BufferedInputStream bis = new BufferedInputStream(new ByteArrayInputStream(content));
String mimeType = URLConnection.guessContentTypeFromStream(bis);

// Goes full barbaric and processes the bytes manually
if (mimeType == null){
    // These ints converted in hex ar:
    // 00 00 00 18 66 74 79 70 69 73 6F 6D
    // which are the file signature (magic bytes) for .mp4 files
    // from https://www.wikiwand.com/en/List_of_file_signatures
    // just ctrl+f "mp4"
    int[] mp4_sig = {0, 0, 0, 24, 102, 116, 121, 112};

    bis.reset();
    bis.mark(16);
    int[] firstBytes = new int[8];
    for (int i = 0; i < 8; i++) {
        firstBytes[i] = bis.read();
    }
    // This byte doesn't matter for the file signature and changes
    mp4_sig[3] = content[3];

    bis.reset();
    if (Arrays.equals(firstBytes, mp4_sig)){
        mimeType = "video/mp4";
    }
}

Tested successfully against 10 different .mp4 files.

EDIT: Here is a useful link (if it is still online) where you can find samples of many types. I don't own those videos, don't know who does either, but they're useful for testing the above code.

Altercation answered 3/11, 2020 at 21:44 Comment(0)
B
2

A solution to detecting a file's Media Type1 has the following parts:

Please remember to give credit if you copy the code.

StreamMediaType.java

In the following code -1 means skip comparing the byte at that index; a -2 denotes end of file type signature. This detects binary formats, primarily images, and a few plain text format variations (HTML, SVG, XML). The code uses up to the first 11 "magic" bytes from the data source's header. Optimizations and improvements that shorten the logic are welcome.

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Path;
import java.util.LinkedHashMap;
import java.util.Map;

import static com.keenwrite.io.MediaType.*;
import static java.lang.System.arraycopy;

public class StreamMediaType {
  private static final int FORMAT_LENGTH = 11;
  private static final int END_OF_DATA = -2;

  private static final Map<int[], MediaType> FORMAT = new LinkedHashMap<>();

  static {
    //@formatter:off
    FORMAT.put( ints( 0x3C, 0x73, 0x76, 0x67, 0x20 ), IMAGE_SVG_XML );
    FORMAT.put( ints( 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A ), IMAGE_PNG );
    FORMAT.put( ints( 0xFF, 0xD8, 0xFF, 0xE0 ), IMAGE_JPEG );
    FORMAT.put( ints( 0xFF, 0xD8, 0xFF, 0xEE ), IMAGE_JPEG );
    FORMAT.put( ints( 0xFF, 0xD8, 0xFF, 0xE1, -1, -1, 0x45, 0x78, 0x69, 0x66, 0x00 ), IMAGE_JPEG );
    FORMAT.put( ints( 0x49, 0x49, 0x2A, 0x00 ), IMAGE_TIFF );
    FORMAT.put( ints( 0x4D, 0x4D, 0x00, 0x2A ), IMAGE_TIFF );
    FORMAT.put( ints( 0x47, 0x49, 0x46, 0x38 ), IMAGE_GIF );
    FORMAT.put( ints( 0x8A, 0x4D, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A ), VIDEO_MNG );
    FORMAT.put( ints( 0x25, 0x50, 0x44, 0x46, 0x2D, 0x31, 0x2E ), APP_PDF );
    FORMAT.put( ints( 0x38, 0x42, 0x50, 0x53, 0x00, 0x01 ), IMAGE_PHOTOSHOP );
    FORMAT.put( ints( 0x25, 0x21, 0x50, 0x53, 0x2D, 0x41, 0x64, 0x6F, 0x62, 0x65, 0x2D ), APP_EPS );
    FORMAT.put( ints( 0x25, 0x21, 0x50, 0x53 ), APP_PS );
    FORMAT.put( ints( 0xFF, 0xFB, 0x30 ), AUDIO_MP3 );
    FORMAT.put( ints( 0x49, 0x44, 0x33 ), AUDIO_MP3 );
    FORMAT.put( ints( 0x3C, 0x21 ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x68, 0x74, 0x6D, 0x6C ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x68, 0x65, 0x61, 0x64 ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x62, 0x6F, 0x64, 0x79 ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x48, 0x54, 0x4D, 0x4C ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x48, 0x45, 0x41, 0x44 ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x42, 0x4F, 0x44, 0x59 ), TEXT_HTML );
    FORMAT.put( ints( 0x3C, 0x3F, 0x78, 0x6D, 0x6C, 0x20 ), TEXT_XML );
    FORMAT.put( ints( 0xFE, 0xFF, 0x00, 0x3C, 0x00, 0x3f, 0x00, 0x78 ), TEXT_XML );
    FORMAT.put( ints( 0xFF, 0xFE, 0x3C, 0x00, 0x3F, 0x00, 0x78, 0x00 ), TEXT_XML );
    FORMAT.put( ints( 0x42, 0x4D ), IMAGE_BMP );
    FORMAT.put( ints( 0x23, 0x64, 0x65, 0x66 ), IMAGE_X_BITMAP );
    FORMAT.put( ints( 0x21, 0x20, 0x58, 0x50, 0x4D, 0x32 ), IMAGE_X_PIXMAP );
    FORMAT.put( ints( 0x2E, 0x73, 0x6E, 0x64 ), AUDIO_BASIC );
    FORMAT.put( ints( 0x64, 0x6E, 0x73, 0x2E ), AUDIO_BASIC );
    FORMAT.put( ints( 0x52, 0x49, 0x46, 0x46 ), AUDIO_WAV );
    FORMAT.put( ints( 0x50, 0x4B ), APP_ZIP );
    FORMAT.put( ints( 0x41, 0x43, -1, -1, -1, -1, 0x00, 0x00, 0x00, 0x00, 0x00 ), APP_ACAD );
    FORMAT.put( ints( 0xCA, 0xFE, 0xBA, 0xBE ), APP_JAVA );
    FORMAT.put( ints( 0xAC, 0xED ), APP_JAVA_OBJECT );
    //@formatter:on
  }

  private StreamMediaType() {
  }

  public static MediaType getMediaType( final Path path ) throws IOException {
    return getMediaType( path.toFile() );
  }

  public static MediaType getMediaType( final java.io.File file )
    throws IOException {
    try( final var fis = new FileInputStream( file ) ) {
      return getMediaType( fis );
    }
  }

  public static MediaType getMediaType( final InputStream is )
    throws IOException {
    final var input = new byte[ FORMAT_LENGTH ];
    final var count = is.read( input, 0, FORMAT_LENGTH );

    if( count > 1 ) {
      final var available = new byte[ count ];
      arraycopy( input, 0, available, 0, count );
      return getMediaType( available );
    }

    return UNDEFINED;
  }

  public static MediaType getMediaType( final byte[] data ) {
    assert data != null;

    final var source = new int[]{
      0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};

    for( int i = 0; i < source.length; i++ ) {
      source[ i ] = data[ i ] & 0xFF;
    }

    for( final var key : FORMAT.keySet() ) {
      int i = -1;
      boolean matches = true;

      while( ++i < FORMAT_LENGTH && key[ i ] != END_OF_DATA && matches ) {
        matches = key[ i ] == source[ i ] || key[ i ] == -1;
      }

      if( matches ) {
        return FORMAT.get( key );
      }
    }

    return UNDEFINED;
  }

  private static int[] ints( final int... data ) {
    final var magic = new int[ FORMAT_LENGTH ];
    int i = -1;
    while( ++i < data.length ) {
      magic[ i ] = data[ i ];
    }

    while( i < FORMAT_LENGTH ) {
      magic[ i++ ] = END_OF_DATA;
    }

    return magic;
  }
}

MediaType.java

Define the file formats according to the IANA Media Type list. Notice that the file name extensions are mapped in MediaTypeExtension. There's a dependency on Apache's FilenameUtils class for its getExtension function.

import java.io.File;
import java.io.IOException;
import java.nio.file.Path;

import static MediaType.TypeName.*;
import static MediaTypeExtension.getMediaType;
import static org.apache.commons.io.FilenameUtils.getExtension;

public enum MediaType {
  APP_ACAD( APPLICATION, "acad" ),
  APP_JAVA_OBJECT( APPLICATION, "x-java-serialized-object" ),
  APP_JAVA( APPLICATION, "java" ),
  APP_PS( APPLICATION, "postscript" ),
  APP_EPS( APPLICATION, "eps" ),
  APP_PDF( APPLICATION, "pdf" ),
  APP_ZIP( APPLICATION, "zip" ),
  FONT_OTF( "otf" ),
  FONT_TTF( "ttf" ),
  IMAGE_APNG( "apng" ),
  IMAGE_ACES( "aces" ),
  IMAGE_AVCI( "avci" ),
  IMAGE_AVCS( "avcs" ),
  IMAGE_BMP( "bmp" ),
  IMAGE_CGM( "cgm" ),
  IMAGE_DICOM_RLE( "dicom_rle" ),
  IMAGE_EMF( "emf" ),
  IMAGE_EXAMPLE( "example" ),
  IMAGE_FITS( "fits" ),
  IMAGE_G3FAX( "g3fax" ),
  IMAGE_GIF( "gif" ),
  IMAGE_HEIC( "heic" ),
  IMAGE_HEIF( "heif" ),
  IMAGE_HEJ2K( "hej2k" ),
  IMAGE_HSJ2( "hsj2" ),
  IMAGE_X_ICON( "x-icon" ),
  IMAGE_JLS( "jls" ),
  IMAGE_JP2( "jp2" ),
  IMAGE_JPEG( "jpeg" ),
  IMAGE_JPH( "jph" ),
  IMAGE_JPHC( "jphc" ),
  IMAGE_JPM( "jpm" ),
  IMAGE_JPX( "jpx" ),
  IMAGE_JXR( "jxr" ),
  IMAGE_JXRA( "jxrA" ),
  IMAGE_JXRS( "jxrS" ),
  IMAGE_JXS( "jxs" ),
  IMAGE_JXSC( "jxsc" ),
  IMAGE_JXSI( "jxsi" ),
  IMAGE_JXSS( "jxss" ),
  IMAGE_KTX( "ktx" ),
  IMAGE_KTX2( "ktx2" ),
  IMAGE_NAPLPS( "naplps" ),
  IMAGE_PNG( "png" ),
  IMAGE_PHOTOSHOP( "photoshop" ),
  IMAGE_SVG_XML( "svg+xml" ),
  IMAGE_T38( "t38" ),
  IMAGE_TIFF( "tiff" ),
  IMAGE_WEBP( "webp" ),
  IMAGE_WMF( "wmf" ),
  IMAGE_X_BITMAP( "x-xbitmap" ),
  IMAGE_X_PIXMAP( "x-xpixmap" ),
  AUDIO_BASIC( AUDIO, "basic" ),
  AUDIO_MP3( AUDIO, "mp3" ),
  AUDIO_WAV( AUDIO, "x-wav" ),
  VIDEO_MNG( VIDEO, "x-mng" ),
  TEXT_HTML( TEXT, "html" ),
  TEXT_MARKDOWN( TEXT, "markdown" ),
  TEXT_PLAIN( TEXT, "plain" ),
  TEXT_XHTML( TEXT, "xhtml+xml" ),
  TEXT_XML( TEXT, "xml" ),
  TEXT_YAML( TEXT, "yaml" ),

  /*
   * When all other lights go out.
   */
  UNDEFINED( TypeName.UNDEFINED, "undefined" );

  public enum TypeName {
    APPLICATION,
    AUDIO,
    IMAGE,
    TEXT,
    UNDEFINED,
    VIDEO
  }

  private final String mMediaType;
  private final TypeName mTypeName;
  private final String mSubtype;

  MediaType( final String subtype ) {
    this( IMAGE, subtype );
  }

  MediaType( final TypeName typeName, final String subtype ) {
    mTypeName = typeName;
    mSubtype = subtype;
    mMediaType = typeName.toString().toLowerCase() + '/' + subtype;
  }

  public static MediaType valueFrom( final File file ) {
    assert file != null;
    return fromFilename( file.getName() );
  }

  public static MediaType fromFilename( final String filename ) {
    assert filename != null;
    return getMediaType( getExtension( filename ) );
  }

  public static MediaType valueFrom( final Path path ) {
    assert path != null;
    return valueFrom( path.toFile() );
  }

  public static MediaType valueFrom( String contentType ) {
    if( contentType == null || contentType.isBlank() ) {
      return UNDEFINED;
    }

    var i = contentType.indexOf( ';' );
    contentType = contentType.substring(
      0, i == -1 ? contentType.length() : i );

    i = contentType.indexOf( '/' );
    i = i == -1 ? contentType.length() : i;
    final var type = contentType.substring( 0, i );
    final var subtype = contentType.substring( i + 1 );

    return valueFrom( type, subtype );
  }

  public static MediaType valueFrom(
    final String type, final String subtype ) {
    assert type != null;
    assert subtype != null;

    for( final var mediaType : values() ) {
      if( mediaType.equals( type, subtype ) ) {
        return mediaType;
      }
    }

    return UNDEFINED;
  }

  public boolean equals( final String type, final String subtype ) {
    assert type != null;
    assert subtype != null;

    return mTypeName.name().equalsIgnoreCase( type ) &&
      mSubtype.equalsIgnoreCase( subtype );
  }

  public boolean isType( final TypeName typeName ) {
    return mTypeName == typeName;
  }

  public String getSubtype() {
    return mSubtype;
  }
   
  @Override
  public String toString() {
    return mMediaType;
  }
}

MediaTypeExtension.java

Last piece of the puzzle is a map of MediaTypes to their known and common/popular file name extensions. This allows bidirectional lookup based on file name extensions.

import static MediaType.*;
import static java.util.List.of;

public enum MediaTypeExtension {
  MEDIA_APP_ACAD( APP_ACAD, of( "dwg" ) ),
  MEDIA_APP_PDF( APP_PDF ),
  MEDIA_APP_PS( APP_PS, of( "ps" ) ),
  MEDIA_APP_EPS( APP_EPS ),
  MEDIA_APP_ZIP( APP_ZIP ),

  MEDIA_AUDIO_MP3( AUDIO_MP3 ),
  MEDIA_AUDIO_BASIC( AUDIO_BASIC, of( "au" ) ),
  MEDIA_AUDIO_WAV( AUDIO_WAV, of( "wav" ) ),

  MEDIA_FONT_OTF( FONT_OTF ),
  MEDIA_FONT_TTF( FONT_TTF ),

  MEDIA_IMAGE_APNG( IMAGE_APNG ),
  MEDIA_IMAGE_BMP( IMAGE_BMP ),
  MEDIA_IMAGE_GIF( IMAGE_GIF ),
  MEDIA_IMAGE_JPEG( IMAGE_JPEG,
                    of( "jpg", "jpe", "jpeg", "jfif", "pjpeg", "pjp" ) ),
  MEDIA_IMAGE_PNG( IMAGE_PNG ),
  MEDIA_IMAGE_PSD( IMAGE_PHOTOSHOP, of( "psd" ) ),
  MEDIA_IMAGE_SVG( IMAGE_SVG_XML, of( "svg" ) ),
  MEDIA_IMAGE_TIFF( IMAGE_TIFF, of( "tiff", "tif" ) ),
  MEDIA_IMAGE_WEBP( IMAGE_WEBP ),
  MEDIA_IMAGE_X_BITMAP( IMAGE_X_BITMAP, of( "xbm" ) ),
  MEDIA_IMAGE_X_PIXMAP( IMAGE_X_PIXMAP, of( "xpm" ) ),

  MEDIA_VIDEO_MNG( VIDEO_MNG, of( "mng" ) ),

  MEDIA_TEXT_MARKDOWN( TEXT_MARKDOWN, of(
    "md", "markdown", "mdown", "mdtxt", "mdtext", "mdwn", "mkd", "mkdown",
    "mkdn" ) ),
  MEDIA_TEXT_PLAIN( TEXT_PLAIN, of( "txt", "asc", "ascii", "text", "utxt" ) ),
  MEDIA_TEXT_R_MARKDOWN( TEXT_R_MARKDOWN, of( "Rmd" ) ),
  MEDIA_TEXT_R_XML( TEXT_R_XML, of( "Rxml" ) ),
  MEDIA_TEXT_XHTML( TEXT_XHTML, of( "xhtml" ) ),
  MEDIA_TEXT_XML( TEXT_XML ),
  MEDIA_TEXT_YAML( TEXT_YAML, of( "yaml", "yml" ) ),

  MEDIA_UNDEFINED( UNDEFINED, of( "undefined" ) );

  private final MediaType mMediaType;
  private final List<String> mExtensions;

  MediaTypeExtension( final MediaType mediaType ) {
    this( mediaType, of( mediaType.getSubtype() ) );
  }

  MediaTypeExtension(
    final MediaType mediaType, final List<String> extensions ) {
    assert mediaType != null;
    assert extensions != null;
    assert !extensions.isEmpty();

    mMediaType = mediaType;
    mExtensions = extensions;
  }

  public String getExtension() {
    return mExtensions.get( 0 );
  }

  public static MediaTypeExtension valueFrom( final MediaType mediaType ) {
    for( final var type : values() ) {
      if( type.isMediaType( mediaType ) ) {
        return type;
      }
    }

    return MEDIA_UNDEFINED;
  }

  boolean isMediaType( final MediaType mediaType ) {
    return mMediaType == mediaType;
  }

  static MediaType getMediaType( final String extension ) {
    final var sanitized = sanitize( extension );

    for( final var mediaType : MediaTypeExtension.values() ) {
      if( mediaType.isType( sanitized ) ) {
        return mediaType.getMediaType();
      }
    }

    return UNDEFINED;
  }

  private boolean isType( final String sanitized ) {
    for( final var extension : mExtensions ) {
      if( extension.equalsIgnoreCase( sanitized ) ) {
        return true;
      }
    }

    return false;
  }

  private static String sanitize( final String extension ) {
    return extension == null ? "" : extension.toLowerCase();
  }

  private MediaType getMediaType() {
    return mMediaType;
  }
}

Usages:

// EXAMPLE -- Detect media type
//
final File image = new File( "filename.jpg" );
final MediaType mt = StreamMediaType.getMediaType( image );

// Tricky! The JPG could be a PNG in disguise.
if( mt.isType( MediaType.TypeName.IMAGE ) ) {

  if( mt == MediaType.IMAGE_PNG ) {
    // Nice try! Sneaky sneak.
  }
}

// EXAMPLE -- Get typical media type file name extension
//
final String ext = MediaTypeExtension.valueFrom( MediaType.IMAGE_SVG_XML ).getExtension();

// EXAMPLE -- Get media type from HTTP request
//
final var url = new URL( "https://localhost/path/file.ext" );
final var conn = (HttpURLConnection) url.openConnection();
final var contentType = conn.getContentType();
MediaType mediaType = valueFrom( contentType );

// Fall back to stream detection probe
if( mediaType == UNDEFINED ) {
  mediaType = StreamMediaType.getMediaType( conn.getInputStream() );
}

conn.disconnect();

You get the idea.


Short library review:


Sample audio, video, and image files for testing:


Note that nearly all XML documents will begin the same way:

<?xml version="1.0" standalone="no"?>

Since SVG documents are XML documents, many SVG documents will contain that XML declaration and may also contain:

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">

Detecting the SVG doctype would be possible by bumping the magic bytes from 11 to 13. Still, the doctype is not required, meaning that the SVG document could also begin after the XML declaration as follows:

<svg xmlns="http://www.w3.org/2000/svg">

Meaning, use caution when using this code to detect SVG file formats, as it is not reliable. Instead, consider using the HTTP Content-Type or filename extension.

Compounding the issue is that comments of arbitrary length can be inserted before the <svg tag, making detection extra-difficult.


1 "MIME type" is a deprecated term.

Beall answered 4/4, 2021 at 18:16 Comment(4)
I'll save you some effort: There's no easy way to capture SVG/XML if the XML declaration is sent because the first 11 bytes are all the same for every type of XML file. One solution would be to check for XML, then gobble up to the first opening element to check for <svg case insensitively. In practice, I ended up using the Content-Type retrieved from the HTTP connection for the SVG resource.Beall
My derivative version: gist.github.com/erizzo/deb52e8f74f937d2ea2aa74f494dcb9fSiddur
The challenge with checking for <?xml... and skipping it, is that there can be comments before the <svg root node. In fact, Adobe Illustrator inserts a lengthy comment of varying length. :-(Siddur
Thanks for posting the derivative. Remember to cite your sources as per the terms of the StackOverflow license agreement.Beall
F
2

Actually, Apache Tika detector Tika.detect(File) is the best option and more accurate than Files.probeContentType(path).

check this great quick reference which contains examples and code samples.

Fathometer answered 27/10, 2021 at 11:2 Comment(0)
S
1

in spring MultipartFile file;

org.springframework.web.multipart.MultipartFile

file.getContentType();

Subspecies answered 26/4, 2016 at 11:33 Comment(0)
T
0

if you work on linux OS ,there is a command line file --mimetype:

String mimetype(file){

   //1. run cmd
   Object cmd=Runtime.getRuntime().exec("file --mime-type "+file);

   //2 get output of cmd , then 
    //3. parse mimetype
    if(output){return output.split(":")[1].trim(); }
    return "";
}

Then

mimetype("/home/nyapp.war") //  'application/zip'

mimetype("/var/www/ggg/au.mp3") //  'audio/mp3'
Tuxedo answered 7/12, 2014 at 11:17 Comment(2)
This will work, but is IMO a bad practice as it ties your code to a specific OS and requires the external utility to be present at the system running it. Don't get me wrong; it's a fully valid solution, but breaks portability - which is one of the main reasons to use Java in the first place...Tufts
@ToVine: For the record, I'm going to respectfully disagree. Not every Java program is required to be portable. Let context and the programmer make that decision. en.wikipedia.org/wiki/Java_Native_InterfaceTremble
C
0

After trying various other libraries I settled with mime-util.

<groupId>eu.medsea.mimeutil</groupId>
      <artifactId>mime-util</artifactId>
      <version>2.1.3</version>
</dependency>

File file = new File("D:/test.tif");
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
Collection<?> mimeTypes = MimeUtil.getMimeTypes(file);
System.out.println(mimeTypes);
Cowfish answered 26/5, 2016 at 12:35 Comment(0)
I
0
public String getFileContentType(String fileName) {
    String fileType = "Undetermined";
    final File file = new File(fileName);
    try
    {
        fileType = Files.probeContentType(file.toPath());
    }
    catch (IOException ioException)
    {
        System.out.println(
                "ERROR: Unable to determine file type for " + fileName
                        + " due to exception " + ioException);
    }
    return fileType;
}
Inobservance answered 23/11, 2016 at 13:26 Comment(1)
This method Files.probeContentType(String) is available since JDK version 1.7 and it works very good for me.Coating
I
0
File file = new File(PropertiesReader.FILE_PATH);
MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
String mimeType = fileTypeMap.getContentType(file);
URLConnection uconnection = file.toURL().openConnection();
mimeType = uconnection.getContentType();
Illyes answered 17/5, 2019 at 9:9 Comment(1)
While this code may solve the question, including an explanation really helps to improve the quality of your post.Baumgartner
B
0

Check the magic bytes of the stream or file:

https://mcmap.net/q/56674/-how-to-get-the-magic-number-from-file-in-java

It uses pure Java, but requires you to define an enum of the types you want to detect.

Balmacaan answered 13/1, 2021 at 5:51 Comment(0)
H
0

If you want a reliable (ie. consistent) way of mapping file extensions to mime-types, here is what I use:

https://github.com/jjYBdx4IL/misc/blob/master/text-utils/src/main/java/com/github/jjYBdx4IL/utils/text/MimeType.java

It includes a bundled mime types database and basically inverts the logic of javax.activation's MimetypesFileTypeMap class by using the database to initialize the "programmatic" entries. That way the library-defined types always have precedence over what may be defined in unbundled resources.

Humber answered 18/5, 2021 at 12:35 Comment(0)
D
0

in Java, the URLConnection class has a method called guessContentTypeFromName(String fileName) that can be used to guess the MIME media type (also known as the content type) of a file based on its file name. The method uses the file name’s extension to determine the content type.

String fileName = "image.jpg";
String contentType = URLConnection.guessContentTypeFromName(fileName);
System.out.println(contentType); // "image/jpeg"

To know more Read this article

Dairy answered 28/1, 2023 at 8:17 Comment(0)
I
-1

I did it with following code.

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class MimeFileType {

    public static void main(String args[]){

        try{
            URL url = new URL ("https://www.url.com.pdf");

            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            InputStream content = (InputStream)connection.getInputStream();
            connection.getHeaderField("Content-Type");

            System.out.println("Content-Type "+ connection.getHeaderField("Content-Type"));

            BufferedReader in = new BufferedReader (new InputStreamReader(content));

        }catch (Exception e){

        }
    }
}
Ivories answered 31/7, 2019 at 20:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.