Alternative to FindMimeFromData method in Urlmon.dll one which has more MIME types
Asked Answered
H

4

8

The FindMimeFromData method accessible through Windows DLL Urlmon.dll is capable of determining the MIME type of a given data stored in memory, considering the first 256 bytes of the byte array, where such data is stored.

However after reading its documentation, I was lead to MIME Type Detection in Windows Internet Explorer where I could find the MIME types this method is able to recognize. See list. As you can see, this method is limited to 26 MIME types.

So I was wondering if anyone could point me to another method with more MIME types, or alternatively another method / class were I would be able to include the MIME types I see fit.

Hyperbole answered 8/3, 2013 at 18:13 Comment(5)
I am not sure this is what you want, but you can get list of major MIME-TYPES from IIS.Serena
But the FindMimeFromData method is hard coded to 26 MIME types, and I cannot modify it accept more MIME types.Specialize
Then in that case, you would probably find another way to do your task. If can you find the "extension" for the kind of data you want to read you might have more chances of determining mime-type, if you just want to know the mime-type from reading the binary data then to my knowledge you have to limit to FindMimeFromData method.Serena
This is a security-sensitive issue (hence the fixed 26 hard-coded detection). And in fact, this MIME detection can be/is disabled depending on the OS version and various configuration (Microsoft has had real problems with it in the past). I don't think you will find an alternative in the Windows API. You can rewrite your own. This link can give you some inspiration :developer.mozilla.org/en-US/docs/…Noachian
@SimonMourier +1 That answers why Microsoft would limit it's own MIME detection. I also didn't believed I would find another Windows API alternative, guess the only way is writing my own. But I will wait and see if someone knows of any alternative to the Microsoft API.Specialize
S
20

UPDATE: @GetoX has taken this code and wrapped it in a NuGet package for .net core! See below, cheers!!

So I was wondering if anyone could point me to another method with more MIME types, or alternatively another method / class were I would be able to include the MIME types I see fit.

I use a hybrid of Winista and URLMon to detect the real format of files uploaded..

Winista MIME Detection

Say someone renames a exe with a jpg extension, you can still determine the "real" file format using Binary Analysis. It doesn't detect swf's or flv's but does pretty much every other well known format + you can get a hex editor and add more files it can detect.

File Magic

Winista detects the real MIME type using an XML file "mime-type.xml" that contains information about file types and the signatures used to identify the content type.eg:

<!--
 !   Audio primary type
 ! -->

<mime-type name="audio/basic"
           description="uLaw/AU Audio File">
    <ext>au</ext><ext>snd</ext>
    <magic offset="0" type="byte" value="2e736e64000000"/>
</mime-type>

<mime-type name="audio/midi"
           description="Musical Instrument Digital Interface MIDI-sequention Sound">
    <ext>mid</ext><ext>midi</ext><ext>kar</ext>
    <magic offset="0" value="MThd"/>
</mime-type>

<mime-type name="audio/mpeg"
           description="MPEG Audio Stream, Layer III">
    <ext>mp3</ext><ext>mp2</ext><ext>mpga</ext>
    <magic offset="0" value="ID3"/>
</mime-type>

When Winista fail's to detect the real file format, I've resorted back to the URLMon method:

public class urlmonMimeDetect
{
    [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
    private extern static System.UInt32 FindMimeFromData(
        System.UInt32 pBC,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
        [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
        System.UInt32 cbSize,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
        System.UInt32 dwMimeFlags,
        out System.UInt32 ppwzMimeOut,
        System.UInt32 dwReserverd
    );

public string GetMimeFromFile(string filename)
{
    if (!File.Exists(filename))
        throw new FileNotFoundException(filename + " not found");

    byte[] buffer = new byte[256];
    using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
    {
        if (fs.Length >= 256)
            fs.Read(buffer, 0, 256);
        else
            fs.Read(buffer, 0, (int)fs.Length);
    }
    try
    {
        System.UInt32 mimetype;
        FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
        System.IntPtr mimeTypePtr = new IntPtr(mimetype);
        string mime = Marshal.PtrToStringUni(mimeTypePtr);
        Marshal.FreeCoTaskMem(mimeTypePtr);
        return mime;
    }
    catch (Exception e)
    {
        return "unknown/unknown";
    }
}
}

From inside the Winista method, I fall back on the URLMon here:

   public MimeType GetMimeTypeFromFile(string filePath)
    {
        sbyte[] fileData = null;
        using (FileStream srcFile = new FileStream(filePath, FileMode.Open, FileAccess.Read))
        {
            byte[] data = new byte[srcFile.Length];
            srcFile.Read(data, 0, (Int32)srcFile.Length);
            fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
        }

        MimeType oMimeType = GetMimeType(fileData);
        if (oMimeType != null) return oMimeType;

        //We haven't found the file using Magic (eg a text/plain file)
        //so instead use URLMon to try and get the files format
        Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect urlmonMimeDetect = new Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect();
        string urlmonMimeType = urlmonMimeDetect.GetMimeFromFile(filePath);
        if (!string.IsNullOrEmpty(urlmonMimeType))
        {
            foreach (MimeType mimeType in types)
            {
                if (mimeType.Name == urlmonMimeType)
                {
                    return mimeType;
                }
            }
        }

        return oMimeType;
    }

Wayback Machine link to the Winista utility from netomatix. AFAIK they found some "mime reader utility classes in open source Nutch crawler system" and they did a C# rewrite in the early 2000's.

I've hosted my MimeDetect project using Winista and the URLMon fall back here (please contribute new file types using a Hex editor): https://github.com/MeaningOfLights/MimeDetect

You could also use the Registry method or .Net 4.5 method mentioned in this post linked to by Paul Zahra, but Winista is the best IMHO.

Enjoy knowing files on your systems are what they claim to be and not laden with malware!


UPDATE:

For desktop applications you may find the WindowsAPICodePack works better:

using Microsoft.WindowsAPICodePack.Shell;
using Microsoft.WindowsAPICodePack.Shell.PropertySystem;

private static string GetFilePropertyItemTypeTextValueFromShellFile(string filePathWithExtension)
{
   var shellFile = ShellFile.FromFilePath(filePathWithExtension);
   var prop = shellFile.Properties.GetProperty(PItemTypeTextCanonical);
   return prop.FormatForDisplay(PropertyDescriptionFormatOptions.None);
}
Sensate answered 24/3, 2013 at 5:40 Comment(5)
Thanks Jeremy. I like your answer, however when it comes to relying on FindMimeFromData method in Urlmon.dll I would be very careful since I've read (if I remember right) that it may return an incorrect MIME type, case the appropriate MIME types aren't defined in some given place on the Windows Registry, furthermore those values can also be tampered with, this makes a problem when shipping this to end users. Considering that, I will only rely on a detection method similar to what you showed with Winista, which you showed on your answer...Specialize
... Now thinking... Maybe I also could put together a small console application, who would sniff through a large number of files of my own making with the correct extension (and known MIME), and search for similarities on the first 256 bytes of each file of the same extension. That way I could build a sizeable list of MIME types. Well something for my spare time. Thanks Jeremy.Specialize
tried the last GetFilePropertyItemTypeTextValueFromShellFile, in my case it always returns a type of 'File' , never what I want (like Microsoft Word Document) , if the filepath has the extension - then I get the desired results - however without the extension this is uselessBorghese
Both Winista and WindowsAPICodePack seem to have disappeared. :(Alnico
@Alnico As I mentioned in my answer I have it hosted on GitHub as a fall back: github.com/MeaningOfLights/MimeDetect - also note that WindowsAPICodePack isn't completely gone: https://mcmap.net/q/56675/-windows-api-code-pack-where-is-it-closed - in future can you please read the answer and do research before posting comments. I've wasted 15mins on this for no good reason.Sensate
S
3

After few hours of looking for elastic solution. I took @JeremyThompson solution, adapted it to frameworks .net core/.net 4.5 and put it into nuget package.

   //init
   var mimeTypes = new MimeTypes();

   //usage by filepath
   var mimeType1 = mimeTypes.GetMimeTypeFromFile(filePath);

   //usage by bytearray
   var mimeType2 = mimeTypes.GetMimeTypeFromFile(bytes);
Sirup answered 16/12, 2019 at 22:45 Comment(2)
I will definitely check it out. However, I'm curious why there is no official package we can use for this purpose. It's obviously a requirement every "security concerned" developer looking for.Sternutation
@Sternutation not even Microsoft can't be expected to cover everything. Filespecs change and they did UrlMon and WindowsAPICodePack. There's also huge litigation ramifications if you get it wrong and every developer can trick people with unknown extensions, eg pif. See my QA about it here security.stackexchange.com/q/81677/10505Sensate
M
2

There are multiple possible solutions in this SO post which will at the very least give you some food for thought.

It seems that the only real way to do it is to read it in binary and then do a comparison, whether the MIME Types are declared hard-coded in some fashion or you rely on the machines own available MIME Types / Registry.

Menchaca answered 15/3, 2013 at 10:8 Comment(1)
+1 Thanks for the link, the answer https://mcmap.net/q/54554/-using-net-how-can-you-find-the-mime-type-of-a-file-based-on-the-file-signature-not-the-extension that was posted to it should prove useful in the event I decide to build my own FindMineFromData alternative.Specialize
F
2

Just found FileSignatures. It is actually a good alternative, that runs fine also on Linux-targeted applications.

Context

Urlmon.dll is not suitable for Linux - therefore ain't going to work for multi-platform applications. I found this article in Microsoft Docs. It makes a reference to a File Signature Database which is a quite good reference for file types (518 by the time I'm writing this).

Digging a little more I found this pretty good project: FileSignatures nuget here. It is also quite extensible, so you can, for example, get all the types you need from filesignatures.net and create your own type models.

Usage

You can either check for any defined type

var format = inspector.DetermineFileFormat(stream);

if(format is Pdf) {
  // Just matches Pdf
}

if(format is OfficeOpenXml) {
  // Matches Word, Excel, Powerpoint
}

if(format is Image) {
  // Matches any image format
}

or use some of the metadata it brings, based on matched file type

var fileFormat = _fileFormatInspector.DetermineFileFormat(stream);
var mime = fileFormat?.MediaType;

Extensibility

You can define any number of types that inherits from FileFormat and configure a FileFormatLocator to load them when needed

var assembly = typeof(CustomFileFormat).GetTypeInfo().Assembly;

// Just the formats defined in the assembly containing CustomFileFormat
var customFormats = FileFormatLocator.GetFormats(assembly);

// Formats defined in the assembly and all the defaults
var allFormats = FileFormatLocator.GetFormats(assembly, true);

More details in the project's Github

Forward answered 5/3, 2021 at 21:0 Comment(2)
Can confirm this works with docx and xlxs files - detects the correct MIME types unlike urlmon's FindMimeFromData.Brubaker
@Brubaker yes it does. Take a look at /formats directory in the project source code, and you'll see the default-supported formats. Just tested with a .xlsm and it returned "mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". You can also define any custom format, the readme tells you how to do that.Forward

© 2022 - 2024 — McMap. All rights reserved.