PHP Upload File Validation
Asked Answered
S

2

6

I am creating file upload script and I'm looking for the best techniques and practices to validate uploaded files.

Allowed extensions are:

$allowed_extensions = array('gif','jpg','png','swf','doc','docx','pdf','zip','rar','rtf','psd');

Here's the list of what I'm doing.

  1. Checking file extension

    $path_info = pathinfo($filename);
    if( !in_array($path_info['extension'], $allowed_extensions) ) {
        die('File #'.$i.': Incorrent file extension.');
    }
    
  2. Checking file mime type

    $allowed_mimes = array('image/jpeg','image/png','image/gif','text/richtext','multipart/x-zip','application/x-shockwave-flash','application/msword','application/pdf','application/x-rar-compressed','image/vnd.adobe.photoshop');
    if( !in_array(finfo_file($finfo, $file), $allowed_mimes) ) {
        die('File #'.$i.': Incorrent mime type.');
    } 
    
  3. Checking file size.

What should I do to make sure uploaded files are valid files? I noticed strange thing. I changed .jpg file extension to .zip and... it was uploaded. I thought it will have incorrect MIME type but after that I noticed I'm not checking for a specific type but if a specific MIME type exist in array. I'll fix it later, that presents no problems for me (of course if you got any good solution/idea, do not hesitate to share it, please).

I know what to do with images (try to resize, rotate, crop, etc.), but have no idea how to validate other extensions.

Now's time for my questions.

  1. Do you know good techniques to validate such files? Maybe I should unpack archives for .zip/.rar files, but what about documents (doc, pdf)?
  2. Will rotating, resizing work for .psd files?
  3. Basically I thought that .psd file has following mime: application/octet-stream but when

I tried to upload .psd file it showed me (image/vnd.adobe.photoshop). I'm a bit confused about this. Do files always have the same MIME type?

Also, I cannot force code block to work. Does anyone have a guess as to why?

Slavonic answered 20/8, 2010 at 22:11 Comment(0)
P
5

Lots of file formats have a pretty standard set of starting bytes to indicate the format. If you do a binary read for the first several bytes and test them against the start bytes of known formats it should be a fairly reliable way to confirm the file type matches the extension.

For example, JPEG's start bytes are 0xFF, 0xD8; so something like:

$fp = fopen("filename.jpg", "rb");
$startbytes = fread($fp, 8);
$chunked = str_split($startbytes,1);
if ($chunked[0] == 0xFF && $chunked[1] == 0xD8){
    $exts[] = "jpg";
    $exts[] = "jpeg";
}

then check against the exts.

could work.

Pontoon answered 20/8, 2010 at 22:24 Comment(5)
So, correct me if I'm wrong, if JPEG's start bytes are different than 0xFF, 0xD8; it means file is invalid right? Is there any list of "starting bytes" out there? Or...how can I create it?Slavonic
Here's a decent list: mikekunz.com/image_file_header.html It's missing PNG, though, but it's header is pretty consistent from what I've seen.Assurgent
@Collin Allen: Thanks a lot! Now I know what to search for.Slavonic
No need to re-invent the wheel. The standard FileInfo extension (for >= PHP 5.3) and mime_content_type function (for <= PHP 5.2) do this already.Nugent
@Nugent does PHP do this exactly, or is KeatsKelleher doing it in a more secure manner?Skylark
P
5

If you want to validate images, a good thing to do is use getimagesize(), and see if it returns a valid set of sizes - or errors out if its an invalid image file. Or use a similar function for whatever files you are trying to support.

The key is that the file name means absolutely nothing. The file extensions (.jpg, etc), the mime types... are for humans.

The only way you can guarantee that a file is of the correct type is to open it and evaluate it byte by byte. That is, obviously, a pretty daunting task if you want to try to validate a large number of file types. At the simplest level, you'd look at the first few bytes of the file to ensure that they match what is expected of a file of that type.

Planetstruck answered 20/8, 2010 at 22:19 Comment(1)
Do you know any manual or document for analyzing first bytes?Slavonic
P
5

Lots of file formats have a pretty standard set of starting bytes to indicate the format. If you do a binary read for the first several bytes and test them against the start bytes of known formats it should be a fairly reliable way to confirm the file type matches the extension.

For example, JPEG's start bytes are 0xFF, 0xD8; so something like:

$fp = fopen("filename.jpg", "rb");
$startbytes = fread($fp, 8);
$chunked = str_split($startbytes,1);
if ($chunked[0] == 0xFF && $chunked[1] == 0xD8){
    $exts[] = "jpg";
    $exts[] = "jpeg";
}

then check against the exts.

could work.

Pontoon answered 20/8, 2010 at 22:24 Comment(5)
So, correct me if I'm wrong, if JPEG's start bytes are different than 0xFF, 0xD8; it means file is invalid right? Is there any list of "starting bytes" out there? Or...how can I create it?Slavonic
Here's a decent list: mikekunz.com/image_file_header.html It's missing PNG, though, but it's header is pretty consistent from what I've seen.Assurgent
@Collin Allen: Thanks a lot! Now I know what to search for.Slavonic
No need to re-invent the wheel. The standard FileInfo extension (for >= PHP 5.3) and mime_content_type function (for <= PHP 5.2) do this already.Nugent
@Nugent does PHP do this exactly, or is KeatsKelleher doing it in a more secure manner?Skylark

© 2022 - 2024 — McMap. All rights reserved.