How does one use magic to verify file type in a Django form clean method?
Asked Answered
E

5

6

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file types to pdfs, word, and open office documents.

To this end, I have installed python-magic and would like to check file types as follows per the specs for python-magic:

mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')

However, recently uploaded files lack addresses on my server. I also do not know of any method of the mime object akin to "from_file_content" that checks for the mime type given the content of the file.

What is an effective way to use magic to verify file types of uploaded files in Django forms?

Earl answered 27/12, 2011 at 17:24 Comment(0)
R
5

Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:

import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))

Also, you might want to check the file size:

if form.cleaned_data['file'].size < ...:
    print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
    # store to disk (the code above)

Additionally:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

So you might want to handle it like so:

import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))
finally:
    os.unlink(tmp.name)
    tmp.close()

Also, you might want to seek(0) after read():

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

Where uploaded data is stored

Rollo answered 27/12, 2011 at 21:17 Comment(5)
Thanks, when I try this with cleaned_data, Django notes that the file /tmp/filename.doc is undefined. Do you an idea why?Earl
In a way or in another your file will be loaded in memory. And I prefer to avoid playing directly with temporary paths.Unfriended
@Unfriended Have you ever faced with a problem when the server crashes due to lack of memory? I think this is the reason why programmers are trying to avoid reading files into memory and that's why django UploadedFile has chunks() methodRollo
@AlexeySavanovich : m.from_buffer(request.FILES['my_file_field'].multiple_chunks()) should works in this case.Unfriended
Thanks a lot! Indeed, multiple_chunks() seems to do the trick without uploading the entire file at once for verification.Earl
U
5

Why no trying something like that in your view :

m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())

Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.

Unfriended answered 27/12, 2011 at 19:46 Comment(2)
This second code is incorrect. multiple_chunks() does not return the chunks, it returns a boolean: whether the file is big enough to be splitted in chunks. docs.djangoproject.com/en/1.5/topics/http/file-uploads/…Bluestocking
from_buffer expects a string buffer, not an iterator. AFAIK your new code will fail with an AttributeError as iterator has no len(). I don't see any nice solution here, except from taking the first chunk manually.Bluestocking
R
5

Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:

import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))

Also, you might want to check the file size:

if form.cleaned_data['file'].size < ...:
    print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
    # store to disk (the code above)

Additionally:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

So you might want to handle it like so:

import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))
finally:
    os.unlink(tmp.name)
    tmp.close()

Also, you might want to seek(0) after read():

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

Where uploaded data is stored

Rollo answered 27/12, 2011 at 21:17 Comment(5)
Thanks, when I try this with cleaned_data, Django notes that the file /tmp/filename.doc is undefined. Do you an idea why?Earl
In a way or in another your file will be loaded in memory. And I prefer to avoid playing directly with temporary paths.Unfriended
@Unfriended Have you ever faced with a problem when the server crashes due to lack of memory? I think this is the reason why programmers are trying to avoid reading files into memory and that's why django UploadedFile has chunks() methodRollo
@AlexeySavanovich : m.from_buffer(request.FILES['my_file_field'].multiple_chunks()) should works in this case.Unfriended
Thanks a lot! Indeed, multiple_chunks() seems to do the trick without uploading the entire file at once for verification.Earl
D
5
mime = magic.Magic(mime=True)

attachment = form.cleaned_data['attachment']

if hasattr(attachment, 'temporary_file_path'):
    # file is temporary on the disk, so we can get full path of it.
    mime_type = mime.from_file(attachment.temporary_file_path())
else:
    # file is on the memory
    mime_type = mime.from_buffer(attachment.read())

Also, you might want to seek(0) after read():

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

Example from Django code. Performed for image fields during validation.

Dynamotor answered 17/10, 2014 at 6:45 Comment(0)
P
2

You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.

from safe_filefield.forms import SafeFileField

class MyForm(forms.Form):

    attachment = SafeFileField(
        allowed_extensions=('xls', 'xlsx', 'csv')
    )
Product answered 7/2, 2018 at 9:43 Comment(0)
L
0

In case you're handling a file upload and concerned only about images, Django will set content_type for you (or rather for itself?):

from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
    photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
    class Meta:
        model = MyPhoto
        fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
    print(form.instance.photo.file.content_type)

It doesn't rely on content type provided by the user. But django.db.models.fields.files.FieldFile.file is an undocumented property.

Actually, initially content_type is set from the request, but when the form gets validated, the value is updated.

Regarding non-images, doing request.FILES['name'].read() seems okay to me. First, that's what Django does. Second, files larger than 2.5 Mb by default are stored on a disk. So let me point you at the other answer here.


For the curious, here's the stack trace that leads to updating content_type:

django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean()
django.forms.forms.BaseForm.full_clean: self._clean_fields()
django.forms.forms.BaseForm._clean_fiels: field.clean()
django.forms.fields.FileField.clean: super().clean()
django.forms.fields.Field.clean: self.to_python()
django.forms.fields.ImageField.to_python

Laaspere answered 6/4, 2019 at 19:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.