I have a model with a FileField
, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.
What I'd like to achieve:
- Calculate the uploaded files md5 checksum
- Store the file with the file name based on its md5sum
- If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead
1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?
Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).
Notes:
- I'm using Django 1.5
- The upload handler is
django.core.files.uploadhandler.TemporaryFileUploadHandler
Code:
def media_file_name(instance, filename):
h = instance.md5sum
basename, ext = os.path.splitext(filename)
return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())
class Media(models.Model):
orig_file = models.FileField(upload_to=media_file_name)
md5sum = models.CharField(max_length=36)
...
def save(self, *args, **kwargs):
if not self.pk: # file is new
md5 = hashlib.md5()
for chunk in self.orig_file.chunks():
md5.update(chunk)
self.md5sum = md5.hexdigest()
super(Media, self).save(*args, **kwargs)
Any help is appreciated!