Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints
Asked Answered
P

3

6

Update

I have filed a feature request. The idea is to pass on the IntegrittyError produced by the database when unique or unique_together reject a record that already exists in the database.


I have the following model:

class Compositions(models.Model):
    composer_key = models.ForeignKey(
        Composer,
        )
    composition = models.CharField(
        max_length=383,
        )

    class Meta(object):
        unique_together = (('composer_key', 'composition'), )

Using django-import-export in the admin interface, without providing an id for each entry in the csv file, ... if one pair of the csv file already exists, the procedure will be interrupted with an integrity error

duplicate key value violates unique constraint "data_compositions_composer_key_id_12f91ce7dbac16bf_uniq"
DETAIL:  Key (composer_key_id, composition)=(2, Star Wars) already exists.

The CSV file is the following:

id  composer_key    composition
        1           Hot Stuff
        2           Star Wars

The idea was to use skip_row and implement it in the admin.

admin.py:

class CompositionsResource(resources.ModelResource):

    class Meta:
        model = Compositions
        skip_unchanged = True
        report_skipped = True


class CompositionsAdmin(ImportExportModelAdmin):
    resource_class = CompositionsResource


admin.site.register(Compositions, CompositionsAdmin)

This will not cure the problem, however, because skip_row expects an id in the csv file in order to check if each row is the same with the very specific database entry.

Considering that this control can be performed by the database when using unique(_together) would not it be effective to catch this error and then return skip_row = True or alternatively pass on this error?

Paton answered 12/1, 2016 at 12:35 Comment(9)
The default implementation seems quite functional and should work out of the box. Can you show your CSV? Maybe, the pk attribute is what differs?Dandruff
@AlexMorozov I have updated my answer with the relevant csv data. I wonder if removing the unique_together will make any difference, since the implemetation is correct. I will give it a try after work.Paton
Is your id column empty? If so, when checking for a duplicate, a module might fool himself, because the new object's pk is different than of that in database. So it tries to add another record, and yes, fails because of unique_together constraint. I'd recommend you not to remove the constraint, but first try to assign a permanent ids to your csv records. Either way, let me know how it went )Dandruff
@AlexMorozov Thank you for helping out! I have added the id: With the auto_now field, only the date field is updated. Without it, as in my example, it works as expected. In my case, it is unrealistic to provide the id: How can I make import_export pass on the integrity error raised?Paton
@Paton I read the issue on github and I'm facing the same. Did you maybe find a solution/workaround ?Brummell
@Brummell Actually, I did write a script that does the job. I am posting it below, although it is written 'on the fly'.Paton
@Brummell Checkout the most recent answer on this issue, it is overwhelmingly cool!Paton
have you tried exclude = ('id',) ?Noble
@Noble The idea is to not import the duplicate value but keep importing the next one without halting at the IntegrittyError. Excluding the id will not help in this particular case because the database raises this error if a duplicate entry exists in the column, even with another id.Paton
S
7

Only one Change is need. And you can use django-import-export

models.py

    class Compositions(models.Model):
        composer_key = models.ForeignKey(
            Composer,
            )
        composition = models.CharField(
            max_length=383,
            unique=False
            )
        date_created = models.DateTimeField(default=timezone.now)

        class Meta(object):
            unique_together = (('composer_key','composition'),)

override save_instance with try. And ignore error when fail. admin.py

        class CompositionsResource(resources.ModelResource):

            class Meta:
                model = Compositions
                skip_unchanged = True
                report_skipped = True

            def save_instance(self, instance, using_transactions=True, dry_run=False):
                try:
                    super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run)
                except IntegrityError:
                    pass

        class CompositionsAdmin(ImportExportModelAdmin):
            resource_class = CompositionsResource

        admin.site.register(Compositions, CompositionsAdmin)

and import this

from django.db import IntegrityError
Siva answered 27/3, 2018 at 23:59 Comment(12)
How to write an answer. Add more details to justify your answer.Chalybeate
Thank you, great answer. It would be better to call the parent save_instance to be sure that you do not break something. Instead of instance.save() you may try super().save_instance(self,instance, using_transactions=True, dry_run=False) and if it works add it to the post. Like the code here.Paton
usign super().save_instance(self,instance, using_transactions=True, dry_run=False) not work. see this django-import-export.readthedocs.io/en/latest/…Siva
Bear in mind that general error catching should be avoided, and also you have a typo: it is not self,instance, it is self.instance. Except for those, there is an issue with this solution: if a dry run is on the go, it will not pass on saving the instance. This interferes with import-export core functioning. You have a good answer but it cannot be marked as a solution.Paton
OK! What about calling super()? Why did not it work?Paton
not save data to model from file. seems to save, but changes in the model are notSiva
Can you try this code after making sure that it is def save_instance(self.instance,...(a dot, not a comma): super().save_instance(self.instance, using_transactions, dry_run)Paton
its works using super(CompositionsResource,self).save_instance(instance, using_transactions, dry_run)Siva
Yes that is the proper way. We need to call super() using the defined variables (using_transactions, dry_run), without redefining those variables. Edit your post, because it is the solution now!Paton
Thank you too! Just edit the first attribute in both def and super() to read self.instance.Paton
super(CompositionsResource,self).save_instance(self.instance,using_transactions, dry_run) create and error: atribute errror 'Resource' object has no attribute 'instance'. it works without self.instanceSiva
OK, you are very right. I just saw the source. Then the comma between self and instance is not a typo. To justify that, you need a space between self, and instance.Paton
B
5

A note on the accepted answer: it will give the desired result, but will slam the disk usage and time with large files.

A more efficient approach I've been using (after spending a lot of time going through the docs) is to override skip_row, and use a set of tuples as a unique constraint as part of the class. I still override save_instance as the other answer suggests to handle IntegrityErrors that get through, of course.

Python sets don't create duplicate entries, so they seem appropriate for this kind of unique index.

class CompositionsResource(resources.ModelResource):
  set_unique = set()

  class Meta:
    model = Composers
    skip_unchanged = True
    report_skipped = True

  def before_import(self, dataset, using_transactions, dry_run, **kwargs):
    # Clear out anything that may be there from a dry_run,
    #  such as the admin mixin preview
    self.set_unique = set()

  def skip_row(self, instance, original):
    composer_key = instance.composer_key  # Could also use composer_key_id
    composition = instance.composition
    tuple_unique = (composer_key, composition)

    if tuple_unique in self.set_unique:
      return true
    else:
      self.set_unique.add(tuple_unique)
    return super(CompositionsResource, self).skip_row(instance, original)

    # save_instance override should still go here to pass on IntegrityError

This approach will at least cut down on duplicates encountered within the same dataset. I used it to deal with multiple flat files that were ~60000 lines each, but had lots of repetitive/nested foreign keys. This made that initial data import way faster.

Bradstreet answered 30/1, 2019 at 23:32 Comment(0)
P
1

models.py:

class Compositions(models.Model):
    composer_key = models.ForeignKey(
        Composer,
        )
    composition = models.CharField(
        max_length=383,
        unique=False
        )
    date_created = models.DateTimeField(default=timezone.now)

    class Meta(object):
        unique_together = (('composer_key','composition'),)

This is a script I have written 'on the fly' for the above model in order to automatically discard duplicate entries. I have saved it to ./project_name/csv.py and import it from shell when I fill the relevant columns of the file duc.csv with data. The columns should not contain headers. Only data.

$./manage.py shell
>>> from project_name import csv

csv.py:

from data.models import Composer, Compositions
import csv
import sys, traceback
from django.utils import timezone

filename = '/path/to/duc.csv'

with open(filename, newline='') as csvfile:
    all_lines = csv.reader(csvfile, delimiter=',', quotechar='"')
    for each_line in all_lines:
        print (each_line)
        try:
            instance = Compositions(
                id=None,
                date_created=timezone.now(),
                composer_key=Composer.objects.get(id=each_line[2]),
                composition=each_line[3]
            )
            instance.save()
            print ("Saved composition: {0}".format(each_line[3]))
        except:  // exception type must be inserted here
            exc_type, exc_value, exc_traceback = sys.exc_info()  //debugging mostly
            print (exc_value)
Paton answered 22/7, 2016 at 19:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.