Performance Optimisation for connecting data in django models
Asked Answered
T

1

6

At my job, I often have two tables in my django models and have to connect them to return this data as an csv for example. That data is not connected by a foreign key, but they have an identifier to connect them. This results from the fact, that we import this data from two different sources and sometimes the counterpart is missing, so I can't connect it while creating the entry.

My question is: What is the best way to connect this data in terms of performance if you think of the fact that I often have to return this data?

  1. Way: Create a new model that connects the data (like an m2m) or parent class with the identifier, that both are connected to.
class OrderInvoiceConnector(models.Model):
    order_data = models.ForeignKey(Order, related_name="invoice")
    invoice_data = models.ForeignKey(Invoice, related_name="order")
  1. Way: Create a new model that saves only the data that is needed for the csv export. Something like:
class ConnectedData(models.Model):
    invoice_id = models.CharField(max_length=255)
    country_iso = models.CharField(max_length=255)
    invoice_date = models.CharField(max_length=255)
    tax = models.FloatField(max_length=255)
    price = models.FloatField()
Timelag answered 14/12, 2015 at 13:5 Comment(5)
Its hard to give a proper answer below but i'll try and comment on what i'd do and maybe it might be helpful but depends on a few things. If the user is waiting for a CSV file to download i'd probably go with option 2 but this all depends on how often the data changes and how long it takes to generate. A few times a day with small amounts of data is OK. Also if this CSV not customisable in the django project by the user, i'd think about generating the CSV before the user requests it. Otherwise your first option. Saying that, this depends on a lot of things.Shipley
Could you provide a bit more info on what the performance issue is? Does the DB join take a few seconds/minutes/hours. If the user waiting for this? How often does the data change?Shipley
The data change once a day right. The user will be able to download the csv from an normal website, with filtering options (e.g.: filter by date). The join takes 2hours for 150.000 datasets on each site.Timelag
I would go with second variant, as you mentioned that joining would be expensive, and data changes are produced on a daily basis. If you create a read-only model you'll be packing all required data for user consumption in a single table query. You'd have to populate data with some automated job, but that looks acceptable in your mentioned scenario.Spendable
Yep then second option too. I wish i had enough orders to make a join last 2hours. FYI, i would actually look at the query SQLAlchemy is creating and try to find a better structure. Even if you have millions of rows in each table you should get better performance than that.Shipley
N
1

I would go with second variant, as you mentioned that joining would be expensive, and data changes are produced on a daily basis. If you create a read-only model you'll be packing all required data for user consumption in a single table query. You'd have to populate data with some automated job, but that looks acceptable in your mentioned scenario.

Nocuous answered 14/12, 2015 at 16:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.