I am trying to build a Django app that functions sort of like a store. Items are scraped from around the internet, and update the Django project database continuously over time (say every few days). I am using the Scrapy framework to perform scraping, and while there is an experimental DjangoItem feature, I would rather stay away from it because it is unstable.
Right now my plan is to create XML files of crawled items with Scrapy's XMLItemExporter
(docs here), and use those to loaddata
into the Django project as XML fixtures (docs here). This seems to be okay because if one of the two processes screws up, there is a file intermediary between them. Modularizing the application as a whole also doesn't seem like a bad idea.
Some concerns are:
- That these files might be too large to read into memory for Django's
loaddata
. - That I am spending too much time on this when there might be a better or easier solution, such as exporting directly to the database, which is MySQL in this case.
- No one seems to have written about this process online, which is strange considering Scrapy is an excellent framework to plug into a Django app in my opinion.
- There is no definitive guide of manually creating Django fixtures on Django's docs - it seems like it is geared more towards the dumping and reloading of fixtures from the app itself.
The existance of the experimental DjangoItem suggests that Scrapy + Django is a popular enough choice for there to be a good solution here.
I would greatly appreciate any solutions, advice, or wisdom on this matter.