Is it possible to access my django models inside of a Scrapy pipeline, so that I can save my scraped data straight to my model?
I've seen this, but I don't really get how to set it up?
Is it possible to access my django models inside of a Scrapy pipeline, so that I can save my scraped data straight to my model?
I've seen this, but I don't really get how to set it up?
If anyone else is having the same problem, this is how I solved it.
I added this to my scrapy settings.py file:
def setup_django_env(path):
import imp, os
from django.core.management import setup_environ
f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)
setup_environ(project)
setup_django_env('/path/to/django/project/')
Note: the path above is to your django project folder, not the settings.py file.
Now you will have full access to your django models inside of your scrapy project.
setup_django_env('/path/to/django/project/project/')
–
Overjoy Import Error
. Any chance you'd be willing to look at my [question] (#14686723) and offer advice? –
Sprite sudo scrapy deploy default -p eScraper Building egg of eScraper-1370604165 'build/scripts-2.7' does not exist -- can't clean it zip_safe flag not set; analyzing archive contents... Deploying eScraper-1370604165 to http://localhost:6800/addversion.json Server response (200): {"status": "error", "message": "ImportError: Error loading object 'eScraper.pipelines.EscraperPipeline': No module named eScraperInterfaceApp.models"}
–
Renter from django.core.management import setup_environ
will not work...then how to do this –
Snowbird The opposite solution (setup scrapy in a django management command):
# -*- coding: utf-8 -*-
# myapp/management/commands/scrapy.py
from __future__ import absolute_import
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def run_from_argv(self, argv):
self._argv = argv
self.execute()
def handle(self, *args, **options):
from scrapy.cmdline import execute
execute(self._argv[1:])
and in django's settings.py:
import os
os.environ['SCRAPY_SETTINGS_MODULE'] = 'scrapy_project.settings'
Then instead of scrapy foo
run ./manage.py scrapy foo
.
UPD: fixed the code to bypass django's options parsing.
python manage.py scrapy list
inside the Django project folder, I always get ImportError: No module named cmdline
. However, the module named cmdline
does exist and the site-packages directory of my Python installation is in the PYTHONPATH as well. What am I doing wrong? Thanks in advance! –
Infirmity from __future__ import absolute_import
wouldn't be necessary in Python 2.7. That's why I commented it out, but it only works with this line. Generally, I have some problems with understanding absolute and relative paths in Python. I definitely should read into this a bit more. Anyway, thanks for your help! –
Infirmity -o scraped_data.json -t json
. I know how to add options to commands in general, but how to link them to Scrapy's counterparts? –
Infirmity myapp.management.commands.scrapy.Command().run_from_argv(['', 'crawl', 'dmoz'])
–
Defection myapp.management.commands.scrapy.Command().run_from_argv(['scrapy','','crawl','dmoz'])
–
Liver scrapyd
server. However, when I execute python manage.py scrapy server
, I get scrapy.exceptions.NotConfigured: Unable to find scrapy.cfg file to infer project data dir
. How to resolve this? –
Infirmity Add DJANGO_SETTINGS_MODULE env in your scrapy project's settings.py
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'your_django_project.settings'
Now you can use DjangoItem in your scrapy project.
Edit:
You have to make sure that the your_django_project
projects settings.py
is available in PYTHONPATH
.
For Django 1.4, the project layout has changed. Instead of /myproject/settings.py, the settings module is in /myproject/myproject/settings.py.
I also added path's parent directory (/myproject) to sys.path to make it work correctly.
def setup_django_env(path):
import imp, os, sys
from django.core.management import setup_environ
f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)
setup_environ(project)
# Add path's parent directory to sys.path
sys.path.append(os.path.abspath(os.path.join(path, os.path.pardir)))
setup_django_env('/path/to/django/myproject/myproject/')
setup_environ
is deprecated starting from version 1.4. –
Infirmity Check out django-dynamic-scraper, it integrates a Scrapy spider manager into a Django site.
Why not create a __init__.py
file in the scrapy project folder and hook it up in INSTALLED_APPS
? Worked for me. I was able to simply use:
from my_app.models import MyModel
Hope that helps.
setup-environ
is deprecated. You may need to do the following in scrapy's settings file for newer versions of django 1.4+
def setup_django_env():
import sys, os, django
sys.path.append('/path/to/django/myapp')
os.environ['DJANGO_SETTINGS_MODULE'] = 'myapp.settings'
django.setup()
Minor update to solve KeyError. Python(3)/Django(1.10)/Scrapy(1.2.0)
from django.core.management.base import BaseCommand
class Command(BaseCommand):
help = 'Scrapy commands. Accessible from: "Django manage.py". '
def __init__(self, stdout=None, stderr=None, no_color=False):
super().__init__(stdout=None, stderr=None, no_color=False)
# Optional attribute declaration.
self.no_color = no_color
self.stderr = stderr
self.stdout = stdout
# Actual declaration of CLI command
self._argv = None
def run_from_argv(self, argv):
self._argv = argv
self.execute(stdout=None, stderr=None, no_color=False)
def handle(self, *args, **options):
from scrapy.cmdline import execute
execute(self._argv[1:])
The SCRAPY_SETTINGS_MODULE declaration is still required.
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'scrapy_project.settings')
© 2022 - 2024 — McMap. All rights reserved.