The best known pluggable app for that is Django-Haystack which allows you to connect to several search backends :
- Solr / Lucene the buzzword-compliant Apache foundation project
- Whoosh a native python search library
- Xapian another very good semantic search engine
haystack allows you to use an API which looks like Django's own Queryset syntax to use directly these search engines (which all happens to have their own API and dialects).
If you're juste after scraping tools, whatever tool you'll use : BeautifulSoup or Scrappy, you'll be on your own, writing python code that will parse what you want to parse, and then populate your django models.
This can even be separate python scripts , available in the commands.py module.
If you have a lot of files to search, you will probably need an index, which is rebuilt frequently and allows fast searches without hitting the django ORM.
Using a Solr index (for example) enables you to create other fields on-the-fly, like virtual fields based on your real model's fields (ex : splitting author firstname and lastname, adding an uppercased file title field, whatever)
Of course, f you don't need speedy indexation, keyword boost or semantic analysis, you still can do a classic full-text search over a couple of django model fields i :