You have neglected to talk about the most important part i.e. the shape of your data. That's really the most important thing here. "Design Patterns" are a distraction--many of these patterns exist because of language limitations that Python doesn't have and introduce unnecessary rigidity.
- Look first at the shape of your data. E.g.:
- First you have XML
- Then you have some collection of data extracted from the XML (a simple dict? A nested dict? What data do you need? Is it homogenous or heterogenous? This is the most important thing but you don't talk about it!)
- Then you serialize/persist this data in an SQL backend.
- Then design "interfaces" (verbal descriptions) of methods, properties, or even just items in a dict or tuple to facilitate operations on that data. If you keep it simple and stick to native Python types, you may not even need classes, just functions and dicts/tuples.
- Re-iterate until you have the level of abstraction you need for your application.
For example, the interface for an "extractor" might be "an iterable that yields xml strings". Note this could be either a generator or a class with an __iter__
and next()
method! No need to define an abstract class and subclass it!
What kind of configurable polymorphism you add to your data depends on the exact shape of your data. For example you could use convention:
# persisters.py
def persist_foo(data):
pass
# main.py
import persisters
data = {'type':'foo', 'values':{'field1':'a','field2':[1,2]}}
try:
foo_persister = getitem(persisters, 'persist_'+data['type'])
except AttributeError:
# no 'foo' persister is available!
Or if you need further abstraction (maybe you need to add new modules you can't control), you could use a registry (which is just a dict) and a module convention:
# registry.py
def register(registry, method, type_):
"""Returns a decorator that registers a callable in a registry for the method and type"""
def register_decorator(callable_):
registry.setdefault(method, {})[type_] = callable_
return callable_
return register_decorator
def merge_registries(r1, r2):
for method, type_ in r2.iteritems():
r1.setdefault(method, {}).update(r2[method])
def get_callable(registry, method, type_):
try:
callable_ = registry[method][type]
except KeyError, e:
e.message = 'No {} method for type {} in registry'.format(method, type)
raise e
return callable_
def retrieve_registry(module):
try:
return module.get_registry()
except AttributeError:
return {}
def add_module_registry(yourregistry, *modules)
for module in modules:
merge_registries(yourregistry, module)
# extractors.py
from registry import register
_REGISTRY = {}
def get_registry():
return _REGISTRY
@register(_REGISTRY, 'extract', 'foo')
def foo_extractor(abc):
print 'extracting_foo'
# main.py
import extractors, registry
my_registry = {}
registry.add_module_registry(my_registry, extractors)
foo_extracter = registry.get_callable(my_registry, 'extract', 'foo')
You can easily build a global registry on top of this structure if you want (although you should avoid global state even if it's a little less convenient.)
If you are building public framework and you need a maximum of extensibility and formalism and are willing to pay in complexity, you can look at something like zope.interface
. (Which is used by Pyramid.)
Rather than roll your own extract-transform-load app, have you considered scrapy? Using scrapy you would write a "Spider" which is given a string and returns sequences of Items (your data) or Requests (requests for more strings, e.g. URLs to fetch). The Items are sent down a configurable item pipeline which does whatever it wants with the items it receives (e.g. persist in a DB) before passing them along.
Even if you don't use Scrapy, you should adopt a data-centric pipeline-like design and prefer thinking in terms of abstract "callable" and "iterable" interfaces instead of concrete "classes" and "patterns".