Good design pattern(s) for extensible program [closed]
Asked Answered
A

5

6

I have a question about how to make a good design for my program. My program is quite simple but I want to have good architecture and make my program easily extensible in the future.

My program need to fetch data from external data sources (XML), extract information from these data and at the end it need to prepare SQL statements to import information to database. So for all the external data sources that there are now and there will be in the future there is simple 'flow' of my application: fetch, extract and load.

I was thinking about creating generic classes called: DataFetcher, DataExtractor and DataLoader and then write specific ones that will inherit from them. I suppose I will need some factory design pattern, but which? FactoryMethod or Abstract Factory?

I want also to not use code like this:

if data_source == 'X':
     fetcher = XDataFetcher()
elif data_source == 'Y':
     fetcher = YDataFetcher()
....

Ideally (I am not sure if this is easily possible), I would like to write new 'data source processor', add one or two lines in existing code and my program would load data from new data source.

How can I get use of design patterns to accomplish my goals? If you could provide some examples in python it would be great.

Affable answered 8/2, 2013 at 15:50 Comment(0)
C
10

If the fetchers all have the same interface, you can use a dictionary:

fetcher_dict = {'X':XDataFetcher,'Y':YDataFetcher}
data_source = ...
fetcher = fetcher_dict[data_source]()

As far as keeping things flexible -- Just write clean idiomatic code. I tend to like the "You ain't gonna need it" (YAGNI) philosophy. If you spend too much time trying to look into the future to figure out what you're going to need, your code will end up too bloated and complex to make the simple adjustments when you find out what you actually need. If the code is clean up front, it should be easy enough to refactor later to suit your needs.

Chessy answered 8/2, 2013 at 15:55 Comment(13)
and you could even put fetcher_dict in a database so you don't have to modify your code (apart from the fetcher code of course)Tympanum
@ mgilson reads my mind. don't waste to much effort in design patterns. They are way to much overrated in terms of code design. See them as pointers to fix certain problems for you. Don't build your design around them. Otherwise you will end up with an unreadable mess of code. I have seen that too many times.Tympanum
Maps are switch statements in disguise ;)Resplendence
This is a dict. Maps in python are another beast than they are in java.Tympanum
@RickyA: In Python all mapping types are currently variations on dict.Instill
I tend to disagree with the sentiment being expressed here that basically it's not worth it to put much effort into the design of your code. Getting it right often does require more up-front effort, but saves you time later. Too many people, IMHO, just want to jump in and start coding right away, especially with interpreted languages like Python.Instill
@Instill -- regarding the statement that "all mappings are variations of dict" -- that's not true -- Users could define mappings that aren't dict. As far as the other point, it's a matter of preference certainly. but I've seen it taken too far a couple too many times: def square(x): return x*x. That's a pretty useless function to have around "just in case I want to change square later". Same thing with making classes out of things that should really be regular functions, etc. Of course, I'm not saying "inline everything" either ... Just trying to be practical.Chessy
@Instill -- And of course, the design strategy needs to match the ultimate goals as well. If you're trying to create some super-powerful,intuitive library for performing some task (e.g. hdf5 file format) -- Then it makes a little more sense to spend extra effort making things a bit more flexible.Chessy
As for my comment about mapping, I meant the predefine ones mentioned (dict, collections.defaultdict, collections.OrderedDict and collections.Counter), not every conceivable one possible -- indeed many functions are mappings.Instill
As for my other point about design: def square(x) certainly isn't an abstraction, so using it as an example is a bit of a straw man. Something more like def mathematical_function(x) of which square might be one of several possibilities would be more demonstrative. Obviously you wouldn't do that for something unlikely to ever vary, but the important point is that you thought about it and made a decision before writing a bunch of code that depends on it and will now have to be changed because you didn't allow for it because you never thought about it.Instill
@atamanroman: I think you got it backwards, switch statements are maps (although Python doesn't formally have them).Instill
My statement had nothing to do with specific implementations. The concept of maps should be similiar in almost any language. I just see this "oh so many if else stuff, let's use a map instead" quite often when it's about different implementations.Resplendence
maybe I tend to complicate things, but I just wanna write good code. I know that in 2 months time there will be new data source that will need to be processed, so I want to prepare. I found something like this: code.activestate.com/recipes/… is it good? I forgot to mansion that each fetcher will be called as a separate script, because each data source refresh its info in different times. so in this in mind I don't think that your code sample is good. I would have to change every script to updated fetcher_dict...Affable
U
1

You have neglected to talk about the most important part i.e. the shape of your data. That's really the most important thing here. "Design Patterns" are a distraction--many of these patterns exist because of language limitations that Python doesn't have and introduce unnecessary rigidity.

  1. Look first at the shape of your data. E.g.:
    1. First you have XML
    2. Then you have some collection of data extracted from the XML (a simple dict? A nested dict? What data do you need? Is it homogenous or heterogenous? This is the most important thing but you don't talk about it!)
    3. Then you serialize/persist this data in an SQL backend.
  2. Then design "interfaces" (verbal descriptions) of methods, properties, or even just items in a dict or tuple to facilitate operations on that data. If you keep it simple and stick to native Python types, you may not even need classes, just functions and dicts/tuples.
  3. Re-iterate until you have the level of abstraction you need for your application.

For example, the interface for an "extractor" might be "an iterable that yields xml strings". Note this could be either a generator or a class with an __iter__ and next() method! No need to define an abstract class and subclass it!

What kind of configurable polymorphism you add to your data depends on the exact shape of your data. For example you could use convention:

# persisters.py

def persist_foo(data):
    pass

# main.py
import persisters

data = {'type':'foo', 'values':{'field1':'a','field2':[1,2]}}
try:
   foo_persister = getitem(persisters, 'persist_'+data['type'])
except AttributeError:
   # no 'foo' persister is available!

Or if you need further abstraction (maybe you need to add new modules you can't control), you could use a registry (which is just a dict) and a module convention:

# registry.py
def register(registry, method, type_):
    """Returns a decorator that registers a callable in a registry for the method and type"""
    def register_decorator(callable_):
        registry.setdefault(method, {})[type_] = callable_
        return callable_
    return register_decorator

def merge_registries(r1, r2):
    for method, type_ in r2.iteritems():
        r1.setdefault(method, {}).update(r2[method])

def get_callable(registry, method, type_):
    try:
        callable_ = registry[method][type]
    except KeyError, e:
        e.message = 'No {} method for type {} in registry'.format(method, type)
        raise e
    return callable_

def retrieve_registry(module):
    try:
        return module.get_registry()
    except AttributeError:
        return {}

def add_module_registry(yourregistry, *modules)
    for module in modules:
        merge_registries(yourregistry, module)

# extractors.py
from registry import register

_REGISTRY = {}

def get_registry():
    return _REGISTRY


@register(_REGISTRY, 'extract', 'foo')
def foo_extractor(abc):
    print 'extracting_foo'

# main.py

import extractors, registry

my_registry = {}
registry.add_module_registry(my_registry, extractors)

foo_extracter = registry.get_callable(my_registry, 'extract', 'foo')

You can easily build a global registry on top of this structure if you want (although you should avoid global state even if it's a little less convenient.)

If you are building public framework and you need a maximum of extensibility and formalism and are willing to pay in complexity, you can look at something like zope.interface. (Which is used by Pyramid.)

Rather than roll your own extract-transform-load app, have you considered scrapy? Using scrapy you would write a "Spider" which is given a string and returns sequences of Items (your data) or Requests (requests for more strings, e.g. URLs to fetch). The Items are sent down a configurable item pipeline which does whatever it wants with the items it receives (e.g. persist in a DB) before passing them along.

Even if you don't use Scrapy, you should adopt a data-centric pipeline-like design and prefer thinking in terms of abstract "callable" and "iterable" interfaces instead of concrete "classes" and "patterns".

Ulysses answered 8/2, 2013 at 17:49 Comment(0)
C
0

What you're trying to do is a dynamic import of a module (which is based upon some Base Class). Much like the c++ use-case of dynamic dll loading.

Try the following SO question. As well as the python docs for importlib.import_module (which is just a wrapper around __import__)

import importlib
moduleToImport = importlib.import_module("moduleName")
Chesty answered 8/2, 2013 at 16:18 Comment(2)
Combine this for the fetcher and a database (eg. non code solution) for the configuration and you have a nice plugin system. No code rewrites needed. Again: most likely an overkill.Tympanum
or just take the name of the module that you want loaded (basically a plugin) as a cmd line argument (just as in the linked SO question) and you're finished. This way the only code that ever needs to be changed is in the plugins and not your 'caller' (unless the interface changes, but thats a different problem).Chesty
H
0

XML is stuctured, SQL-Inserts are tabular. Sounds simple enough, don't over-engineer it. Your basic approach will probably be:

  1. Find a bunch of XML files in the file system
  2. Call parser on XML, get back a couple of trees
  3. Query the trees or recurse over trees to fill in a bunch of tabular data structures
  4. Serialize the data structures producing a couple of INSERTs

Your "business logic" is point 3, which will change from case to case. A well written example will help any successor much more than several layers of abstractions. The whole thing is probably too small to merit its own domain specific language. Besides, XSLT already exists.

The other points are candidates for reuse, but to me sound more like well-written and well-documented function, than factory pattern.

Healion answered 8/2, 2013 at 17:0 Comment(0)
S
0

I would add the abstraction not at individual external source but above it. Abstract how you interact with external source. For example, SoapClient, HttpClient, XmlServiceCLient, NativeOBjectCLient etc. This way you only have to add a new class when you have to use a new way to call the external source. This way you don't have to write new fetcher classes often. (Note: Iam not a Python Developer)

Use a ServiceRegistry pattern to invoke external resource. One entry in your service registry.xml would be

    <service registry>
      <externaldatasource>
         <dsname> xmlservice</dsname>
         <endpointurl> some url </endpointurl>
         <invocationtype> Soap </invocationtype>
      </externaldatasource>
    </service registry> 

When a class needs to get to an external source, taht calss simply passes teh Service Registry class the datasource name. SR reads the xml file and invokes the external source and gets you the data. Now, a single class will handle all your external calls and there is not much code overhead.

Once you have the raw data from the client, you want to convert it to your data model. I am assuming you have your own Xsd. Use xslt to convert the incoming XML to ur xsd format and for validation too. I would suggest a factory pattern to handle non XML data formats like Json. Don't have to implement them...just a opening for future extension.

This whole set of classes can be under Gateway package. Any external client dependent code will be within this package and will not seep to any other packages. This is called the gateway pattern. The input/output to the public classes in this package would be your domain model.

Then you have one single logic for loading into db which is independent of any external data source.

Shrive answered 14/2, 2013 at 17:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.