How to write a flexible modular program with good interaction possibilities between modules?

Asked 28/5, 2010 at 7:16 Answered 28/5, 2010 at 9:42

I went through answers on similar topics here on SO but could't find a satisfying answer. Since i know this is a rather large topic, i will try to be more specific.

I want to write a program which processes files. The processing is nontrivial, so the best way is to split different phases into standalone modules which then would be used as necessary (since sometimes i will be only interested in the output of module A, sometimes i would need output of five other modules, etc). The thing is, that i need the modules to cooperate, because the output of one might be the input of another. And i need it to be FAST. Moreover i want to avoid doing certain processing more than once (if module A creates some data which then need to be processed by module B and C, i don't want to run module A twice to create the input for modules B,C ).

The information the modules need to share would mostly be blocks of binary data and/or offsets into the processed files. The task of the main program would be quite simple - just parse arguments, run required modules (and perhaps give some output, or should this be the task of the modules?).

I don't need the modules to be loaded at runtime. It's perfectly fine to have libs with a .h file and recompile the program every time there is a new module or some module is updated. The idea of modules is here mainly because of code readability, maintaining and to be able to have more people working on different modules without the need to have some predefined interface or whatever (on the other hand, some "guidelines" on how to write the modules would be probably required, i know that). We can assume that the file processing is a read-only operation, the original file is not changed.

Could someone point me in a good direction on how to do this in C++ ? Any advice is wellcome (links, tutorials, pdf books...).

Dormie answered 28/5, 2010 at 7:16 Comment(1)

This question is basically "how do I write modular code"? As all code should be modular, there is nothing specifically about C++ here, or about your particular problem domain. and the answer is "by applying skill, talent and experience". – Mirnamirror 28/5, 2010 at 7:55

This looks very similar to a plugin architecture. I recommend to start with a (informal) data flow chart to identify:

how these blocks process data
what data needs to be transferred
what results come back from one block to another (data/error codes/ exceptions)

With these Information you can start to build generic interfaces, which allow to bind to other interfaces at runtime. Then I would add a factory function to each module to request the real processing object out of it. I don't recommend to get the processing objects direct out of the module interface, but to return a factory object, where the processing objects ca be retrieved. These processing objects then are used to build the entire processing chain.

A oversimplified outline would look like this:

struct Processor
{
    void doSomething(Data);
};

struct Module
{
    string name();
    Processor* getProcessor(WhichDoIWant);
    deleteprocessor(Processor*);
};

Out of my mind these patterns are likely to appear:

factory function: to get objects from modules
composite && decorator: forming the processing chain

Lyceum answered 28/5, 2010 at 7:51 Comment(8)

Thank you for your answer, the factory pattern approach looks good! – Dormie 28/5, 2010 at 8:13

The implementation of the factory looks wrong though. Use RAII and stop asking the client for returning its Processor to the Module: we know he'll forget! – Diane 28/5, 2010 at 8:52

@Matthieu M. even if there was no delete method, the client side must perform the deletion, since the objects can't pass per value, but only per pointer. So RAII does not prevent any damage at this point. The reason to have a deletion method is to have more freedom for the factory implementation, and not to be forced to use new for the object construction. I use this pattern in one project where some factories create objects upon demand, whereas others return pointers to singletons or objects from a pool. – Lyceum 28/5, 2010 at 12:6

Hum, I think I understand, the deleteprocessor method is in fact to ask the Module (factory) to remove an item from the "constructible" objects, is that it ? I usually use the "id" for that, so as not to ask the client to retrieve the object first. – Diane 28/5, 2010 at 12:58

@Matthieu M. My approach is that the factory returns the processor object, this processor object is bound by the requesting code into some processing context, then the processing happens and afterwards the processor object gets passed back to its factory for deletion. Using this way i can have more than one processor object alive at the same time. Say I have a two pipelines where every char should be converted to lower case, my factory can return two independent lower-case processors (or one singleton instance), and whenever one of these pipes is done, it returns its lowercase-processor. – Lyceum 28/5, 2010 at 13:33

Ah, then I don't like your solution: why do you explicitly have to return the processor to the factory ? Using RAII concept, it would be automatically returned (if you don't want to simply delete it) when the handle it's bound to drops out of the stack. That's way much cleaner. – Diane 28/5, 2010 at 13:52

@Matthieu M. you are mixing RAII with smart pointers. RAII means that allocation of a resource is also the initialization, there is nothing said about the resource deallocation. – Lyceum 31/5, 2010 at 6:54

Well, technically RAII only speaks of initialization. However it is generally used for guaranteed deallocation (that is implied by the ownership bit). So I am not mixing it up, but I may not be expressing myself clearly enough... – Diane 31/5, 2010 at 11:51

I am wondering if the C++ is the right level to think about for this purpose. In my experience, it has always proven useful to have separate programs that are piped together, in the UNIX philosophy.

If your data is no overly large, there are many advantages in splitting. You first gain the ability to test every phase of your processing independently, you run one program an redirect the output to a file: you can easily check the result. Then, you take advantage of multiple core systems even if each of your programs is single threaded, and thus much easier to create and debug. And you also take advantage of the operating system synchronization using the pipes between your programs. Maybe also some of your programs could be done using already existing utility programs?

Your final program will create the glue to gather all of your utilities into a single program, piping data from a program to another (no more files at this times), and replicating it as required for all your computations.

Linctus answered 28/5, 2010 at 7:39 Comment(3)

Forgot to say that i'm bound to Windows OS. And i really want just one program, not a set of programs which would work together (since it is quite possible that the modules I create won't be used only in my app, but also in others). Anyway, thanks for your answer. – Dormie 28/5, 2010 at 8:12

There are libraries for piping independent of the OS (or more precisely, abstracting it). – Diane 28/5, 2010 at 8:53

Being bound to Windows is not a show-stopper for creating several programs and piping them together. Even Windows can do this perfectly! – Linctus 28/5, 2010 at 11:44

This really seems quite trivial, so I suppose we miss some requirements.

Use Memoization to avoid computing the result more than once. This should be done in the framework.

You could use some flowchart to determine how to make the information pass from one module to another... but the simplest way is to have each module directly calling those they depend upon. With memoization it does not cost much since if it's already been computed, you're fine.

Since you need to be able to launch about any module, you need to give them IDs and register them somewhere with a way to look them up at runtime. There are two ways to do this.

Exemplar: You get the unique exemplar of this kind of module and execute it.
Factory: You create a module of the kind requested, execute it and throw it away.

The downside of the Exemplar method is that if you execute the module twice, you'll not be starting from a clean state but from the state that the last (possibly failed) execution left it in. For memoization it might be seen as an advantage, but if it failed the result is not computed (urgh), so I would recommend against it.

So how do you ... ?

Let's begin with the factory.

class Module;
class Result;

class Organizer
{
public:
  void AddModule(std::string id, const Module& module);
  void RemoveModule(const std::string& id);

  const Result* GetResult(const std::string& id) const;

private:
  typedef std::map< std::string, std::shared_ptr<const Module> > ModulesType;
  typedef std::map< std::string, std::shared_ptr<const Result> > ResultsType;

  ModulesType mModules;
  mutable ResultsType mResults; // Memoization
};

It's a very basic interface really. However, since we want a new instance of the module each time we invoke the Organizer (to avoid problem of reentrance), we need will need to work on our Module interface.

class Module
{
public:
  typedef std::auto_ptr<const Result> ResultPointer;

  virtual ~Module() {}               // it's a base class
  virtual Module* Clone() const = 0; // traditional cloning concept

  virtual ResultPointer Execute(const Organizer& organizer) = 0;
}; // class Module

And now, it's easy:

// Organizer implementation
const Result* Organizer::GetResult(const std::string& id)
{
  ResultsType::const_iterator res = mResults.find(id);

  // Memoized ?
  if (res != mResults.end()) return *(it->second);

  // Need to compute it
  // Look module up
  ModulesType::const_iterator mod = mModules.find(id);
  if (mod != mModules.end()) return 0;

  // Create a throw away clone
  std::auto_ptr<Module> module(it->second->Clone());

  // Compute
  std::shared_ptr<const Result> result(module->Execute(*this).release());
  if (!result.get()) return 0;

  // Store result as part of the Memoization thingy
  mResults[id] = result;

  return result.get();
}

And a simple Module/Result example:

struct FooResult: Result { FooResult(int r): mResult(r) {} int mResult; };

struct FooModule: Module
{
  virtual FooModule* Clone() const { return new FooModule(*this); }

  virtual ResultPointer Execute(const Organizer& organizer)
  {
    // check that the file has the correct format
    if(!organizer.GetResult("CheckModule")) return ResultPointer();

    return ResultPointer(new FooResult(42));
  }
};

And from main:

#include "project/organizer.h"
#include "project/foo.h"
#include "project/bar.h"


int main(int argc, char* argv[])
{
  Organizer org;

  org.AddModule("FooModule", FooModule());
  org.AddModule("BarModule", BarModule());

  for (int i = 1; i < argc; ++i)
  {
    const Result* result = org.GetResult(argv[i]);
    if (result) result->print();
    else std::cout << "Error while playing: " << argv[i] << "\n";
  }
  return 0;
}

Diane answered 28/5, 2010 at 9:42 Comment(0)

Recommended topics

Hot tags