I've just entered into an environment where I am much more free to choose whatever approach I want for a project (meaning full access to the CPAN and no module-approval-by-committee), but I'm a little out of touch with the new hotnesses, so I thought I'd solicit for ideas here.
My project involves scraping multiple sources with varying formats (html, zipped text, csv, etc.) normalizing and then processing them into some sort of datastore. The pulls need to happen at programmable intervals and I'd like to make the back-end modular so that similar sources can use the same codebase. It also needs to be able to respond via the web with a simple status of running processes (nothing fancy). I was thinking POE might be a good idea with several collector processes reporting to one master, but are there any specific modules in POE (or elsewhere) that anyone thinks I should have a look at?