Writing a modular aggregator and normalizer in Perl
Asked Answered
S

1

7

I've just entered into an environment where I am much more free to choose whatever approach I want for a project (meaning full access to the CPAN and no module-approval-by-committee), but I'm a little out of touch with the new hotnesses, so I thought I'd solicit for ideas here.

My project involves scraping multiple sources with varying formats (html, zipped text, csv, etc.) normalizing and then processing them into some sort of datastore. The pulls need to happen at programmable intervals and I'd like to make the back-end modular so that similar sources can use the same codebase. It also needs to be able to respond via the web with a simple status of running processes (nothing fancy). I was thinking POE might be a good idea with several collector processes reporting to one master, but are there any specific modules in POE (or elsewhere) that anyone thinks I should have a look at?

Scever answered 17/8, 2011 at 18:44 Comment(0)
B
1

WWW::Mechanize is a great module for getting info off webpages.
It allows you to login to websites by providing login and password, allows you to submit forms and so on.

You can find more info at: http://metacpan.org/pod/WWW::Mechanize

Bulletproof answered 17/8, 2011 at 18:54 Comment(1)
I'm already familiar, actually. I'm more curious about event processing and architecture. I should have mentioned WWW::Mechanize, though. That's a great recommendation.Scever

© 2022 - 2024 — McMap. All rights reserved.