I'm trying to build a specialised search engine web site that indexes a limited number of web sites. The solution I came up with is:
- using Nutch as the web crawler,
- using Solr as the search engine,
- the front-end and the site logic is coded with Wicket.
The problem is that I find Nutch quite complex and it's a big piece of software to customise, despite the fact that a detailed documentation (books, recent tutorials.. etc) does just not exist.
Questions now:
- Any constructive criticism about the hole idea of the site?
- Is there a good yet simple alternative to Nutch (as the crawling part of the site)?
Thanks