Meta Search Engine Architecture
Asked Answered
R

5

15

The question wasn't clear enough, I think; here's an updated straight to the point question:

What are the common architectures used in building a meta search engine and is there any libraries available to build that type of search engine?

I'm looking at building an "enterprise" type of search engine where the indexed data could be coming from proprietary (like Autonomy or a Google Box) or public search engines (like Google Web or Yahoo Web).

Realpolitik answered 17/5, 2010 at 15:8 Comment(3)
.. depends on what do you mean by "meta search"Rubicund
I mean a search engine of search engine. Like en.wikipedia.org/wiki/Metasearch_engine for example. It is also common to see federated search.Realpolitik
What aspects of the architecture are you interested in? I covered the basic Adapter idea you might need to use to talk to different search engines in my answer, but is there something else you're wanting to find out about? Managing the in-flight requests (as I assume you'll be doing them in parallel) maybe? Or something else entirely.Mulry
B
9

If you look at Garlic (pdf), you'll notice that its architecture is generic enough and can be adapted to a meta-search engine.

UPDATE:

The rough architectural sketch is something like this:

   +---------------------------+
   |                           |
   |    Meta-Search Engine     |         +---------------+
   |                           |         |               |
   |   +-------------------+   |---------| Configuration |
   |   | Query Processor   |   |         |               |
   |   |                   |   |         +---------------+
   |   +-------------------+   |
   +-------------+-------------+
                 |
      +----------+---------------+
   +--+----------+-------------+ |
   |             |             | |
   |     +-------+-------+     | |
   |     |    Wrapper    |     | |
   |     |               |     | |
   |     +-------+-------+     | |
   |             |             | |
   |             |             | |
   |     +-------+--------+    | |
   |     |                |    | |
   |     | Search Engine  |    | |
   |     |                |    +-+
   |     +----------------+    |
   +---------------------------+

The parts depicted are:

  • Meta-Search Engine - the engine, orchestrates the whole thing.
  • Query Processor - part of the engine, resolves capabilities, sends requests and aggregates results of specific search engines (through the wrappers).
  • Wrapper - bridges the meta-search engine API to specific search engines. Each wrapper works with a specific search engine. Exposes the external search engine capabilities to the meta-search engine, accepts and responds to search requests.
  • Search engine - external search engines to query, they're exposed to the meta-search engine through the wrappers.
  • Configuration - data that configures the meta-search engine, e.g., which wrappers to use, where to find more wrappers, etc. Can also configure the wrappers.
Burnley answered 19/5, 2010 at 22:52 Comment(3)
Eh, be careful when linking to a PDF!Realpolitik
+1 : I am currently working on this and this is pretty much what I ended up with. I have a Meta-Query, and the Wrapper translates the query into the format of the actual search-engine. the wrapper then translates the answer to the Meta-Result and here you go...Declarer
How was this sketch generated? From any website? or using any tool?Clarance
S
3

Have a look at Lucene.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Swat answered 19/5, 2010 at 22:21 Comment(2)
Can Lucene/Solr/Nutch handle meta-searching (or federated searching)?Realpolitik
Not directly. But Lucene's indexing capabilities are awesome, especially incremental index construction and merging multiple indexes. Feature list is lucene.apache.org/java/docs/features.htmlSwat
F
2

Not exactly what you are looking for but I'd still suggest to check Compass, it might give you some ideas. And maybe also Hibernate Search.

Update: To clarify, Compass is not an ORM (neither Hibernate Search), it's a search oriented API and because it tries to abstract the underlying search engine (Lucene), I was suggesting to have a look at some structures it uses: Analyzers, Analyzer Filter, Query Parser, etc.

Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search (...)

See also:

Ferrigno answered 19/5, 2010 at 22:10 Comment(0)
M
1

This page seems to list a few:

http://java-source.net/open-source/search-engines

I'd imagine the APIs will all be a similar in that they take a query string and some options, and return a collection of results. However, the exact types of the options and results are likely to be different, so I'd have thought that you'd need some sort of Adapter approach (for example) to unify access to the different backends.

Mulry answered 18/5, 2010 at 20:20 Comment(0)
R
1

If you can read Objective-C and want to see a working example of something like a "meta-search engine" you might want to take a look at the source code for Google's Vermilion framework. It use the engine that backs the very popular Google Quick Search Box utility for OS X (which in turn is a lot like QuickSilver.

The framework provides the capability to add plugin backends for the search process and deals with merge sorting the results from a number of sources etc. I would imagine the design for a federated search engine of any sort would follow a similar design.

Raseda answered 25/5, 2010 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.