How to implement an Enterprise Search
Asked Answered
A

4

3

We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.

What we need:

  1. Column search
    • ability to search by column
    • we flag which columns in a table are searchable
  2. Keep some relation between db column and data
    • we provide advanced filtering on the results
    • facilitates (amazon style) filtering
    • filter provided by grouping of results and allowing user to filter them via a checkbox
    • this is a great feature, users like it very much
  3. Partial Word Match
    • we have a lot of unique identifiers (product id, etc).
    • the unique id's can have sub parts with meaning (location, etc)
    • or only a portion may be available (when the user is searching)
    • or (by a decidedly poor design decision) there may be white space in the id
    • this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
    • using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
    • didn't test on Oracle
    • however searches against both types of db are very fast
    • We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
      • this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
      • both provide speedups in read-only join situations (like a search combing company and product)
    • A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
    • Full Text Search is not the best in this area (slow and inconsistent matching)
    • partial matching (see "Partial Word Match")

Nice to have:

  1. Search database in real time
    • skip the indexing skip, this is not a hard requirement
  2. Spelling suggestion

What we don't need:

  1. We don't need to index documents
    • at this point searching our data sources are the most important thing
    • even when we do search documents, we will be looking for partial word matching, etc
  2. Ranking
    • Our own simple ranking algorithm has proven much better than an FTS equivalent.
    • Users understand it, we understand it, it's almost always relevant.
  3. Stemming
    • Just don't need to get [run|ran|running]
  4. Advanced search operators
    • phrase matching, or/and, etc
    • according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
      • most users are using simple search phrases
      • very few use advanced searches (when it's available)
      • also in Information Architecture 3rd edition Page 185
      • "few users take advantage of them [advanced search functions]"
      • http://oreilly.com/catalog/9780596000356
      • our Amazon like filtering allows better filtering anyway (via user testing)
  5. Full Text Search
    • We've found that results don't always "make sense" to the user
    • Searching with FTS is hard to tune (which set of operators match the users expectations)
    • Advanced search operators are a no go
    • we don't need them because
    • users don't understand them
    • Performance has been very close (+/1) to the char index functions
    • but the results are sometimes just "weird"

The question: Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?

I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.

Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?

You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."

Aromaticity answered 7/7, 2010 at 21:17 Comment(0)
A
1

I ended up coding my own.

The results are fantastic. Users like it, it works well with our existing technologies.

It really wasn't that hard. Just took some time.

Features:

  • Faceted search (amazon, walmart, etc)
  • Partial word search (the real stuff not full text)
  • Search databases (oracle, sql server, etc) and non database sources
  • Integrates well with our existing environment
  • Maintains relations, so I can have a n to n search and display
  • --> this means I can display child records of a master record in search results
  • --> also I can search any child field and return the master record

It's really amazing what you can do with dictionaries and a lot of memory.

Aromaticity answered 14/9, 2011 at 16:32 Comment(0)
F
0

I recommend looking into Solr, I believe it will meet you needs:

http://lucene.apache.org/solr/

Foreclose answered 7/7, 2010 at 21:21 Comment(2)
Any advice on how to do partial word matching with solr?Aromaticity
I believe it accepts wildcards with ? or * it has been a while since I last looked at it though.Foreclose
T
0

For an off-she-shelf solution: Have you checked out the Google Search Appliance?

Quote from the Google Mini/GSA site:

... If direct database indexing is a requirement for you, we encourage you to consider the Google Search Appliance, which has direct database connectivity.

And of course it indexes everything else in the Googly manner you'd expect it to.

Theosophy answered 7/7, 2010 at 21:22 Comment(2)
Very interesting. Any info on how the GSA searches databases?Aromaticity
@Chris: Here's the blurb on that question: google.com/support/gsa/bin/answer.py?hl=en&answer=15846 Not specific i know, but still very interesting. And really interesting is the fact that you can customize the search.Theosophy
A
0

Apache Solr is a good way to start your project with and it is open source . You can also try Elastic Search and there are a lot of off shelf products which offer good customization abilities and search features such as Coveo, SharePoint Fast, Google ...

Adar answered 10/11, 2015 at 23:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.