What is the best full text search open source project (.NET preferred)?
Asked Answered
Z

2

16

I've developed an index and search application with Lucene library. but this library has some limitation in custom ranking in my context, aside from its performance, i need scalability and access to all kinds of word frequencies and etc. is there any powerful open source full text library available?

Zink answered 8/11, 2010 at 13:34 Comment(2)
I've found that performance with Lucene.net is incredible, so it's a surprise to hear someone say they've got problems with performance! (BTW, Lucene has a pretty good API for custom scoring etc as well)Institutionalize
I don't have any problem with performance of lucene but custom ranking is so difficult.Zink
W
6

http://www.sphinxsearch.com

http://www.sphinxconnector.net/

Key Sphinx features are:

  • high indexing and searching performance;
  • advanced indexing and querying tools (flexible and feature-rich text tokenizer, querying language, several different ranking modes, etc);
  • advanced result set post-processing (SELECT with expressions, WHERE, ORDER BY, GROUP BY etc over text search results);
  • proven scalability up to billions of documents, terabytes of data, and thousands of queries per second;
  • easy integration with SQL and XML data sources, and SphinxAPI, SphinxQL, or SphinxSE search interfaces;
  • easy scaling with distributed searches.

To expand a bit, Sphinx:

  • has high indexing speed (upto 10-15 MB/sec per core on an internal benchmark);
  • has high search speed (upto 150-250 queries/sec per core against 1,000,000 documents, 1.2 GB of data on an internal benchmark);
  • has high scalability (biggest known cluster indexes over 3,000,000,000 documents, and busiest one peaks over 50,000,000 queries/day);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document excerpts (snippets) generation;
  • provides searching from within application with SphinxAPI or SphinxQL interfaces, and from within MySQL with pluggable SphinxSE storage engine;
  • supports boolean, phrase, word proximity and other types of queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports morphological word forms dictionaries;
  • supports tokenizing exceptions;
  • supports both single-byte encodings and UTF-8;
  • supports stemming (stemmers for English, Russian and Czech are built-in; and stemmers for French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Finnish, Hungarian, are available by building third party libstemmer library);
  • supports MySQL natively (all types of tables, including MyISAM, InnoDB, NDB, Archive, etc are supported);
  • supports PostgreSQL natively;
  • supports ODBC compliant databases (MS SQL, Oracle, etc) natively;
  • ...has 50+ other features not listed here, refer to API and configuration manual!
Weismannism answered 8/11, 2010 at 13:39 Comment(0)
U
2

You can use the library Bsa.Search.Core to search in .Net

The library contains 4 index types:

  • MemoryDocumentIndex - fast memory index
  • DiskDocumentIndex stores the index on disk
  • FileDocumentIndex - indexing files
  • ShardDocumentIndex - stores large indexes on disk of more than 3 million documents

Example of using Memory index

var field = "*";
var query = "one | two";

var documentIndex = new MemoryDocumentIndex();
var content = "one two one two second try to welcome";
var title = "one first second four";

while (!documentIndex.IsReady) 
{ 
    Thread.Sleep(500); 
}

var searchService = new SearchServiceEngine(documentIndex);

var doc = new IndexDocument("ExternalId");
doc.Add("content".GetField(content);
// filter
doc.Add("intValue".GetFilterField(10));
doc.Add("longValue".GetFilterField(20l));
doc.Add("dateValue".GetFilterField(DateTime.UtcNow));

searchService.Index(new IndexDocument[]
{
    doc
});

var query = "one | two";
var parsed = query.Parse("*");


var request = new SearchQueryRequest()
{
    Query = parsed,
    Field = field,
    ShowHighlight = true,
    OrderField = SortOrderFields.Relevance,
    Order = SortOrder.Desc,
    Size = 20,
    Fields = new List<string>()
    {
        "content","id"
    },
    Filter = new FilterClause()
    {
        Condition = FilterCondition.Equal,
        Value = "intValue".GetFilterField(10),
        Next = new FilterClause()
        {
            Condition = FilterCondition.Great,
            Value = "longValue".GetFilterField(21l)
        }
    }
};
var result = searchService.Search(request);
Underpay answered 11/1, 2022 at 11:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.