I've developed an index and search application with Lucene library. but this library has some limitation in custom ranking in my context, aside from its performance, i need scalability and access to all kinds of word frequencies and etc. is there any powerful open source full text library available?
What is the best full text search open source project (.NET preferred)?
Asked Answered
I've found that performance with Lucene.net is incredible, so it's a surprise to hear someone say they've got problems with performance! (BTW, Lucene has a pretty good API for custom scoring etc as well) –
Institutionalize
I don't have any problem with performance of lucene but custom ranking is so difficult. –
Zink
http://www.sphinxconnector.net/
Key Sphinx features are:
- high indexing and searching performance;
- advanced indexing and querying tools (flexible and feature-rich text tokenizer, querying language, several different ranking modes, etc);
- advanced result set post-processing (SELECT with expressions, WHERE, ORDER BY, GROUP BY etc over text search results);
- proven scalability up to billions of documents, terabytes of data, and thousands of queries per second;
- easy integration with SQL and XML data sources, and SphinxAPI, SphinxQL, or SphinxSE search interfaces;
- easy scaling with distributed searches.
To expand a bit, Sphinx:
- has high indexing speed (upto 10-15 MB/sec per core on an internal benchmark);
- has high search speed (upto 150-250 queries/sec per core against 1,000,000 documents, 1.2 GB of data on an internal benchmark);
- has high scalability (biggest known cluster indexes over 3,000,000,000 documents, and busiest one peaks over 50,000,000 queries/day);
- provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
- provides distributed searching capabilities;
- provides document excerpts (snippets) generation;
- provides searching from within application with SphinxAPI or SphinxQL interfaces, and from within MySQL with pluggable SphinxSE storage engine;
- supports boolean, phrase, word proximity and other types of queries;
- supports multiple full-text fields per document (upto 32 by default);
- supports multiple additional attributes per document (ie. groups, timestamps, etc);
- supports stopwords;
- supports morphological word forms dictionaries;
- supports tokenizing exceptions;
- supports both single-byte encodings and UTF-8;
- supports stemming (stemmers for English, Russian and Czech are built-in; and stemmers for French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Finnish, Hungarian, are available by building third party libstemmer library);
- supports MySQL natively (all types of tables, including MyISAM, InnoDB, NDB, Archive, etc are supported);
- supports PostgreSQL natively;
- supports ODBC compliant databases (MS SQL, Oracle, etc) natively;
- ...has 50+ other features not listed here, refer to API and configuration manual!
You can use the library Bsa.Search.Core to search in .Net
The library contains 4 index types:
- MemoryDocumentIndex - fast memory index
- DiskDocumentIndex stores the index on disk
- FileDocumentIndex - indexing files
- ShardDocumentIndex - stores large indexes on disk of more than 3 million documents
Example of using Memory index
var field = "*";
var query = "one | two";
var documentIndex = new MemoryDocumentIndex();
var content = "one two one two second try to welcome";
var title = "one first second four";
while (!documentIndex.IsReady)
{
Thread.Sleep(500);
}
var searchService = new SearchServiceEngine(documentIndex);
var doc = new IndexDocument("ExternalId");
doc.Add("content".GetField(content);
// filter
doc.Add("intValue".GetFilterField(10));
doc.Add("longValue".GetFilterField(20l));
doc.Add("dateValue".GetFilterField(DateTime.UtcNow));
searchService.Index(new IndexDocument[]
{
doc
});
var query = "one | two";
var parsed = query.Parse("*");
var request = new SearchQueryRequest()
{
Query = parsed,
Field = field,
ShowHighlight = true,
OrderField = SortOrderFields.Relevance,
Order = SortOrder.Desc,
Size = 20,
Fields = new List<string>()
{
"content","id"
},
Filter = new FilterClause()
{
Condition = FilterCondition.Equal,
Value = "intValue".GetFilterField(10),
Next = new FilterClause()
{
Condition = FilterCondition.Great,
Value = "longValue".GetFilterField(21l)
}
}
};
var result = searchService.Search(request);
© 2022 - 2024 — McMap. All rights reserved.