How to build a search engine in C# [closed]
Asked Answered
G

3

15

I am trying to build a web application in ASP.NET MVC and need build a pretty complex search feature. When a user enters a search term I want to search a variety of data sources which include documents, tables in the database, webpage urls and some APIs like facebook. Any tips, tutorials and hints would be greatly appreciated.

Grotto answered 29/5, 2010 at 2:4 Comment(4)
where are you stucked? You are stucked with index storage or with searching, or with query analysis?Search engine is quite a big topicPicrotoxin
What portion of that are you having difficulty with? If you're having trouble building a complex search engine, I would start with a simple one first. Build something that searches documents only, because you'll eventually need that part. Then move on to database searches.Impacted
Point google at it. Voila, instant search.Hatchway
use 3rd party, You can actually buy a server from Google or install Lucen (or similar) on a machine of yours and use that as search engine. I have done this with Google, pretty simple, pretty efficient, pretty expensive ;-)Turanian
E
17

Your question suggests that you're probably not planing to implement the whole feature from scratch, so here are some links that you may find useful.

  • One (the easiest) option would be to use a third-party search engine (e.g. Google Custom Search, but Bing probably has a similar API). This allows you to search (only) your page using Google and display the results in a customized way. The limiation is that it searches only data displayed on some (linked) pages.

  • A more sophisticated approach is to use some .NET library that implements indexing for you (based on the data you give it). A popular library is for example Lucene.Net. In this case, you give it the data you want to search explicitly (relevant content from web pages, database content, etc.), so you have more control of what is being searched (but it is a bit more work).

Everything answered 29/5, 2010 at 2:47 Comment(2)
lucene.net is dead by the looks of it, is there other alternatives?Badgett
@augustas I could be wrong but there seems to be some activity git-wip-us.apache.org/repos/asf?p=lucenenet.git (48 hours in git-wip-us.apache.org/repos/… ) It's an on and off "spare time" project for the people from apache (and occasional other big corp) who work on it. 3.0.3 is dead AFAIK. 4.8 was in beta two years ago. dunno what happened there code972.com/blog/2016/07/… - so half dead maybe? Like one of those really fast zombies Hollywood has now.Incongruity
I
8

Building the actual search index structures and algorithms is no trivial feat. That's why people use Lucene, Sphinx, Solr, etc. Using google.com, as recommended in the comments, will give you no control and poor matching compared to what you'll get from one of these free search engines, when properly configured and used.

I recomend taking a look at Solr, it gives you the power of Lucene but it's much easier to use, plus it adds several convenience features like caching, sharding, faceting, etc.

SolrNet is a Solr client for .Net, it has a sample ASP.NET MVC app that you can use to see how it works and as a base to your project.

Disclaimer: I'm the author of SolrNet.

Inharmonious answered 29/5, 2010 at 15:58 Comment(1)
Is it still in active development? The github looks active thoDeitz
W
3

I wrote a custom search engine for my MVC 4 site. It parses the View directories and reads all the .cshtml files, matching the supplied terms with a regular expression. Here is the basic code:

List<string> results = new List<string>();
        DirectoryInfo di = new DirectoryInfo(System.Configuration.ConfigurationManager.AppSettings["PathToSearchableViews"]);
        //get all view directories except the shared
        foreach (DirectoryInfo d in di.GetDirectories().Where(d=>d.Name != "Shared"))
        {
            //get all the .cshtml files
            foreach (FileInfo fi in d.GetFiles().Where(e=>e.Extension  == ".cshtml"))
            {
                //check if cshtml file and exclude partial pages
                if (fi.Name.Substring(0,1) != "_")
                {
                    MatchCollection matches;
                    bool foundMatch = false;
                    int matchCount = 0;
                    using (StreamReader sr = new StreamReader(fi.FullName))
                    {
                        string file = sr.ReadToEnd();
                        foreach (string word in terms)
                        {
                            Regex exp = new Regex("(?i)" + word.Trim() + "(?-i)");
                            matches = exp.Matches(file);
                            if (matches.Count > 0)
                            {
                                foundMatch = true;
                                matchCount = matches.Count;
                            }
                        }
                        //check match count and create links
                        //
                        //
                    }
                }
            }
        }
        return results;
Woolworth answered 4/6, 2013 at 4:49 Comment(1)
it seems it's not working if some data are read form database, right?Arman

© 2022 - 2024 — McMap. All rights reserved.