How to index source code with ElasticSearch
Asked Answered
L

4

15

I need to provide full text search on javascript source files and highlighting of results.

My question is what combination of existing ElasticSearch tokenizers and analyzers would be best for this?

Leoleod answered 17/10, 2011 at 17:18 Comment(0)
A
5

Interesting question but I'm not aware of an out of the box solution. You can use a WordDelimiter tokenizer as you can specify e.g. the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.

But I doubt that the results are sufficient ... and you'll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields

Arletha answered 22/10, 2011 at 8:56 Comment(1)
As a developer searching source code, would you actually WANT to find hello_world or helloWorld with just "hello" or just "world"? In our case at least, we have elastic search for all of our code repositories, and we usually search for a specific spelling of a specific method across all of them - for example when updating our core framework that they all use.Amulet
A
1

You can use the attachment type plugin to load the files into Elasticsearch and let it index the files. It can handle meta data for the files and index the content of the files.

The github page includes information on how to do highlighting of the search documents.

Amylose answered 30/7, 2014 at 13:9 Comment(0)
M
0

Unless you want to expose this as a service to somebody, i would recommend you to install InstaSearch plugin in eclipse; this plugin creates lucense index and gives you instantaneous results.

Macleod answered 30/7, 2014 at 13:2 Comment(0)
A
0

This kind of indexing feature is part of the ElasticSearch configuration for MS Azure DevOps Server. Although, I haven't a clue about how it's done :/

Aboral answered 18/5, 2022 at 7:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.