How to speed up OpenGrok indexing

C

3

7

lately I was asked by my boss to explore OpenGrok possibilities in the company I'm working for. First I started with a few projects at my virtualbox lubuntu, it was working ok, but kind of slowly. I blamed my laptop with mediocre parameters for that.

Now I'm having virtual of bigger proportions and I'm also running indexing on larger volume of data (SVN repository - 100 different projects, some of them with multiple branches, tags and trunk, about 100 000 files in total, few GB in size). All files are checked out directly in the SRC_ROOT.

I was hoping for reasonably fast indexing, but it's been running for more than five days now. I can see multiple threads running via htop, but CPU usage is 0.5-2.5%, memory usage 0.9%. So I guess it's not an issue of computing power. And unless there are terribly slow HDDs I don't know what the problem is.

Furthemore the indexing process seems to be slowing down. At the beginning it was approximately 1 sec/file, now it is about 5 sec/file. Unfortunately I haven't triggered the progress option, so I have no idea how long it's still going to run.

Any ideas how to make indexing faster? How to use resources more effectively? Current speed is simply unusable...

Caliber answered 1/9, 2014 at 13:58 Comment(0)

F

2

I think easy way to improve performance is to run opengrok index with setting up JAVA_OPTS and using 64 bit java. Also, using derby for storing generated index data increase performance too. More info about how to use and setup opengrok

Frasier answered 3/9, 2014 at 9:58 Comment(0)

S

0

I think the problem is SVN, try to debug and improve speed of SVN access from your VM, or disable(temporarily) svn altogether to get a fast index (and you can add history to index later gradually - per project, even if it will take few days, see options on how to run indexer per project) Or if you can mirror SVN repo and make local svn calls that should give you a boost too. So to conclude {OpenGrok can detect svn, skip history creation(enable it on the fly) and just index the checkout and then later add locally history to avoid long waits for history to be generated on the fly. And that said, git and hg seem to work well with {OpenGrok in terms of history index.

Swabber answered 28/2, 2019 at 6:54 Comment(0)

D

-1

I've been running into this myself, and I've found that the indexer is spending most (>90%) of its time querying the source control systems.

That said, some of the projects I use do use Perforce and SVN, so I don't want to disable them entirely, so what I've done is index twice -- first, with all the options that involve source control disabled, and then again with everything enabled.

That way, it still takes a long time (several days, in my case), but at least I have a usable index up and running in a few hours, and then it can spend days working out all the history.

Subsequent indexes should be faster, as I would expect that the historycache is only updated for files that are newer than the cached history.

(That said, it would be nice if I could update the historycache externally so it's all ready to go before I start the indexer at all, and have the indexer configured to not look up history information at all, but instead to just index what's cached)

Dioptric answered 23/12, 2019 at 19:57 Comment(0)

Recommended topics

Hot tags