no segments* file found
Asked Answered
D

3

19

I need to access a lucene index ( created by crawling several webpages using Nutch) but it is giving the error shown above :

java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/<path>: files:
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
    at DictionaryGenerator.generateDict(DictionaryGenerator.java:24)
    at DictionaryGenerator.main(DictionaryGenerator.java:56)

I googled but the reasons given were not matching the requirements. The fact that files are being shown ( the path) probably means that the directory is not empty.
Thanks

Disequilibrium answered 27/9, 2010 at 8:6 Comment(0)
C
9

Basically, the error message says that Lucene did not find the proper files in the index directory. I suggest checking the following:

  1. Verify the path of the index directory fits what you think it should be.
  2. Do the Nutch and Lucene versions used match? This may stem from a version difference.
  3. Is there a permissions issue? Can you read the files in the directory?
  4. Try looking at the index using Luke. If you cannot, there is probably some corruption in the index.

If all these do not help, Please post the indexing part of the code.

Consort answered 27/9, 2010 at 9:26 Comment(3)
I did all of them except the Nutch and Lucene versions.I was not aware that there has to be a compatibility between Lucene and Nutch . If it helps, the lucene version is 2.2 . I can access the files. Infact,i am running the java program in the same directory as the index . Also, i checked the index using Luke and its definitely fine . Also, the thing is that i just became a part of the project. The index is the result of an extensive crawl by Nutch . So , i do not have any indexing code. It was just a crawl .But i will still try to find out the exact picture.Disequilibrium
One thing i have observed is that the newer version of Nutch (1.1) generates 5 folders after a crawl while the data which i have has only 4( out of which segments is one) folders . Can that be an issue ?Disequilibrium
Like Yuval said, make sure that the Java program that you use to read the index uses the same version of Lucene that Nutch used to create the index.Equivoque
M
30

Another hint, as I was having the same error and found that after creating indexes I did not close IndexWriter and it proved very unforgiven. In my indexdirectory I have some .lock files and no segments or segments.gen files which is what Reader is looking for. See here #3 for details

Matlock answered 24/12, 2013 at 20:14 Comment(2)
Just for those who are wondering (as I did): Even if you close your IndexWriter, the write.lock file will still exist in your folder. So don't care if this file doesn't get deleted.Aryl
@nir, this absolutely the problem for me! Thank you!Minded
C
9

Basically, the error message says that Lucene did not find the proper files in the index directory. I suggest checking the following:

  1. Verify the path of the index directory fits what you think it should be.
  2. Do the Nutch and Lucene versions used match? This may stem from a version difference.
  3. Is there a permissions issue? Can you read the files in the directory?
  4. Try looking at the index using Luke. If you cannot, there is probably some corruption in the index.

If all these do not help, Please post the indexing part of the code.

Consort answered 27/9, 2010 at 9:26 Comment(3)
I did all of them except the Nutch and Lucene versions.I was not aware that there has to be a compatibility between Lucene and Nutch . If it helps, the lucene version is 2.2 . I can access the files. Infact,i am running the java program in the same directory as the index . Also, i checked the index using Luke and its definitely fine . Also, the thing is that i just became a part of the project. The index is the result of an extensive crawl by Nutch . So , i do not have any indexing code. It was just a crawl .But i will still try to find out the exact picture.Disequilibrium
One thing i have observed is that the newer version of Nutch (1.1) generates 5 folders after a crawl while the data which i have has only 4( out of which segments is one) folders . Can that be an issue ?Disequilibrium
Like Yuval said, make sure that the Java program that you use to read the index uses the same version of Lucene that Nutch used to create the index.Equivoque
O
5

Stumbled upon this issue in 2020:

I had opened the IndexReader using the

org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.store.Directory)

method instead of

org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.index.IndexWriter)

The first one resulted in the error described above, while the latter one worked fine on an empty directory - and seems the way to go here.

Occasion answered 31/8, 2020 at 15:31 Comment(1)
this one worked.Alundum

© 2022 - 2024 — McMap. All rights reserved.