Auto Suggestion not working in Lucene after first search iteration
Asked Answered
J

2

11

Currently I am working on the auto suggestion part using lucene in my application. The Auto suggestion of the words are working fine in console application but now i have integerated to the web application but it's not working the desired way.

When the documents are search for the first time with some keywords search and auto suggestion both are working fine and showing the result. But when i search again for some other keyword or same keyword both the auto suggestion as well as Search result are not showing. I am not able to figure out why this weird result is coming.

The snippets for the auto suggestion as well as search are as follows:

final int HITS_PER_PAGE = 20;

final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";

String searchText = request.getParameter("search_text");

BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;

try {
    textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
    fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
    booleanQuery = new BooleanQuery.Builder();
    booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
    booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
    e.printStackTrace();
}


Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);

try{
    searcher.search(booleanQuery.build(), collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

    for (ScoreDoc hit : hits) {
        Document doc = reader.document(hit.doc);
    }

    // Auto Suggestion of the data

    Dictionary dictionary = new LuceneDictionary(reader, "content");
    AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
    analyzingSuggester.build(dictionary);

    List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
    System.out.println("Look up result size :: "+lookupResultList.size());
    for (LookupResult lookupResult : lookupResultList) {
         System.out.println(lookupResult.key+" --- "+lookupResult.value);
    }

    analyzingSuggester.close();
    reader.close();

}catch(IOException e){
    e.printStackTrace();
}

For ex: In first iteration if i search for word "sample"

  • Auto suggestion gives me result: sample, samples, sampler etc. (These are the words in the documents)
  • Search Result as : sample

But if i search it again with same text or different it's showing no result and also LookUpResult list size is coming Zero.

I am not getting why this is happening. Please help

Below is the updated code for the index creation from set of documents.

final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler>    ();

String fileNames = (String)request.getAttribute("message");

File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);

ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);

Metadata metadata = new Metadata();

// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);

try {
    parser.parse(stream, handler, metadata, context);
    contentHandlerList.add(handler);
}catch (TikaException e) {
    e.printStackTrace();
}catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
finally {
    try {
        stream.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);

Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new      File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);

Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();

Date date = new Date();

while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();

String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();

String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");

String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
    Field titleField = new Field("title",contentTitle,fieldType);
    titleField.setBoost(2.0f);
    doc.add(titleField);
}

if(fileNameArr.length > 0){
    fileName = fileNameArr[0];
}

String document_id= UUID.randomUUID().toString();

FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);

Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);

doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);

writer.addDocument(doc);

analyzer.close();
}

writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();

writer.close();

Also i have observed that from second search iteration the files in the index directory are getting deleted and only the file with .segment suffix is getting changes like .segmenta, .segmentb, .segmentc etc..

I dont know why this weird situation is happening.

Jacks answered 4/9, 2016 at 18:39 Comment(1)
Can you check my answer and see if it works?Voncile
S
0

your code looks pretty straightforward. So, I am sensing that you might facing this problem because something is going wrong with your indexes, providing the information about how you are building indexes might help to diagnose. But exact code this time :)

Schulz answered 7/9, 2016 at 18:38 Comment(0)
V
0

I think your problem is with writer.deleteUnusedFiles() call.

According to JavaDocs, this call can "delete unreferenced index commits".

What indexes to delete is driven by IndexDeletionPolicy. However "The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).".

It also talks about "delete on last close", which means once this index is used and closed(e.g. during search), that index will be deleted.

So all indexes that matched your first search result will be deleted immediately.

Try this:

IndexWriterConfig conf = new IndexWriterConfig(analyzer);
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);
Voncile answered 10/9, 2016 at 7:33 Comment(1)
I tried your snippet but's its still not working.. Its still behaving the old wayJacks

© 2022 - 2024 — McMap. All rights reserved.