ShingleFilterFactory affects size of highlighted section in Solr
Asked Answered
D

2

9

Adding ShingleFilterFactory to a type in solr (index time) does result in changing behavior when queering with highlighting.

Sample Text: "in a ship a dragon was in a box"

Without ShingleFilterFactory both "in" tokens will be highlighted separately.

<em>in</em> a ship a dragon was <em>in</em> a box

With it the whole segment is returned as a single highlight.

<em>in a ship a dragon was in</em>

Why is it that the use of 'ShingleFilterFactory' does affect the highlighting?

EDIT:

Adding schema info as requested:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Using text_general, which contains the shingle filter, results in unusually large highlight fields as described above.

Disarray answered 5/5, 2015 at 13:52 Comment(1)
when you refer to sample text, is that the indexed text, or the query, or both? do you mind posting the schema of that field?Socinus
A
2

Maybe you can use this highlighter:

https://issues.apache.org/jira/browse/LUCENE-1522

The problem that you are pointing is known and some patches are available:

https://issues.apache.org/jira/browse/LUCENE-1489

Edit: The second link is the same that Bereng sent.

Askew answered 13/5, 2015 at 10:9 Comment(0)
H
2

Won't help much but will shed some light:

https://issues.apache.org/jira/browse/LUCENE-1489

Hawkes answered 13/5, 2015 at 9:26 Comment(0)
A
2

Maybe you can use this highlighter:

https://issues.apache.org/jira/browse/LUCENE-1522

The problem that you are pointing is known and some patches are available:

https://issues.apache.org/jira/browse/LUCENE-1489

Edit: The second link is the same that Bereng sent.

Askew answered 13/5, 2015 at 10:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.