ShingleFilterFactory affects size of highlighted section in Solr

Asked 5/5, 2015 at 13:52 Answered 13/5, 2015 at 10:9

Adding ShingleFilterFactory to a type in solr (index time) does result in changing behavior when queering with highlighting.

Sample Text: "in a ship a dragon was in a box"

Without ShingleFilterFactory both "in" tokens will be highlighted separately.

<em>in</em> a ship a dragon was <em>in</em> a box

With it the whole segment is returned as a single highlight.

<em>in a ship a dragon was in</em>

Why is it that the use of 'ShingleFilterFactory' does affect the highlighting?

EDIT:

Adding schema info as requested:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Using text_general, which contains the shingle filter, results in unusually large highlight fields as described above.

Disarray answered 5/5, 2015 at 13:52 Comment(1)

when you refer to sample text, is that the indexed text, or the query, or both? do you mind posting the schema of that field? – Socinus 12/5, 2015 at 23:28