Apache Solr string field or text field?

Asked 24/8, 2011 at 12:44 Answered 13/4, 2021 at 12:14

In apache Solr why do we always need to prefer string field over text field if both solves purposes?

How string or text affects the parameters like index size, index read, index creation?

Pandich answered 24/8, 2011 at 12:44 Comment(0)

135

The fields as default defined in the solr schema are vastly different.

String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for facetting.

Text typically performs tokenization, and secondary processing (such as lower-casing etc.). Useful for all scenarios when we want to match part of a sentence.

If the following sample, "This is a sample sentence", is indexed to both fields we must search for exactly the text This is a sample sentence to get a hit from the string field, while it may suffice to search for sample (or even samples with stemmning enabled) to get a hit from the text field.

Duval answered 25/8, 2011 at 8:44 Comment(4)

can you also comment on index size, index read, index creation? – Pandich 25/8, 2011 at 9:31

You will get a larger index size when tokenizing, how large depends on your processing chain. Index creation will also be marginally slower since there's more work. Index read/creation will be great either way, so don't worry about it unless approaching millions of documents. – Liquor 25/8, 2011 at 9:37

I am reading through millions of documents..hope that is not a problem..so I am going for string field since it seems efficient in all cases AND I do not need tokenizers/full text search – Pandich 25/8, 2011 at 10:47

@JohanSjöberg I understand the difference between String and Text as you have explained it, but what if I need to get hits for *tence . What if the correct choice for field type? – Quick 11/3, 2019 at 15:5

Adding to Johans Sjöbergs good answer:

You can sort a String but not a Text.

Anthelion answered 13/4, 2021 at 12:14 Comment(0)

Recommended topics

Hot tags