As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.
Solr Schema 101
For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.
A type
definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.
A field
definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.
Solr also supports a dynamicField
definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.
Sunspot's conventional schema
Sunspot's schema has a handful of field
definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use of dynamicField
definitions to establish naming conventions based on types.
This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.
For example, this simple configuration block in your model…
searchable do
text :body
end
…will be used by Sunspot to create a field name of body_text
. This field name is matched against the *_text
pattern for the following dynamicField
definition in the schema:
<dynamicField name="*_text" type="text" indexed="true" stored="false" multiValued="true"/>
This maps any field with the suffix _text
to Sunspot's definition of the text
type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The :stored => true
option, for example, will typically add an s
on that type's suffix (e.g., _texts
).
Modifying Sunspot's schema in practice
In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the text
field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.
For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original text
field may be used to populate spellcheck, or to boost exact matches. And the tokens in the custom text_ngram
or text_en
can serve to broaden search results when the stricter matching fails.
Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the type
and its corresponding dynamicField
definition(s), you can use Sunspot's :as
option to override the convention-based name generation.
For example, adding a custom ngram
type for the above, we might end up processing the body again with NGrams with the following Ruby code:
searchable do
text :body
text :body_ngram, :as => 'body_ngram'
end
text :body_ngram
portion do? Is it fully overridden by the :as or does it also serve a purpose? – Ferretti