How does Sunspot modify Solr's schema.xml? Does it modify it at all?

Asked 25/8, 2011 at 19:11 Answered 25/8, 2011 at 21:3

Solved ruby-on-rails ruby-on-rails-3 solr sunspot sunspot-rails

Let me know if I am wrong, but I think solr only expects fields that are already mentioned in the schema.xml. So, if I have a field called 'title', I need to mention this in the schema.

There is no mentioning about modifying the schema.xml in the Sunspot's documentation. I am just wondering how Sunspot modifies schema.xml allowing custom fields to be entered to the index.

I also know that Sunspot uses RSolr to do things. So if there is a way to modify the schema and reload data from DB to Solr using RSolr, please let me know.

Archaeornis answered 25/8, 2011 at 19:11 Comment(0)

As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.

Solr Schema 101

For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.

A type definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.

A field definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.

Solr also supports a dynamicField definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.

Sunspot's conventional schema

Sunspot's schema has a handful of field definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use of dynamicField definitions to establish naming conventions based on types.

This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.

For example, this simple configuration block in your model…

searchable do
  text :body
end

…will be used by Sunspot to create a field name of body_text. This field name is matched against the *_text pattern for the following dynamicField definition in the schema:

<dynamicField name="*_text" type="text" indexed="true" stored="false" multiValued="true"/>

This maps any field with the suffix _text to Sunspot's definition of the text type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The :stored => true option, for example, will typically add an s on that type's suffix (e.g., _texts).

Modifying Sunspot's schema in practice

In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the text field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.

For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original text field may be used to populate spellcheck, or to boost exact matches. And the tokens in the custom text_ngram or text_en can serve to broaden search results when the stricter matching fails.

Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the type and its corresponding dynamicField definition(s), you can use Sunspot's :as option to override the convention-based name generation.

For example, adding a custom ngram type for the above, we might end up processing the body again with NGrams with the following Ruby code:

searchable do
  text :body
  text :body_ngram, :as => 'body_ngram'
end

Centroid answered 25/8, 2011 at 21:3 Comment(1)

In your final example block what does the text :body_ngram portion do? Is it fully overridden by the :as or does it also serve a purpose? – Ferretti 6/11, 2013 at 14:46

Sunspot comes with a stock schema that's a little tuned for a sunspot integration that adheres to the principle of least surprise for the developer—for example, the stock solrconfig.xml is set to turn autocommit off, even though in production you'll want to turn this on. The schema really has more to do with types than fields—see the link below for an example of how to create a new field type. Indexing a field is trivial if it fits into one of the existing types. For example:

class Blog
  searchable do
     text :title
  end
end

And in the search process, you'd do something like this:

class BlogSearch
   def self.search(options={})
     Sunspot.search(Blog) do
       with(:title, options[:title]) if options[:title].present?
     end
   end
end

Sunspot's wiki has a lot of additional documentation. Here's an example on adding a custom type to allow ngram searching:

https://github.com/outoftime/sunspot/wiki/Wildcard-searching-with-ngrams

Subjectify answered 25/8, 2011 at 20:38 Comment(4)

"…set to turn autocommit off, even though in production you'll want to turn this on…" Actually, it's the opposite. Commits are on, even though they should be off in production, because developers expect the effects of the commit. – Centroid 25/8, 2011 at 20:44

@nick: technically, autocommit is turned off on solr. Its sunspot's autocommit support that's turned on, which you disable in sunspot.yml. I thought i had it backwards as well after reading your comment. The point I think we're both making is that sunspot will commit indexing changes automatically, even though you'll want to change that for production use. – Subjectify 26/8, 2011 at 0:3

@nick, et al: see github.com/outoftime/sunspot/blob/master/sunspot/solr/solr/conf/… – Subjectify 26/8, 2011 at 0:9

Ah, right. I was thinking Sunspot's automatic commit on the controller after_filter. You are indeed correct that its solrconfig.xml comes with Solr's autoCommit commented out. Both of which should be reversed for production :) – Centroid 26/8, 2011 at 17:7

Solr Schema 101

Sunspot's conventional schema

Modifying Sunspot's schema in practice

Recommended topics

Hot tags