Fields in apache solr response are multivalued when they should be singular
Asked Answered
H

1

6

I'm experiencing a problem with Apache Solr where I'm receiving fields wrapped in lists in JSON responses but they should be singular. Here is an exerpt from schema.xml, two example fields giving me a problem are django_ct and django_id:

  <fields>
    <!-- general -->
    <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
    <field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
    <field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>

Here is an example of how data is posted to Solr:

<doc>
    <field name="id">search.productcategory.3</field>
    <field name="gender">M</field>
    <field name="name">OBQYHSOQLWOUEHRMPSDI</field>
    <field name="text">M\nOBQYHSOQLWOUEHRMPSDI</field>
    <field name="django_id">3</field>
    <field name="django_ct">search.productcategory</field>
</doc>

And here is an example of the file stored in solr:

  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "django_ct": [
          "search.productcategory"
        ],
        "name": [
          "Example"
        ],
        "text": [
          "Male\nExample"
        ],
        "id": "search.productcategory.2",
        "gender": [
          "Male"
        ],
        "django_id": [
          2
        ],
        "_version_": 1502081283634757600
      }
    ]
  }

What is causing these fields to be wrapped in lists? In the schema, the multiValuedattribute for these fields is set to false. Apart from creating the core and replacing schema.xml everything else is straight out of the box. I'm accessing Solr using Haystack (a Django plugin), the code expects to receive single values for these fields but is completely broken by this. Tracing back the problem it seems to be due to how Solr is configured.

Edit: Here are the complete contents of solr.log, all of this was logged after starting the server, running a couple of example queries had no output:

INFO  - 2015-05-27 08:38:12.563; [   ] org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312
INFO  - 2015-05-27 08:38:12.586; [   ] org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor /Users/sampeka/solr-5.1.0/server/contexts at interval 0
INFO  - 2015-05-27 08:38:12.593; [   ] org.eclipse.jetty.deploy.DeploymentManager; Deployable added: /Users/sampeka/solr-5.1.0/server/contexts/solr-jetty-context.xml
INFO  - 2015-05-27 08:38:13.629; [   ] org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
INFO  - 2015-05-27 08:38:13.682; [   ] org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()WebAppClassLoader=1121453612@42d8062c
Heredity answered 24/5, 2015 at 19:45 Comment(6)
Are you accessing solr from haystack using SearchQuerySet? If yes can you paste the filters used to access solr. Also look for solr.log and locate the log entries when you do the above search. It will show the exact parameters sent to Solr - which will give enough clue to debug this further.Twilley
The response I posted is taken straight from Solr using the admin interface and is consistent with the problem I was having accessing it through haystack. SearchQuerySets are broken because of this.Heredity
I've posted the contents of solr.log above as well.Heredity
Have you changed the schema.xml after creating the initial index? What this aims at: If you have documents indexed before the change of multivalue="true" to "false" these are still stored in a multivalued way.Sungod
All I've done is: 1) auto-generated the schema using Haystack. 2) Created a core via the Apache command line. 3) Dropped schema.xml into server/solr/<core>/conf/. 4) Rebuilt the index (i.e. posted docs in). Using the Solr Admin interface, schema.xml does correctly appear under the core's files.Heredity
There were no docs in the index before I created the schema, but this may be a good point. Is there a way of forcing a Solr core to re-parse schema.xml?Heredity
H
9

Got to the root of the problem. The problem was that solrconfig.xml wasn't configured correctly. By default the schemafactory class is set to ManagedIndexSchemaFactory which overrides the use of schema.xml. By changing the schemaFactory to class ClassicIndexSchemaFactory it forces the use of schema.xml and makes the schema immutable by API calls.

Heredity answered 27/5, 2015 at 13:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.