Solr with normalized document structure

I have a Solr document like this, where all the fields are mapped as a single document.

<doc>
    <int name="Id">7</int>
    <str name="Name">PersonName</str>
    <str name="Address">Address Line 1, Address Line 2, City</str>
    <str name="Country">India</str>
    <str name="ImageURL">0000028415.jpeg</str>
    <arr name="Category">
      <str>Student</str>
      <str>Group A</str>
    </arr>
</doc>

We would like to normalize it and have separate doc type for Person, Country and Category.

<doc>
    <int name="PId">7</int>
    <str name="Name">PersonName</str>
    <str name="Address">Address Line 1, Address Line 2, City</str>
    <str name="CountryId">91</str>
    <str name="ImageURL">0000028415.jpeg</str>
    <arr name="CategoryId">
      <str>2</str>
      <str>5</str>
    </arr>
</doc>



    <doc>
        <int name="CId">91</int>
        <str name="CountryName">India</str>
    </doc>



<doc>
        <int name="CatId">2</int>
        <str name="CategoryName">Student</str>
    </doc>

Note that I am just simplifying the example, actual document that I work with is too much complex than this, and we have millions of documents in the index.

I would like to understand, how to join and do filter query with this kind of document structure. And how does it impact performance compared to previous case, where all details are stored in single doc structure.

Update

Sample query with current structure, hope this helps with some idea on how it is done currently:

Here is the sample query for search with certain facets applied -

/select?indent=on&wt=json&facet.field={!ex%3DCategory}Category&facet.field=Manufacturer&facet.field=Vendor&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field=f_Screensize&facet.field=pa_OS&bf=&start=0&fq={!tag%3DCategory}Category:Notebooks&fq=Price:[0+TO+9999999999999]&rows=6&version=2.2&bq=&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=(laptop)&defType=edismax&spellcheck.q=(laptop)&qf=Name^7++ShortDescription^6++FullDescription^4+CategoryCopy^2+ManufacturerCopy^2+Sku^3+ChildSku^3+nGramContent+Attributes+ProductAttributes+Tag+ManufacturerPartNumber+CustomProperties&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price

This filter query with facets:

select?indent=on&wt=json&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field={!ex%3Df_Screensize}f_Screensize&facet.field=pa_HDD&facet.field=pa_OS&facet.field={!ex%3Dpa_OS}pa_OS&facet.field=pa_OS&facet.field=pa_Processor&facet.field=pa_RAM&facet.field=pa_Software&facet.field=Vendor&facet.field={!ex%3DManufacturer}Manufacturer&facet.field=Category&start=0&fq=StockAvailability:(true)&fq={!tag%3Df_Screensize}f_Screensize:15.0%2527%2527\!!4!!&fq={!tag%3Dpa_OS}pa_OS:Apple\!!0!!&fq={!tag%3DPrice}Price:[594+TO+1800]&sort=CDO_1+asc&rows=6&version=2.2&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=CategoryID:(1+OR+2+OR+3+OR+4)&defType=edismax&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price

The only thing that comes to my mind is using an XSLTResponseWriter to modify the query response with an XSLT file that transforms that response in a more adequate one.

Don't know if thats what you wanted.

EDIT: I will add more info about this.

So XSLT allows you transform an XML file into another (or anothers). You can swap the place of your tags, create new ones, combine them, take info from other XMLs and use it in the file you want to transform, etc. You can find more info about this here: https://www.w3schools.com/xml/xsl_intro.asp

Solr allows you to apply an XSLT tranformation in query time, to your query result. You just have to create your .xsl file and place it in mySolrCollection/conf/xslt/ dicrectory (create xslt/ if it doesn't exist). For example: mySolrCollection/conf/xslt/transformation.xsl

This file (transformation.xsl) will contain all transformations you want to apply to the query response. Im not going to go into how to write this transformations, it's not that hard to learn so you can just check the web for examples and for tutorials ;)

The last thing to do is to tell Solr that you want to apply a transformation to the response of the query, and you must do that by changing the query syntax. You must add the &wt=xslt&tr=transformation.xsl parts to your query to tell Solr that you want to apply a transformation to the response and that that transformation is defined in transformation.xsl

An example of a query should be:

http://<your_host>:<your_port>/solr/"your_collection"/select?q=*:*&wt=xslt&tr=tranformation.xsl&rows=100&...

If your query is correct, you will have your response transformed as you specified in your .xsl file.

Hope this in enough.

Recommended topics

Hot tags