I have a Solr document like this, where all the fields are mapped as a single document.
<doc>
<int name="Id">7</int>
<str name="Name">PersonName</str>
<str name="Address">Address Line 1, Address Line 2, City</str>
<str name="Country">India</str>
<str name="ImageURL">0000028415.jpeg</str>
<arr name="Category">
<str>Student</str>
<str>Group A</str>
</arr>
</doc>
We would like to normalize it and have separate doc type for Person, Country and Category.
<doc>
<int name="PId">7</int>
<str name="Name">PersonName</str>
<str name="Address">Address Line 1, Address Line 2, City</str>
<str name="CountryId">91</str>
<str name="ImageURL">0000028415.jpeg</str>
<arr name="CategoryId">
<str>2</str>
<str>5</str>
</arr>
</doc>
<doc>
<int name="CId">91</int>
<str name="CountryName">India</str>
</doc>
<doc>
<int name="CatId">2</int>
<str name="CategoryName">Student</str>
</doc>
Note that I am just simplifying the example, actual document that I work with is too much complex than this, and we have millions of documents in the index.
I would like to understand, how to join and do filter query with this kind of document structure. And how does it impact performance compared to previous case, where all details are stored in single doc structure.
Update
Sample query with current structure, hope this helps with some idea on how it is done currently:
Here is the sample query for search with certain facets applied -
/select?indent=on&wt=json&facet.field={!ex%3DCategory}Category&facet.field=Manufacturer&facet.field=Vendor&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field=f_Screensize&facet.field=pa_OS&bf=&start=0&fq={!tag%3DCategory}Category:Notebooks&fq=Price:[0+TO+9999999999999]&rows=6&version=2.2&bq=&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=(laptop)&defType=edismax&spellcheck.q=(laptop)&qf=Name^7++ShortDescription^6++FullDescription^4+CategoryCopy^2+ManufacturerCopy^2+Sku^3+ChildSku^3+nGramContent+Attributes+ProductAttributes+Tag+ManufacturerPartNumber+CustomProperties&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price
This filter query with facets:
select?indent=on&wt=json&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field={!ex%3Df_Screensize}f_Screensize&facet.field=pa_HDD&facet.field=pa_OS&facet.field={!ex%3Dpa_OS}pa_OS&facet.field=pa_OS&facet.field=pa_Processor&facet.field=pa_RAM&facet.field=pa_Software&facet.field=Vendor&facet.field={!ex%3DManufacturer}Manufacturer&facet.field=Category&start=0&fq=StockAvailability:(true)&fq={!tag%3Df_Screensize}f_Screensize:15.0%2527%2527\!!4!!&fq={!tag%3Dpa_OS}pa_OS:Apple\!!0!!&fq={!tag%3DPrice}Price:[594+TO+1800]&sort=CDO_1+asc&rows=6&version=2.2&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=CategoryID:(1+OR+2+OR+3+OR+4)&defType=edismax&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price