Solr query for matching nested/relational data
Asked Answered
L

1

9

I'm using apache solr for the matching functionality of my webapp, and I encountered a problem of this scenario:

I got three programmer, the skill field are their skills, "weight" means how well that skill he/she has:

{
    name: "John",
    skill: [
        {name: "java", weight: 90},
        {name: "oracle", weight: 90},
        {name: "linux", weight: 70}
    ]
},
{
    name: "Sam",
    skill: [
        {name: "C#", weight: 98},
        {name: "java", weight: 75},
        {name: "oracle", weight: 70},
        {name: "tomcat", weight: 70},
    ]
},
{
    name: "Bob",
    skill: [
        {name: "oracle", weight: 90},
        {name: "java", weight: 85}
    ]
}

and I have a job seeking for programmer:

{
    name: "webapp development",
    skillRequired: [
        {name: "java", weight: 85},
        {name: "oracle", weight: 85},
    ]
}

I want use the job's "skillRequired" to match those programmer(to find the best guys for the job). In this case, it should be John and Bob, Sam was kicked off cause his java and oracle skill is not good enough. and John should scored higher than Bob, cause he know oracle better.

problem is, solr can't index nested object, the best format I think I can get is:

name: "John",
skill-name: ["java", "oracle", "linux"],
skill-weight: [90, 90, 70]

and so on. so I don't know if that possible to construct a query to get this scenario working.

Is there a better schema structure for it? or using index/query time boost?

I read almost all of the solr wiki and google around with no luck, any tips and workaround is welcomed.

Problem solved, Log my solution here for help:

1st, My data format is json, so I need solr-4.8.0 for support index nested data with json. if the data was xml format, solr-4.7.2 still work.

2nd, solr-4.8.0 need java7-u55 (official recommended)

3rd, nested document/object should submitted to solr with "childDocuments" key. and for identify the type of parent/child document, I add and "type" field . so with the example above, it seems like this:

   {
        type: "programmer",
        name: "John",
        _childDocuments_: [
            {type:"skill", name: "java", weight: 90},
            {type:"skill", name: "oracle", weight: 90},
            {type:"skill", name: "linux", weight: 70}
        ]
    },
    {
        type: "programmer",
        name: "Sam",
        _childDocuments_: [
            {type:"skill",name: "C#", weight: 98},
            {type:"skill", name: "java", weight: 75},
            {type:"skill", name: "oracle", weight: 70},
            {type:"skill", name: "tomcat", weight: 70},
        ]
    },
    {
        type: "programmer",
        name: "Bob",
        _childDocuments_: [
            {type:"skill", name: "oracle", weight: 90},
            {type:"skill", name: "java", weight: 85}
        ]
    }

4th, after submit and commit to solr, I can match the job with block join query (in filter query):

fq={!parent which='type:programmer'}type:skill AND name:java AND weight:[85 TO *]&
fq={!parent which='type:programmer'}type:skill AND name:oracle AND weight:[85 TO *]
Loggerhead answered 11/5, 2014 at 15:21 Comment(4)
could you, please, provide schema.xml for this particular case?Bhutan
Did you have to add the _ root _ filed to your schema? I was following the guidelines from yonik.com/solr-nested-objects, and before adding a nested document, I had to update the schema: $ curl localhost:8983/solr/nested_demo/schema -X POST -H 'Content-type:application/json' --data-binary '{ "add-field" : { "name":"_ root _", "type":"string", "indexed":true, "stored":false } }'Haldas
Can you please provide schema? How did you declare this field in schema?Preconcerted
@PratikPatel Sorry, I quitted that company very long time ago, and all the knowledge left there. maybe you can try elastic search? that seems much more popular.Loggerhead
T
3

You can try BlockJoinQuery. Refer here

Tectonic answered 12/5, 2014 at 10:17 Comment(4)
Nice! Very useful clue! and I find it here finally solved my problem: heliosearch.org/solr-4-8-featuresLoggerhead
The site is not reachable! Can you please update your answer? @HetfieldJoeVeneaux
@TimLong link works fine for me. Plz try again. Also you can google for block join query. Another resource is yonik.com/solr-nested-objectsTectonic
@TimLong As of today, you probably might want to take a look at a newer Solr5.3 features: yonik.com/solr-nested-objectsHaldas

© 2022 - 2024 — McMap. All rights reserved.