full Text Search within embedded document

Asked 16/7, 2016 at 7:46 Answered 16/7, 2016 at 10:4

here is my document modal

"translation" : {
        "en" : {
            "name" : "brown fox",
            "description" : "the quick brown fox jumps over a lazy dog"
        },
        "it" : {
            "name" : "brown fox ",
            "description" : " the quick brown fox jumps over a lazy dog"
        },
        "fr" : {
            "name" : "renard brun ",
            "description" : " le renard brun rapide saute par-dessus un chien paresseux"
        },
        "de" : {
            "name" : "brown fox ",
            "description" : " the quick brown fox jumps over a lazy dog"
        },
        "es" : {
            "name" : "brown fox ",
            "description" : " el rápido zorro marrón salta sobre un perro perezoso"
        }
    },

now i have to add text index for above document. how can i achieve? i already added text index on translation but that not work since name and description are within language prefix (within object). also i have to give text weight (text score) for name and description separately. i.e name have the text score of 5 and description have 2 score. so i can't give wild card text index i.e

{'$**': 'text'}

i also tried with 'translation.en.name': 'text' but that not working and also my languages are dynamic which increase so what the best solution for that case

Any help will be greatly appreciated.

Ca answered 16/7, 2016 at 7:46 Comment(2)

You consider to change you document structure and make the the dynamic key a field's value. { "lang": "en", "name" : "brown fox", "description" : "the quick brown fox jumps over a lazy dog" } – Checked 16/7, 2016 at 9:19

@SSDMS can you please elaborate how to achieve this with more than one lang i.e index of 'de' or 'ru' – Ca 16/7, 2016 at 9:39

Because the embedded fields are dynamic, the best approach is to modify your schema in such as way that the translation field becomes an array of embedded documents. An example of such schema that maps the current structure follows:

"translation": [    
    {
        "lang": "en",
        "name" : "brown fox",
        "description" : "the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "it",
        "name" : "brown fox ",
        "description" : " the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "fr",
        "name" : "renard brun ",
        "description" : " le renard brun rapide saute par-dessus un chien paresseux"
    },
    {
        "lang": "de",
        "name" : "brown fox ",
        "description" : " the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "es",
        "name" : "brown fox ",
        "description" : " el rápido zorro marrón salta sobre un perro perezoso"
    }
]

With this schema, it's easy to apply the text index on the name and description fields:

db.collection.createIndex(
    {
        "translation.name": "text",
        "translation.description": "text"
    }
)

As for modifying the schema, you would need to use an api that allows you to update your collection in bulk and the Bulk API does that for you. These offer better performance as you will be sending the operations to the server in batches of say 1000, which gives you a better performance as you are not sending every request to the server, but just once in every 1000 requests.

The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all the documents in the collection by changing all the translation fields to arrays:

var bulk = db.collection.initializeUnorderedBulkOp(),
    counter = 0;

db.collection.find({ 
    "translation": { 
        "$exists": true, 
        "$not": { "$type": 4 } 
    } 
}).snapshot().forEach(function (doc) {
    var localization = Object.keys(doc.translation)
        .map(function (key){
            var obj = doc["translation"][key];
            obj["lang"] = key;
            return obj;
        });
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "translation": localization }
    });

    counter++;
    if (counter % 1000 === 0) {
        bulk.execute(); // Execute per 1000 operations 
        // re-initialize every 1000 update statements
        bulk = db.collection.initializeUnorderedBulkOp(); 
    }
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) { bulk.execute(); }

The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite().

It uses the same cursors as above but creates the arrays with the bulk operations using the same forEach() cursor method to push each bulk write document to the array. Because write commands can accept no more than 1000 operations, you will need to group your operations to have at most 1000 operations and re-initialise the array when loop hits the 1000 iteration:

var cursor = db.collection.find({ 
        "translation": { 
            "$exists": true, 
            "$not": { "$type": 4 } 
        } 
    }).snapshot(),
    bulkUpdateOps = [];

cursor.forEach(function(doc){ 
    var localization = Object.keys(doc.translation)
        .map(function (key){
            var obj = doc["translation"][key];
            obj["lang"] = key;
            return obj;
        });
    bulkUpdateOps.push({ 
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": { "$set": { "translation": localization } }
         }
    });

    if (bulkUpdateOps.length === 1000) {
        db.collection.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});         

if (bulkUpdateOps.length > 0) { db.collection.bulkWrite(bulkUpdateOps); }

Giamo answered 16/7, 2016 at 10:4 Comment(4)

Thanks for detailed Answer, just want to confirm that there is it is only way or best way to modal dynamic doc like that – Ca 16/7, 2016 at 10:17

I added .snapshot() to the cursor and replaced the evils equality operators. I hope you don't mind.;) – Checked 16/7, 2016 at 10:17

@Giamo i assume index like that correct?db.collection.createIndex({'translation.name': 'text','translation.description':'text'} – Ca 16/7, 2016 at 10:29

@AbdulHameed Thanks for the correction. As far as I know, yes I believe that's the only way to index the fields because they are dynamic keys involved. – Giamo 16/7, 2016 at 10:45

To create index on the name field use it like this db.collectionname.createIndex({"name": 'text'})

To ensure the index is created list all indexes created using this command

db.collectionname.getIndexes()

EDIT

The issue not about Index creating method, issue is how to achieve with above model for all languages

I got it now, you cannot index the way you wanted for all languages with the existing document schema, please change the schema, below is one way you can acheive it

 {
 "_id" : 1,
 "translation" : [
         {       "language": "en",
                 "name" : "brown fox",
                 "description" : "the quick brown fox jumps over a lazy dog"
         },
         {       "language" : "it",
                 "name" : "brown fox ",
                 "description" : " the quick brown fox jumps over a lazy dog"
         },
         {       "language" :"fr",
                 "name" : "renard brun ",
                 "description" : " le renard brun rapide saute par-dessus un chien paresseux"
         },
         {       "language" : "de",
                 "name" : "brown fox ",
                 "description" : " the quick brown fox jumps over a lazy dog"
         },
         {       "language":"es",
                 "name" : "brown fox ",
                 "description" : " el rápido zorro marrón salta sobre un perro perezoso"
         }
 ]}

Then create the index as db.collectionname.createIndex({"language" : "text"});

The above assumption based on your suggested model because name and description are keys within translation, not top level objects. isn't it?

Nope, with the schema I have provided it is easier to have text indexes on the name and description fields and you can search based on languages.

Chryso answered 16/7, 2016 at 8:27 Comment(6)

i Already mentioned the document, The issue not about Index creating method, issue is how to achieve with above model for all languages – Ca 16/7, 2016 at 8:46

i assume index like that correct?db.collection.createIndex({'translation.name': 'text','translation.description':'text'} – Ca 16/7, 2016 at 10:23

@AbdulHameed - You can have indexes like, translation.name and translation.description, but it will not help you to achieve your search based on the languages. Kindly restructure your schema, as I have shown. The same approach is also shown in the answer provided by chridham. – Chryso 16/7, 2016 at 10:30

the the above assumption based on your suggested model because name and description are keys within translation, not top level objects. isn't it? – Ca 16/7, 2016 at 10:36

Nope, with the schema I have provided it is easier to have text indexes on the name and description fields and you can search based on languages. – Chryso 16/7, 2016 at 11:2

Let us continue this discussion in chat. – Ca 16/7, 2016 at 15:43

Recommended topics

Hot tags