Multilingual data modeling on MongoDB

Asked 22/5, 2014 at 9:29 Answered 14/7, 2019 at 6:9

I am trying to model my objects on MonogoDB and not sure how to proceed. I am building a Product catalog that will be:

No frequent changes to product catalog. A bulk operation may be done weekly / fortnight.
Product information is in multiple languages ( English, Spanish , French ) new language may be added anytime.

Here is what I am trying to do: I need to model my product catalog to capture the multilingual functionality. Assume I have:

product : { 
 _id:xxx,
 sku:"23456",
 name:"Name",
 description: "Product details", 
 tags:["x1","x2"]}... 
}

Surely, name,description, tags and possible images will change according to language. So, how do I model it?

I can have a seperate collection for each language eg: enProducts,esProducts etc

Have JSON representation in the product itself with the individual languages like:

product :{
   id: xxx,
   en: {
         name: "Name",
         description: "product details.."
       },
   es: {
         name: "Name",
         description: "product details.."
       },
   ...   
}

Or is there any other solution? Need help of MongoDB modeling experts here :)

Iranian answered 22/5, 2014 at 9:29 Comment(2)

Did anybody try option 1 here? We have our own CMS which requires frequent changes. Now, we want to add a multi language option to the existing project. – Quadricycle 3/1, 2020 at 9:48

@Abdel Raoof which solution did you pick for this problem? If it's the second one, how you managed validations in language specific data? I am having a scenario very similar to this and need to know the right solution to go with. – Neckcloth 17/9, 2020 at 9:51

What about this approach:

product: {
  id: 1,
  name: 'Original Name',
  description: 'Original Description',
  price: 33,
  date: '2019-03-13',
  translations: {
    es: {
      name: 'Nombre Original',
      description: 'Descripción Original',
    }
  }
}

If the user selects some language different to the default and the key translations exists in the object, you only need to merge it, and if any key has no translation, the original remains.

Another advantage is if you need to remove the translation feature or add/remove some language, you only need to change or remove the translation key and not having to refactor the entire schema.

Mentalist answered 3/5, 2019 at 0:18 Comment(1)

Can you please suggest, if followed this schema how validations can be applied on translations. Like, translations must have name as a String, description as a String and only 'es' is allowed as a language. – Neckcloth 17/9, 2020 at 9:36

Another option would be to just keep the values different per language. Would probably make maintaining the schema much easier as well:

product : { 
 _id:xxx,
 sku: {
   und: "23456"
 },
 name: {
   en: "Fork",
   de: "Gabel"
 },
 description: {
   en: "A metal thingy with four spikes",
   de: "Eine Dinge aus metal der vier spitze hat"
 }  
}

und would be short for "undefined", i.e. the same for all languages, and could be used as a fallback - or you always use "en" as fallback if you'd prefer that.

The above example is roughly how Drupal CMS manages languages (albeit translated from SQL to Mongo).

Blois answered 22/2, 2018 at 6:50 Comment(1)

This is the solution I prefer. Just a note: if the sku is language-independent, it may as well be represented without the language "level", rather than using a dummy language code. – Ambiversion 1/1 at 17:46

Both solutions are normally standard for this, the first being standard in RDBMS techs as well (or file based translations being another method that is not possible here).

As for which is best right here, I am leaning towards the second considering your use.

Some of the reasons would be:

One single document load for all translations and product data, no JOINs
Making for a single contiguous read of your disk
Allowing for atomic updating and adding of new languages and changes etc to a single product

But creating some downsides:

Updating could (probably will) create fragmentation which can be remedied to some extent (not completely) by powerof2sizes
All your ops will now go to one single part of your hard disk which may actually create a bottle neck however, your scenario is such that you do not update often if at all so this shouldn't be a problem.

As a side note: I am judging that fragmentation might not bee too much of a problem for you. The reason being is that you only really bulk import products, probably from a CSV as such your documents will not probably grow greater than by the power of 2 from their insertion regularly. As such this point might be obsolete.

So overall, if planned right the second option is a good one however, there are some considerations to take into account:

Could the multiple descriptions/fields push the document past the 16meg limit?
How to manually pad to the document to efficiently use space and prevent fragmentation?

Those are your biggest concerns if you go with the second option.

Considering that you can fit all of the works of Shakespear into 4MB with room to spare I am actually not sure if you will reach the 16MB limit, if you do it would have to be some considerable text, and maybe storing the images in binary into the document.

Coming back to the first option, your largest concern will be duplication of certain data, i.e. price (France and Spain both have the Euro) unless you use two documents, one to house common data and the other a translation (this will make 4 documents actually but two queries).

Considering that this catalogue will never be updated unless in bulk duplicated data will not matter too much (however, for future reference in the case of expansion I will be cautious) so:

You can make it have one document per translation and not worry about updating prices atomically across all regions
You have one disk read without the fragmentation
No need to manually pad your documents

So both options are readily available but I am leaning towards the second case.

Khichabia answered 22/5, 2014 at 9:41 Comment(3)

Now, that's some interesting points to think about... I really need to evaluate your point about document sizes. – Iranian 22/5, 2014 at 9:47

@AbdelOlakara let me know if you want more info – Khichabia 22/5, 2014 at 9:53

@AbdelOlakara though considering document size, 4mb can fit all of Shakespears work so it would have to be quite large descriptions, so maybe with that in mind the first option is best providing you account for document size fluctuations – Khichabia 22/5, 2014 at 10:0

I use following pattern for key and values that should be indexed in key:

 {
"id":"ObjectId",
"key":"error1"
"values":[{
             "lang":"en", 
             "value":"Error Message 1"
          },
          {
             "lang":"fa", 
             "value":"متن خطای شماره 1"
          }] 
}

and Use This Code in C#

object = coleccion.find({"key": "error1"});

view this link Model One-to-Many Relationships with Embedded Documents!

Although answered 14/7, 2019 at 6:9 Comment(4)

1. This isn't valid json 2. How does the "key" tie to the values? For instance, I have multiple keys. Are you using ordinal positions to tie the two together? – Hildegardehildesheim 2/12, 2019 at 23:31

how you would query based on different languages ? – Beastly 3/4, 2020 at 17:42

@Hildegardehildesheim This json sample is compatible on MongoDB. I use this structure in MongoDB for handle multi language error message for in app. for set keys i use name of [class file name].[method name].[error name] in c# code. you can use from keys structure of "Resource name" column in nopcommerce project on bellow link : admin-demo.nopcommerce.com/Admin/Language/Edit/1 – Although 9/1, 2022 at 14:4

@Al-HanashMoataz for query based on different languages there are 2 solutions: 1. Key-based query. 2. Query based on the class name at the beginning of the key name for optimal information caching and performance. – Although 9/1, 2022 at 14:6

For a static list of languages I would go with @Zagorulkin Dmitry solution, as it is easy to query.

For a dynamic list of languages, I would rather not change the schema and allow easy management of the data.

The down side is that querying is less trivial.

  {
    "product": {
      "id": "xxx",
      "languageDependentData": [
        {
          "language": "en",
          "name": "Name",
          "description": "product details.."
        },
        {
          "language": "es",
          "name": "Name",
          "description": "product details.."
        }
      ]
    }
  }

Dirk answered 29/11, 2015 at 15:37 Comment(2)

Can you please suggest, if followed this schema how validations will be applied. Like, languageDependentData must have name as a String, description as a String and only languages allowed are 'en' and 'es'. – Neckcloth 17/9, 2020 at 9:39

@AvaniKhabiya I would use an ORM for that, for example, on Node.js I'm using mongoosejs.com It enables defining a schema for MongoDB where you can easily define attributes as Sting and even set an allowed enum for languages. – Dirk 25/9, 2020 at 14:32

this way will be the best:

product :{
       id: xxx,
       en: {
             name: "Name",
             description: "product details.."
           },
       es: {
             name: "Name",
             description: "product details.."
           },
       ...

  }

just because you have to search for only one product and after you could choose any language.

Minny answered 22/5, 2014 at 9:32 Comment(4)

but what if multiple ( say 3 more) new language comes up.. its not just name and description that is language specific fields. I have around 8~10 fields that will be language specific – Iranian 22/5, 2014 at 9:35

you will have to use all fields in document which you want. this way will be suitable, if you want to design highload system. – Minny 22/5, 2014 at 10:4

how would you query and get en always by default if value of es is not available ? – Silvester 27/5, 2018 at 15:19

@Silvester this logic has to be declared in your app, while you are querying Firebase. – Djebel 15/11, 2022 at 9:43

Yet another option is to store your primary data in one language only and to have a separate text-resource translation collection where you map any text resource from your primary language to other target languages (no matter if your text resource comes from the primary data store or is just a translation of a system message on your system).

I.e. make no language specific adjustments to the schema and model at all.

The drawback that I can see is in maintaining the removal of information from the translation collection when the product is removed from the primary store, well, as soon as you guarantee that the same resource is not used elsewhere it is trivial but needs to be programmed :)

Mephitis answered 22/5, 2014 at 11:15 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags