The links are dead from both the question and the original answer given, but there is a way to define the schema for this which is supported in modern versions.
The recommended way would be to include a "language"
property in the document or embedded documents next to the property being used for the text index. The term "next to" means at the "same level" and not specifically adjacent to the property in the index.
Something common would look like:
{
"description": "Texto largo en español",
"language": "spanish",
"translation": [
{
"description": "Large text in Spanish",
"language": "english"
},
{
"description": "Grand texte en espagnol",
"language": "french"
}
]
},
{
"description": "The quick brown fox",
"translation": [
{
"description": "Le renard brun rapide",
"language": : "french"
}
]
}
And then presuming that we use the "default" text index language of "english" we can simply index with:
db.collection.createIndex({ "description": "text", "translation.description": "text" })
MongoDB will then use the "language"
property as either shown in the document "root" or from "embedded documents" in the array, and where omitted it will simply use the default defined for the index. For instance the second document here has no language property on the "root" so "english"
is presumed since it is the default on the index.
The items indexed need not be in any order, as also demonstrated by having the "english"
entry inside the "translations"
array with embedded documents by the first sample document. The rules for embedded items differs slightly in that we must include the "language"
properties on the embedded documents or the actual language used with be that from the document "root". In this example any embedded document in the array without the "language"
property would be considered to be using "spanish"
since that is what is defined in the "root".
Searches are of course all done in consideration of all the paths present in the index, so on both the "description"
and the embedded "translation.description"
properties as defined here. The appropriate "search language" is still always used as specified with the $language
option to the $text
operator, as "stop words" and "stemming" are still considered in relation to this and the default index language set upon index creation.
The embedded format also gives you an easy point from which to retrieve the language information for "translating" between two languages where you have the content defined for both languages in question, so it's practicality is "two fold" in this case.
The specific documentation is now located at Create a text Index for a Collection in Multiple Languages as a section within the wider topic of Specify a Language for Text Index which includes links to all the other details, including specifying a different default language on the index.