Situation: I have an index with strict mapping and I want to delete an old field from it which is no longer used. So I create a new index with mapping that doesn't include that field and I try to reindex the data into the new index.
Problem: When I reindex, I get an error, because I'm trying to index data into a field that is not available in the mapping. So to solve this, I want to remove that field from all documents in the original index first, before I can reindex.
PUT old_index/_doc/1
{
"field_to_delete" : 5
}
PUT old_index/_doc/2
{
"field_to_delete" : null
}
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
"reason": "mapping set to strict, dynamic introduction of [field_to_delete] within [new_index] is not allowed"
1. Some places I found suggest doing:
POST old_index/_doc/_update_by_query
{
"script": "ctx._source.remove('field_to_delete')",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "field_to_delete"
}
}
]
}
}
}
However that doesn't match documents that have an explicit value of null
, so reindexing still fails after this update.
2. Others (like members of the Elastic team in their official forum) suggest doing something like:
POST old_index/_doc/_update_by_query
{
"script": {
"source": """
if (ctx._source.field_to_delete != null) {
ctx._source.remove("field_to_delete");
} else {
ctx.op="noop";
}
"""
}
},
"query": {
"match_all": {}
}
}
However this has the same problem - it doesn't remove the second document that has an explicit value of null
.
3. In the end I could just do:
POST old_index/_doc/_update_by_query
{
"script": {
"source": "ctx._source.remove("field_to_delete");"}
},
"query": {
"match_all": {}
}
}
But this will update all documents and for a large index could mean additional downtime during deployment.
containsKey
, good catch ;-) ES will still have to iterate over all the documents, but only those havingfield_to_delete
will effectively be updated. – Assuan