ElasticSearch: Delete field from all documents where it exists (with Painless?)
Asked Answered
L

1

6

Situation: I have an index with strict mapping and I want to delete an old field from it which is no longer used. So I create a new index with mapping that doesn't include that field and I try to reindex the data into the new index.

Problem: When I reindex, I get an error, because I'm trying to index data into a field that is not available in the mapping. So to solve this, I want to remove that field from all documents in the original index first, before I can reindex.

PUT old_index/_doc/1
{
    "field_to_delete" : 5
}
PUT old_index/_doc/2
{
    "field_to_delete" : null
}
POST _reindex
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
"reason": "mapping set to strict, dynamic introduction of [field_to_delete] within [new_index] is not allowed"

1. Some places I found suggest doing:

POST old_index/_doc/_update_by_query
{
  "script": "ctx._source.remove('field_to_delete')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "field_to_delete"
          }
        }
      ]
    }
  }
}

However that doesn't match documents that have an explicit value of null, so reindexing still fails after this update.

2. Others (like members of the Elastic team in their official forum) suggest doing something like:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
          if (ctx._source.field_to_delete != null) {
            ctx._source.remove("field_to_delete");
          } else {
            ctx.op="noop";
          }
      """
    }
  },
  "query": {
    "match_all": {}
  }
}

However this has the same problem - it doesn't remove the second document that has an explicit value of null.

3. In the end I could just do:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": "ctx._source.remove("field_to_delete");"}
  },
  "query": {
    "match_all": {}
  }
}

But this will update all documents and for a large index could mean additional downtime during deployment.

Lavonnelaw answered 18/6, 2019 at 12:33 Comment(0)
L
10

Eventually I found the correct way to do it, so I'm sharing it for the general knowledge:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
        if (ctx._source.containsKey("field_to_delete")) {
            ctx._source.remove("field_to_delete");
        } else {
          ctx.op="noop";
        }
      """
  },
  "query": {
    "match_all": {}
  }
}
Lavonnelaw answered 18/6, 2019 at 12:33 Comment(4)
I was going to suggest using containsKey, good catch ;-) ES will still have to iterate over all the documents, but only those having field_to_delete will effectively be updated.Assuan
you should use an exists query instead of the match_all. Then your script will only be run on document with the field to delete. See here elastic.co/guide/en/elasticsearch/reference/current/…Caiaphas
@PierreMallet: Unfortunately that wouldn't work, as the query would not return the document with "field_to_delete" : null and it won't be updatedLavonnelaw
Oh yes you are right :/ you could maybe work around this with an explicit null_value. See here elastic.co/guide/en/elasticsearch/reference/current/…Caiaphas

© 2022 - 2025 — McMap. All rights reserved.