How to handle empty field names in ElasticSearch?
Asked Answered
R

1

8

I'd like to log users' input to my RESTful API for debugging purpose but whenever there's an empty field in the JSON payload, an error is generated and the log is discarded.

For instance,

{
  "extra": {
    "request": {
      "body": {
        "": ""
      }
    }
  }
}

...will result in

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "field name cannot be an empty string"
    }
  },
  "status": 400
}

It seems to be caused by https://github.com/elastic/elasticsearch/blob/45e7e24736eeb4a157ac89bd16a374dbf917ae26/server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java#L191.

It's a bit tricky since it happens in the parsing phase... Is there any workaround to remove/rename such fields so that it can enable ES to digest these logs?

Randee answered 17/5, 2018 at 6:53 Comment(0)
P
0

You can create an ingestion pipeline to use at index time.

For example, you can create a pipeline that would use a script processor. With some painless language scripting you would be able remove the problematic field:

curl -X PUT http://localhost:9200/_ingest/pipeline/cleandoc -H 'Content-Type: application/json' -d '
{
  "description": "Clean doc",
  "processors": [
      {
        "script": {
          "lang": "painless",
          "source": "if (ctx.extra.request.body.containsKey(params.fieldName)) { ctx.extra.request.body.remove(params.fieldName) } ",
          "params": {
            "fieldName": ""
          }
        }
      }
  ]
}
'

Then when indexing the document you could do it by employing the pipeline so that the json doc gets cleaned before indexed.

curl -X PUT "http://localhost:9200/your_index/_doc/1?pipeline=cleandoc" -H 'Content-Type: application/json' -d '
{
  "extra": {
    "request": {
      "body": {
        "": ""
      }
    }
  }
}
'

Getting the document would result in a clean doc:

$ curl -XGET http://localhost:9200/your_index/_doc/1?pretty
{
  "_index" : "your_index",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "extra" : {
      "request" : {
        "body" : { }
      }
    }
  }
}

Obviously, you can alter the script to copy the value of the problematic field somewhere else.

The above examples were tested with Elasticsearch 6.8 but shouldn't be too different in other versions.

Pamper answered 14/9, 2024 at 9:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.