Spring Data Elasticsearch (4.x) - Using @Id forces id field in _source
Asked Answered
C

1

9

Summary

Recently we upgraded to Spring Data Elasticsearch 4.x. Part of this major release meant that Jackson is no longer used to convert our domain objects to json (using MappingElasticsearchConverter instead) [1]. This means we are now forced to add a new id field to all our documents.

Previously we had domain objects like this:

import org.springframework.data.annotation.Id;

public ESDocument {
    @Id
    private String id;

    private String field1;

    @JsonIgnore
    public String getId() {
        return id;
    }

    public String getField1() {
        return field1;
    }

Which resulted in documents like this in ES:

{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
  "_score" : 1.0,
  "_source" : {
    "field1" : "blabla"
  }
}

Note that:

  1. The @JsonIgnore annotation used to ensure that we were not required to have a id field in the _source.
  2. We are setting the document id ourselves and it ends up in _id.

Problem

With Spring Data Elastic 4.x the @JsonIgnore annotation is no longer respected which means we are now forced to have an id field in the _source as shown below:

{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
  "_score" : 1.0,
  "_source" : {
    "id": "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
    "field1" : "blabla"
  }
}

Questions

  1. Is it no longer possible to omit the duplication of the identifier of the document (i.e. in the _id and id fields)? If so how? (Note we already tried @org.springframework.data.annotation.Transient which does not work because spring-data-elastic then thinks our document does not have an id).
  2. Was our previous approach of suppressing the id field in _source incorrect or problematic?

Versions

java: 1.8.0_252
elasticsearch: 7.6.2
spring-boot: 2.3.1.RELEASE
spring-data-elastic: 4.0.1.RELEASE

References

[1] - https://spring.io/blog/2020/05/27/what-s-new-in-spring-data-elasticsearch-4-0

Chisel answered 6/7, 2020 at 23:0 Comment(2)
id field will always be present in the document. However, if you don't want it then remove it from your mappings instead? or you can use queries to fetch documents and can exclude id field. Ref: elastic.co/guide/en/elasticsearch/reference/current/…Krieger
Thanks @Harshit. Like I said in my question, up until now we have not had a _source.id field in our documents. Thanks for the link.Chisel
M
8

Question 1:

To omit the id field from the _source, you would normally use the @Transient annotation, but as you wrote, this does not work for the id property. Transient properties are ignored in Spring Data modules (not only Spring Data Elasticsearch).

But you you can use the org.springframework.data.annotation.ReadOnlyProperty annotation for this:

@Id
@ReadOnlyProperty
private String id;

To be honest, I didn't know up to now that this exists, this comes from Spring Data Commons as well and is checked in the isWriteable() method of the property when properties are written by the MappingElasticsearchConverter .

Question 2:

Surely not incorrect, but problematic as you found out. We always consider the whole entity when storing it, so we never thought about not writing the id. Strictly speaking, it is not necessary, there you're right, because we always get the id back in the _id field together with the _source, so we can easily put the entity back together, but we never considered this a necessary feature to have.

Note:

When you look at the data in your ES index you will find that with the MappingElasticsearchConverter an additional _source field named _class is written which contains the name of the entity class (or a defined alias). This allows for mapping generics; for further info check the documentation - just in case you wonder where this comes from.

Edit 18.11.2022:

Recently (with version 4.4.3) we had a change that fixed a wrong behaviour in Spring Data Elasticsearch: Spring Data Elasticsearch must not write data into a property that is marked with @ReadOnlyProperty. This leads to the proposed solution not working any longer because on reading data from Elasticsearch the id property is not filled anymore.

To get the id property being set in this case it is necessary to add an AfterConvertCallback to your application:

#import org.springframework.data.elasticsearch.core.event.AfterConvertCallback;

@Component
public class EntityAfterConvertCallback implements AfterConvertCallback<EsDocument> {

    @Override
    public EsDocument onAfterConvert(EsDocument entity, Document document, IndexCoordinates indexCoordinates) {
        entity.setId(document.getId());
        return entity;
    }
}
Midyear answered 7/7, 2020 at 19:6 Comment(5)
Thank you very much for your detailed an helpful response @P.J.Meisch ! We'll try your suggestion and let you know how it goes.Chisel
Hi @P.J.Meisch, I can confirm your suggestion works. Thanks again for your help!Chisel
glad I could helpMidyear
Hi @P.J.Meisch, Thanks for your update from 18.11.2022. We are just in the process of upgrading our dependencies. Your suggestion of using AfterConvertCallback only works for when we are reading a document (i.e. it lets us read the document id and populate our object). However, when we are saving a new document we have no way of populating our object with the id generated by elastic search. Ideally we could use AfterSaveCallback to extract this id but that doesn't seem to be possible.Chisel
Spring Data Elasticsearch 5.1 will add the property storeIdInSource to the @Document annotation, with this these workaround aren't necessary anymore.Midyear

© 2022 - 2024 — McMap. All rights reserved.