Spring Data Elasticsearch 4.4.x: How to get Aggregations from SearchHits?
Asked Answered
L

2

2

I'm new to Spring Data elasticsearch. I'm working on a project in which I'm indexing bugs faced in different projects (just for example).

I want to fetch all projects, with the number of bugs in each project.

Here is my document:

@Data
@Document(indexName = "all_bugs")
public class Bug{
    @Id
    private String recordId;
    private Project project;
    private String bugSummary;
    private String status;
    // other fields omitted for brevity
}

This is the Project class

@Data
public class Project {
    private String projectId;
    private String name;
}

Now all the bugs are in elasticsearch, and I can execute this query in the Kibana console to get All projects, with the count of bugs in each project

GET /all_bugs/_search
{
  "size": 0,
  "aggs": {
    "distinct_projects": {
      "terms": {
        "field": "project.projectId",
        "size": 10
      },
      "aggs": {
        "project_details": {
          "top_hits": {
            "size": 1,
            "_source": {
              "includes": ["project.projectId", "project.name"]
            }
          }
        }
      }
    }
  }
}

Though I know i need to make this better, the problem i'm facing is in the Spring Data Elasticsearch part. This is my method to construct the aggregation.

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    public List<DistinctProject> getDistinctProjects() {
        TermsAggregationBuilder aggregation = AggregationBuilders
                .terms("distinct_projects")
                .field("projects.projectId")
                .size(10)
                .subAggregation(AggregationBuilders
                        .topHits("project_details")
                        .size(1)
                        .fetchSource(new String[]{"project.name", "project.projectId"}, null));

        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withAggregations(aggregation)
                .build();

        SearchHits<DistinctProject> searchHits = elasticsearchOperations.search(searchQuery, DistinctProject.class);

//I dont' know what to do from here...
    }

Now, I have the SearchHits<DistinctProject> with me. The question is, how do I get the aggregations from here to construct my response? In this case DistinctProject is simply a DTO in which I want to store projectId, name and docCount so that I can create a List and return it to the caller.

Now, the problem here, is all documentation I've gone through so far suggests me to implement searchHits.getAggregations().get("distinct_projects"), but that's not available in Spring Data Elasticsearch 4.4.11, which we're using. According to the documentation here,

The SearchHitsclass does not contain the org.elasticsearch.search.aggregations.Aggregations anymore. Instead it now contains an instance of the org.springframework.data.elasticsearch.core.AggregationsContainer class

So, searchHits.getAggregations().get("distinct_projects") throws a compilation error. I'm unable to proceed beyond this point.

I also referened this answer by P.J.Meisch, but this too referred to an older version of Spring Data Elasticsearch

I would really appreciate if someone could help me get out of this block.

For information, My spring boot version is 2.7.11 and the Spring Data elasticsearch version is 4.4.11.

Thanks, Sriram

Laxity answered 18/5, 2023 at 12:47 Comment(0)
T
0

I've tested your code. Sadlly, There is no data model for aggregation in Spring Data Elasticsearch. But you can treat aggregation data as json, and parse it by yourself.

    @Test
        public void testCreate(){
            TermsAggregationBuilder aggregation = AggregationBuilders
                    .terms("distinct_projects")
                    .field("project.projectId") // your code here is wrong
                    .size(10)
                    .subAggregation(AggregationBuilders
                            .topHits("project_details")
                            .size(1)
                            .fetchSource(new String[]{"project.name", "project.projectId"}, null));

            NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                    .withAggregations(aggregation)
                    .build();

            SearchHits<DistinctProject> searchHits = elasticsearchOperations.search(searchQuery, DistinctProject.class, IndexCoordinates.of("all_bugs2"));

            System.out.println(JSONObject.toJSONString(searchHits.getAggregations()));
        }

    {
            "asMap": {
                    "distinct_projects": {
                            "buckets": [{
                                    "aggregations": {
                                            "asMap": {
                                                    "project_details": {
                                                            "fragment": true,
                                                            "hits": {
                                                                    "fragment": true,
                                                                    "hits": [{
                                                                            "documentFields": {},
                                                                            "fields": {},
                                                                            "fragment": false,
                                                                            "highlightFields": {},
                                                                            "id": "tqpfM4gBOyQu5gYl2sOB",
                                                                            "matchedQueries": [],
                                                                            "metadataFields": {},
                                                                            "primaryTerm": 0,
                                                                            "rawSortValues": [],
                                                                            "score": 1.0,
                                                                            "seqNo": -2,
                                                                            "sortValues": [],
                                                                            "sourceAsMap": {
                                                                                    "project": [{
                                                                                            "name": "my project",
                                                                                            "projectId": 10
                                                                                    }]
                                                                            },
                                                                            "sourceAsString": "{\"project\":[{\"name\":\"my project\",\"projectId\":10}]}",
                                                                            "sourceRef": {
                                                                                    "fragment": true
                                                                            },
                                                                            "type": "_doc",
                                                                            "version": -1
                                                                    }],
                                                                    "maxScore": 1.0,
                                                                    "totalHits": {
                                                                            "relation": 0,
                                                                            "value": 1
                                                                    }
                                                            },
                                                            "name": "project_details",
                                                            "type": "top_hits"
                                                    }
                                            },
                                            "fragment": true
                                    },
                                    "docCount": 1,
                                    "docCountError": 0,
                                    "fragment": true,
                                    "key": 10,
                                    "keyAsNumber": 10,
                                    "keyAsString": "10"
                            }],
                            "docCountError": 0,
                            "fragment": true,
                            "name": "distinct_projects",
                            "sumOfOtherDocCounts": 0,
                            "type": "lterms"
                    }
            },
            "fragment": true
    }
Tartu answered 19/5, 2023 at 12:44 Comment(2)
Yes, that's what I figured too. For now, I have moved to writing this piece in NodeJS, it's more intuitive.. Until i figure out something equally efficient with Spring Boot.Laxity
Did you find a solution ?Kus
I
3

I have a bit more recent Spring and spring-data-elasticsearch version, but hopefully my solution will help you. I was able to extract aggregations from SearchHits like so:

public List<NipSummary> getMergeSuggestionsByField(String fieldName) {
    Query query = NativeQuery.builder()
        .withAggregation("suggestions", Aggregation.of(a -> a
        .terms(ta -> ta.field("nip.keyword").size(10).minDocCount(2))))
        .build();

    SearchHits<CustomerDataDTO> searchHits = elasticsearchOperations.search(query, CustomerDataDTO.class);
    ElasticsearchAggregations aggregations = (ElasticsearchAggregations) searchHits.getAggregations();
    assert aggregations != null;
    List<StringTermsBucket> buckets = aggregations.aggregationsAsMap().get("suggestions").aggregation().getAggregate().sterms().buckets().array();

    List<NipSummary> result = new ArrayList<>();

    buckets.forEach(stringTermsBucket -> result.add(
            NipSummary.builder()
                    .nip(stringTermsBucket.key().stringValue())
                    .count(stringTermsBucket.docCount())
                    .build()
    ));

    return result;
}

NipSummary is simple DTO for storing each aggregation result:

@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class NipSummary {

    private String nip;
    private Long count;
}

I have simpler aggregation, but I think it might be a good start. This is the most crucial part:

ElasticsearchAggregations aggregations = (ElasticsearchAggregations) searchHits.getAggregations();
assert aggregations != null;
List<StringTermsBucket> buckets = aggregations.aggregationsAsMap().get("suggestions").aggregation().getAggregate().sterms().buckets().array();

Where "suggestions" is the name of my aggregation. Also I am not sure if StringTermsBucket is still valid for more complex aggregations. I have found it by digging in the debugger output.

Individualism answered 21/8, 2023 at 12:48 Comment(3)
The code looks much simpler, but i'm stuck here: List<StringTermsBucket> buckets = aggregations.aggregationsAsMap().get("suggestions").aggregation().getAggregate().sterms().buckets().array(); ElasticsearchAggregations.aggregationsAsMap() is not available. I'm using spring boot version 2.7.14. Anything wrong?Laxity
Two lines earlier there is casting to org.springframework.data.elasticsearch.client.elc.ElasticsearchAggregations. Take a look in your code whether you are using the same package. It is part of Spring Data Elasticsearch. If you still have any issues please let me know or try to inspect aggregations in debugger. My exact package version is Spring Data Elasticsearch 5.1.1. I am also using some classes from Elasticsearch Java Client (elasticsearch-java-8.5.3), probably included in Spring Data Elasticsearch.Individualism
However this particular class shall be available since 4.4: docs.spring.io/spring-data/elasticsearch/docs/5.1.1/api/org/…Individualism
T
0

I've tested your code. Sadlly, There is no data model for aggregation in Spring Data Elasticsearch. But you can treat aggregation data as json, and parse it by yourself.

    @Test
        public void testCreate(){
            TermsAggregationBuilder aggregation = AggregationBuilders
                    .terms("distinct_projects")
                    .field("project.projectId") // your code here is wrong
                    .size(10)
                    .subAggregation(AggregationBuilders
                            .topHits("project_details")
                            .size(1)
                            .fetchSource(new String[]{"project.name", "project.projectId"}, null));

            NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                    .withAggregations(aggregation)
                    .build();

            SearchHits<DistinctProject> searchHits = elasticsearchOperations.search(searchQuery, DistinctProject.class, IndexCoordinates.of("all_bugs2"));

            System.out.println(JSONObject.toJSONString(searchHits.getAggregations()));
        }

    {
            "asMap": {
                    "distinct_projects": {
                            "buckets": [{
                                    "aggregations": {
                                            "asMap": {
                                                    "project_details": {
                                                            "fragment": true,
                                                            "hits": {
                                                                    "fragment": true,
                                                                    "hits": [{
                                                                            "documentFields": {},
                                                                            "fields": {},
                                                                            "fragment": false,
                                                                            "highlightFields": {},
                                                                            "id": "tqpfM4gBOyQu5gYl2sOB",
                                                                            "matchedQueries": [],
                                                                            "metadataFields": {},
                                                                            "primaryTerm": 0,
                                                                            "rawSortValues": [],
                                                                            "score": 1.0,
                                                                            "seqNo": -2,
                                                                            "sortValues": [],
                                                                            "sourceAsMap": {
                                                                                    "project": [{
                                                                                            "name": "my project",
                                                                                            "projectId": 10
                                                                                    }]
                                                                            },
                                                                            "sourceAsString": "{\"project\":[{\"name\":\"my project\",\"projectId\":10}]}",
                                                                            "sourceRef": {
                                                                                    "fragment": true
                                                                            },
                                                                            "type": "_doc",
                                                                            "version": -1
                                                                    }],
                                                                    "maxScore": 1.0,
                                                                    "totalHits": {
                                                                            "relation": 0,
                                                                            "value": 1
                                                                    }
                                                            },
                                                            "name": "project_details",
                                                            "type": "top_hits"
                                                    }
                                            },
                                            "fragment": true
                                    },
                                    "docCount": 1,
                                    "docCountError": 0,
                                    "fragment": true,
                                    "key": 10,
                                    "keyAsNumber": 10,
                                    "keyAsString": "10"
                            }],
                            "docCountError": 0,
                            "fragment": true,
                            "name": "distinct_projects",
                            "sumOfOtherDocCounts": 0,
                            "type": "lterms"
                    }
            },
            "fragment": true
    }
Tartu answered 19/5, 2023 at 12:44 Comment(2)
Yes, that's what I figured too. For now, I have moved to writing this piece in NodeJS, it's more intuitive.. Until i figure out something equally efficient with Spring Boot.Laxity
Did you find a solution ?Kus

© 2022 - 2024 — McMap. All rights reserved.