In spring data elasticsearch, can aggregation queries be put in a respository implementation?
Asked Answered
W

1

7

I'm using spring-boot-elasticsearch for the first time. I've now figured out how to describe my serial difference pipeline query using elastics java api. This query, as you'll see below is rather large, and returns several buckets for each object as well as the serial difference between each bucket. The examples I see of search in a Spring Data Repository all seem to spell out the query's json body in a Query Annotation like this:

@Repository
public interface SonarMetricRepository extends ElasticsearchRepository<Article, String> {

    @Query("{\"bool\": {\"must\": {\"match\": {\"authors.name\": \"?0\"}}, \"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
    Page<Article> findByAuthorsNameAndFilteredTagQuery(String name, String tag, Pageable pageable);
}

That seems elegant for basic CRUD operations but how can I put my query below into the repository object without needing to use @Query's raw query syntax? If you have a similar example of what a Model object built for a serial difference query result or any pipeline aggregation that would be even more helpful as well. Basically I want a search method in my repository like this

Page<Serial Difference Result Object> getCodeCoverageMetrics(String projectKey, Date start, Date end, String interval, int lag);

I should mention part of the reason I want to use this object is that I will have other CRUD query in here as well, AND I think it'll handle pagination for me, so that's appealing.

Here is my query which shows the serial difference between code coverage on sonar projects from a 1 week time period:

        SerialDiffPipelineAggregationBuilder serialDiffPipelineAggregationBuilder =
            PipelineAggregatorBuilders
                    .diff("Percent_Change", "avg_coverage")
                    .lag(1);

    AvgAggregationBuilder averageCoverageAggregationBuilder = AggregationBuilders
            .avg("avg_coverage")
            .field("coverage");

    AggregationBuilder coverageHistoryAggregationBuilder = AggregationBuilders
            .dateHistogram("coverage_history")
            .field("@timestamp")
            .calendarInterval(DateHistogramInterval.WEEK)
            .subAggregation(averageCoverageAggregationBuilder)
            .subAggregation(serialDiffPipelineAggregationBuilder);

    TermsAggregationBuilder sonarProjectKeyAggregationBuilder = AggregationBuilders
            .terms("project_key")
            .field("key.keyword")
            .subAggregation(coverageHistoryAggregationBuilder);

    BoolQueryBuilder searchQuery = new BoolQueryBuilder()
            .filter(matchAllQuery())
            .filter(matchPhraseQuery("name.keyword", "my-sample-sonar-project"))
            .filter(rangeQuery("@timestamp")
                    .format("strict_date_optional_time")
                    .gte("2020-07-08T19:29:12.054Z")
                    .lte("2020-07-15T19:29:12.055Z"));

    // Join query and aggregation together
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
            .query(searchQuery)
            .aggregation(sonarProjectKeyAggregationBuilder);

    SearchRequest searchRequest = new SearchRequest("sonarmetrics").source(searchSourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
We answered 24/7, 2020 at 14:1 Comment(3)
which version of Spring Data Elasticsearch do you use?Ollie
4.0.0.RELEASE but now that you mention it I see 4.0.2 out there. Willing to upgrade if that helps.We
upgrade to 4.0.2 makes no difference for your problem. But you should upgrade,as 4.0.x contain bugfixesOllie
O
19

Ok, so if I get it right, you want to add an aggregation to a repository query. This is not possible with the methods that are automatically created by Spring Data Elasticsearch, but it is not too hard to implement it.

To show you how to do this, I use a simpler example, where we have defined a Person entity:

@Document(indexName = "person")
public class Person {

    @Id
    @Nullable
    private Long id;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String lastName;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String firstName;

    // getter/setter
}

There also is a corresponding repository:

public interface PersonRepository extends ElasticsearchRepository<Person, Long>{
}

We now want to extend this repository to be able to search for persons with a first name, and also return for these persons the top 10 ten last names with the count (a terms aggs on the lastNames).

The first thing to do is to define a customization repository that describes the method you need:

interface PersonCustomRepository {
    SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable);
}

We want to pass in a Pageable so that the methods returns pages of data. We return a SearchPage object check the documentation on return types that will contain paging information along with a SearchHits<Person>. This object then has the aggregations information and the result data.

We then change the PersonRepository to extend this new interface:

public interface PersonRepository extends ElasticsearchRepository<Person, Long>, PersonCustomRepository {
}

Of course we now need to provide an implementation in a class named PersonCustomRepositoryImpl (this must be named like the interface with Impl added):

public class PersonCustomRepositoryImpl implements PersonCustomRepository {

    private final ElasticsearchOperations operations;

    public PersonCustomRepositoryImpl(ElasticsearchOperations operations) { // let Spring inject an operations which we use to do the work
        this.operations = operations;
    }

    @Override
    public SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable) {

        Query query = new NativeSearchQueryBuilder()                       // we build a Elasticsearch native query
            .addAggregation(terms("lastNames").field("lastName").size(10)) // add the aggregation
            .withQuery(QueryBuilders.matchQuery("firstName", firstName))   // add the query part
            .withPageable(pageable)                                        // add the requested page
            .build();

        SearchHits<Person> searchHits = operations.search(query, Person.class);  // send it of and get the result

        return SearchHitSupport.searchPageFor(searchHits, pageable);  // convert the result to a SearchPage
    }
}

That's all for the implementation of the search. Now the repository has this additional method. How to use it?

For this demo I assume we have a REST controller that takes a name and returns a pair of:

  1. the found persons as a list of SearchHit<Person> objects
  2. a Map<String, Long> containing the last names and their count

This could be implemented as follows, comments describe what is done:

@GetMapping("persons/firstNameWithLastNameCounts/{firstName}")
public Pair<List<SearchHit<Person>>, Map<String, Long>> firstNameWithLastNameCounts(@PathVariable("firstName") String firstName) {

    // helper function to get the lastName counts from an Elasticsearch Aggregations
    // Spring Data Elasticsearch does not have functions for that, so we need to know what is coming back
    Function<Aggregations, Map<String, Long>> getLastNameCounts = aggregations -> {
        if (aggregations != null) {
            Aggregation lastNames = aggregations.get("lastNames");
            if (lastNames != null) {
                List<? extends Terms.Bucket> buckets = ((Terms) lastNames).getBuckets();
                if (buckets != null) {
                    return buckets.stream().collect(Collectors.toMap(Terms.Bucket::getKeyAsString, Terms.Bucket::getDocCount));
                }
            }
        }
        return Collections.emptyMap();
    };

    // the parts of the returned object
    Map<String, Long> lastNameCounts = null;
    List<SearchHit<Person>> searchHits = new ArrayList<>();

    // request pages of size 1000
    Pageable pageable = PageRequest.of(0, 1000);
    boolean fetchMore = true;
    while (fetchMore) {
        // call the custom method implementation
        SearchPage<Person> searchPage = personRepository.findByFirstNameWithLastNameCounts(firstName, pageable);

        // get the aggregations on the first call, will be the same on the other pages
        if (lastNameCounts == null) {
            Aggregations aggregations = searchPage.getSearchHits().getAggregations();
            lastNameCounts = getLastNameCounts.apply(aggregations);
        }

        // collect the returned data
        if (searchPage.hasContent()) {
            searchHits.addAll(searchPage.getContent());
        }

        pageable = searchPage.nextPageable();
        fetchMore = searchPage.hasNext();
    }

    // return the collected stuff
    return Pair.of(searchHits, lastNameCounts);
}

I hope this gives an idea on how to implement custom repository functions and add functionality not provided out of the box.

Ollie answered 26/7, 2020 at 20:27 Comment(7)
Wow. Thanks so much P.J. This is probably the most extensive / detailed answer I've received on a question here! So not only can I get the functionality I need by extending as you described but it looks like I don't have to invent an object to handle modeling the the returned Aggregations. I just need to a make a SonarMetric object and a SearchPage object and then I can just iterate to retrieve my aggregrations on each. Thanks again!We
Hi P.J, I'm implementing this now and I'm finding that the Query object you are using is the spring-data-elastic Query object and the one I'm using above is from Elasticsearch's SearchRequest library. In your solution was I supposed to convert my query to the spring elastic Query (in which case I'm not sure it'll support serial differencing) or can I just use the high level rest client and then use some other mechanism to convert the result to a SearchPage<> object to be returned? I guess when I first read the solution I thought I could do operations.search(searchRequest, Metric.class)We
Nevermind! I got it. Query query = new NativeSearchQueryBuilder() .withQuery(searchQuery) .addAggregation(sonarProjectKeyAggregationBuilder) .withPageable(pageable) .build();We
ES recommends against using fielddata; should work instead with 'lastname.keyword'Mooneye
Thank you @Mooneye Good call. I'll swap that out then.We
Hi @P.J.Meisch please check out my question if you have time #72478648Systematize
Hi @P.J.Meisch, I tried this, but facing issues with the latest version of Spring Data elasticsearch. Could you please look at my question if you have time? #76281492Myrick

© 2022 - 2024 — McMap. All rights reserved.