How to select latest record in group using JPQL in Spring JpaRepository?
Asked Answered
F

1

9

In a SpringBoot Microservice, I am trying to select the latest record for an actor for each mean_of_payment_id. To achieve this, selecting actor content for actor_id where created_date is equaled to the subset of the nested query of max(created_date) using group by clause on mean_of_payment_id. I am using JPQL. Below are the table structure and query.

enter image description here

    @Query("select ac from ActorContent ac "
        + "where (ac.actor.uuid=:actorUuid ) and "
        + "ac.createdDate IN ( SELECT MAX(aci.createdDate) "
            + "FROM ActorContent aci WHERE ac.actor.uuid=aci.actor.uuid "
            + "and aci.uuid = ac.uuid group by ac.meanOfPayment.id)"
        )

enter image description here

Unfortunately, After executing the query I am getting all the records but what I am expecting is the top three rows. MeanOfPayment and Actor are the referential tables for ActorContent.

Fabria answered 10/10, 2018 at 16:4 Comment(5)
why are you referencing ac.actor.uuid and ac.uuid in the inner query? Shouldn't be enough to just SELECT MAX(aci.createdDate) FROM ActorContent aci WHERE aci.actor.uuid=:actorUuid group by ac.meanOfPayment.id)"Timberlake
@Zeromus, you are the Man. I changed the subquery and got the perfect result but I have one doubt in mind, is using subquery is a good way or can I scale this query another way, any thoughts?Fabria
This is a common enough problem to have a tag just for it [greatest-n-per-group]. I personally just go for the subquery when in the same situation, but there might be better alternativesTimberlake
Here's a similar question #7746109Timberlake
Thank you for your valuable response and feedback. I am having the correct results but I feel there could be more scalable solution. Please feel free to have a look.Fabria
C
4

I think in terms of relational algebra you are asking for the set of ActorContent minus the set of ActorContent constrained by actor = actor and meanOfPayment = meanOfPayment and createDate < createDate. So, the way to think of it is to get the second set from a cross product of ActorContent with ac1.meanOfPayment = ac2.meanOfPayment and ac1.actor = ac2.actor and ac1.createDate < ac2.createDate. Then subtract this set from the set of ActorContent. I haven't looked to see if it is more efficient than using MAX and Group By By example:

@Query("select ac from ActorContent ac where ac.id not in (select ac1.id from ActorContent ac1, ActorContent ac2 where ac1.meanOfPayment = ac2.meanOfPayment and ac1.actor = ac2.actor and ac1.createDate < ac2.createDate)")

This gives me the first four rows in the UPPER table representing the first actor and his only meanOfPayment and the second actor and his most recent payments for all three meanOfPayments.

ActorContent [id=1, actor=Actor [id=1], meanOfPayment=MeanOfPayment [id=1], amount=10500.00, createDate=2018-10-09 00:00:00.887]
ActorContent [id=2, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=1], amount=-10400.00, createDate=2018-10-02 00:00:00.887]
ActorContent [id=3, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=3], amount=6000.00, createDate=2018-10-02 00:00:00.887]
ActorContent [id=4, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=2], amount=200.00, createDate=2018-09-30 00:00:00.887]

After that you may want to optimize the query by join fetching the Actor and MeanOfPayment instances. By Example:

@Query("select ac from ActorContent ac left outer join fetch ac.actor left outer join fetch ac.meanOfPayment where ac.id not in (select ac1.id from ActorContent ac1, ActorContent ac2 where ac1.meanOfPayment = ac2.meanOfPayment and ac1.actor = ac2.actor and ac1.createDate < ac2.createDate)")

This results in the following hibernate generated SQL query:

select actorconte0_.id as id1_1_0_, actor1_.id as id1_0_1_, meanofpaym2_.id as id1_2_2_, actorconte0_.actor_id as actor_id4_1_0_, actorconte0_.amount as amount2_1_0_, actorconte0_.create_date as create_d3_1_0_, actorconte0_.mean_of_payment_id as mean_of_5_1_0_ from actor_content actorconte0_ left outer join actor actor1_ on actorconte0_.actor_id=actor1_.id left outer join mean_of_payment meanofpaym2_ on actorconte0_.mean_of_payment_id=meanofpaym2_.id where actorconte0_.id not in  (select actorconte3_.id from actor_content actorconte3_ cross join actor_content actorconte4_ where actorconte3_.mean_of_payment_id=actorconte4_.mean_of_payment_id and actorconte3_.actor_id=actorconte4_.actor_id and actorconte3_.create_date<actorconte4_.create_date)

Of course if you want a specific Actor then just add that to the where clause.

@Query("select ac from ActorContent ac left outer join fetch ac.actor left outer join fetch ac.meanOfPayment where ac.actor.id = :actorId and ac.id not in (select ac1.id from ActorContent ac1, ActorContent ac2 where ac1.meanOfPayment = ac2.meanOfPayment and ac1.actor = ac2.actor and ac1.createDate < ac2.createDate)")
public List<ActorContent> findLatestForActor(@Param("actorId") Integer actorId);

and that gives me the "top three rows"

ActorContent [id=2, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=1], amount=-10400.00, createDate=2018-10-02 00:00:00.066]
ActorContent [id=3, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=3], amount=6000.00, createDate=2018-10-02 00:00:00.066]
ActorContent [id=4, actor=Actor [id=2], meanOfPayment=MeanOfPayment [id=2], amount=200.00, createDate=2018-09-30 00:00:00.066]

If you have an issue with having the same createDate for an Actor and MeanOfPayment combination then you can deal with in a few different ways. First if you have a logical constraint such that you don't want to process those duplicates then you should probably have a database constraint as well so you don't get them and ensure you don't create them in the first place. Another things is you can manually check the result list and remove them. Finally you can use a distinct in your query but you have to leave out the ActorContent id field since it will not be unique. You can do this with a DTO but JPA can't handle projection and join fetch at the same time so you will only be getting actor.id and meanOfPayment.id or you will be doing multiple selects. Multiple selects probably not a deal killer in this use case but you have to decide all that for yourself. Of course you could also make the primary key of ActorContent a combination of actor.id, meanOfPayment.id, and createDate and that would have the added benefit of being the constraint mentioned above.

These are the Entities I worked with.

@Entity
public class Actor {
    @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Integer id;

@Entity
public class MeanOfPayment {
    @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Integer id;

@Entity
public class ActorContent {
    @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Integer id;

    @ManyToOne
    private Actor actor;
    @ManyToOne
    private MeanOfPayment meanOfPayment;

    private BigDecimal amount;
    @Temporal(TemporalType.TIMESTAMP)
    private Date createDate;
Coo answered 24/10, 2018 at 3:43 Comment(3)
Thank you @K.Nicholas, the solution you proposed giving me the exact result what I was looking for. Regarding performance, what do you think? Suppose we have 5000 records to select only three records, we are doing something like 5000 - 4777. (not in (select ac1.id from ActorContent ac1, ActorContent ac2 where ac1.meanOfPayment = ac2.meanOfPayment and ac1.actor = ac2.actor and ac1.createDate < ac2.createDate)Fabria
For performance it's best to understand and inspect the query analysis tools for whatever platform you are using. Also it can be important to be sure you have enough buffer_pool_size and memory settings so the sql server can do its work efficiently.Coo
You might also add and ac1.actor.id = :actorId and ac2.actor.id = :actorId to the subquery to limit the rows there but it might not matter anyway. Again, if you can analyze performance with the server.Coo

© 2022 - 2024 — McMap. All rights reserved.