Query results as a Stream with Hibernate 5.2
Asked Answered
S

2

22

Since Hibernate 5.2, we are able to use the stream() method instead of scroll() if we want to fetch large amount of data.

However, when using scroll() with ScrollableResults we are able to a hook into the retrieval process and to free memory up by either evicting the object from the persistent context after processing it and/or clearing the entire session every now and then.

My questions:

  1. Now, if we use the stream() method, what happens behind the scenes?
  2. Is it possible to evict object from the persistent context?
  3. Is the session cleared periodically?
  4. How is optimal memory consumption achieved?
  5. Is is possible to use e.g. StatelessSession?
  6. Also, if we have set hibernate.jdbc.fetch_size to some number (e.g. 1000) at JPA properties, then how is this combined well with scrollable results?
Segarra answered 12/5, 2017 at 10:58 Comment(2)
Interesting article related to this subject: vladmihalcea.com/2016/11/23/…Segarra
Related Hibernate JIRA issue: hibernate.atlassian.net/browse/HHH-11260Segarra
T
9

The following works for me:

DataSourceConfig.java

@Bean
public LocalSessionFactoryBean sessionFactory() {
    // Link your data source to your session factory
    ...
}

@Bean("hibernateTxManager")
public HibernateTransactionManager hibernateTxManager(@Qualifier("sessionFactory") SessionFactory sessionFactory) {
    // Link your session factory to your transaction manager
    ...
}

MyServiceImpl.java

@Service
@Transactional(propagation = Propagation.REQUIRES_NEW, transactionManager = "hibernateTxManager", readOnly = true)
public class MyServiceImpl implements MyService {

    @Autowired
    private MyRepo myRepo;
    ...
    Stream<MyEntity> stream = myRepo.getStream();
    // Do your streaming and CLOSE the steam afterwards
    ...

MyRepoImpl.java

@Repository
@Transactional(propagation = Propagation.MANDATORY, transactionManager = "hibernateTxManager", readOnly = true)
public class MyRepoImpl implements MyRepo {

    @Autowired
    private SessionFactory sessionFactory;

    @Autowired
    private MyDataSource myDataSource;

    public Stream<MyEntity> getStream() {

        return sessionFactory.openStatelessSession(DataSourceUtils.getConnection(myDataSource))
            .createNativeQuery("my_query", MyEntity.class)
            .setReadOnly(true)
            .setFetchSize(1000)
            .stream();
    }
    ...

Just remember, when you stream you really only need to be cautious of memory at the point of object materialisation. That is truly the only part of the operation susceptible to problems in memory. In my case I chunk the stream at 1000 objects at a time, serialise them with gson and send them to a JMS broker immediately. The garbage collector does the rest.

It's worth noting that Spring's transactional boundary awareness closes the connection to the dB at the end without needing to be explicitly told.

Tribal answered 19/5, 2017 at 17:50 Comment(7)
Do you care to share with us how you measured the memory footprint? We noticed also that you used a native query which in our case is not an option. Saying that we also cannot use stateless session.Bring
I measured the memory footprint with numerous amounts of performance testing and a profiling tool. I ended up streaming 65 million records (fetch size 1000, then materialised into 1000 records per JMS message) straight into Solace. If you need the code for this I can add it. If you are not using a native query or stateless session I would suggest adding a hint to hibernate to tell it cacheable = false. Is there any reason you can't have a stateless session? Our code for this exists within a Spring transactional boundary and operates the same way as any other read that we would perform.Tribal
In our application the entity manager has disabled second level caching. Also the reason why we cannot have stateless session is that we have dynamic criteria queries.Bring
As long as the connection is closing after the transaction and you're using minimal caching you should be fine.Tribal
@wild_nothing: can you please inform which database are you using?Segarra
@Segarra Oracle. We are using Hikari as our connection pool though, so our data source bean returns a HikariDataSource.Tribal
@wild_nothing: in our case we use PostgreSQL. Apparently, this seems to be an important difference. We have tried various approaches including yours as-is (we adapted our scenario a bit for this purpose). We did not see any benefit from using ScrollableResults or Stream. Have a look at the 2 links I added as comments to my original question. It seems that this feature does not have a guaranteed behavior...Segarra
L
9

Hibernate ORM User Guide states that

Internally, the stream() behaves like a Query#scroll and the underlying result is backed by a ScrollableResults.

You can check the source code of org.hibernate.query.internal.AbstractProducedQuery to ensure that it's your duty to clear session periodically or evict object from the persistent context.

As I understand from comments, StatelessSession is not option for you. I think the clean way to solve your case is to implement your own stream() method. It could be very similar to original method, just replace ScrollableResultsIterator with your own that would do what you need (evict object or clear session) during iteration.

Logography answered 29/5, 2017 at 7:43 Comment(1)
Since using StatelessSession makes a difference and since it is up the the user to clear session periodically etc. I am not quite sure which is the added value of ScrollableResults or stream() in hibernate. I am reading also the following article: dzone.com/articles/bulk-fetching-hibernate, which was the one triggering me to investigate this issue.Segarra

© 2022 - 2024 — McMap. All rights reserved.