JPA and first level cache, whats the point?
Asked Answered
F

2

5

EntityManager maintains first level cache for retrieved objects, but if you want to have threadsafe aplication you're creating and closing entityManager for each transaction.

So whats the point of the level 1 cache if those entities are created and closed for every transaction? Or entityManager cache is usable if youre working in single thread?

Flanna answered 29/3, 2014 at 13:51 Comment(1)
I think you should benefit of reading a book like Pro JPA 2: Mastering the Java™ Persistence API. Read that book and you will be ready to do the Java EE 6 Java Persistence API Developer Certified Expert Exam. It will be you answering questions real soon.Aurie
A
5

The first level cache serves other purposes. It is basically the context in which JPA places the entities retrieved from the database.

Performance

So, to start stating the obvious, it avoids having to retrieve a record when it has already being retrieved serving as some form of cache during a transaction processing and improving performance. Also, think about lazy loading. How could you implement it without a cache to record entities that have already being lazy loaded?

Cyclic Relationships

This caching purpose is vital to the implementation of appropriate ORM frameworks. In object-oriented languages it is common that the object graph has cyclic relationships. For instance, a Department that has Employee objects and those Employee objects belong to a Department.

Without a context (aka as Unit of Work) it would be difficult to keep track of which records you have already ORMed and you would end up creating new objects, and in a case like this, you may even end up in an infinite loop.

Keep Track of Changes: Commit and Rollback

Also, this context keeps track of the changes you do to the objects so that they can be persisted or rolled back at some later point when the transaction ends. Without a cache like this you would be forced to flush your changes to the database immediately as they happen and then you could not rollback, neither could you optimize the best moment to flush them to the store.

Object Identity

Object identity is also vital in ORM frameworks. That is if you retrieve employee ID 123, then if at some time you need that Employee, you should always get the same object, and not some new Object containing the same data.

This type of cache is not to be shared by multiple threads, if it was so, you would compromise performance and force everyone to pay that penalty even when they could be just fine with a single-threaded solution. Besides the fact that you would end up with a much more complex solution that would be like killing a fly with a bazooka.

That is the reason why if what you need is a shared cache, then you actually need a 2nd-level cache, and there are implementations for that as well.

Aurie answered 29/3, 2014 at 14:29 Comment(2)
Thanks for the answer. Now suppose we have two Entities, and em1 has loaded instance of Entity1 with some dependant Entity2 instances from thread1. Now the em2 is trying to load same Entity2 instances from thread2. What should happen here? Should em2 wait for em1 got closed, or I'll get the exception for objects already being in em1? You can answer here #22732916Flanna
Every thread will have independent units of work, therefore they have independent copies of the entity. Obviously, the right thing to do is to design your program such that those threads do not mix entities from different entitity managers.Aurie
C
7

The point is to have an application that works like you expect it to work, and that wouldn't be slow as hell. Let's take an example:

Order order = em.find(Order.class, 3L);
Customer customer = em.find(Customer.class, 5L);
for (Order o : customer.getOrders()) { // line A
    if (o.getId().longValue == 3L) {
        o.setComment("hello"); // line B
        o.setModifier("John"); 
    }
}

System.out.println(order.getComment)); // line C

for (Order o : customer.getOrders()) { // line D
    System.out.println(o.getComment()); // line E
}

At line A, JPA executes a SQL query to load all the orders of the customer.

At line C, what do you expect to be printed? null or "hello"? You expect "hello" to be printed, because the order you modified at line B has the same ID as the one loaded in the first line. That wouldn't be possible without the first-level cache.

At line D, you don't expect the orders to be loaded again from the database, because they have already been loaded at line A. That wouldn't be possible without the first-level cache.

At line E, you expect once again "hello" to be printed for the order 3. That wouldn't be possible without the first-level cache.

At line B, you don't expect an update query to be executed, because there might be many subsequent modifications (like in the next line), to the same entity. So you expect these modifications to be written to the database as late as possible, all in one go, at the end of the transaction. That wouldn't be possible without the first-level cache.

Colincolinson answered 29/3, 2014 at 14:29 Comment(0)
A
5

The first level cache serves other purposes. It is basically the context in which JPA places the entities retrieved from the database.

Performance

So, to start stating the obvious, it avoids having to retrieve a record when it has already being retrieved serving as some form of cache during a transaction processing and improving performance. Also, think about lazy loading. How could you implement it without a cache to record entities that have already being lazy loaded?

Cyclic Relationships

This caching purpose is vital to the implementation of appropriate ORM frameworks. In object-oriented languages it is common that the object graph has cyclic relationships. For instance, a Department that has Employee objects and those Employee objects belong to a Department.

Without a context (aka as Unit of Work) it would be difficult to keep track of which records you have already ORMed and you would end up creating new objects, and in a case like this, you may even end up in an infinite loop.

Keep Track of Changes: Commit and Rollback

Also, this context keeps track of the changes you do to the objects so that they can be persisted or rolled back at some later point when the transaction ends. Without a cache like this you would be forced to flush your changes to the database immediately as they happen and then you could not rollback, neither could you optimize the best moment to flush them to the store.

Object Identity

Object identity is also vital in ORM frameworks. That is if you retrieve employee ID 123, then if at some time you need that Employee, you should always get the same object, and not some new Object containing the same data.

This type of cache is not to be shared by multiple threads, if it was so, you would compromise performance and force everyone to pay that penalty even when they could be just fine with a single-threaded solution. Besides the fact that you would end up with a much more complex solution that would be like killing a fly with a bazooka.

That is the reason why if what you need is a shared cache, then you actually need a 2nd-level cache, and there are implementations for that as well.

Aurie answered 29/3, 2014 at 14:29 Comment(2)
Thanks for the answer. Now suppose we have two Entities, and em1 has loaded instance of Entity1 with some dependant Entity2 instances from thread1. Now the em2 is trying to load same Entity2 instances from thread2. What should happen here? Should em2 wait for em1 got closed, or I'll get the exception for objects already being in em1? You can answer here #22732916Flanna
Every thread will have independent units of work, therefore they have independent copies of the entity. Obviously, the right thing to do is to design your program such that those threads do not mix entities from different entitity managers.Aurie

© 2022 - 2024 — McMap. All rights reserved.