Using FlatFileItemReader with a TaskExecutor (Thread Safety)
Asked Answered
I

1

8

There are a lot of examples which use FlatFileItemReader along with TaskExecutor. I provide samples below (both with XML and Java Config):

I have used it my self with XML configuration for large CSVs (GB size) writing to database with the out-of-the-box JpaItemWriter. There seem to be no issues even without setting save-state = false or taking any kind of special handling.

Now, FlatFileItemReader is documented as not thread-safe.

My guess was that JpaItemWriter was "covering" the issue by persisting Sets i.e. collections with no duplicates if the hashCode() and equals() were covering the business key of the Entity. However, even this way it is not enough to prevent duplicates due to non-thread safe reading and processing.

Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?

P.S: The example links that I give above, use FlatFileItemReader with TaskExecutor without mentioning at all possible thread-safety issues...

Idolater answered 16/2, 2017 at 10:10 Comment(4)
I may be wrong but are you asking how JpaItemWriter is thread-safe when FlatFileItemReader is not?Robyn
My question is more general. I provide 2 links where a FlatFileItemReader is used with TaskExecutor and different Writers without any special handling for thread safety. Moreover I provided my own experience and I tried to give a theoretical explanation on how a JpaItemWriter could "hide" the problem under some circumstances. So in other words my question is: how should we use it properly and if finally it can work with TaskExecutor as-is.Idolater
Do you mean the JpaItemReader or JpaItemWriter?Amir
Sorry if I am was not so explicit. I re-phrase: Is it proper to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecuror? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?Idolater
R
6

TL;DR It is safe to use a FlatFileItemReader with a TaskExecutor provided the Writer is thread-safe. (Assuming that you are not concerned with restarting jobs, retrying steps, skipping, etc at the moment).

Update : There is now a JIRA that officially confirms that saveState needs to be set to false (i.e disable restartability) if one wants to use FlatFileItemReader with a TaskExecutor in a thread safe manner.


Let's first hear it from the horses mouth by seeing what the Spring documentation says about using multi-threaded steps with a TaskExecutor.

Spring Batch provides some implementations of ItemWriter and ItemReader. Usually they say in the Javadocs if they are thread safe or not, or what you have to do to avoid problems in a concurrent environment. If there is no information in Javadocs, you can check the implementation to see if there is any state

Let's address your questions now :

Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?

The statement "Regardess of the writer" is incorrect. The Writer you use must be thread-safe. The JpaItemWriter is thread-safe accroding to the Java docs and can safely be used with a FlatFileItemReader that is not thread-safe. Explaining how JpaItemWriter is thread-safe would make this answer long. I recommend that you post another question if you are interested in how specific writers handle thread-safety. (As mentioned by the Spring Batch docs as well)

P.S: The example links that I give above, use FlatFileItemReader with TaskExecutor without mentioning at all possible thread-safety issues..

If you take a look at the coherence example, you will see that they clearly modify the CoherenceBatchWriter.java in Figure 6. They first make mapBatch local variable so that multiple threads have their own copy of this Map. Moreover, if you dig further into the Coherence API, you should find that the NamedCache being returned would be thread safe.

The second link that you provide looks really dicey since the Writer does not do anything to avoid race conditions. That example is indeed an incorrect use of a multi-threaded step.


Robyn answered 16/2, 2017 at 17:17 Comment(2)
Thank you for all the provided information. However, given the chunk-oriented processing style, if the FlatFileItemReader is not thread-safe, isn't it possible different threads to have read the same lines of a file? This way, the Writer (even if it is thread-safe) would attempt to write duplicate items in the end. This is why I didn't put emphasis on the Writer part.Idolater
@Idolater Yes that's possible but if the writer is thread-safe, this issue is nullified so all is well in the end. If you take a look at the Writer implementation for Cohernece, you will see that the ensure that there won't be any duplicate writes as the writer writes to a thread-safe Map and a Map doesn't allow duplicate keys. Also, JpaItemWriter is thread-safe and would give you the same quarantine of avoidin duplicate inserts IMO. So the writer does matter even if you did not emphasize on it.Robyn

© 2022 - 2024 — McMap. All rights reserved.