Asynchronous Logging
Asked Answered
C

6

16

Right now in my application, at certain points, we are logging some heavy stuff into the log files.

Basically only for logging, we are creating JSON of the data available and then logging into Log files. This is a business requirement to log data in JSON format.

Now creating JSON from the data available and then logging to FILE takes a lot of time and impacts the original request return time. Now idea is to improve the situation.

One of the things that we have discussed is creating a thread pool using

Executors.newSingleThreadExecutor() 

in our code and then submitting the task to it which does the conversion of data into JSON and subsequent logging.

Is it a good approach to do this? As we are managing the thread pool itself, is it going to create some issues?

I would appreciate it if someone can share better solutions. Someway to use Log4j for this. I tried to use AsyncAppender but didn't achieve any desired result.

We are using EJB 3, JBoss 5.0, Log4j, and Java6.

Chicalote answered 10/6, 2013 at 7:1 Comment(0)
H
11

I believe you are on right track in terms of using a separate thread pool for logging. In lot of products you will see the asynchronous logging feature. Logs are accumulated and pushed to log files using a separate thread than the request thread. Especially in prodcution environments, where are millions of incoming request and your response time need to be less than few seconds. You cannot afford anything such as logging to slow down the system. So the approach used is to add logs in a memory buffer and push them asynchronously in reasonably sized chunks.

A word of caution while using thread pool for logging As multiple threads will be working on the log file(s) and on a memory log buffer, you need to be careful about the logging. You need to add logs in a FIFO kind of a buffer to be sure that logs are printed in the log files sorted by time stamp. Also make sure the file access is synchronized and you don't run into situation where log file is all upside down or messed up.

Host answered 10/6, 2013 at 7:7 Comment(1)
Thanks for giving me confidence to go ahead in the right direction.But my only worry is I am having using application created threadpool...will that create any issue or is there any way I can leverage jboss thread pool for this.Chicalote
P
7

Have a look at Logback,AsyncAppender it already provide separate threadpool, queue etc and is easily configurable, it almost do the same as you are doing, but saves you from re-inventing the wheel.

Proceeds answered 10/6, 2013 at 7:11 Comment(2)
Is there any difference between this and log4j.AsyncAppender ?Also as part of this I dont only want to log asychronosuly but also convert data into JSON.Chicalote
@Chicalote If you have other task also(like data conversion), then your solution is also ok. Regarding log4j vs logback, log4j development is stopped and logback is active. For more details about log4j vs logback have a look at this linkProceeds
R
3

Is using MongoDB for logging considered?

  1. MongoDB inserts can be done asynchronously. One wouldn’t want a user’s experience to grind to a halt if logging were slow, stalled or down. MongoDB provides the ability to fire off an insert into a log collection and not wait for a response code. (If one wants a response, one calls getLastError() — we would skip that here.)
  2. Old log data automatically LRU’s out. By using capped collections, we preallocate space for logs, and once it is full, the log wraps and reuses the space specified. No risk of filling up a disk with excessive log information, and no need to write log archival / deletion scripts.
  3. It’s fast enough for the problem. First, MongoDB is very fast in general, fast enough for problems like this. Second, when using a capped collection, insertion order is automatically preserved: we don’t need to create an index on timestamp. This makes things even faster, and is important given that the logging use case has a very high number of writes compared to reads (opposite of most database problems).
  4. Document-oriented / JSON is a great format for log information. Very flexible and “schemaless” in the sense we can throw in an extra field any time we want.
Rola answered 10/6, 2013 at 7:7 Comment(1)
Well thanks for sharing this but I dont think we are going that route now.Chicalote
E
1

There is also log4j 2: http://logging.apache.org/log4j/2.x/manual/async.html Additionally read this article about why it is so fast: http://www.grobmeier.de/log4j-2-performance-close-to-insane-20072013.html#.UzwywI9Bow4

Erbes answered 2/4, 2014 at 15:55 Comment(0)
S
0

You can also try CoralLog to asynchronously log data using the disruptor pattern. That way you spend minimum time in the logger thread and all the hard work is passed to the thread doing the actual file I/O. It also provides Memory Mapped Files to speed up the consumer thread and reduce queue contention.

Disclaimer: I am one of the developers of CoralLog

Snuggery answered 8/7, 2014 at 11:52 Comment(0)
C
0

Disclaimer: I am the developer of elf4j-engine/elf4j-provider.

Take a look at elf4j-engine. Asynchronous is the only logging mode. Also supports JSON pattern format out of the box. Hope it addresses most of the "cautions" and usual use cases.

Good word of caution from Juned Ahsan already. Just to add a few...

  1. If you are using Executors.newSingleThreadExecutor(), the FIFO part is already taken care of by that single-thread pool (with a built-in FIFO task queue). Just make sure no more multiple-thread "fan out" after the thread pool takes over the work (or if you do further multi-threading, then all the caution applies).

  2. By using the async thread pool, the real expensive part of the logging is the operations you have to perform before handing over the log entry data to the thread-pool (for output like file writing). e.g. if you need the caller thread info, or the caller frame details such as calling method, line number, and calling file name - those are difficult to be done by the async thread from the pool, and the caller/application thread would need to do majority of the gathering work before the log data is handed over to the thread-pool. The data gathering may have to include walking the calling stack trace frames to extract the details, so it's a bit expensive but hard to get around. Asynchrony does help the output performance once all the data is gathered.

Crozier answered 3/4, 2023 at 16:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.