Spring Batch: How to process multi-line log files
Asked Answered
R

2

3

I am trying to import the contents of a log file into a database using Spring Batch.

I am currently using a FlatFileItemReader, but there are unfortunately many log entries that doesn't catch. The two main problems are:

  1. Lines that contain multi-line JSON Strings:

    2012-03-22 11:47:35,307  DEBUG main someMethod(SomeClass.java:56): Do Something(18,true,null,null,null): my.json = '{
        "Foo":"FooValue",
        "Bar":"BarValue",
        ... etc
    }'
    
  2. Lines that contain stack traces

    2012-03-22 11:47:50,596  ERROR main com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:859): No route to host
    java.net.NoRouteToHostException: No route to host
            at sun.nio.ch.Net.connect0(Native Method)
            at sun.nio.ch.Net.connect(Net.java:364)
            at sun.nio.ch.Net.connect(Net.java:356)
            at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
            at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
            at com.meetup.memcached.SockIOPool$SockIO.getSocket(SockIOPool.java:1703)
            at com.meetup.memcached.SockIOPool$SockIO.<init>(SockIOPool.java:1674)
            at com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:850)
            at com.meetup.memcached.SockIOPool.populateBuckets(SockIOPool.java:737)
            at com.meetup.memcached.SockIOPool.initialize(SockIOPool.java:695)
    

Basically, I need the FlatFileItemReader to keep reading until it reaches the next timestamp, while aggregating all the lines before that. Has any such thing been done before (in Spring Batch)

Rockabilly answered 30/3, 2012 at 8:56 Comment(0)
T
2

There's now an FAQ in the Spring Batch documentation addressing this use case.

Trachyte answered 21/1, 2015 at 15:3 Comment(0)
R
1

The solution was to write a custom reader that backtracks the last several lines and looks for a specific pattern that marks valid line starts. I did not find anything pre-made in Spring Batch, but I could reuse a lot of existing code. The solution is proprietary, so I can't post it here, sorry, but this is how it works:

  1. Keep a LinkedList of Lines. LinkedList is important, because we'll access it both as a List and as a Queue.
  2. In your read method, start a loop: read the next line and write it to your queue. Check your queue to see if you have two valid lines in there (you'll need list access here). If you do, return all lines before the second valid line (and remove them from the queue). If you don't find any valid line, return null.

Needless to say, this solution is noticably slower than the built-in FlatFileItemReader, but it gets the correct data.

Rockabilly answered 30/3, 2012 at 11:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.