Searching huge log files
Asked Answered
I

4

9

Troubleshooting, analyzing & filtering log files is by far one of the most laborious daily jobs. My issue is searching through a log file, which could be well over 4 gigs in size. Simply loading the file takes up to 15 mins. I'm running a fairly fast processor with 8 gigs of memory. After the file loads, I literally only have the luxury of grep and/or control+F to scan through the file. This gets worse when I'm trying to look files from multiple systems each weighing over a gig. Have tried segregating the files based on time-stamps to make them smaller, but no joy really.

Is there a tool or even a process that I could use to make troubleshooting less time consuming (apart from the usual "just fix the bug first")?

Your comments are appreciated.

Initiative answered 28/10, 2010 at 2:42 Comment(5)
Take a look here baremetalsoft.com/index.phpBrout
What platform are you running on?Peroxidase
Why are the logs so big: is it because there are really a lot of transactions/events happening, or is an unnecessary level of detail being logged? Does the application have any support for adjusting the verbosity, and/or directing the log data from different components to different log files?Starlike
@David: I'd bet money that he's hunting for stack traces in a java app server log. Nothing else makes log files that big.Jamikajamil
@Jamikajamil That's my guess too. Hence I wonder if he could use log4j.properties (or similar) to separate the important stuff from the noise.Starlike
C
9

What are you loading it with? 4 gigs is a fairly large file, but that shouldn't take THAT long to load into memory.

For files that large, I would recommend using grep directly, and if grep isn't doing it for you, SED and AWK are your friends. If you want to do it in realtime, learn about using those tools in conjunction with pipes and tail -f.

Yes, I know, SED is very intimidating at first. It's also ridiculously powerful. Learn it.

If you're on windows, you have my sympathy. May I recommend a unix shell?

If you are afraid of the command line tools, consider learning Perl or Python. They're both quite good at sorting signal from noise in large files like this.

Chickie answered 28/10, 2010 at 2:56 Comment(1)
i would second that. please learn AWK & SED. then you can write a couple of scripts and life will be so very simple! :-)Adabelle
F
1

Baretail is a good tool to have. Give it a try. I haven't used it for 4 gigs files but my log files are also quite big and it works just fine. http://www.baremetalsoft.com/baretail/index.php

edit: I did not see that someone has already suggested baretail.

Fontenot answered 28/10, 2010 at 3:15 Comment(0)
P
1

If you want to exclude lines of things you don't want to see, you can grep -v 'I dont wanna see this' > logWithExcludedLines.log. You can use regex as well grep -vE 'asdf|fdsa' > logWithNoASDForFDSA.log

This method works very well with apache access logs grep -v 'HTTP/1.1 200' > no200s.log (or something like that, don't remember the exact string).

Penmanship answered 28/10, 2010 at 3:20 Comment(0)
P
0

I am currently doing such things using the unix command line tools (f)grep, awk, cut, join etc., which are available also for windows with cygwin or UnxUtils and so forth, and also use some Scala scripts for things that are more complicated. You can write scripts to do searches that span logfile entries in several files. But I am also wondering if there is something better than that - maybe importing them into a database (both being SO questions)?

By the way: have your harddisk replaced by a SSD drive. These are way faster! Also, it pays for me to leave the logs gzip-compressed on the disk, since when searching them the disk is the bottleneck. If you are searching for, say, a regular expression in the logfiles and want to have 100 lines of context for each occurrence, you'd do:

zcat *.log.gz | grep -100 '{regexp}' > {outputfile}

and load the outputfile into your favourite textfile viewer. If you are searching for fixed strings, use fgrep (same as grep with the additional option -F) - that's much faster.

Pectinate answered 8/12, 2010 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.