I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here is the awk script:
#!/bin/bash awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1
EDIT:
For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date
recognizes and calls it to represent the date as the number of seconds since 1970, finally returning it as a line of a .csv file, along with the IP.
Example input: 189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
Returned output: 189.5.56.113,124237889