sort unique urls from log
Asked Answered
P

4

19

I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file

I executed this command:

cat access.log | awk '{print $7}' > url.txt

then only get the unique one and sort them:

cat url.txt | uniq | sort > urls.txt

The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?

Purport answered 17/11, 2011 at 16:6 Comment(0)
T
26

uniq | sort does not work: uniq removes contiguous duplicates.

The correct way is sort | uniq or better sort -u. Because only one process is spawned.

Tolman answered 17/11, 2011 at 16:8 Comment(0)
T
5

uniq needs its input sorted, but you sorted after uniq. Try:

$ sort -u < url.txt > urls.txt
Tombac answered 17/11, 2011 at 16:8 Comment(0)
R
3

Try something like this:

cat url.txt | sort | uniq
Reciprocity answered 17/11, 2011 at 16:10 Comment(0)
C
0

For nginx access logs, this gives the unique URLs being called:

 sed -r "s/.*(GET|POST|PUT|DELETE|HEAD) (.*?) HTTP.*/\2/" /var/log/nginx/access.log | sort | uniq -u

Reference: https://www.guyrutenberg.com/2008/08/10/generating-url-list-from-access-log-access_log/

Cletis answered 5/6, 2018 at 11:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.