sort unique urls from log

Asked 17/11, 2011 at 16:6 Answered 5/6, 2018 at 11:17

I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file

I executed this command:

cat access.log | awk '{print $7}' > url.txt

then only get the unique one and sort them:

cat url.txt | uniq | sort > urls.txt

The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?

Purport answered 17/11, 2011 at 16:6 Comment(0)

uniq | sort does not work: uniq removes contiguous duplicates.

The correct way is sort | uniq or better sort -u. Because only one process is spawned.

Tolman answered 17/11, 2011 at 16:8 Comment(0)

uniq needs its input sorted, but you sorted after uniq. Try:

$ sort -u < url.txt > urls.txt

Tombac answered 17/11, 2011 at 16:8 Comment(0)

Try something like this:

cat url.txt | sort | uniq

Reciprocity answered 17/11, 2011 at 16:10 Comment(0)

For nginx access logs, this gives the unique URLs being called:

 sed -r "s/.*(GET|POST|PUT|DELETE|HEAD) (.*?) HTTP.*/\2/" /var/log/nginx/access.log | sort | uniq -u

Cletis answered 5/6, 2018 at 11:17 Comment(0)