How to print only the unique lines in BASH?
Asked Answered
C

3

59

How can I print only those lines that appear exactly once in a file? E.g., given this file:

mountain
forest
mountain
eagle

The output would be this, because the line mountain appears twice:

forest
eagle
  • The lines can be sorted, if necessary.
Convergence answered 19/5, 2014 at 14:37 Comment(2)
I think you can use dictionary. You can have a look on this link: #1494678Coliseum
Does this answer your question? Find unique linesBradski
E
22

Using awk:

awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file
eagle
forest
Emmieemmit answered 19/5, 2014 at 14:41 Comment(14)
No need of going so complex. simple uniq command will do the job as well.Treed
1. Its not complex and 2. It avoids expensive sort for larger files.Emmieemmit
@Emmieemmit Nice awk. +1. But for it is really simpler to use uniq. And keeping in the memory larger files - who knows - what is more expensive. Swapping or sorting. :)Metrify
@Emmieemmit just tested on 300k lines. This awk solution is 8 times faster than sort|uniq.Metrify
@jm666: Thanks so much for running the test and verifying that awk command is faster than sort|uniq.Emmieemmit
Since we are iterating, we can quickly check and print only those which is seen just once. awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file but +1 none the less.Mervinmerwin
Yes sure that can also be done, I just chose delete to free up some memory not sure how much will that help :)Emmieemmit
@Emmieemmit Thats a valid point, but as the solution is right now, it will probably get confused when the number of dups are in odd numbers. For example, if you add another mountain row, it will print it as well.Mervinmerwin
@jaypal: Ah that's very important point. I updated as you suggested, many thanks!Emmieemmit
@Emmieemmit Thanks for the edit and you're always welcome. :)Mervinmerwin
@jm666 I tried with my .xsession-errors.old file (129315 lines), and the sort | uniq solution is 5 times faster than this awk solution...Beret
@Beret sort also has added benefit of writing the cache to disk if memory is not available. awk does not have that benefit.Mervinmerwin
I created a 803200 lines text file. My awk command took: 1.946s whereas sort|uniq took 3.188s on my OSX.Emmieemmit
my OS X is probably slow on IO, because i did: gsort -uR /usr/share/dict/* > words.txt (the gsort is the GNU version of sort - for getting randomly ordered file) - got 312123 lines. And tested both commands: time sort words.txt | uniq -u >/dev/null (got: 8.4 secs) and time awk .... words.txt >/dev/null got: 1.3 secs. So, for me (repeated few times) the awk is (nearly) 8 times faster than sort.Metrify
R
128

Use sort and uniq:

sort inputfile | uniq -u

The -u option would cause uniq to print only unique lines. Quoting from man uniq:

   -u, --unique
          only print unique lines

For your input, it'd produce:

eagle
forest

Obs: Remember to sort before uniq -u because uniq operates on adjacent lines. So what uniq -u actually does is to print lines that don't have identical neighbor lines, but that doesn't mean they are really unique. When you sort, all the identical lines get grouped together and only the lines that are really unique in the file will remain after uniq -u.

Richelieu answered 19/5, 2014 at 14:42 Comment(5)
@jordan Don't know. Somebody didn't like it, perhaps.Richelieu
@anubhava Did you try it?Richelieu
Apologies I missed -u in copy/paste.Emmieemmit
I like simple answer. A +1 for that simplicity.Treed
Just a note. If someone is here trying to get unique lines with many columns then please refer to this question: #30895896Buckshot
E
22

Using awk:

awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file
eagle
forest
Emmieemmit answered 19/5, 2014 at 14:41 Comment(14)
No need of going so complex. simple uniq command will do the job as well.Treed
1. Its not complex and 2. It avoids expensive sort for larger files.Emmieemmit
@Emmieemmit Nice awk. +1. But for it is really simpler to use uniq. And keeping in the memory larger files - who knows - what is more expensive. Swapping or sorting. :)Metrify
@Emmieemmit just tested on 300k lines. This awk solution is 8 times faster than sort|uniq.Metrify
@jm666: Thanks so much for running the test and verifying that awk command is faster than sort|uniq.Emmieemmit
Since we are iterating, we can quickly check and print only those which is seen just once. awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file but +1 none the less.Mervinmerwin
Yes sure that can also be done, I just chose delete to free up some memory not sure how much will that help :)Emmieemmit
@Emmieemmit Thats a valid point, but as the solution is right now, it will probably get confused when the number of dups are in odd numbers. For example, if you add another mountain row, it will print it as well.Mervinmerwin
@jaypal: Ah that's very important point. I updated as you suggested, many thanks!Emmieemmit
@Emmieemmit Thanks for the edit and you're always welcome. :)Mervinmerwin
@jm666 I tried with my .xsession-errors.old file (129315 lines), and the sort | uniq solution is 5 times faster than this awk solution...Beret
@Beret sort also has added benefit of writing the cache to disk if memory is not available. awk does not have that benefit.Mervinmerwin
I created a 803200 lines text file. My awk command took: 1.946s whereas sort|uniq took 3.188s on my OSX.Emmieemmit
my OS X is probably slow on IO, because i did: gsort -uR /usr/share/dict/* > words.txt (the gsort is the GNU version of sort - for getting randomly ordered file) - got 312123 lines. And tested both commands: time sort words.txt | uniq -u >/dev/null (got: 8.4 secs) and time awk .... words.txt >/dev/null got: 1.3 secs. So, for me (repeated few times) the awk is (nearly) 8 times faster than sort.Metrify
J
10

You almost had the answer in your question:

sort filename | uniq -u

Jittery answered 19/5, 2014 at 14:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.