I have a file with an identifier and a value:
ABC123 111111
ABC123 111111
ABCDEF 333333
ABCDEF 111111
CCCCCC 333333
ABC123 222222
DEF123 444444
DEF123 444444
Both columns contain duplicate values, but I need to count lines which have the same ID (first column) and a unique value (second column). This would make the output from the above input:
ABCDEF 2
ABC123 2
DEF123 1
CCCCCC 1
...where the first column is the ID and the second column is the count of unique values in the second column. In other words, I need to find out how many unique values exist for a given ID.
The closest I've come is this but all it does is count the first column's unique values:
cut -d " " -f1 "file.txt" | uniq -cd | sort -nr | head
How would I do something like this in Bash?
sort PHA-DC.txt | cut -d " " -f1 | uniq …
– AabergDEF123
does not have 2 distinct values, it occurs with444444
twice. – FragmentalDEF123 2
not in output now? – Confluence