Unix uniq command to CSV file
Asked Answered
M

2

8

I have a text file (list.txt) containing single and multi-word English phrases. My goal is to do a word count for each word and write the results to a CSV file.

I have figured out the command to write the amount of unique instances of each word, sorted from largest to smallest. That command is:

$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | less > output.txt

The problem is the way the new file (output.txt) is formatted. There are 3 leading spaces, followed by the number of occurrences, followed by a space, followed by the word. Then on to a next line. Example:

   9784 the
   6368 and
   4211 for
   2929 to

What would I need to do in order to get the results in a more desired format, such as CSV? For example, I'd like it to be:

9784,the
6368,and
4211,for
2929,to

Even better would be:

the,9784
and,6368
for,4211
to,2929

Is there a way to do this with a Unix command, or do I need to do some post-processing within a text editor or Excel?

Monogamy answered 11/3, 2013 at 18:42 Comment(0)
H
8

Use awk as follows:

 > cat input 
   9784 the
   6368 and
   4211 for
   2929 to
 > cat input | awk '{ print $2 "," $1}'
the,9784
and,6368
for,4211
to,2929

You full pipeline will be:

$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2 "," $1}' > output.txt
Hardback answered 11/3, 2013 at 18:52 Comment(1)
That's very similar to what I came up with: $ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | awk '{ printf "%s,%s\n", $2, $1}' | less > output.txtMonogamy
F
0

use sed to replace the spaces with comma

cat extra_set.txt | sort -i | uniq -c |  sort -nr | sed 's/^ *//g' | sed 's/ /\, /'
Forever answered 9/3, 2022 at 15:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.