Awk: Characters-frequency from one text file?
Asked Answered
B

1

1

Given a multilangual .txt files such as:

But where is Esope the holly Bastard
But where is 생 지 옥 이 군
지 옥 이
지 옥
지
我 是 你 的 爸 爸 !
爸 爸 ! ! !
你 不 會 的 !

I counted space-separated words' word-frequency using this Awk function :

$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" myfile.txt | sort

Getting the elegant :

1 생
1 군
1 Bastard
1 Esope
1 holly
1 the
1 不
1 我
1 是
1 會
2 이
2 But
2 is
2 where
2 你
2 的
3 옥
4 지
4 爸
5 !

How to change it to count characters-frequency ?


EDIT: For Characters-frequency, I used (@Sudo_O's answer):

$ grep -o '\S' myfile.txt | awk '{a[$1]++}END{for(k in a)print a[k],k}' | sort > myoutput.txt

For word-frequency, use:

$ grep -o '\w*' myfile.txt | awk '{a[$1]++}END{for(k in a)print a[k],k}' | sort > myoutput.txt
Betide answered 24/3, 2013 at 17:57 Comment(0)
C
3

One method:

$ grep -o '\S' file | awk '{a[$1]++}END{for(k in a)print a[k],k}' 
3 옥
4 h
2 u
2 i
3 B
5 !
2 w
4 爸
1 군
4 지
1 y
2 l
1 E
1 會
2 你
1 是
2 a
1 不
2 이
2 o
1 p
2 的
1 d
1 생
3 r
6 e
4 s
1 我
4 t

Use redirection to save the output to a file:

$ grep -o '\S' file | awk '{a[$1]++}END{for(k in a)print a[k],k}' > output

And for sorted output:

$ grep -o '\S' file | awk '{a[$1]++}END{for(k in a)print a[k],k}' | sort > output
Console answered 24/3, 2013 at 18:3 Comment(3)
Thanks! Happy YOU answered!Betide
Funny, both $ grep -o . file and $ grep -o '\S' file works. Are them both correct ?Betide
@Betide good spot. No it's not correct.. originally I posted grep -o . but the would match the whitespace so I change it to grep -o '\S' where \S is the regexp shorthand that matches any non-whitespace characters.Console

© 2022 - 2024 — McMap. All rights reserved.