wc -c
appears to only do a dumb bytecount, not interpret actual characters with regard for encoding.
How can I get the actual character count?
wc -c
appears to only do a dumb bytecount, not interpret actual characters with regard for encoding.
How can I get the actual character count?
Use -m
or --chars
option.
For example (text
file contains two Korean characters and newline):
falsetru@jmlee12:~$ cat text
안녕
falsetru@jmlee12:~$ wc -c text
7 text
falsetru@jmlee12:~$ wc -m text
3 text
According to wc(1)
:
-c, --bytes print the byte counts -m, --chars print the character counts
Don't confuse chars, char
s and bytes. A byte is 8 bits long, and -c
counts bytes in your file whatever you put in. A char
in many programming languages is also 8 bits long this is why counting bytes uses -c
! If you want to count how many characters (chars) of a given alphabet you have in a file, then you need to specify in some way which encoding of chars have been used, and sometimes, that encoding uses more than a byte for a char. Read the manual for wc
, it will tell you that -m
will use you current locale (roughly your language/charset preferences) to decode the file and count your chars.
© 2022 - 2024 — McMap. All rights reserved.
wc -c
gives you. Feel free to roll back the edit if that's not what you meant... – Babblement