Length of string in bash
Asked Answered
B

11

607

How do you get the length of a string stored in a variable and assign that to another variable?

myvar="some string"
echo ${#myvar}  
# 11

How do you set another variable to the output 11?

Benitobenjamen answered 28/6, 2013 at 15:14 Comment(0)
E
376

UTF-8 string length

By using wc

by using wc, you could (from man bc):

   -c, --bytes
          print the byte counts

   -m, --chars
          print the character counts

So you could under :

echo -n Généralité | wc -c
 13
echo -n Généralité | wc -m
 10
echo -n Généralité | wc -cm
 10      13
for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strlens=$(echo -n "$string"|wc -mc)
    chrs=$((${strlens% *}))
    byts=$((${strlens#*$chrs }))
    printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
        $(( 14 + $byts - $chrs )) "$string" $chrs $byts
done
 - Généralités    is 11 chars length, but uses 14 bytes
 - Language       is  8 chars length, but uses  8 bytes
 - Théorème       is  8 chars length, but uses 10 bytes
 - Février        is  7 chars length, but uses  8 bytes
 - Left: ←        is  7 chars length, but uses  9 bytes
 - Yin Yang ☯     is 10 chars length, but uses 12 bytes

See further, at Useful printf correction tool, for explanation about this syntax.

Under , you could split wc's ouput directly:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    read -r chrs byts < <(wc -mc <<<"$string")
    printf " - %-$((14+$byts-chrs))s is %2d chars length, but uses %2d bytes\n" \
        "$string" $((chrs-1)) $((byts-1))
done

But having to fork to wc for each strings could consume a lot of system resources, I prefer to use the pure bash way! Have a look at bottom of this answer to know why!!

By using pure

The first idea I had was to change locales environment to force bash to consider each characters as bytes:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

So function could be:

strU8DiffLen() {
    local chLen=${#1} LANG=C LC_ALL=C
    return $((${#1}-chLen))
}

But surprisingly, this is not the quickest way:

Same, but without having to play with locales

I recently learn %n format of printf command (builtin):

myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
  • printf -v _ tell printf to store result into variable _ instead of ouptut them on STDOUT.
  • _ is a garbage variable in this use.
  • %n tell printf to store byte count of already processed string into variable name at corresponding place in arguments.

Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C.)

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
    local -i chrlen=${#1} bytlen
    printf -v _ %s%n "$1" bytlen
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful printf correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    printf " - %-14s is %2d char length\n" "'$string'"  ${#string}
done
 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ←'    is  7 char length
 - 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
    local -i bytlen
    printf -v _ %s%n "$1" bytlen
    return $(( bytlen - ${#1} ))
}

or written in one line:

strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strU8DiffLen "$string"
    printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
        $((14+$?)) "'$string'" ${#string} $((${#string}+$?))
  done 
 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ←'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ☯'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

Comparison: fork to wc vs pure :

Making a little loop of 1'000 String length inquiries:

string="Généralité"
time for i in {1..1000};do strlens=$(echo -n "$string"|wc -mc);done;echo $strlens
real    0m2.637s
user    0m2.256s
sys 0m0.906s
10 13
string="Généralité"
time for i in {1..1000};do printf -v _ %s%n "$string" bytlen;chrlen=${#string};done;echo $chrlen $bytlen
real    0m0.005s
user    0m0.005s
sys 0m0.000s
10 13

Hopefully result (10 13) is same, but execution time differ a lot, something like 500x quicker using pure bash!!

Eda answered 23/6, 2015 at 17:50 Comment(20)
I appreciate this answer, as file systems impose name limitations in bytes and not characters.Motteo
You may also need to set LC_ALL=C and perhaps others.Surmullet
@IsabellCowan In wich case? I think no! You could prefer to use LC_ALL but if not used, this is not needed. But no other variable have to be used.Eda
@F.Hauri try this code: /usr/bin/env -i LC_ALL=en_US.utf8 LANG=C bash -c 'v=€; echo ${#v}' LC_ALL might be unset by default on your system, but it is not on mine.Surmullet
@IsabellCowan Yes, see man 7 locale, LC_ALL have precedence over all others. It's the reason I follow Debian rules, having LC_ALL= somewhere and change LANG only, by default (It could be very usefull to be able to just change LC_CTIME or LC_NUMERIC)..Eda
@F.Hauri But, it none the less follows that on some systems your solution will not work, because it leaves LC_ALL alone. It might work fine on default installs of Debian and it's derivatives, but on others (like Arch Linux) it will fail to give the correct byte length of the string.Surmullet
it didn't work for me and i couldn't find out why, i successed using iconv like this: STR=$(printf "$1" | iconv -f UTF-8 -t ISO-8859-15), and then ${#STR} worked wellGross
@F.Hauri GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) I don't have the admin rights on the server, i tried the examples you gave and i always got the byte length. I'm trying this from a .sh file encoding in UTF-8..Gross
thanks for taking something simple and convoluting it :)Genipap
Just to note that UTF8 is a variable width encoding from 1 to 6 bytes cf. other encodings i.e. UTF16 which is a fixed width 2 byte per character.Periphrastic
A UTF8 encoded Oracle DB instance allows nvarchar2(4000) data types (4000 bytes, each character stored on 1 to 6 bytes) whereas a UTF16 encoded instance only allows for nvarchar2(2000) data types (4000 bytes, 2 bytes per character). Ex. UTF8 string truncation depends on the number of bytes required to store the data which is not necessarily (and most often not the case when dealing with internationalised software) equal to the number of characters.Periphrastic
@Periphrastic Yes and will require 3 bytes, where é and ô require 2 bytes and a or z only 1 byte...Eda
@Genipap I'm sorry, 對不起 Sometime simple is just an idea.Eda
@F.Hauri correct for UTF8. Encoded in UTF16 each character ("☯", "←", "é", "ô", "a" and "z") is encoded with a fixed 2 bytes. If assuming that all text is ASCII then any mention of UTF8 is "good to know" but not necessary for say as it's 8-bit ASCII and the code points are identical in UTF8. Having taken the time to delve into encodings then it's worth while imho to note that the byte count is encoding dependent and there exists a plethora of different encodings.Periphrastic
@Genipap previous chinese post warn me about another problem. see posted test script about limitation (bug) of this: diffU8test.sh.txt or diffU8test.shEda
You can't necessarily guarantee that the default locale is UTF-8. To make sure you get character length rather than byte length, you may want to set LC_ALL=C.UTF-8 and LANG=C.UTF-8.Bower
@nyuszika7h You're right, anyway, mostly my strU8DiffLen will return correct difference. In case current session usr Latin encoding, strU8DiffLen will return 0 (alway) wich will be correct too.Eda
It's worth to mention that the function strU8DiffLen will fail if $(( bytlen - ${#1} )) is greater than 255. Why not just printf the result and call the function inside a sub-shell? Related: gnu.org/software/bash/manual/html_node/Exit-Status.htmlDeclare
@F8ER In order to prevent forks. For sample: Trying to replace return by echo, adding OFF=$(strU8DiffLen....) and replacing ? by OFF in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!)Eda
@F8ER If you mind using return, you could replace them by printf -v ${2:-OFF} %d $(( bytlen - ${#1} )), then use $OFF or any other variable by specifying his name as second argument.Eda
M
659

To get the length of a string stored in a variable, say:

myvar="some string"
size=${#myvar} 

To confirm it was properly saved, echo it:

$ echo "$size"
11
Mindamindanao answered 28/6, 2013 at 15:15 Comment(4)
With UTF-8 stings, you could have a string length and a bytes length. see my answerEda
You can also use it directly in other parameter expansions - for example in this test I check that $rulename starts with the $RULE_PREFIX prefix: [ "${rulename:0:${#RULE_PREFIX}}" == "$RULE_PREFIX" ]Evert
Could you please explain a bit the expressions of #myvar and {#myvar}?Valenta
@lerneradams see Bash reference manual →3.5.3 Shell Parameter Expansion on ${#parameter}: The length in characters of the expanded value of parameter is substituted.Mindamindanao
E
376

UTF-8 string length

By using wc

by using wc, you could (from man bc):

   -c, --bytes
          print the byte counts

   -m, --chars
          print the character counts

So you could under :

echo -n Généralité | wc -c
 13
echo -n Généralité | wc -m
 10
echo -n Généralité | wc -cm
 10      13
for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strlens=$(echo -n "$string"|wc -mc)
    chrs=$((${strlens% *}))
    byts=$((${strlens#*$chrs }))
    printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
        $(( 14 + $byts - $chrs )) "$string" $chrs $byts
done
 - Généralités    is 11 chars length, but uses 14 bytes
 - Language       is  8 chars length, but uses  8 bytes
 - Théorème       is  8 chars length, but uses 10 bytes
 - Février        is  7 chars length, but uses  8 bytes
 - Left: ←        is  7 chars length, but uses  9 bytes
 - Yin Yang ☯     is 10 chars length, but uses 12 bytes

See further, at Useful printf correction tool, for explanation about this syntax.

Under , you could split wc's ouput directly:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    read -r chrs byts < <(wc -mc <<<"$string")
    printf " - %-$((14+$byts-chrs))s is %2d chars length, but uses %2d bytes\n" \
        "$string" $((chrs-1)) $((byts-1))
done

But having to fork to wc for each strings could consume a lot of system resources, I prefer to use the pure bash way! Have a look at bottom of this answer to know why!!

By using pure

The first idea I had was to change locales environment to force bash to consider each characters as bytes:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

So function could be:

strU8DiffLen() {
    local chLen=${#1} LANG=C LC_ALL=C
    return $((${#1}-chLen))
}

But surprisingly, this is not the quickest way:

Same, but without having to play with locales

I recently learn %n format of printf command (builtin):

myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
  • printf -v _ tell printf to store result into variable _ instead of ouptut them on STDOUT.
  • _ is a garbage variable in this use.
  • %n tell printf to store byte count of already processed string into variable name at corresponding place in arguments.

Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C.)

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
    local -i chrlen=${#1} bytlen
    printf -v _ %s%n "$1" bytlen
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful printf correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    printf " - %-14s is %2d char length\n" "'$string'"  ${#string}
done
 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ←'    is  7 char length
 - 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
    local -i bytlen
    printf -v _ %s%n "$1" bytlen
    return $(( bytlen - ${#1} ))
}

or written in one line:

strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strU8DiffLen "$string"
    printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
        $((14+$?)) "'$string'" ${#string} $((${#string}+$?))
  done 
 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ←'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ☯'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

Comparison: fork to wc vs pure :

Making a little loop of 1'000 String length inquiries:

string="Généralité"
time for i in {1..1000};do strlens=$(echo -n "$string"|wc -mc);done;echo $strlens
real    0m2.637s
user    0m2.256s
sys 0m0.906s
10 13
string="Généralité"
time for i in {1..1000};do printf -v _ %s%n "$string" bytlen;chrlen=${#string};done;echo $chrlen $bytlen
real    0m0.005s
user    0m0.005s
sys 0m0.000s
10 13

Hopefully result (10 13) is same, but execution time differ a lot, something like 500x quicker using pure bash!!

Eda answered 23/6, 2015 at 17:50 Comment(20)
I appreciate this answer, as file systems impose name limitations in bytes and not characters.Motteo
You may also need to set LC_ALL=C and perhaps others.Surmullet
@IsabellCowan In wich case? I think no! You could prefer to use LC_ALL but if not used, this is not needed. But no other variable have to be used.Eda
@F.Hauri try this code: /usr/bin/env -i LC_ALL=en_US.utf8 LANG=C bash -c 'v=€; echo ${#v}' LC_ALL might be unset by default on your system, but it is not on mine.Surmullet
@IsabellCowan Yes, see man 7 locale, LC_ALL have precedence over all others. It's the reason I follow Debian rules, having LC_ALL= somewhere and change LANG only, by default (It could be very usefull to be able to just change LC_CTIME or LC_NUMERIC)..Eda
@F.Hauri But, it none the less follows that on some systems your solution will not work, because it leaves LC_ALL alone. It might work fine on default installs of Debian and it's derivatives, but on others (like Arch Linux) it will fail to give the correct byte length of the string.Surmullet
it didn't work for me and i couldn't find out why, i successed using iconv like this: STR=$(printf "$1" | iconv -f UTF-8 -t ISO-8859-15), and then ${#STR} worked wellGross
@F.Hauri GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) I don't have the admin rights on the server, i tried the examples you gave and i always got the byte length. I'm trying this from a .sh file encoding in UTF-8..Gross
thanks for taking something simple and convoluting it :)Genipap
Just to note that UTF8 is a variable width encoding from 1 to 6 bytes cf. other encodings i.e. UTF16 which is a fixed width 2 byte per character.Periphrastic
A UTF8 encoded Oracle DB instance allows nvarchar2(4000) data types (4000 bytes, each character stored on 1 to 6 bytes) whereas a UTF16 encoded instance only allows for nvarchar2(2000) data types (4000 bytes, 2 bytes per character). Ex. UTF8 string truncation depends on the number of bytes required to store the data which is not necessarily (and most often not the case when dealing with internationalised software) equal to the number of characters.Periphrastic
@Periphrastic Yes and will require 3 bytes, where é and ô require 2 bytes and a or z only 1 byte...Eda
@Genipap I'm sorry, 對不起 Sometime simple is just an idea.Eda
@F.Hauri correct for UTF8. Encoded in UTF16 each character ("☯", "←", "é", "ô", "a" and "z") is encoded with a fixed 2 bytes. If assuming that all text is ASCII then any mention of UTF8 is "good to know" but not necessary for say as it's 8-bit ASCII and the code points are identical in UTF8. Having taken the time to delve into encodings then it's worth while imho to note that the byte count is encoding dependent and there exists a plethora of different encodings.Periphrastic
@Genipap previous chinese post warn me about another problem. see posted test script about limitation (bug) of this: diffU8test.sh.txt or diffU8test.shEda
You can't necessarily guarantee that the default locale is UTF-8. To make sure you get character length rather than byte length, you may want to set LC_ALL=C.UTF-8 and LANG=C.UTF-8.Bower
@nyuszika7h You're right, anyway, mostly my strU8DiffLen will return correct difference. In case current session usr Latin encoding, strU8DiffLen will return 0 (alway) wich will be correct too.Eda
It's worth to mention that the function strU8DiffLen will fail if $(( bytlen - ${#1} )) is greater than 255. Why not just printf the result and call the function inside a sub-shell? Related: gnu.org/software/bash/manual/html_node/Exit-Status.htmlDeclare
@F8ER In order to prevent forks. For sample: Trying to replace return by echo, adding OFF=$(strU8DiffLen....) and replacing ? by OFF in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!)Eda
@F8ER If you mind using return, you could replace them by printf -v ${2:-OFF} %d $(( bytlen - ${#1} )), then use $OFF or any other variable by specifying his name as second argument.Eda
M
46

I wanted the simplest case, finally this is a result:

echo -n 'Tell me the length of this sentence.' | wc -m;
36
Mellen answered 11/10, 2017 at 8:52 Comment(2)
sorry mate :( This is bash... the cursed hammer that sees everything as a nail, particularly your thumb. 'Tell me the length of this sentence.' contains 36 characters. echo '' | wc -m => 1. You'd need to use -n: echo -n '' | wc -m => 0... in which case it's a good solution :)Benitobenjamen
Thanks for the correction! Manual page says: -n do not output the trailing newlineMellen
S
27

You can use:

MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)
  • wc -c or wc --bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.
  • wc -m or wc --chars for character counts = Unicode characters are counted single until they use more bytes.
Selfliquidating answered 9/5, 2015 at 3:27 Comment(4)
-c is for bytes. -m is for chars. gnu.org/software/coreutils/manual/html_node/wc-invocation.html pubs.opengroup.org/onlinepubs/009604499/utilities/wc.htmlConstant
Seriously? a pipe, a subshell and an external command for something that trivial?Bobsled
this handles something like mylen=$(printf "%s" "$HOME/.ssh" | wc -c) whereas the accepted solution fails and you need to myvar=$HOME/.ssh first.Movie
This isn’t any better than ${#var}. You still need LC_ALL / LANG set to an UTF-8 locale, otherwise -m will return byte count.Bower
S
23

In response to the post starting:

If you want to use this with command line or function arguments...

with the code:

size=${#1}

There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:

if [ -z "$1" ]; then
    #zero length argument 
else
    #non-zero length
fi

See GNU and wooledge for a more complete list of Bash conditional expressions.

Smashandgrab answered 3/6, 2017 at 9:47 Comment(0)
C
19

If you want to use this with command line or function arguments, make sure you use size=${#1} instead of size=${#$1}. The second one may be more instinctual but is incorrect syntax.

Cypher answered 5/6, 2014 at 20:11 Comment(4)
Part of the problem with "you can't do <invalid syntax>" is that, that syntax being invalid, it's unclear what a reader should interpret it to mean. size=${#1} is certainly valid.Skater
Well, that's unexpected. I didn't know that #1 was a substitute for $1 in this case.Cypher
It isn't. # isn't replacing the $ -- the $ outside the braces is still the expansion operator. The # is the length operator, as always.Skater
I've fixed this answer since it is a useful tip but not an exception to the rule - it follows the rule exactly, as pointed out by @CharlesDuffyBloodline
G
18

Using your example provided

#KISS (Keep it simple stupid)
size=${#myvar}
echo $size
Genipap answered 6/11, 2018 at 16:46 Comment(1)
@Angel The question was about setting a variable to the output of the length command, and this question answers that.Celaeno
F
15

Here is couple of ways to calculate length of variable :

echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'

and to set the result in another variable just assign above command with back quote into another variable as following:

otherVar=`echo -n $VAR | wc -m`   
echo $otherVar

http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html

Fredricfredrick answered 5/10, 2017 at 18:20 Comment(0)
S
2

I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var} combination, but it fails with unicode: most text I process with the bash is in Cyrillic... Based on @atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:

$ cat draw_border.sh
#!/bin/sh
#based on https://mcmap.net/q/40690/-length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

And what this sample produces:

$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

First example (in French?) was taken from someone's example above. Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.

I used echo $BPAR|wc -m instead of printf ... in order to not rely on if the printf is buillt-in or not.

Above I saw talks about trailing newline and -n parameter for echo. I did not used it, thus I add only one to the $BPLEN. Should I use -n, I must add 2.

To explain the difference between wc -m and wc -c, see the same script with only one minor change: -m was replaced with -c

$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message. Hope, it will save some one some time :-)

p.s. Russian text says "here is one more"

p.p.s. Working "two-liner"

#!/bin/sh
#based on https://mcmap.net/q/40690/-length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command

Skied answered 27/8, 2021 at 4:1 Comment(3)
Shorter and a lot quicker: border() { local sline;printf -v sline '%*s' ${#1} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$1" ${sline// /-};} - Have a look at bottom of my answer : upto 500x faster!'... Or even try this: border() { local line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;} then border 'généralité' 'А вот еще одна /usr/bin/lesspipe' 'pure ENGLISH'Eda
About ${#var}, take care of $LC_ALL and your locales configuration: compare: border() { local LC_ALL=C.UTF8 line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;} with border() { local LC_ALL=C line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;} ... (depending on your configuration, you may have to replace C.UTF8 by ru_RU.UTF-8, zh_CN.UTF-8 or else... Try: grep '^[^#]*UTF' /etc/locale.gen)Eda
... And, still abount ${#var}, the only shell that won't consider U8 characters correctly is dash, U could test: for shell in dash bash busybox\ sh zsh ;do echo -n $shell : \ ;$shell -c 'echo ${#1}' -- 'Généralité';done.Eda
P
2

Maybe just use wc -c to count the number of characters:

myvar="Hello, I am a string."
echo -n $myvar | wc -c

Result:

21
Predestination answered 29/3, 2022 at 11:1 Comment(0)
C
0

Length of string in bash

str="Welcome to Stackoveflow"  
length=`expr length "$str"`  
  
echo "Length of '$str' is $length"

OUTPUT

Length of 'Welcome to Stackoveflow' is 23

Calculable answered 20/8, 2022 at 7:46 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.