How do you get the length of a string stored in a variable and assign that to another variable?
myvar="some string"
echo ${#myvar}
# 11
How do you set another variable to the output 11
?
How do you get the length of a string stored in a variable and assign that to another variable?
myvar="some string"
echo ${#myvar}
# 11
How do you set another variable to the output 11
?
wc
by using wc
, you could (from man bc
):
-c, --bytes print the byte counts -m, --chars print the character counts
So you could under posix shell:
echo -n Généralité | wc -c
13
echo -n Généralité | wc -m
10
echo -n Généralité | wc -cm
10 13
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strlens=$(echo -n "$string"|wc -mc)
chrs=$((${strlens% *}))
byts=$((${strlens#*$chrs }))
printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
$(( 14 + $byts - $chrs )) "$string" $chrs $byts
done
- Généralités is 11 chars length, but uses 14 bytes
- Language is 8 chars length, but uses 8 bytes
- Théorème is 8 chars length, but uses 10 bytes
- Février is 7 chars length, but uses 8 bytes
- Left: ← is 7 chars length, but uses 9 bytes
- Yin Yang ☯ is 10 chars length, but uses 12 bytes
See further, at Useful printf correction tool, for explanation about this syntax.
wc
's ouput directly:for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
read -r chrs byts < <(wc -mc <<<"$string")
printf " - %-$((14+$byts-chrs))s is %2d chars length, but uses %2d bytes\n" \
"$string" $((chrs-1)) $((byts-1))
done
But having to fork to wc
for each strings could consume a lot of system resources, I prefer to use the pure bash way! Have a look at bottom of this answer to know why!!
The first idea I had was to change locales environment to force bash to consider each characters as bytes:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
will render:
Généralités is 11 char len, but 14 bytes len.
you could even have a look at stored chars:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"
will answer:
Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').
Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL
along with $LANG
.
So function could be:
strU8DiffLen() {
local chLen=${#1} LANG=C LC_ALL=C
return $((${#1}-chLen))
}
But surprisingly, this is not the quickest way:
I recently learn %n
format of printf
command (builtin):
myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
printf -v _
tell printf to store result into variable _
instead of ouptut them on STDOUT
._
is a garbage variable in this use.%n
tell printf to store byte count of already processed string into variable name at corresponding place in arguments.Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen
is about 2 time quicker by using printf
than previous version using local LANG=C
.)
Argument work same as regular variables
showStrLen() {
local -i chrlen=${#1} bytlen
printf -v _ %s%n "$1" bytlen
LANG=$oLang LC_ALL=$oLcAll
printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}
will work as
showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'
printf
correction tool:If you:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
printf " - %-14s is %2d char length\n" "'$string'" ${#string}
done
- 'Généralités' is 11 char length
- 'Language' is 8 char length
- 'Théorème' is 8 char length
- 'Février' is 7 char length
- 'Left: ←' is 7 char length
- 'Yin Yang ☯' is 10 char length
Not really pretty output!
For this, here is a little function:
strU8DiffLen() {
local -i bytlen
printf -v _ %s%n "$1" bytlen
return $(( bytlen - ${#1} ))
}
or written in one line:
strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}
Then now:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strU8DiffLen "$string"
printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
$((14+$?)) "'$string'" ${#string} $((${#string}+$?))
done
- 'Généralités' is 11 chars length, but uses 14 bytes
- 'Language' is 8 chars length, but uses 8 bytes
- 'Théorème' is 8 chars length, but uses 10 bytes
- 'Février' is 7 chars length, but uses 8 bytes
- 'Left: ←' is 7 chars length, but uses 9 bytes
- 'Yin Yang ☯' is 10 chars length, but uses 12 bytes
But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...
Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.
wc
vs pure bash:Making a little loop of 1'000 String length inquiries:
string="Généralité"
time for i in {1..1000};do strlens=$(echo -n "$string"|wc -mc);done;echo $strlens
real 0m2.637s
user 0m2.256s
sys 0m0.906s
10 13
string="Généralité"
time for i in {1..1000};do printf -v _ %s%n "$string" bytlen;chrlen=${#string};done;echo $chrlen $bytlen
real 0m0.005s
user 0m0.005s
sys 0m0.000s
10 13
Hopefully result (10 13
) is same, but execution time differ a lot, something like 500x quicker using pure bash!!
LC_ALL
but if not used, this is not needed. But no other variable have to be used. –
Eda man 7 locale
, LC_ALL
have precedence over all others. It's the reason I follow Debian rules, having LC_ALL=
somewhere and change LANG
only, by default (It could be very usefull to be able to just change LC_CTIME
or LC_NUMERIC
).. –
Eda iconv
like this: STR=$(printf "$1" | iconv -f UTF-8 -t ISO-8859-15)
, and then ${#STR}
worked well –
Gross GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
I don't have the admin rights on the server, i tried the examples you gave and i always got the byte length. I'm trying this from a .sh
file encoding in UTF-8.. –
Gross ☯
and ←
will require 3 bytes, where é
and ô
require 2 bytes and a
or z
only 1 byte... –
Eda LC_ALL=C.UTF-8
and LANG=C.UTF-8
. –
Bower strU8DiffLen
will return correct difference. In case current session usr Latin encoding, strU8DiffLen
will return 0
(alway) wich will be correct too. –
Eda strU8DiffLen
will fail if $(( bytlen - ${#1} ))
is greater than 255. Why not just printf
the result and call the function inside a sub-shell
? Related: gnu.org/software/bash/manual/html_node/Exit-Status.html –
Declare return
by echo
, adding OFF=$(strU8DiffLen....)
and replacing ?
by OFF
in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!) –
Eda return
, you could replace them by printf -v ${2:-OFF} %d $(( bytlen - ${#1} ))
, then use $OFF
or any other variable by specifying his name as second argument. –
Eda To get the length of a string stored in a variable, say:
myvar="some string"
size=${#myvar}
To confirm it was properly saved, echo
it:
$ echo "$size"
11
$rulename
starts with the $RULE_PREFIX
prefix: [ "${rulename:0:${#RULE_PREFIX}}" == "$RULE_PREFIX" ]
–
Evert #myvar
and {#myvar}
? –
Valenta ${#parameter}
: The length in characters of the expanded value of parameter is substituted. –
Mindamindanao wc
by using wc
, you could (from man bc
):
-c, --bytes print the byte counts -m, --chars print the character counts
So you could under posix shell:
echo -n Généralité | wc -c
13
echo -n Généralité | wc -m
10
echo -n Généralité | wc -cm
10 13
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strlens=$(echo -n "$string"|wc -mc)
chrs=$((${strlens% *}))
byts=$((${strlens#*$chrs }))
printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
$(( 14 + $byts - $chrs )) "$string" $chrs $byts
done
- Généralités is 11 chars length, but uses 14 bytes
- Language is 8 chars length, but uses 8 bytes
- Théorème is 8 chars length, but uses 10 bytes
- Février is 7 chars length, but uses 8 bytes
- Left: ← is 7 chars length, but uses 9 bytes
- Yin Yang ☯ is 10 chars length, but uses 12 bytes
See further, at Useful printf correction tool, for explanation about this syntax.
wc
's ouput directly:for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
read -r chrs byts < <(wc -mc <<<"$string")
printf " - %-$((14+$byts-chrs))s is %2d chars length, but uses %2d bytes\n" \
"$string" $((chrs-1)) $((byts-1))
done
But having to fork to wc
for each strings could consume a lot of system resources, I prefer to use the pure bash way! Have a look at bottom of this answer to know why!!
The first idea I had was to change locales environment to force bash to consider each characters as bytes:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
will render:
Généralités is 11 char len, but 14 bytes len.
you could even have a look at stored chars:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"
will answer:
Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').
Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL
along with $LANG
.
So function could be:
strU8DiffLen() {
local chLen=${#1} LANG=C LC_ALL=C
return $((${#1}-chLen))
}
But surprisingly, this is not the quickest way:
I recently learn %n
format of printf
command (builtin):
myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
printf -v _
tell printf to store result into variable _
instead of ouptut them on STDOUT
._
is a garbage variable in this use.%n
tell printf to store byte count of already processed string into variable name at corresponding place in arguments.Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen
is about 2 time quicker by using printf
than previous version using local LANG=C
.)
Argument work same as regular variables
showStrLen() {
local -i chrlen=${#1} bytlen
printf -v _ %s%n "$1" bytlen
LANG=$oLang LC_ALL=$oLcAll
printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}
will work as
showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'
printf
correction tool:If you:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
printf " - %-14s is %2d char length\n" "'$string'" ${#string}
done
- 'Généralités' is 11 char length
- 'Language' is 8 char length
- 'Théorème' is 8 char length
- 'Février' is 7 char length
- 'Left: ←' is 7 char length
- 'Yin Yang ☯' is 10 char length
Not really pretty output!
For this, here is a little function:
strU8DiffLen() {
local -i bytlen
printf -v _ %s%n "$1" bytlen
return $(( bytlen - ${#1} ))
}
or written in one line:
strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}
Then now:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strU8DiffLen "$string"
printf " - %-*s is %2d chars length, but uses %2d bytes\n" \
$((14+$?)) "'$string'" ${#string} $((${#string}+$?))
done
- 'Généralités' is 11 chars length, but uses 14 bytes
- 'Language' is 8 chars length, but uses 8 bytes
- 'Théorème' is 8 chars length, but uses 10 bytes
- 'Février' is 7 chars length, but uses 8 bytes
- 'Left: ←' is 7 chars length, but uses 9 bytes
- 'Yin Yang ☯' is 10 chars length, but uses 12 bytes
But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...
Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.
wc
vs pure bash:Making a little loop of 1'000 String length inquiries:
string="Généralité"
time for i in {1..1000};do strlens=$(echo -n "$string"|wc -mc);done;echo $strlens
real 0m2.637s
user 0m2.256s
sys 0m0.906s
10 13
string="Généralité"
time for i in {1..1000};do printf -v _ %s%n "$string" bytlen;chrlen=${#string};done;echo $chrlen $bytlen
real 0m0.005s
user 0m0.005s
sys 0m0.000s
10 13
Hopefully result (10 13
) is same, but execution time differ a lot, something like 500x quicker using pure bash!!
LC_ALL
but if not used, this is not needed. But no other variable have to be used. –
Eda man 7 locale
, LC_ALL
have precedence over all others. It's the reason I follow Debian rules, having LC_ALL=
somewhere and change LANG
only, by default (It could be very usefull to be able to just change LC_CTIME
or LC_NUMERIC
).. –
Eda iconv
like this: STR=$(printf "$1" | iconv -f UTF-8 -t ISO-8859-15)
, and then ${#STR}
worked well –
Gross GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
I don't have the admin rights on the server, i tried the examples you gave and i always got the byte length. I'm trying this from a .sh
file encoding in UTF-8.. –
Gross ☯
and ←
will require 3 bytes, where é
and ô
require 2 bytes and a
or z
only 1 byte... –
Eda LC_ALL=C.UTF-8
and LANG=C.UTF-8
. –
Bower strU8DiffLen
will return correct difference. In case current session usr Latin encoding, strU8DiffLen
will return 0
(alway) wich will be correct too. –
Eda strU8DiffLen
will fail if $(( bytlen - ${#1} ))
is greater than 255. Why not just printf
the result and call the function inside a sub-shell
? Related: gnu.org/software/bash/manual/html_node/Exit-Status.html –
Declare return
by echo
, adding OFF=$(strU8DiffLen....)
and replacing ?
by OFF
in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!) –
Eda return
, you could replace them by printf -v ${2:-OFF} %d $(( bytlen - ${#1} ))
, then use $OFF
or any other variable by specifying his name as second argument. –
Eda I wanted the simplest case, finally this is a result:
echo -n 'Tell me the length of this sentence.' | wc -m;
36
echo '' | wc -m
=> 1
. You'd need to use -n
: echo -n '' | wc -m
=> 0
... in which case it's a good solution :) –
Benitobenjamen -n do not output the trailing newline
–
Mellen You can use:
MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)
wc -c
or wc --bytes
for byte counts = Unicode characters are counted with 2, 3 or more bytes.wc -m
or wc --chars
for character counts = Unicode characters are counted single until they use more bytes.mylen=$(printf "%s" "$HOME/.ssh" | wc -c)
whereas the accepted solution fails and you need to myvar=$HOME/.ssh
first. –
Movie ${#var}
. You still need LC_ALL
/ LANG
set to an UTF-8 locale, otherwise -m
will return byte count. –
Bower In response to the post starting:
If you want to use this with command line or function arguments...
with the code:
size=${#1}
There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:
if [ -z "$1" ]; then
#zero length argument
else
#non-zero length
fi
See GNU and wooledge for a more complete list of Bash conditional expressions.
If you want to use this with command line or function arguments, make sure you use size=${#1}
instead of size=${#$1}
. The second one may be more instinctual but is incorrect syntax.
size=${#1}
is certainly valid. –
Skater #
isn't replacing the $
-- the $
outside the braces is still the expansion operator. The #
is the length operator, as always. –
Skater Using your example provided
#KISS (Keep it simple stupid)
size=${#myvar}
echo $size
Here is couple of ways to calculate length of variable :
echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'
and to set the result in another variable just assign above command with back quote into another variable as following:
otherVar=`echo -n $VAR | wc -m`
echo $otherVar
http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html
I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var}
combination, but it fails with unicode: most text I process with the bash is in Cyrillic...
Based on @atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:
$ cat draw_border.sh
#!/bin/sh
#based on https://mcmap.net/q/40690/-length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
And what this sample produces:
$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
First example (in French?) was taken from someone's example above. Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.
I used echo $BPAR|wc -m
instead of printf ...
in order to not rely on if the printf is buillt-in or not.
Above I saw talks about trailing newline and -n
parameter for echo
. I did not used it, thus I add only one to the $BPLEN
. Should I use -n
, I must add 2.
To explain the difference between wc -m
and wc -c
, see the same script with only one minor change: -m
was replaced with -c
$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message. Hope, it will save some one some time :-)
p.s. Russian text says "here is one more"
p.p.s. Working "two-liner"
#!/bin/sh
#based on https://mcmap.net/q/40690/-length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command
border() { local sline;printf -v sline '%*s' ${#1} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$1" ${sline// /-};}
- Have a look at bottom of my answer : upto 500x faster!'... Or even try this: border() { local line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;}
then border 'généralité' 'А вот еще одна /usr/bin/lesspipe' 'pure ENGLISH'
–
Eda ${#var}
, take care of $LC_ALL
and your locales configuration: compare: border() { local LC_ALL=C.UTF8 line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;}
with border() { local LC_ALL=C line sline;for line;do printf -v sline '%*s' ${#line} '';printf '+%s+\n|%s|\n+%s+\n' ${sline// /-} "$line" ${sline// /-};done;}
... (depending on your configuration, you may have to replace C.UTF8
by ru_RU.UTF-8
, zh_CN.UTF-8
or else... Try: grep '^[^#]*UTF' /etc/locale.gen
) –
Eda ${#var}
, the only shell that won't consider U8 characters correctly is dash
, U could test: for shell in dash bash busybox\ sh zsh ;do echo -n $shell : \ ;$shell -c 'echo ${#1}' -- 'Généralité';done
. –
Eda Maybe just use wc -c
to count the number of characters:
myvar="Hello, I am a string."
echo -n $myvar | wc -c
Result:
21
Length of string in bash
str="Welcome to Stackoveflow"
length=`expr length "$str"`
echo "Length of '$str' is $length"
OUTPUT
Length of 'Welcome to Stackoveflow' is 23
© 2022 - 2025 — McMap. All rights reserved.