bash localization won't work with multilines strings (with strong syntax or through `eval`)
Asked Answered
G

5

11

There is a nice feature in bash, about localization (language translation):

TEXTDOMAIN=coreutils
LANG=fr_CH.utf8
echo $"system boot"
démarrage système

(Nota: For this work, fr_CH.utf8 was already generated on your system... Else you may try with your own locale... or install locales and generate one.)

The problem:

But if this work fine with simple strings, when string contain a \n (or worst: a backtick ` things are more complicated:

echo $"Written by %s, %s, %s,\nand %s.\n"
Written by %s, %s, %s,\nand %s.\n

This is not attended answer.

(Nota2: For this work, exact message has to be prepared in .mo message file, in this sample/test, I use existant coreutils.mo files, which could be unformated with the command msgunfmt.)

At all, the only way I've found to do the translation is:

eval echo \$\"$'Written by %s, %s, %s,\nand %s.\n'\"
Écrit par %s, %s, %s,
et %s.

or

msg=$'Written by %s, %s, %s,\nand %s.\n'
eval echo \$\""$msg"\"
Écrit par %s, %s, %s,
et %s.

(You could see two double quotes... not very sexy...)

And finally I could:

WRITTERS=(Hans Pierre Jackob Heliott)
eval printf \$\""$msg"\" ${WRITTERS[@]}
Écrit par Hans, Pierre, Jackob,
et Heliott.

But as I've heard recently that eval is evil... ;-)

In fact, I don't have problem with an eval that's run with only hard coded part, but I would appreciate a way to keep this eval out and to write this kind of part in a more natural or readable manner.

At all @techno 's answer let me see that my first idea is something dangerous as if WRITTERS contain some ;ls, for sample...

Edit: So question is:

How could I keep this eval out and/or write this in a more sexy fashion

Nota:

$ printf "I use bash %s on Debian %s\n" $BASH_VERSION $(</etc/debian_version)
I use bash 4.1.5(1)-release on Debian 6.0.6
Grantgranta answered 25/12, 2012 at 1:36 Comment(13)
... I would appreciate a way to keep this eval out sound not really as a question, but it is.Grantgranta
Ah OK. For \n, try echo -e. It is still not very clear what exactly is being asked in other parts of the question. Why are you using echo at all? What's wrong with printf $"message key" $var1 $var2?Designing
@n.m. I complain that $"$msg" will work fine only if $msg don't contain a \n. If so, need to write ugly thing like eval... \$\""$msg"\"...Grantgranta
I'd suggest trying the help-bash mailing list. This is a feature that's rarely used both because it's obscure, and because of security bugs. Even people that do a lot of scripting don't tend to use it.Blow
Aha, it's more clear now.#9139901Designing
@n.m.I don't have problem with embed newline in a variable in bash! I have problem with bash localized translation with $"..."! Take a look at man gettext and man bash about $" (not $'').Grantgranta
@ormaaj: I've searched about infos through the web and at gnu.org but didn't find anything! If you have some links, arging your mean,your remark could become constructive. If else, I now it's obscure, there are no more than 5 lines in manpage and approx 10 in the Bash Reference Manual!Grantgranta
What am I doing wrong? pastie.org/pastes/5614753/textShackleford
You have to compile russian locales with utf-8 in order for this work; under Debian hit: dpkg-reconfigure locales.Grantgranta
@Shackleford You have to separate by a semi-colon LANG=ru_RU ; echo $"..." !Grantgranta
Uhm… I got it now. But this requires the script to have permission for execute, so it wouldn’t change the environment LANG. And I wonder why the inline LANG setting doesn’t work. BTW in ABS guide descibed only one way to get rid of eval using gettext in a subshell, this was probably used somewhere below, so I just post the link to source tldp.org/LDP/abs/html/localization.htmlShackleford
Ah, stupid me, inline setting in case of localizing a script is terrible, right.Shackleford
@Shackleford Yes. the problem is that work, but not well while there is linebreak in strings, and that won't work when backticks are presents. Take a look at myself's answer. There is a full not-well-working-but-working-with-readable-errors sample.Grantgranta
S
6

I've played a little bit with this feature and this is what I came up with: you can include the newline verbatim as:

$ echo $"Written by %s.
> "
Écrit par %s.
$ 

In a script:

#!/bin/bash

message=$"Written by %s.
"

printf "$message" Gniourf

This script will output:

Écrit par Gniourf.

Ok, this is not really an answer, but it might help a little bit (at least, we're not using the evil eval).

Personal remark: I find this feature really clunky!

Sensory answered 25/12, 2012 at 13:56 Comment(5)
Yes, thank, I've seen that too, but as you said: this is not really -the- answer ;-) But +1 as your way of storing and re-using$message is cleaner as what I've already tested. (Initial idea was using bash associative array for storing all messages, so I could imagine a nice way of doing that with your syntax.)Grantgranta
...cleaner and more efficient... if all message have to be printed at least on time or more; Your way run translation when setting variable, when mine run translation when used... More or less, depending on what, when and how...Grantgranta
You can probably mix $"..." and $'...' for the desired effect. msg=$'Written by %s.\n'; echo $"$msg"Panther
@Panther And what if message contain backticks? Try: msg="$(msgunfmt /usr/share/locale/fr/LC_MESSAGES/coreutils.mo | sed -ne '/missing character class/s/^.* "\(.*\)"/\1/p')"Grantgranta
@Panther (copy) Thanks to everyone. (My bounty will go to gniourf_gniourf unless best answer in 8 hours. But thanks to techno too, I like your lPrintf! )Grantgranta
C
3

If using eval is bad with arbitrary variables, there is a way to do this only when called/needed, in running eval only on message part:

function lPrintf() {
    local sFormat="$(
        eval 'echo $"'"${1}"'"'.
    )"
    shift
    printf "${sFormat%.}" $@
}

lPrintf "system boot"
démarrage système

lPrintf  $'Written by %s, %s, %s,\nand %s.\n' techno moi lui-même bibi
Écrit par techno, moi, lui-même,
et bibi.

( The dot at end of translated string ensure that whole string, including leading line-break, where passed to variable sFormat. They will be dropped with ${sFormat%.} )

Ceja answered 27/12, 2012 at 9:5 Comment(2)
Nice way to limit use of eval. And to make them a little more sexy! +1, but there is an eval anyway...Grantgranta
Thanks to everyone. (My bounty will go to gniourf_gniourf unless best answer in 8 hours. But thanks to techno too, I like your lPrintf! )Grantgranta
D
2

OK I think finally got it right.

iprintf() {
    msg="$2"
    domain="$1"
    shift
    shift
    imsg=$(gettext -ed "$domain" "$msg" ; echo EOF)
    imsg="${imsg%EOF}"
    printf "$imsg" "$@"
}

Usage example:

LANG=fr_CH.utf8 iprintf coreutils "If FILE is not specified, use %s.  %s as FILE is common.\n\n" foo bar
Designing answered 25/12, 2012 at 21:22 Comment(6)
Yes sure, but if this let us keep eval and ugly form out, this add a fork and this is not as quick as invoking $"...". But thank for contrib!Grantgranta
If you can produce profiling data that pinpoint this fork as the performance bottleneck in your system, you probably should not have written it in bash in the first place.Designing
difference exist (as tiny they are). The feature exist. So if they exist why did a need to use expensive fork? While each time cost have to be reduced, I think your comment is not constructive.Grantgranta
More electrons are probably wasted on this thread than ever will be on all these forks...Designing
A shell script that doesn't invoke any external programs probably shouldn't be a shell script. Note the subshell doesn't really add any cost. It's a single fork+exec either way. @n.m. I'd use local variables, or better, use the parameters directly. It's also not a very good idea to expand a variable into the first argument of printf, especially in Bash, especially when it results from calling an external program. The -v option can result in executing arbitrary code. Also, shift can take an argument to indicate the number of shifts.Blow
@Blow (copy) Thanks to everyone. (My bounty will go to gniourf_gniourf unless best answer in 8 hours. But thanks to techno too, I like your lPrintf! )Grantgranta
G
1

Simple solution for building a translation function:

f() {
    eval 'local msg=$"'"${1//[\"\$\`]}"\"
    shift
    printf "${msg}" "$@"
}

Test:

TEXTDOMAIN=coreutils
LANG="fr_CH.utf8"
f system boot
démarrage système

f $'Written by %s, %s, %s,\nand %s.\n' Athos Portos Aramis Shreck
Écrit par Athos, Portos, Aramis
et Shreck.

But as I prefer setting variables instead of forking function:

f() {
    eval 'local msg=$"'"${1//[\"\$\`]}"\"
    local -n variable=$2
    shift 2
    printf -v variable "$msg" "$@"
}

Then

f $'Written by %s, %s, %s,\nand %s.\n' string Huey Dewey Louie Batman
echo ${string@Q}
$'Écrit par Huey, Dewey, Louie\net Batman.\n'

echo "$string"
Écrit par Huey, Dewey, Louie
et Batman.

Or even better as a full translation function:

f() {
    local store=false OPTIND OPTARG OPTERR varname
    while getopts 'd:v:' opt ;do
        case $opt in
            d ) local TEXTDOMAIN=$OPTARG ;;
            v ) varname=$OPTARG ;;
        esac
    done
    shift $((OPTIND-1))
    eval 'local msg=$"'"${1//[\"\$\`]}"\"
    shift
    printf ${varname+-v} $varname "$msg" "$@"
}

Then

f -d libc -v string "Permission denied"
echo $string
Permission non accordée

f -d coreutils $'Written by %s, %s, %s,\nand %s.\n' Riri Fifi Loulou Georges
Écrit par Riri, Fifi, Loulou
et Georges.

Old answer (Jan 2013)

Well, there is my self answer:

This seem not well implemented now. Work in many situations, but, while

echo "$(gettext 'missing character class name `[::]'\')"
caractère de nom de classe « [::] » manquant

work simply, the same string seem impossible to translate using this bashism:

echo $"missing character class name `[::]'"
> 

the console stay locked (waiting for such an end of string) adding ``" ` would immerse bash in a complex interpretation process :->>

> `"
bash: command substitution: line 1: Caractère de fin de fichier (EOF) prématuré lors de la recherche du « ' » correspondant
bash: command substitution: line 2: Erreur de syntaxe : fin de fichier prématurée
missing character class name 

And, of course:

echo $"missing character class name \`[::]'"
missing character class name `[::]'

make no translation. :-p

While translating this string containing two backticks work finely:

echo $"%s}: integer required between `{' and `}'"
%s} : entier requis entre « { » et « } »

There is a script where you may see some of mine unsuccessfull trys.

#!/bin/bash

echo "Localized tests"
export TEXTDOMAIN=coreutils
export LANG=fr_CH.UTF-8
export WRITTERS=(Athos Portos Aramis Dartagnan\ Le\ Beau)

echo '#First method# whitout eval'

declare -A MyMessages;
MyMessages[sysReboot]=$"system boot"
MyMessages[writtenBy]=$"Written by %s, %s, %s,
and %s.
"
MyMessages[intReq]=$"%s}: integer required between `{' and `}'"
MyMessages[trClass]=$"when translating, the only character classes that may appear in
string2 are `upper' and `lower'"
# MyMessages[missClass]=$"missing character class name `[::]'" 

for msgIdx in ${!MyMessages[@]} ;do
    printf "\n--- Test chain '%s' ---\n" $msgIdx
    case $msgIdx in
    writ* )
        printf "${MyMessages[$msgIdx]}\n" "${WRITTERS[@]}"
        ;;
    intReq )
        printf "ARRAY{${MyMessages[$msgIdx]}\n" NaN
        ;;
    * )
        printf "${MyMessages[$msgIdx]}\n"
        ;;
    esac
  done

echo $'###\n#Second method# whith limited eval'
unset MyMessages;

declare -A MyMessages;

lPrintf() {
    local sFormat="$(
        eval 'echo $"'"${1}"'"'.
    )"
    shift
    printf "${sFormat%.}" "$@"
}

MyMessages[sysReboot]="system boot"
MyMessages[writtenBy]=$'Written by %s, %s, %s,\nand %s.\n'
MyMessages[intReq]="%s}: integer required between \`{' and \`}'"
MyMessages[trClass]="when translating, the only character classes that "
MyMessages[trClass]+=$'may appear in\nstring2 '
MyMessages[trClass]+="are \`upper' and \`lower'"
MyMessages[missClass]="missing character class name \`[::]'"

for msgIdx in ${!MyMessages[@]} ;do
    printf "\n--- Test chain '%s' ---\n" $msgIdx
    case $msgIdx in
    writ* )
        lPrintf "${MyMessages[$msgIdx]}" "${WRITTERS[@]}"
        ;;
    intReq )
        lPrintf "${MyMessages[$msgIdx]}" NaN
        ;;
    * )
        lPrintf "${MyMessages[$msgIdx]}"
        ;;
    esac
  done

and his output:

Localized tests
#First method# whitout eval

--- Test chain 'trClass' ---
à la traduction, les seules classes de caractères qui peuvent apparaître
dans string2 sont « upper » ou « lower »

--- Test chain 'intReq' ---
ARRAY{NaN} : entier requis entre « { » et « } »

--- Test chain 'sysReboot' ---
démarrage système

--- Test chain 'writtenBy' ---
Écrit par Athos, Portos, Aramis,
et Dartagnan Le Beau.

###
#Second method# whith limited eval

--- Test chain 'trClass' ---
à la traduction, les seules classes de caractères qui peuvent apparaître
dans string2 sont « upper » ou « lower »
--- Test chain 'missClass' ---
./localized.sh: eval: line 44: Caractère de fin de fichier (EOF) prématuré lors de la recherche du « ` » correspondant
./localized.sh: eval: line 45: Erreur de syntaxe : fin de fichier prématurée

--- Test chain 'intReq' ---
NaN} : entier requis entre « { » et « } »
--- Test chain 'sysReboot' ---
démarrage système
--- Test chain 'writtenBy' ---
Écrit par Athos, Portos, Aramis,
et Dartagnan Le Beau.

If anyone could help my to remove comments and/or error message in this script!? ... (in less then 8 hours?!)

At all, thanks to everyone. (My bounty will go to @gniourf_gniourf unless best answer in 8 hours. But thanks to @techno too, I like your lPrintf! )

Grantgranta answered 3/1, 2013 at 21:13 Comment(0)
V
0

Talking you out of it

Fundamentally you should probably be not concerned about this issue, because C and Bash are different in how printf works: C's printf does not translate backslash escapes, while the Bash one does. So in an ideal world, you should really only be doing is just printf $"%s, %s,\n%s" some thing more and having the template string retain the raw backslash escape (so it might look like msgid "%s, %s,\\n%s" in the po-file).

As you have already realized, the $"" construct also disallows the use of msgids invalid to bash's double-quotation syntax. There simply is no way to use this entry:

msgid "`"
msgstr "« "

and by stripping these problematic characters away it only masks the problem. (Again, it's fine for bash, because you would've been writing echo $"\`" and msgid "\\`").

On the other hand, there really is good reason to not use the $"" construct. The construct allows translators to run arbitrary commands, creating one more real level of insecurity compared to eval. Using the gettext.sh functions is free from the problem, as any variable substitution is handled by a separate envsubst program. And it also lets you use $'' as much as you like.

Valona answered 29/3, 2021 at 11:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.