If your charsets are single-byte encodings (like the ISO-8859-n family) or UTF-8, where the newline character is the same as in ASCII, and the NUL character (\0
) doesn't occur, your operation is likely to work. If the files use UTF-16, it will not (because of NULs). Why it should work for simple search and replacement of ASCII strings is: we assumed, your encoding is a superset of ASCII and for a simple match like this, sed
will mostly work on the byte level and just replace one byte sequence with another.
But: with more complex operations, like when your replaced or replacement strings contain special characters, your results may vary. For example, the accented characters you enter on your command line might not fit the encoding in your file if console encoding/locale is different from file encoding. One can go around this, but it requires care.
Some operations in sed
depend on your locale, for example which characters are considered alphanumeric. Compare for example the following replacement performed in Polish UTF-8 locale and in C locale which uses ASCII:
$ echo "gęś gęgała" | LC_ALL=pl_PL.UTF-8 sed -e 's/[[:alnum:]]/X/g'
XXX XXXXXX
$ echo "gęś gęgała" | LC_ALL=C sed -e 's/[[:alnum:]]/X/g'
Xęś XęXXłX
But if you only want to replace literal strings, it works as expected:
$ echo "gęś gęgała" | LC_ALL=pl_PL.UTF-8 sed -e 's/g/G/g'
Gęś GęGała
$ echo "gęś gęgała" | LC_ALL=C sed -e 's/g/G/g'
Gęś GęGała
As you see, the results differ because accented characters are treated differently depending on locale. In short: replacements of literal ASCII strings will most probably work OK, more complex operations need looking into and may either work or not.