Note:
SINGLE-line Solutions
Escaping a string literal for use as a regex in sed
:
To give credit where credit is due: I found the regex used below in this answer.
Assuming that the search string is a single-line string:
search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3' # sample input containing metachars.
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it.
sed -n "s/$searchEscaped/foo/p" <<<"$search" # Echoes 'foo'
- Every character except
^
is placed in its own character set [...]
expression to treat it as a literal.
- Note that
^
is the one char. you cannot represent as [^]
, because it has special meaning in that location (negation).
- Then,
^
chars. are escaped as \^
.
- Note that you cannot just escape every char by putting a
\
in front of it because that can turn a literal char into a metachar, e.g. \<
and \b
are word boundaries in some tools, \n
is a newline, \{
is the start of a RE interval like \{1,3\}
, etc.
The approach is robust, but not efficient.
The robustness comes from not trying to anticipate all special regex characters - which will vary across regex dialects - but to focus on only 2 features shared by all regex dialects:
- the ability to specify literal characters inside a character set.
- the ability to escape a literal
^
as \^
Escaping a string literal for use as the replacement string in sed
's s///
command:
The replacement string in a sed
s///
command is not a regex, but it recognizes placeholders that refer to either the entire string matched by the regex (&
) or specific capture-group results by index (\1
, \2
, ...), so these must be escaped, along with the (customary) regex delimiter, /
.
Assuming that the replacement string is a single-line string:
replace='Laurel & Hardy; PS\2' # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it
sed -n "s/.*/$replaceEscaped/p" <<<"foo" # Echoes $replace as-is
MULTI-line Solutions
Escaping a MULTI-LINE string literal for use as a regex in sed
:
Note: This only makes sense if multiple input lines (possibly ALL) have been read before attempting to match.
Since tools such as sed
and awk
operate on a single line at a time by default, extra steps are needed to make them read more than one line at a time.
# Define sample multi-line literal.
search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3
/def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4'
# Escape it.
searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #'
# Use in a Sed command that reads ALL input lines up front.
# If ok, echoes 'foo'
sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"
- The newlines in multi-line input strings must be translated to
'\n'
strings, which is how newlines are encoded in a regex.
$!a\'$'\n''\\n'
appends string '\n'
to every output line but the last (the last newline is ignored, because it was added by <<<
)
tr -d '\n
then removes all actual newlines from the string (sed
adds one whenever it prints its pattern space), effectively replacing all newlines in the input with '\n'
strings.
Escaping a MULTI-LINE string literal for use as the replacement string in sed
's s///
command:
# Define sample multi-line literal.
replace='Laurel & Hardy; PS\2
Masters\1 & Johnson\2'
# Escape it for use as a Sed replacement string.
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace")
replaceEscaped=${REPLY%$'\n'}
# If ok, outputs $replace as is.
sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar"
- Newlines in the input string must be retained as actual newlines, but
\
-escaped.
-e ':a' -e '$!{N;ba' -e '}'
is the POSIX-compliant form of a sed
idiom that reads all input lines a loop.
's/[&/\]/\\&/g
escapes all &
, \
and /
instances, as in the single-line solution.
s/\n/\\&/g'
then \
-prefixes all actual newlines.
IFS= read -d '' -r
is used to read the sed
command's output as is (to avoid the automatic removal of trailing newlines that a command substitution ($(...)
) would perform).
${REPLY%$'\n'}
then removes a single trailing newline, which the <<<
has implicitly appended to the input.
bash
functions based on the above (for sed
):
quoteRe()
quotes (escapes) for use in a regex
quoteSubst()
quotes for use in the substitution string of a s///
call.
- both handle multi-line input correctly
- Note that because
sed
reads a single line at at time by default, use of quoteRe()
with multi-line strings only makes sense in sed
commands that explicitly read multiple (or all) lines at once.
- Also, using command substitutions (
$(...)
) to call the functions won't work for strings that have trailing newlines; in that event, use something like IFS= read -d '' -r escapedValue <(quoteSubst "$value")
# SYNOPSIS
# quoteRe <text>
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# SYNOPSIS
# quoteSubst <text>
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}
Example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars.
# Should print the unmodified value of $to
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from"
Note the use of -e ':a' -e '$!{N;ba' -e '}'
to read all input at once, so that the multi-line substitution works.
perl
solution:
Perl has built-in support for escaping arbitrary strings for literal use in a regex: the quotemeta()
function or its equivalent \Q...\E
quoting.
The approach is the same for both single- and multi-line strings; for example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars.
# Should print the unmodified value of $to.
# Note that the replacement value needs NO escaping.
perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from"
Note the use of -0777
to read all input at once, so that the multi-line substitution works.
The -s
option allows placing -<var>=<val>
-style Perl variable definitions following --
after the script, before any filename operands.
input
? – Erskines/[chars]/\\&/'
– Erskine\1
(back reference) or\(
(start of capturing group) – Aggress\1
to\\1
with this. – Erskineprintf %q
come? – Swerveprintf "%q" "\n"
would destroy the\n
.. – Aggress\n
in the input search string to match? For it to match a literal\n
in the file it needs to be\\n
in the pattern (which is whatprintf
does). – Swerveit should be treated a literal backslash followed by a literal n
- yes, I meant this. – Aggress\n
should stay\n
and not getting\\n
.. (Same with\t
,\0
... ) – Aggress{
or(
between posix and gnu native. – AdynamiaSOH \001
- see my answer to another question that uses this here: https://mcmap.net/q/48795/-replace-multiple-lines-identifying-end-character On the RHS there is no perfect solution that I know of but you can scan the string quickly and find a unique replacement value at runtime, you can see this technique here: github.com/AdamDanischewski/r-n-f-bash-rename-script – Mopes