Escaping characters in bash (for JSON)
Asked Answered
T

15

105

I'm using git, then posting the commit message and other bits as a JSON payload to a server.

Currently I have:

MSG=`git log -n 1 --format=oneline | grep -o ' .\+'`

which sets MSG to something like:

Calendar can't go back past today

then

curl -i -X POST \
  -H 'Accept: application/text' \
  -H 'Content-type: application/json' \
  -d "{'payload': {'message': '$MSG'}}" \
  'https://example.com'

My real JSON has another couple of fields.

This works fine, but of course when I have a commit message such as the one above with an apostrophe in it, the JSON is invalid.

How can I escape the characters required in bash? I'm not familiar with the language, so am not sure where to start. Replacing ' with \' would do the job at minimum I suspect.

Toccaratoccata answered 7/4, 2012 at 10:28 Comment(2)
As an extra note, JSON is supposed to use double (not single) quotes around values, so many (but not all) parsers would reject the above, even if it was structurally sound and escaped properly.Extrauterine
Not a solution to the question, but others might consider this: dwaves.de/tools/escape, which appears to work in my minimal testing.Roderick
T
15

OK, found out what to do. Bash supports this natively as expected, though as always, the syntax isn't really very guessable!

Essentially ${string//substring/replacement} returns what you'd image, so you can use

MSG=${MSG//\'/\\\'}

To do this. The next problem is that the first regex doesn't work anymore, but that can be replaced with

git log -n 1 --pretty=format:'%s'

In the end, I didn't even need to escape them. Instead, I just swapped all the ' in the JSON to \". Well, you learn something every day.

Toccaratoccata answered 7/4, 2012 at 11:10 Comment(1)
This is by no means fully compliant JSON escaping. The real thing requires tabs to be replaced with \t, newlines to be replaced with \n, literal backslashes to be doubled, etc.Glorification
V
128

jq can escape strings for use in shell scripts or JSON documents.

Lightweight, free, and written in C, jq enjoys widespread community support with over 25k stars on GitHub. I personally find it very speedy and useful in my daily workflow.

Convert string to JSON

echo -n '猫に小判' | jq -Rsa .

"\u732b\u306b\u5c0f\u5224"

To explain,

  • -R means "raw input"
  • -s means "include linebreaks" (mnemonic: "slurp")
  • -a means "ascii output" (optional)
  • . means "output the root of the JSON document"

Git + Grep Use Case

To fix the code example given by the OP, simply pipe through jq.

MSG=`git log -n 1 --format=oneline | grep -o ' .\+' | jq -Rsa .`
Volans answered 16/5, 2018 at 22:18 Comment(4)
For a text containing line breaks the -s option should be added, to get the single string result.Lucho
This should be the correct and accepted answer!Mesopotamia
wow JQ saved the day. My final command for posting a GitHub comment markdown formatted looks like: shell COMMENT=$(cat summary.md | jq -Rsa .) curl -X POST -H "Authorization: token $GITHUB_TOKEN" -d "{\"body\":\"$COMMENT\"}" "$COMMENTS_URL" Charyl
I added an answer that is pure Bash, no jq required: https://mcmap.net/q/203954/-escaping-characters-in-bash-for-jsonYl
E
84

Using Python:

This solution is not pure bash, but it's non-invasive and handles unicode.

json_escape () {
    printf '%s' "$1" | python -c 'import json,sys; print(json.dumps(sys.stdin.read()))'
}

Note that JSON is part of the standard python libraries and has been for a long time, so this is a pretty minimal python dependency.

Or using PHP:

json_escape () {
    printf '%s' "$1" | php -r 'echo json_encode(file_get_contents("php://stdin"));'
}

Use like so:

$ json_escape "ヤホー"
"\u30e4\u30db\u30fc"
Extrauterine answered 20/11, 2012 at 3:25 Comment(11)
Is it really a JSON format that is being passed as the first parameter, or is it a python object format? Is there an appreciable difference between the two?Hamburg
The first parameter should be just a string that will be a simple value in the output JSON, not a complex object itself, just like in the original question. If you want to insert a complex value bash is almost certainly more trouble than it's worth.Extrauterine
I like this. not hard to change it to a simple oneliner: alias json_escape="python -c 'import json,sys; print json.dumps(sys.stdin.read())'"Rodina
It would be a good idea to use printf '%s' "$1" instead of echo, because the way in which echo parses its arguments is highly inconsistent across different shells. It would also be useful to parenthesize the print call for Python 3 compatibility, and remove the function keyword.Arthurarthurian
I think you need to quote the $1, so you don't lose spaces.Tellurate
How is this pure bash if it calls python XDTelethermometer
You need to quote the $1 in the printfOr
Not just to avoid spaces, but also to prevent glob expansion. With the original code, json_escape '*' would be emitting the quoted form of, not the * character, but rather of a list of filenames in the current directory concatenated together.Glorification
jq will do the same thing, e.g. jq -aR <<< 'ヤホー'Volans
Yeah, not using a standard set of bash tools, but I can't imagine a bash without python (and in my case also PHP). jq seems like a powerful tool, yet I can hardly justify installing it for a single use.Totter
How to store the formatted JSON string in a variable?Horsemanship
P
67

Instead of worrying about how to properly quote the data, just save it to a file and use the @ construct that curl allows with the --data option. To ensure that the output of git is correctly escaped for use as a JSON value, use a tool like jq to generate the JSON, instead of creating it manually.

jq -n --arg msg "$(git log -n 1 --format=oneline | grep -o ' .\+')" \
   '{payload: { message: $msg }}' > git-tmp.txt

curl -i -X POST \
  -H 'Accept: application/text' \
  -H 'Content-type: application/json' \
  -d @git-tmp.txt \
  'https://example.com'

You can also read directly from standard input using -d @-; I leave that as an exercise for the reader to construct the pipeline that reads from git and produces the correct payload message to upload with curl.

(Hint: it's jq ... | curl ... -d@- 'https://example.com' )

Platitude answered 15/7, 2012 at 21:27 Comment(7)
Correct; I didn't give thought to that back when I wrote this answer. I'll update now.Platitude
Why not skip the file and store to a variable in bash with data="$(jq --arg...)"?Volans
@Volans The end of the answer alludes to how to do this with no intermediate storage at all; curl can read directly from jq via a pipe, rather than storing the output of jq in memory.Platitude
The question was - "how to escape characters in bash", so I have to stand up for the justice to us who want to do just that and have come here looking for it :) The problem is in encoding some e.g. dynamic content (coming from a pretty formatted JSON file or a command output, or simply - any text at all) into a string field of a JSON to be sent to an API using Curl.Totter
If you prefer to achieve the same effect with pipes in a monstrous one-liner, you can do git log -n 1 --format=oneline | grep -o ' .\+' | jq -R -s '{payload: {message: . }}' | curl -i -X POST -H 'Accept: application/text' -H 'Content-type: application/json' -d @- 'https://example.com'Sellers
I am trying to pass a path as string to the json file, but it automatically changes it to absolute path :( don't know whyUnrepair
@HaydenSchiff technically you can put a newline after the | pipe character.Volans
B
23

I was also trying to escape characters in Bash, for transfer using JSON, when I came across this. I found that there is actually a larger list of characters that must be escaped – particularly if you are trying to handle free form text.

There are two tips I found useful:

  • Use the Bash ${string//substring/replacement} syntax described in this thread.
  • Use the actual control characters for tab, newline, carriage return, etc. In vim you can enter these by typing Ctrl+V followed by the actual control code (Ctrl+I for tab for example).

The resultant Bash replacements I came up with are as follows:

JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\\/\\\\} # \ 
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\//\\\/} # / 
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\'/\\\'} # ' (not strictly needed ?)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\"/\\\"} # " 
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//   /\\t} # \t (tab)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//
/\\\n} # \n (newline)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^M/\\\r} # \r (carriage return)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^L/\\\f} # \f (form feed)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^H/\\\b} # \b (backspace)

I have not at this stage worked out how to escape Unicode characters correctly which is also (apparently) required. I will update my answer if I work this out.

Blastocoel answered 15/7, 2012 at 20:59 Comment(2)
With regards to the suggestion elsewhere that you use the -d parameter and the @ modifier with curl, that doesn't solve the problem. I was in fact already using this and found that the contents of the file still need to be encoded properly in the way that JSON expects.Blastocoel
I needed to replace all escape characters (Oktal: 033). I figured out that you can use single quote syntax like this: JSON_TOPIC_RAW=${JSON_TOPIC_RAW//$'\033'/_}. A list of ctrl characters: unicodelookup.com/#ctrl.Prompt
T
15

OK, found out what to do. Bash supports this natively as expected, though as always, the syntax isn't really very guessable!

Essentially ${string//substring/replacement} returns what you'd image, so you can use

MSG=${MSG//\'/\\\'}

To do this. The next problem is that the first regex doesn't work anymore, but that can be replaced with

git log -n 1 --pretty=format:'%s'

In the end, I didn't even need to escape them. Instead, I just swapped all the ' in the JSON to \". Well, you learn something every day.

Toccaratoccata answered 7/4, 2012 at 11:10 Comment(1)
This is by no means fully compliant JSON escaping. The real thing requires tabs to be replaced with \t, newlines to be replaced with \n, literal backslashes to be doubled, etc.Glorification
A
9
git log -n 1 --format=oneline | grep -o ' .\+' | jq --slurp --raw-input

The above line works for me. refer to https://github.com/stedolan/jq for more jq tools

Arrange answered 8/11, 2016 at 6:34 Comment(0)
W
5

I found something like that :

MSG=`echo $MSG | sed "s/'/\\\\\'/g"`
Wageworker answered 7/4, 2012 at 10:34 Comment(2)
MSG ends up being ok - I guess I just want to do something like MSG = MSG.replace("'","\'"), but not sure how to do that in bash.Toccaratoccata
Here is some points cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files. May be you can grasp it fasterWageworker
M
4

The simplest way is using jshon, a command line tool to parse, read and create JSON.

jshon -s 'Your data goes here.' 2>/dev/null

Membranous answered 20/2, 2016 at 0:28 Comment(0)
B
3

Adding a JSON-aware tool to your environment is sometimes a no-go, so here's a POSIX solution, in the form of a shell function, that should work on every UNIX/Linux:

json_stringify() {
    LANG=C command -p awk '
        BEGIN {
            ORS = ""

            for ( i = 1; i <= 127; i++ )
                tr[ sprintf( "%c", i) ] = sprintf( "\\u%04x", i )

            for ( i = 1; i < ARGC; i++ ) {
                s = ARGV[i]
                print "\""
                while ( match( s, /[\001-\037\177"\\]/ ) ) {
                    print substr(s,1,RSTART-1) tr[ substr(s,RSTART,RLENGTH) ]
                    s = substr(s,RSTART+RLENGTH)
                }
                print s "\"\n"
            }
        }
    ' "$@"
}

Aside: You might prefer to use the widely available (but non-POSIX) perl instead:

json_stringify() {
    LANG=C perl -le '
        for (@ARGV) {
            s/[\x00-\x1f\x7f"\\]/sprintf("\\u%04x",ord($&))/ge;
            print "\"$_\""
        }
    ' -- "$@"
}
Example:
json_stringify '"foo\bar"' 'hello
world'

Each argument is converted to a JSON string and outputted one per line:

"\u0022foo\u005cbar\u0022"
"hello\u000aworld"
Limitations:
  • Cannot handle NUL bytes.

  • Doesn't validate the input for UNICODE; it only escapes the mandatory ASCII characters specified by the RFC 8259.

  • The input is limited in size (you'll get an Argument list too long error when the input is too big).


Replying to OP's question:

Here's how you can build a valid JSON object using the json_stingify function:

MSG=$(git log -n 1 --format=oneline | grep -o ' .\+')

curl -i -X POST \
  -H 'Accept: application/text' \
  -H 'Content-type: application/json' \
  -d '{"payload": {"message": '"$(json_stringify "$MSG")"'}}' \
  'https://example.com'
Bushelman answered 14/11, 2022 at 2:27 Comment(1)
i think the JSON RFC has mistakenly forgotten to include x7F \177 in the escape list - but good thing your solution has already included itSparrow
J
2

[...] with an apostrophe in it, the JSON is invalid.

Not according to https://www.json.org. A single quote is allowed in a JSON string.

How can I escape the characters required in bash?

You can use to properly prepare the JSON you want to POST.
As https://example.com can't be tested, I'll be using https://api.github.com/markdown (see this answer) as an example.

Let's assume 'çömmít' "mêssågè" as the exotic output of git log -n 1 --pretty=format:'%s'.

Create the (serialized) JSON object with the value of the "text"-attribute properly escaped:

$ git log -n 1 --pretty=format:'%s' | \
  xidel -se 'serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})'
{"text":"'\u00E7\u00F6mm\u00EDt' \"m\u00EAss\u00E5g\u00E8\""}

Curl (variable)

$ eval "$(
  git log -n 1 --pretty=format:'%s' | \
  xidel -se 'msg:=serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})' --output-format=bash
)"

$ echo $msg
{"text":"'\u00E7\u00F6mm\u00EDt' \"m\u00EAss\u00E5g\u00E8\""}

$ curl -d "$msg" https://api.github.com/markdown
<p>'çömmít' "mêssågè"</p>

Curl (pipe)

$ git log -n 1 --pretty=format:'%s' | \
  xidel -se 'serialize({"text":$raw},{"method":"json","encoding":"us-ascii"})' | \
  curl -d@- https://api.github.com/markdown
<p>'çömmít' "mêssågè"</p>

Actually, there's no need for curl if you're already using xidel.

Xidel (pipe)

$ git log -n 1 --pretty=format:'%s' | \
  xidel -s \
  -d '{serialize({"text":read()},{"method":"json","encoding":"us-ascii"})}' \
  "https://api.github.com/markdown" \
  -e '$raw'
<p>'çömmít' "mêssågè"</p>

Xidel (pipe, in-query)

$ git log -n 1 --pretty=format:'%s' | \
  xidel -se '
    x:request({
      "post":serialize(
        {"text":$raw},
        {"method":"json","encoding":"us-ascii"}
      ),
      "url":"https://api.github.com/markdown"
    })/raw
  '
<p>'çömmít' "mêssågè"</p>

Xidel (all in-query)

$ xidel -se '
  x:request({
    "post":serialize(
      {"text":system("git log -n 1 --pretty=format:'\''%s'\''")},
      {"method":"json","encoding":"us-ascii"}
    ),
    "url":"https://api.github.com/markdown"
  })/raw
'
<p>'çömmít' "mêssågè"</p>
Janitress answered 27/3, 2020 at 17:10 Comment(2)
in the .sh file, i am passing a path like /notebooks/folder/test.py and it automatically converts it to C:/Program Files/Git/notebooks/folder/test.py, don't know why :(Unrepair
@Unrepair This is too little information to say anything useful about it. Please start a new question, or see videlibri.sourceforge.net/xidel.html#contact.Janitress
M
1

This is an escaping solution using Perl that escapes backslash (\), double-quote (") and control characters U+0000 to U+001F:

$ echo -ne "Hello, 🌵\n\tBye" | \
  perl -pe 's/(\\(\\\\)*)/$1$1/g; s/(?!\\)(["\x00-\x1f])/sprintf("\\u%04x",ord($1))/eg;'
Hello, 🌵\u000a\u0009Bye
Metabolic answered 15/8, 2017 at 0:29 Comment(0)
O
1

I struggled with the same problem. I was trying to add a variable on the payload of cURL in bash and it kept returning as invalid_JSON. After trying a LOT of escaping tricks, I reached a simple method that fixed my issue. The answer was all in the single and double quotes:

 curl --location --request POST 'https://hooks.slack.com/services/test-slack-hook' \
--header 'Content-Type: application/json' \
--data-raw '{"text":'"$data"'}'

Maybe it comes in handy for someone!

Overshine answered 28/2, 2020 at 12:13 Comment(1)
This only works with some data, not all possible values. If your data contains newlines, they need to be changed to \ns before being substituted into JSON; if the data contains double quotes, they need backslashes before them; etc. Tools such as jq or xidel, suggested in other answers, automate doing that for you.Glorification
Y
1

Escape with Bash only

Below is my Bash function that will escape JSON.

function escape_json_string() {
  local input=$1
  for ((i = 0; i < ${#input}; i++)); do
    local char="${input:i:1}"
    local escaped="${char}"
    case "${char}" in
      $'"' ) escaped="\\\"";;
      $'\\') escaped="\\\\";;
      *)
        if [[ "${char}" < $'\x20' ]]; then
          case "${char}" in 
            $'\b') escaped="\\b";;
            $'\f') escaped="\\f";;
            $'\n') escaped="\\n";;
            $'\r') escaped="\\r";;
            $'\t') escaped="\\t";;
            *) escaped=$(printf "\u%04X" "'${char}")
          esac
        fi;;
    esac
    echo -n "${escaped}"
  done
}

Example:

$ escape_json_string 'File: a\b\c
"my file"'
File: a\\b\\c\n\"my file\"

Explanation

It iterates over the characters and outputs them, escaping only ", \ and the control characters (< 0x20), as per the JSON specification. It uses ANSI-C Quoting to specify the characters in C notation. Some control characters have special escaping. The fallback to hex values uses printf and a parameter with a leading ' to allow interpreting the character as a number, as per the manual:

Arguments to non-string format specifiers are treated as C language constants, except that a leading plus or minus sign is allowed, and if the leading character is a single or double quote, the value is the ASCII value of the following character.

Yl answered 2/2 at 22:45 Comment(0)
K
0

I had the same idea to send a message with commit message after commit. First i tryed similar was as autor here. But later found a better and simpler solution.

Just created php file which is sending message and call it with wget. in hooks/post-receive :

wget -qO - "http://localhost/git.php" 

in git.php:

chdir("/opt/git/project.git");
$git_log = exec("git log -n 1 --format=oneline | grep -o ' .\+'");

And then create JSON and call CURL in PHP style

Kendall answered 18/3, 2017 at 14:5 Comment(0)
L
0

Using Node.js

If you have node.js installed, you can use the -p (print) option of node to help with this:

MSG=$(MSG=$MSG node -p "JSON.stringify(process.env.MSG)")
curl -i -X POST \
  -H 'Accept: application/text' \
  -H 'Content-type: application/json' \
  -d "{'payload': {'message': $MSG}}" \
  'https://example.com'

This basically uses the JSON.stringify() function of Node.js to escape $MSG.

(As others have noted the single quotes in the value passed to -d make it invalid JSON, but I left that as-is. But those could be replaced with double-quotes escaped with backslash.)

Luisluisa answered 21/9, 2023 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.