How to serialize JSON with json4s with UTF-8 characters?
Asked Answered
S

1

10

I have a really simple example:

import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL._

val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))

println(pretty(render(json)))

What I get out of that is:

{
  "english":"serialization",
  "japanese":"\u30b7\u30ea\u30a2\u30e9\u30a4\u30bc\u30fc\u30b7\u30e7\u30f3"
}

What I want is this (perfectly valid AFAIK) JSON:

{
  "english":"serialization",
  "japanese":"シリアライゼーション"
}

I can't find it now, but I think I've read somewhere that JSON only requires two special UTF-8 characters to be escaped.

Looking at the code for render, it appears that Strings always get this extra double-escaping for non-ASCII characters.

Anyone know how I can get valid JSON without double-escaping all the UTF-8 extended characters? This seems like a very similar issue to: Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?


Update: It turns out this is an open issue in json4s with a pending PR #327 which was closed in favor of PR #339 which in turn merged into the 3.4 release branch in a commit on Feb 13, 2016.

Sewing answered 3/2, 2016 at 4:20 Comment(1)
I do not know about json4s, but RFC 7159 says that UTF-8 is the default encoding for JSON. So theoretically there is no need (only an option) to escape Japanese characters. You just need a library which does it or which can be configured accordingly.Corncrib
C
7

@0__, it is not clear what answer you want to get with your bounty. The bug mentioned in the original question has already been fixed, so you can customize whether you want Unicode characters to be encoded or not. You just need to build with a current version, e.g. with a build.sbt like this:

name := "SO_ScalaJson4sUnicodeChars"
version := "1.0"
scalaVersion := "2.12.1"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.5.1"

As @kriegaex mentioned in his comment, UTF-8 is the default encoding for JSON according to RFC 7159, so encoding is not strictly necessary. This is why by default json4s does not encode, just as the OP requested:

package so

import org.json4s.JsonDSL._
import org.json4s._
import org.json4s.native.JsonMethods._

object SOTest extends App {
  val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))
  println(pretty(render(json)))
}

Console log:

{
  "english":"serialization",
  "japanese":"シリアライゼーション"
}

However if for some compatibility reason you need the output to be encdeded, json4s supports that as well. If you add your own customJsonFormats like this, you get encoded output:

package so

import org.json4s.JsonDSL._
import org.json4s._
import org.json4s.native.JsonMethods._

object SOTest extends App {
  val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))
  implicit val customJsonFormats = new DefaultFormats {
    override def alwaysEscapeUnicode: Boolean = true
  }
  println(pretty(render(json)))
}

Console log:

{
  "english":"serialization",
  "japanese":"\u30b7\u30ea\u30a2\u30e9\u30a4\u30bc\u30fc\u30b7\u30e7\u30f3"
}

Cultivator answered 25/3, 2017 at 23:28 Comment(1)
Thanks. I thought I had the same problem as OP, but found out, the issue was actually Dispatch not falling back to UTF-8 but ISO Latin when the content-type didn't specify encoding.Niggard

© 2022 - 2024 — McMap. All rights reserved.