Explanation of JSONB introduced by PostgreSQL
Asked Answered
D

9

503

PostgreSQL just introduced JSONB in version 9.4, and it's already trending on hacker news. How is it different from Hstore and JSON previously present in PostgreSQL?

What are its advantages and limitations and when should someone consider using it?

Deservedly answered 26/3, 2014 at 7:26 Comment(4)
From PGCon2014: youtube.com/…Burrus
@CraigRinger url is not precise enough, now, 1 year later it doesn't even point close enough to JSONB-related content.Hamnet
@Hamnet I thought I linked to the specific post. How frustrating.Piggin
It does point to the specific video.Nonentity
G
632

First, hstore is a contrib module, which only allows you to store key => value pairs, where keys and values can only be texts (however values can be sql NULLs too).

Both json & jsonb allows you to store a valid JSON value (defined in its spec).

F.ex. these are valid JSON representations: null, true, [1,false,"string",{"foo":"bar"}], {"foo":"bar","baz":[null]} - hstore is just a little subset compared to what JSON is capable (but if you only need this subset, it's fine).

The only difference between json & jsonb is their storage:

  • json is stored in its plain text format, while
  • jsonb is stored in some binary representation

There are 3 major consequences of this:

  • jsonb usually takes more disk space to store than json (sometimes not)
  • jsonb takes more time to build from its input representation than json
  • json operations take significantly more time than jsonb (& parsing also needs to be done each time you do some operation at a json typed value)

When jsonb will be available with a stable release, there will be two major use cases, when you can easily select between them:

  1. If you only work with the JSON representation in your application, PostgreSQL is only used to store & retrieve this representation, you should use json.
  2. If you do a lot of operations on the JSON value in PostgreSQL, or use indexing on some JSON field, you should use jsonb.
Galiot answered 7/4, 2014 at 11:6 Comment(8)
hi, since it has binary representation, why jsonb doesn't support this? UPDATE test SET data->'a' = 123 WHERE id = 1; from CREATE TABLE test(id SERIAL PRIMARY KEY, data JSONB);Celesta
Kokizzu, it's possible in 9.5. wiki.postgresql.org/wiki/…Threewheeler
Just to add, one of the reasons you might also use json over jsonb is if for legacy reasons your code consuming your json is dependent on the ordering of the json fields and they can't be reordered.Holozoic
what about text vs json in this case? any significant advantage of using json, compared to the performance drawbacks?Perianth
As, for the legacy reasons: in JSON, there is no semantical difference, if an object's (table, map, hash, whatever it is call in the host language) key-value pairs are ordered differently. If you rely on that, you are actually using something different than JSON. -- For text vs. json: the latter comes with JSON validation, so upon invalid JSON, it will only fail upon inserting, instead of every time your application reads it (because it gets an invalid representation). Also, you can safely cast the latter to jsonb within the database.Galiot
This is a great post on explaining the implementation details for JSONB (pgeoghegan.blogspot.com/2014/03/what-i-think-of-jsonb.html)Corset
@Perianth It is an advantage in sense of having a dedicated query syntax so you can filter the records and avoid parsing on yourself. When you use a simple text column, you have to load all objects into the memory so you can parse it (or you could write a procedure).Neoclassicism
Side note: SQLite says their jsonb is usually 5-10% smaller than json. Different encoding from PostgreSQL.Dirty
G
189

Peeyush:

The short answer is:

  • If you are doing a lot of JSON manipulation inside PostgreSQL, such as sorting, slicing, splicing, etc., you should use JSONB for speed reasons.
  • If you need indexed lookups for arbitrary key searches on JSON, then you should use JSONB.
  • If you are doing neither of the above, you should probably use JSON.
  • If you need to preserve key ordering, whitespace, and duplicate keys, you should use JSON.

For a longer answer, you'll need to wait for me to do a full "HowTo" writeup closer to the 9.4 release.

Gossamer answered 26/3, 2014 at 17:47 Comment(0)
T
131

A simple explanation of the difference between json and jsonb (original image by PostgresProfessional):

SELECT '{"c":0,   "a":2,"a":1}'::json, '{"c":0,   "a":2,"a":1}'::jsonb;

          json          |        jsonb 
------------------------+--------------------- 
 {"c":0,   "a":2,"a":1} | {"a": 1, "c": 0} 
(1 row)
  • json: textual storage «as is»
  • jsonb: no whitespaces
  • jsonb: no duplicate keys, last key wins
  • jsonb: keys are sorted

More in speech video and slide show presentation by jsonb developers. Also they introduced JsQuery, a pg.extension that provides the powerful jsonb query language.

Threewheeler answered 16/11, 2015 at 9:8 Comment(0)
T
67
  • hstore is more of a "wide column" storage type, it is a flat (non-nested) dictionary of key-value pairs, always stored in a reasonably efficient binary format (a hash table, hence the name).
  • json stores JSON documents as text, performing validation when the documents are stored, and parsing them on output if needed (i.e. accessing individual fields); it should support the entire JSON spec. Since the entire JSON text is stored, its formatting is preserved.
  • jsonb takes shortcuts for performance reasons: JSON data is parsed on input and stored in binary format, key orderings in dictionaries are not maintained, and neither are duplicate keys. Accessing individual elements in the JSONB field is fast as it doesn't require parsing the JSON text all the time. On output, JSON data is reconstructed and initial formatting is lost.

IMO, there is no significant reason for not using jsonb once it is available, if you are working with machine-readable data.

Themistocles answered 4/4, 2014 at 11:19 Comment(0)
L
62

JSONB is a "better" version of JSON.

Let's look at an example:

SELECT '{"c":0,   "a":2,"a":1}'::json, '{"c":0,   "a":2,"a":1}'::jsonb;
          json          |        jsonb 
------------------------+--------------------- 
 {"c":0,   "a":2,"a":1} | {"a": 1, "c": 0} 
(1 row)
  1. JSON stores white space, and that is why we can see spaces when key "a" is stored, while JSONB does not.
  2. JSON stores all the values of a key. This is the reason you can see multiple values (2 and 1) against the key "a" , while JSONB only "stores" the last value.
  3. JSON maintains the order in which elements are inserted, while JSONB maintains the "sorted" order.
  4. JSONB objects are stored as a decompressed binary as opposed to "raw data" in JSON, where no reparsing of data is required during retrieval.
  5. JSONB also supports indexing, which can be a significant advantage.

In general, one should prefer JSONB, unless there are specialized needs, such as legacy assumptions about ordering of object keys.

Links answered 14/4, 2018 at 16:18 Comment(0)
E
35

Regarding the differences between json and jsonb datatypes, it worth mentioning the official explanation:

PostgreSQL offers two types for storing JSON data: json and jsonb. To implement efficient query mechanisms for these data types, PostgreSQL also provides the jsonpath data type described in Section 8.14.6.

The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

PostgreSQL allows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, characters that can be represented in the database encoding but not in UTF8 will be allowed.

Source: https://www.postgresql.org/docs/current/datatype-json.html

Exigent answered 5/2, 2020 at 17:45 Comment(0)
P
21

I was at the PostgresOpen today, and benchmarks are way faster than MongoDB. I believe it was around 500% faster for selects. Pretty much everything was faster, at least by at 200% when contrasted with MongoDB. Then one exception right now is an update which requires completely rewriting the entire JSON column - something MongoDB handles better.

The gin indexing on JSONB sounds amazing.

Also PostgreSQL will persist types of JSONB internally and basically match this with types such as numeric, text, boolean, etc.

Joins will also be possible using JSONB.

Add PLv8 for stored procedures and this will basically be a dream come true for Node.js developers.

Being it's stored as binary, JSONB will also strip all white space, change the ordering of properties and remove duplicate properties using the last occurrence of the property.

Besides the index when querying against a JSONB column contrasted to a JSON column PostgreSQL doesn't have to actually run the functionality to convert the text to JSON on every row which will likely save a vast amount of time alone.

Phyllode answered 18/9, 2014 at 2:31 Comment(0)
T
10

Another important difference, that wasn't mentioned in any answer above, is that there is no equality operator for json type, but there is one for jsonb.

This means that you can't use DISTINCT keyword when selecting this json-type and/or other fields from a table (you can use DISTINCT ON instead, but it's not always possible because of cases like this).

Toomey answered 19/6, 2018 at 2:13 Comment(0)
U
8

As far as I can tell,

  • hstore as it currently exists (in PostgreSQL 9.3) does not allow for nesting other objects and arrays as the values of its key/value pairs. However, a future hstore patch will allow for nesting. This patch will not be in the 9.4 release and may not be included any time soon.

  • json as it currently exists does allow for nesting, but it is text-based and does not allow for indexing, thus it is "slow"

  • jsonb that will be released with 9.4 will have the current nesting capabilities of json, as well as the GIN/GIST indexing of hstore, so it will be fast

People working on PostgreSQL 9.4 seem to be saying that the new, fast jsonb type will appeal to people who would have chosen to use a NoSQL data store like MongoDB, but it can now combine a relational database with query-able unstructured data under one roof

Why HStore2/jsonb is the most important patch of 9.4

Benchmarks of PostgreSQL 9.4 jsonb seem to be on par with or in some cases faster than MongoDB.

http://texture.io/alphabetum/postgresql-incl-hstore-vs-mongodb

Unpaid answered 22/4, 2014 at 0:3 Comment(1)
The last link is broken: "DisallowedHost at /alphabetum/postgresql-incl-hstore-vs-mongodb"Methionine

© 2022 - 2024 — McMap. All rights reserved.