Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?
Asked Answered
G

5

140

In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been added. It will obviously be a great change in MySQL. They listed some benefits

Document Validation - Only valid JSON documents can be stored in a JSON column, so you get automatic validation of your data.

Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored in an optimized binary format that allows for quicker access to object members and array elements.

Performance - Improve your query performance by creating indexes on values within the JSON columns. This can be achieved with “functional indexes” on virtual columns.

Convenience - The additional inline syntax for JSON columns makes it very natural to integrate Document queries within your SQL. For example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;

WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column. So MySQL is now flavored with NoSQL.

Now I can imagine a query for JSON data something like

SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN 
( 
SELECT JSON_EXTRACT(data,"$.inverted") 
FROM t1 | {"series": 3, "inverted": 8} 
WHERE JSON_EXTRACT(data,"$.inverted")<4 );

So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.

Gormless answered 11/11, 2015 at 22:14 Comment(9)
oh please don't say what I think you are saying. Here, read this. Yours is yet another variant on a bad idea.Moazami
@Moazami You gave a big answer. But its not my question. I just want to know that if we write a query for json data then we may skip sql rules . beacuse we dont need many tablesGormless
you said Now it is possible to store more complex data in column. Be carefulMoazami
yea I think it only for json table. They create many function and feature for retrieve data from jsonGormless
I think it is great. Be careful turning mysql into a document database. We know what is better for that :)Moazami
Json data type support index and It has smart size : 64K & 4G . So what the problem if I want store 2000 data and add 5 nested label instead of 5 table with relation ?Gormless
I guess time will tell. If and when we find that it truly can be nosql gibberish and the schema-less (formerly schema) peeps find performance is a dog. We will see. Turning mysql into hbase :)Moazami
"I really want to know more about this feature." and "Pros and cons of MySQL JSON data type." are not questions, and if rephrased as questions are too broad. "So I never think of a complex schema structure and foreign keys in MySQL. I store complex relations using only a few tables." is self-contradictory since JSON is not relations & FKs. An explanation of "is this good" is just an introduction to the relational model, so again this is too broad. Work through some examples, make your own list of pros & cons with references, and ask where you went wrong.Schaller
@NeilLunn Tag nosql seems reasonable here. Even though the question most specifically involves primitive type vs JSON type within MySQL, even the question itself acknowledges that the core issue is relational vs non-relational representation, which it characterizes in terms of SQL vs NoSQL. A correct answer must address the special properties of the relational model vs others.Schaller
B
84
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...

Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.

The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.

By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.

MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.

ALTER TABLE t1
  ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
  ADD INDEX (series);

Then if you query the virtual column, it can use the index and avoid the table-scan.

SELECT * FROM t1
WHERE series IN ...

This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.

But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:

SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>

I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.

When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.

I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.

There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.

MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.

You should choose to use JSON when JSON makes your queries more efficient.

Don't choose a technology just because it's new, or for the sake of fashion.


Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))

SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...

Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.

Bumbailiff answered 21/8, 2018 at 18:45 Comment(8)
Well worth following the link to the slidesKubiak
Good point the 2 technologies are both good in their own means we decide which will fit our needs and what gives us more advantage in terms of security and performance.Wolfson
The crux of the problem is that ALTER TABLE is still needed to make use of an index on a generated column for every new key in the JSON. Glad to see it being pointed out.Nodus
Only if you need to add a virtual column and/or an index. If you treat the JSON data as a "black box" and don't try to do any queries that search or sort on sub-fields within JSON, then you don't need to do that. That's why I recommend to avoid referencing JSON in JOIN, WHERE or other clauses. Just fetch the JSON column in the select-list.Bumbailiff
The link to the slides is broken, @BillKarwin.Iridissa
The link to the slides is not broken.Bumbailiff
@BillKarwin re the last point on your post, on MySQL 5.7.33, I can confirm this now works as per the official instructions! (see the bottom of the page) :)Cretonne
@PaulBenn, That's great to hear! I can understand why it might be quite tricky to implement, but it's a shame it took them six years since the MySQL 5.7 GA to do it.Bumbailiff
M
46

The following from MySQL 5.7 brings sexy back with JSON sounds good to me:

Using the JSON Data Type in MySQL comes with two advantages over storing JSON strings in a text field:

Data validation. JSON documents will be automatically validated and invalid documents will produce an error. Improved internal storage format. The JSON data is converted to a format that allows quick read access to the data in a structured format. The server is able to lookup subobjects or nested values by key or index, allowing added flexibility and performance.

...

Specialised flavours of NoSQL stores (Document DBs, Key-value stores and Graph DBs) are probably better options for their specific use cases, but the addition of this datatype might allow you to reduce complexity of your technology stack. The price is coupling to MySQL (or compatible) databases. But that is a non-issue for many users.

Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:

  1. Mysql with JSON datatypes
  2. Mysql without

The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.

Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.

Moazami answered 11/11, 2015 at 23:11 Comment(7)
One con; JSON data type is not supported by Mysql Memory tables, like the data types, TEXT & BLOB. This means if a temporary table is required, it will create a disk based table not memory. Some cases when a temporary table is used outlined here: dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.htmlZygophyllaceous
@raizmedia Could you please elaborate on why a disk based table is a problem vs memory (based table I guess) ?Insomnolence
@Insomnolence Probably due to speed limitations.Bellda
@LittleHelper you can avoid it if you use PCI 4x 40 Gb/s M.2 slot and insert 40 Gb/s supported drive. This works as fast as memmory. You can alos apply a special format to that drive that is used to format memmory.Rosettarosette
@SergeyRomanov, [citation required] Have you benchmarked that drive versus RAM?Bumbailiff
@BillKarwin it is obviouse that it wil work clower because of BUS to CPU is different. But it will work trity much fast.Rosettarosette
@SergeyRomanov, It's also important to think of latency as a constraint on performance, instead of only transfer speed. Check out colin-scott.github.io/personal_website/research/… One RAM access is 100 nanoseconds. One random SSD access is 16,000 nanoseconds.Bumbailiff
A
12

From my experience, JSON implementation at least in MySQL 5.7 is not very useful due to its poor performance. Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySQL that with Python or PHP. Let's imagine very simple JSON:

{ "name": "value" }

Let's suppose we have to convert it to something like that:

{ "name": "value", "newName": "value" }

You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will be able to use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySQL will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.

I have 40 millions rows table, and Python script updates it in 3-4 hours.

Now we have MySQL JSON, so we don't need Python or PHP anymore, we can do something like that:

UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))

It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications cannot modify the table data in parallel.

So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It makes no sense.

About reading data, from my experience, direct access to JSON field via JSON_EXTRACT in WHERE is also extremely slow (much slower that TEXT with LIKE on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i.e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.

Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySQL side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)

Ashcan answered 30/3, 2018 at 15:12 Comment(0)
U
11

I got into this problem recently, and I sum up the following experiences:

1, There isn't a way to solve all questions. 2, You should use the JSON properly.

One case:

I have a table named: CustomField, and it must two columns: name, fields. name is a localized string, it content should like:

{
  "en":"this is English name",
  "zh":"this is Chinese name"
   ...(other languages)
}

And fields should be like this:

[
  {
    "filed1":"value",
    "filed2":"value"
    ...
  },
  {
    "filed1":"value",
    "filed2":"value"
    ...
  }
  ...
]

As you can see, both the name and the fields can be saved as JSON, and it works!

However, if I use the name to search this table very frequently, what should I do? Use the JSON_CONTAINS,JSON_EXTRACT...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName.

From the above case, I think you should keep these ideas in mind:

  1. Why MYSQL support JSON?
  2. Why you want to use JSON? Did your business logic just need this? Or there is something else?
  3. Never be lazy

Thanks

Unknot answered 1/12, 2016 at 2:12 Comment(1)
You might be interested in using a VIRTUAL column. percona.com/blog/2016/03/07/…Bocock
Y
7

Strongly disagree with some of the things said in other answers (which, to be fair, was a few years ago).

We have very carefully started to adopt JSON fields with a healthy skepticism. Over time, we've been adding this more.

This generally describes the situation we are in:

  • Like 99% of applications out there, we are not doing things at a massive scale. We work with many different applications and databases, the majority of these are capable of running on modest hardware.
  • We have processes and know-how in place to make changes if performance does become a problem.
  • We have a general idea of which tables are going to be large and think carefully about how we optimize queries for them.
  • We also know in which cases this is not really needed.
  • We're pretty good at data validation and static typing at the application layer.

Lastly,

When we use JSON for storing complex data, that data is never referenced directly by other tables. We also tend to never need to use them in where clauses in hot paths.

So with all this in mind, using a little JSON field instead of 1 or more tables vastly reduces the complexity of queries and data model. Removing this complexity makes it easier to write certain queries, makes our code simpler and just generally saves time.

Complexity and performance is something that needs to be carefully balanced. JSON fields should not be blindly applied, but for the cases where this works, it's fantastic.

'JSON fields don't perform well' is a valid reason to not use JSON fields, if you are at a place where that performance difference matters.

One specific example is that we have a table where we store settings for video transcoding. The settings table has 1 'profile' per row, and the settings themselves have a maximum nesting level of 4 (arrays and objects).

Despite this being a large database overall, there are only a few hundreds of these records in the database. Suggesting to split this into 5 tables would yield no benefit and lots of pain.

This is an extreme example, but we have plenty of others (with more rows) where the decision to use JSON fields is a few years in the past, and hasn't yet caused an issue.

Last point: it is now possible to directly index on JSON fields.

Yogini answered 14/2, 2021 at 7:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.