mysql udf json_extract in where clause - how to improve performance
Asked Answered
F

2

1

How can I efficiently search json data in a mysql database?

I installed the extract_json udf from labs.mysql.com and played around with a test table with 2.750.000 entries.

CREATE TABLE `testdb`.`JSON_TEST_TABLE` (
   `AUTO_ID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `OP_ID` INT NULL,
   `JSON` LONGTEXT NULL,
PRIMARY KEY (`AUTO_ID`)) $$

An example JSON field would look like so:

{"ts": "2014-10-30 15:08:56 (9400.223725848107) ", "operation": "1846922"}

I found that putting json_extract into a select statement has virtually no performance impact. I.e. the following selects (almost) have the same performance:

SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10; 

However, as soon as I put a json_extract expression into the where clause the execution time increases by a factor of 10 or more (I went from 2,5 to 30 secs):

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;

At this point I am thinking that I need to extract all info that I want to search into separate columns at insert time, and that if I really have to search in the json data I need to first narrow down the number of rows to be searched by other criteria, but I would like to make sure that I am not missing anything obvious. E.g. can I somehow index the json fields? Or is my select statement inefficiently written?

Flexor answered 31/10, 2014 at 8:10 Comment(1)
I think if you do an EXPLAIN on your query, you will see that MySQL does a full table scan, simply because your query is on a term that is not indexed.Cosby
M
2

In fact during the execution of

SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;

json_extract() will be executed at most 10 times.

During this one

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;

json_extract() will be executed for each row and the result limited to 10 records, hence the speed loss. Indexing won't help either since the processing time is used up rather tby the external code than MySQL's. Imho, the best bet in this case would be an optimized UDF.

Mariande answered 28/6, 2017 at 19:31 Comment(1)
Thanks for the clarification but... this question is almost 3 years old! ;-) I ended up dumping the data into an elastic search instance... Problem solved.Flexor
P
1

You can try this: http://www.percona.com/blog/2015/02/17/indexing-json-documents-for-efficient-mysql-queries-over-json-data/

Flexviews materialized views for MySQL are used to extract data from the JSON using JSON_EXTRACT into another table, which can be indexed.

Phallus answered 2/3, 2015 at 9:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.