Deterministic function in MySQL
Asked Answered
M

8

41

I got confused with a seemingly simple concept. MySQL defines deterministic function as a function that:

always produces the same result for the same input parameters

So in my understanding, functions like:

CREATE FUNCTION foo (val INT) READS SQL DATA
BEGIN
   DECLARE retval INT;
   SET retval = (SELECT COUNT(*) FROM table_1 WHERE field_1 = val);
   RETURN retval;
END;

are not deterministic (there is no guarantee that delete/update/insert does not happen between 2 calls to the function). At the same time, I saw many functions which do pretty much the same, i.e. return value based on result of queries, and declared as DETERMINISTIC. It looks like I'm missing something very basic.

Could anyone clarify this issue?

Thanks.

Update Thanks for those who answered(+1); so far it looks like there is a widespread misuse of DETERMINISTIC keyword. It is still hard to believe for me that so many people do it, so I'll wait a bit for other answers.

Macfarlane answered 30/10, 2011 at 17:27 Comment(7)
Could you state an example of the "many functions" you're talking about?Pleurodynia
@Mat: For instance, dev.mysql.com/doc/refman/5.0/en/create-procedure.html (in user comments section, search for "CREATE FUNCTION db.fnfullname" - it reads row from db for a passed id and returns value based on the result of select).Macfarlane
"... user comments ..." ... Some real mysql function you have in mind?Pleurodynia
I like this question. I don't see where in the MySQL docs is stated that a function cannot be DETERMINISTIC and READS SQL DATA at the same time (and what that would mean for the function).Hagiographer
@Mat: I agree with you that "user comments" section is not very reliable source. My first thought was that I should not take it too seriously. So I googled before asking the question and found many similar implementations (another example, databasejournal.com/features/mysql/article.php/3569846/… , "Accessing tables in stored functions"). Then I realized that it's either a common mistake or I'm missing something obvious...Macfarlane
You’re not missing anything, this is indeed a very common misunderstandingNexus
The confusion comes from "if it always produces the same result" in the official docs, which some people interpreted as "returns the same result set". If you study the actual runtime behavior of MySQL it quickly becomes clear that what was meant here is "produces the same result in the stored data".Nexus
J
20

From the MySQL 5.0 Reference:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used. Prior to MySQL 5.0.44, the DETERMINISTIC characteristic is accepted, but not used by the optimizer.

So there you have it, you can tag a stored routine as DETERMINISTIC even if it is not, but it might lead to unexpected results or performance problems.

Jehias answered 30/10, 2011 at 17:50 Comment(3)
So we might always have a 50% chance of picking the wrong one? Great. lol Your comment helped me understand it a bit more.Outfoot
In a "Nested set model" eg. mikehillyer.com/articles/managing-hierarchical-data-in-mysql I'm using routines to insert and delete nodes. 1) Can we consider that updating all nodes boundaries (adding 2 to the actual boundaries) at the right (after) the insertion point (selected node as reference) is DETERMINISTIC ? 2) Therefore destroying a node too ? 3) Moving a node to the left of it's actual position ? 4) Moving a node the right of it's actual position ?Gudrin
Why is that a feature? Maybe faster compilation if a function is truly deterministic like 3 + x = output?Freeze
G
17

DETERMINISTIC results does not refer to different results sets being returned at different times (depending on what data has been added in the mean time). Moreover it is a reference to the result sets on different machines using the same data. If for example, you have 2 machines which run a function including uuid() or referencing server variables then these should be considered NOT DETERMINISTIC. This is useful for example in replication because the function calls are stored in the binary log (master) and then also executed by the slave. For details and examples see http://dev.mysql.com/doc/refman/5.0/en/stored-programs-logging.html

The use of DETERMINISTIC is thus (99% of the time) correct, not to be considered misuse.

Grimbal answered 18/6, 2012 at 18:45 Comment(3)
From mysql : 'A routine is considered “deterministic” if it always produces the same result for the same input parameters, and “not deterministic” otherwise.' Definitions of 'deterministic function' from other sources also have "always returns the same result for same parameters". Using certain DML (at least SELECT) is a perfectly valid operation inside the body of function. If you say it doesn't refer to different result sets, then all definitions are wrong since 'always' implies all possible valid statements within function body.Macfarlane
"produces the same result" means produces the same result in the database. It does not mean returns the same result set. Why would MySQL care if the result set was different each time or not? It does care however if the resulting stored data is different - see my answer for more details.Nexus
I appreciate the answers by Jon Gilbert and bikeman868... I wish there was some official source would back them up. This dba.stackexchange.com answer by someone with 23k rep gives the opposite answer.Tramway
N
15

I think that your routine is deterministic. The documentation is not very clear and this has led to many people being very confused about this issue, which is actually more about replication than anything else.

Consider a situation where you have replication set up between two databases. The master database keeps a log of all the stored routines that were executed including their input parameters, and ships this log to the the slave. The slave executes the same stored routines in the same order with the same input parameters. Will the slave database now contain identical data to the master database? If the stored routines create GUIDs and store these in the database then no, the master and slave databases will be different and replication will be broken.

The main purpose of the DETERMINISTIC flag is to tell MySQL whether including calls to this stored routine in the replication log will result in differences between the master database and the replicated slaves, and is therefore unsafe.

When deciding if the DETERMINISTIC flag is appropriate for a stored routine think of it like this: If I start with two identical databases and I execute my routine on both databases with the same input parameters will my databases still be identical? If they are then my routine is deterministic.

If you declare your routine is deterministic when it is not, then replicas of your main database might not be identical to the original because MySQL will only add the procedure call to the replication log, and executing the procedure on the slave does not produce identical results.

If your routine is non-deterministic then MySQL must include the affected rows in the replication log instead. If you declare your routine as non-deterministic when it is not this will not break anything, but the replication log will contain all of the affected rows when just the procedure call would have been enough and this could impact performance.

Nexus answered 15/4, 2016 at 22:19 Comment(1)
Thank you. The example really helps me understand the behavior.Husband
W
3

You're not missing anything. This function is non-deterministic. Declaring it deterministic won't cause your database to melt but it might affect performance. From the MySQL site: "Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices." But MySQL does not enforce or check if your declared deterministic routine is actually deterministic---MySQL trusts that you know what you are doing.

Washout answered 30/10, 2011 at 17:50 Comment(1)
This routine is deterministic because if you execute this routine on two identical databases the results will always be identical.Nexus
R
2

Deterministic is important if you have replication turned on or may use it one day. A non-deterministic function call that causes a row change (update or insert) for instance will need to be replicated using binary (row-based) where as a deterministic function can be replicated statement based. This becomes interesting when looking at your SQL examples above, which ones will happen the same (give the same result) when replicated using statement based, and which should be replicated using the result obtained in the master (row-based). If the statements are executed with the appropriate locking and can be guaranteed to execute in the same order on the Slave then they are indeed deterministic. If the locking / statement order that the Slave uses (no concurrency, serial processing of statements in the order they are started) means the answer can be different, then the function should be non-deterministic.

Remembrance answered 21/8, 2015 at 5:34 Comment(0)
B
2

I would like to add an example of why it can have a huge impact on performance, to declare a function for DETERMINISTIC or not:

SELECT id1, foo(id1) f1
FROM table1
ORDER BY id1
LIMIT 1;

Using MariaDB 10.6:

If foo() is declared DETERMINISTIC, then foo() is only called once for the single returned row.

If foo() is NOT declared DETERMINISTIC, then foo() is called once for every row in table1, before returning the single row. This can take ages!!

So I recommend to using "bikeman868" and "C.Poh" definitions, to decide whether a function is DETERMINISTIC or not:

A deterministic function always return the same result given the same input parameters in the same state of the database. Eg POW,SUBSTR(),UCASE().

A non deterministic function does not necessarily always return the same result given the same input parameters in the same state of the database. Eg CURDATE(), RAND(), UUID().

In addition to this, you can also consider in what way the result of the function is used; is it saved in the database or only extracted for external use.

Barnum answered 17/8, 2022 at 16:30 Comment(0)
A
0

I was looking through the answers and decide to contribute a more compact and updated answer.

A deterministic function always return the same result given the same input parameters in the same state of the database. Eg POW,SUBSTR(),UCASE().

A non deterministic function does not necessarily always return the same result given the same input parameters in the same state of the database. Eg CURDATE(), RAND(), UUID().

MySQL 8.0 Reference Manual have some update on this

8.2.1.20 Function Call Optimization

MySQL functions are tagged internally as deterministic or nondeterministic. A function is nondeterministic if, given fixed values for its arguments, it can return different results for different invocations. Examples of nondeterministic functions: RAND(), UUID().If a function is tagged nondeterministic, a reference to it in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join).MySQL also determines when to evaluate functions based on types of arguments, whether the arguments are table columns or constant values. A deterministic function that takes a table column as argument must be evaluated whenever that column changes value. Nondeterministic functions may affect query performance. For example, some optimizations may not be available, or more locking might be required. The following discussion uses RAND() but applies to other nondeterministic functions as well.

This code example from MySQL 8.0 Reference Manual. You can create the table then populate the data with 49 rows like id column 1 to 49 and col_a some strings which are unique like "AA","AB","AC" until 49 rows. You can actually do 15 rows but you need to change the 49 to 15 that's more of topic of the random function.

CREATE TABLE t (id INT NOT NULL PRIMARY KEY, col_a VARCHAR(100));

SELECT * FROM t WHERE id = POW(1,2);
SELECT * FROM t WHERE id = FLOOR(1 + RAND() * 49);

The code will help illustrate the point, the MySQL 8.0 Reference manual is trying to make. Hopefully this helps thanks !

Addam answered 29/6, 2021 at 4:10 Comment(0)
P
0

The doc says below:

A routine is considered "deterministic" if it always produces the same result for the same input parameters, and "not deterministic" otherwise.

For example, test() function below is deterministic because it always returns the same @num value for the same v1 and v2 values. For example, when v1 and v2 are 2 and 3 respectively, @num is always 5 so you should set DETERMINISTIC to test() function as shown below. The doc gives you more examples but it doesn't explain them much:

DELIMITER $$

CREATE FUNCTION test(v1 INT, v2 INT) RETURNS INT
DETERMINISTIC -- Here
BEGIN
SET @num = v1 + v2;
RETURN @num;
END$$

DELIMITER ;

And, test() function below is not deterministic because RAND() built-in function returns a randum number so you should set NOT DETERMINISTIC to test() function as shown below. *My answer explains how to enable to use NOT DETERMINISTIC which you cannot use by default getting error and NOW() and UUID() built-in functions are also not deterministic and without DETERMINISTIC and NOT DETERMINISTIC is also recognized as NOT DETERMINISTIC by default:

DELIMITER $$

CREATE FUNCTION test() RETURNS FLOAT
NOT DETERMINISTIC -- Here
BEGIN
RETURN RAND();
END$$

DELIMITER ;

Lastly, you should properly set DETERMINISTIC or NOT DETERMINISTIC to a function otherwise there are some problems according to the doc below:

However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

Precis answered 23/11, 2023 at 16:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.