Does putting integrity constraints decrease performance?
Asked Answered
P

5

8

In a discussion with a friend, I got to hear two things -

  1. Using constraints causes slight decrease in performance. eg. Consider a uniqueness constraint. Before insertion, DBMS would have to check for the uniqueness in all of existing data, thus causing extra computation.

  2. He suggested to make sure that these constraints are handled at the application level logic itself. eg. Delete rows from both table yourself properly, instead of putting foreign integrity constraint etc.

First one sounds a little logical to me, but the second one seems pretty wrong intuitively. I don't have enough experience in DBMS to really judge these claims though.

Q. Is the claim 1 correct ? If so, is claim 2 even the right way to handle such scenarios ?

Pameliapamelina answered 1/10, 2016 at 7:21 Comment(4)
Constraints can not only decrease performance, they can also increase it, depending on the constraint & what you want performed. They affect performance.Cruce
"Before insertion, DBMS would have to check for the uniqueness in all of existing data, thus causing extra computation." That's not even remotely true. See, for example, this article about b-tree indexes. In any case, if uniqueness is a requirement, relevant computation is necessary; none of it is extra.Speak
"Using constraints causes slight decrease in performance." Not really. Using constraints slows down some database operations, and speeds up other database operations. For example, a unique constraint on a nonprime column slows down inserts and deletes, but speeds up selects that use that column in a WHERE clause. And slows down doesn't necessarily mean slow, because slow is application-dependent. In my case, I don't really care if my insert takes 0.36 ms or 0.58 ms.Speak
it's not the constraint that speeds up anything. But some constraints come along together with an index, and that index may speed it up. If a constraint is mandatory, implement it on database level. This does not mean to omit all checks in the user interface, since that's the place to handle incorrect data in the most transparent way for the user.Parity
A
19

TL;DR

If your data needs to be correct, you need to enforce the constraints, and if you need to enforce the constraints, letting the database do it for you will be faster than anything else (and likely more correct too).

Example

Attempting to enforce something like key uniqueness at the application-level can be done correctly or quickly, but not both. For example, let's say you want to insert a new row. A naive application-level algorithm could look something like this:

  1. Search the table for the (key fields of) new row.
  2. If not found, insert the new row.

And that would actually work in a single-client / single-threaded environment. However, in a concurrent environment, some other client could write that same key value in between your steps 1 and 2, and presto: you have yourself a duplicate in your data without even knowing it!

To prevent such a race condition, you'd have to use some form of locking, and since you are inserting a new row, there is no row to lock yet - you'll likely end-up locking the entire table, destroying scalability in the process.

OTOH, if you let the DBMS do it for you, it can do it in a special way without too much locking, which has been tested and double-tested for correctness in all the tricky concurrent edge cases, and whose performance has been optimized over the time the DBMS has been on the market.

Similar concerns exist for foreign keys as well.


So yeah, if your application is the only one accessing the database (e.g. when using an embedded database), you may get away with application-level enforcement, although why would you if the DBMS can do it for you?

But in a concurrent environment, leave keys and foreign keys to the database - you'll have plenty of work anyway, enforcing your custom "business logic" (that is not directly "declarable" in the DBMS) in a way that is both correct and performant...

That being said, feel free to perform any application-level "pre-checks" that benefit your user experience. But do them in addition to database-level constraints, not instead of them.

Apetalous answered 5/10, 2016 at 8:41 Comment(1)
Yes, it's important to note that an application cannot guarantee integrity in the database without extensive locking.Wilkes
S
3

Claim 1 is correct, claim 2 is incorrect, just like you concluded.

Database's job is to handle the data and its integrity. App's job is to ask the database about the data and then perform work with that data.

If you handle #2 trough the application:

  • you have to handle concurrency - what happens when there's more than 1 connection active to the db? You need to lock tables to perform operations ensuring uniqueness or integrity. Since this connection can break at any time, you've got a huge problem at your hands. How to unlock tables when the process that locked it died?

  • you can't do a better job from the app than the database can on its own. You still need to check the rows for uniqueness, meaning that you need to retrieve all the data, perform the check on the whole dataset and then write it. You can't do anything better or faster than database can - by definition, it will be slower since you need to transfer the data from db to your app

  • databases are made with concurrency in mind. Creating optimizations using logic of your friend is what leads to unstable apps, duplicate data, unresponsive databases etc. Never do that. Let the db do its job, it's made for such purposes.

When checking for uniqueness, MySQL utilizes indexes which is a data structure made for fast access. The speed at which MySQL performs uniqueness check is incomparable in performance compared to what any app can do - it's simply going to do the work faster. If you need unique data, you need to ensure that you have unique data - this is a workload that can't be avoided and people that develop databases are using proven algorithms designed for speed. It works at optimum speed already.

As for integrity - the same, MySQL (or any other RDBMS) is made to handle such scenarios. If foreign key constraints would be better if implemented in app logic, then we'd never have FK's available to us in the first place. Like I mentioned before - the database's job is to take care of that.

ACID for relational databases isn't there for no reason. Atomicity, Consistency, Isolation, Durability MySQL's InnoDB implements and allows for those, if you need it - then you use it. There's no app in any language that anyone can create which performs better in any way compared to MySQL's internal handling of those.

TL;DR: you are correct in your thinking.

Semaphore answered 1/10, 2016 at 7:55 Comment(8)
Claim 1 is not correct. Constraints do not only decrease performance, they also increase it, depending on the constraint & what you want performed. They affect performance.Cruce
@Cruce it is correct. When we look at the bigger picture, then we can claim what you wrote because performance is never measured just by one factor. However, to check for uniqueness, extra work has to be done. Ergo, it decreases performance from that point of view.Semaphore
The claim is "Using constraints causes slight decrease in performance." This is not so. You are confusing it with the claim "Using constraints sometimes causes slight decrease in performance." Which was not what the OP wrote. (And their example isn't even an example of either, see the comment by Mike Sherrill 'Cat Recall' on the question.)Cruce
@Cruce When would a constraint improve performance? The checking of a constraint requires data to be read and processed, that's always going to cost SOMETHING. I suppose you could say that a uniqueness constraint might improve performance in the sense that if there is only one matching record in a query instead of a hundred, the query will be faster. The query optimizer may know that there can only be one matching record and take this into account when devising the query plan. Is that the sort of thing you're thinking of ... or what?Lillie
@Lillie Foreign keys can improve the performance of inner joins under specific conditions. For example: SELECT A.* FROM A JOIN B ON <condition> doesn't have to query B at all if there is a FK in A that matches the <condition> (note that we are selecting only A's fields). That's because FK tells the DBMS that the corresponding row from B must exist, so there is no need to actually check its existence. Not all DBMSes do this optimization, but some very important do, and this can be a boon for performance when views join other views (etc..), making the "manual" optimization non-obvious.Apetalous
@BrankoDimitrijevic Well, okay. You're saying that the query optimizer might be smart enough to eliminate a join that is superfluous. I suppose by the same reasoning if you had a "not null" constraint and then had a query "select * from foo where bar is null", a smart enough query optimizer could instantly return an empty set without looking at the DB at all. But those sort of things seem like a trivial case. If the programmer knew what he was doing he never would have written such a query in the first place.Lillie
@Lillie Real systems sometimes rely on many layers of views, where this particular optimization that I mentioned would not have been obvious.Apetalous
@BrankoDimitrijevic Ok, fair enough. I suppose if we have elements of a query scattered across views, user-defined functions, whatever, etc. Or if the query is very complex, things that would be obvious in isolation may be hidden in the 37 joins and 82 inner queries.Lillie
L
3

Yes, it's true that checking a constraint is going to take time and slow down database updates.

But it's not at all clear how moving this logic to the application will result in a net performance improvement. Now you have at least two separate trips to the database: one to check the constraint and another to perform the update. Every trip to the database costs: It takes time to make a connection, it takes time for the database engine to parse the query and construct a query plan, it takes time to send results back. As the database engine doesn't know what you're doing or why, it can't optimize. In practice, one "big visit" is almost always cheaper than two "small visits" that accomplish the same thing.

I'm speaking here mostly of uniqueness constraints and relational integrity constraints. If you have a constraint that can be tested without visiting the database, like a range limit on an individual field, it would be faster to do that in the application. Maybe still not a good idea for a variety of reasons, but it would be faster.

Lillie answered 5/10, 2016 at 18:41 Comment(2)
Regarding "Maybe still not a good idea for a variety of reasons" in the post, scenarios to consider include: people directly accessing/manipulating the DB; a new application (developped by other people) starting to connect to the DB.Russian
@Russian Yes, good point. If you put an integrity test in the database, then you are guaranteed that it is done every time any app updates the database. But if you put it in application code, you have to make sure that you call that code before every update. Forget one place out of a hundred and now you have a hole that bad data can sneak through.Lillie
R
2

Constraints do generally cause a slight decrease in performance. Nothing is free. There are, however, two important considerations:

  1. The performance hit is usually so slight that it is lost in the "noise" of the natural variability of a running system so it would take tests involving thousands or millions of test queries to determine the difference.
  2. One has to ask "Affects the performance where?" Constraints affect the performance of DML operations. But if the constraints were not there, then every query would have to perform additional testing to verify the accuracy of the data being read. I can assure you, this will be at a far greater performance hit than the constraints.

There are exceptions, of course, but most databases are queried a lot more often than modified. So if you can shift performance hits from queries to DML, you generally speed up the overall performance of the system.

Perform separate constraint checking at the app level by all means. It is a tremendous benefit to provide the user with feedback during the process of collecting data ("Delivery date cannot be in the past!") rather than waiting until the attempt to insert the data into the database fails.

But that doesn't mean remove them from the database. This redundancy is important. Can you absolutely guarantee that the only operations ever performed on the database will originate from the app? Absolutely not. There is too much normal maintenance activity going on outside the app to make that promise. Not to mention that there are generally more than one app so the guarantee must apply to each one. Too many loose ends.

When designing a database, data integrity is your number one priority. Never sacrifice that for the sake of performance, especially since performance of a well-designed database is not often an issue and even when it is, there are far too many ways to improve performance that does not involve removing constraints (or denormalizing, another mistake many still make in order to improve the performance of an OLTP system).

Rafflesia answered 3/10, 2016 at 5:59 Comment(4)
You too are missing the point that constraints allow optimization, so the statement "Constraints do generally cause a slight decrease in performance." is just wrong.Cruce
Context. Read the entire post.Rafflesia
You are missing my point, which is that the first sentence is wrong, regardless of what else the post consists of, which by the way does not contradict the misconception of the first sentence.Cruce
What on Earth are you on about? In reading your last comment to the answer from N.B., you say "Foreign keys can improve the performance of inner joins under specific conditions." That's true; no one argues otherwise. But your own hedges "can improve" and "under specific conditions" show that you understand. A "not null" constraint may cost very little to an INSERT operation, a FK can cost more as it must perform I/O. But these constraints allow for simpler (and faster) queries. This is all I said.Rafflesia
T
1

Q. Is the claim 1 correct ?

Yes. In my experience, using constraints can cause a massive decrease in performance. The performance impact is relative to the amount of constraints and records in the tables. As table records grow, the performance is impacted and DB performance can move from great to bad fast.

For example. In one auditing company I worked for, part of the process was to serialize an excel matrix containing a large number of responsibilities/roles/functions into a set of tables which had many FK constraints. Initially the performance was fine, but within 6 months to a year this serialization process took a few minutes to complete. We optimised as much as we could with little affect. If we switched off the constraints, this process completed in a few seconds.

If so (if claim 1 is correct), is claim 2 even the right way to handle such scenarios ?

Yes, but under certain circumstances.

  1. You have a large number of constraints
  2. You have a large number / ever growing records in your DB tables.
  3. The DB hardware provided is not able to be improved upon for whatever reason and you are experiencing performance problems.

So with the performance problem we had at the auditing company, we looked at moving the constraint checks into an application dataset. So in essence the dataset was used to check and validate the constraints and the matrix DB tables used simply for storage (and processing).

NOTE: This worked for us because the matrix data never changed once inserted and each matrix was independent of all other past inserted matrices.

Telltale answered 18/9, 2020 at 13:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.