In a Data Warehouse scenario is there any disadvantage to using WITH(NOLOCK)
Asked Answered
F

7

8

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update rows (on new dates) and monthly and daily reporting processes. The fact tables are partitioned by the dates for easy rolloff of old data.

I understand the WITH(NOLOCK) can cause uncommitted data to be read, however, I also do not wish to create any locks which would cause the ETL processes to fail or block.

In all cases, when we are reading from the DW, we are reading from fact tables for a date which will not change (the fact tables are partitioned by date) and dimension tables which will not have attributes changing for the facts they are linked to.

So - are there any disadvantages? - perhaps in the execution plans or in the operation of such SELECT-only queries running in parallel off the same tables.

Fortis answered 18/6, 2009 at 20:22 Comment(1)
related. #20547Fantoccini
P
2

As long as it's all no-update data there's no harm, but I'd be surprised if there's much benefit either. I'd say it's worth a try. The worst that will happen is that you'll get incomplete and/or inconsistent data if you are in the middle of a batch insert, but you can decide if that invalidates anything useful.

Palaeozoic answered 18/6, 2009 at 20:24 Comment(4)
The fact rows we are reading will not change and the dimension rows will always be valid, but they may be expired and a new dimension creted for new facts.Fortis
Seems entirely straightfoward to me. I have only two questions. 1. Is there a problem with the way its running without making this change (i.e. is this possibly a premature optimization). 2. These are all read-only queries and you're relaxing their isolation levels. What bad thing (other than the obvious crappiness of the resuits, which you are evidently mitigating by your emphasis on appends and fact versioning) are you imagining?Palaeozoic
I'm not in control of the ETL, but I am responsible for all the reporting. I am not given access to sp_who, so I need to proactively make sure all my (significant) processing does not interfere with daily and monthly loads before the DBAs complain I'm blocking them.Fortis
Then I think the simple answer to your specific question is that using WITH(NOLOCK) will decrease and likely eliminate the likelihood that you will create any locks which would cause the ETL processes to fail or block; and there are no disadvantages created by this strategy. You have a bunch of suggestions for more exotic ways to accomplish approximately the same thing, but they generally involve global server setting alterations that you probably don't have access to and you dbas will be suspicious about.Palaeozoic
R
5

This is what you probably need:

`ALTER DATABASE AdventureWorks SET READ_COMMITTED_SNAPSHOT ON;

ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON; `

Then go ahead and use

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

in your queries. According to BOL:

The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:

If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared lock type determines when it will be released. Row locks are released before the next row is processed. Page locks are released when the next page is read, and table locks are released when the statement finishes.

If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.

Hope this help. Raj

Renfred answered 19/6, 2009 at 2:58 Comment(1)
I will consider it. There are other operations on the connection (i.e. my process configuration and status data, not the fact/dim star-modelled data) which I do not necessarily want to have behave that way.Fortis
P
2

As long as it's all no-update data there's no harm, but I'd be surprised if there's much benefit either. I'd say it's worth a try. The worst that will happen is that you'll get incomplete and/or inconsistent data if you are in the middle of a batch insert, but you can decide if that invalidates anything useful.

Palaeozoic answered 18/6, 2009 at 20:24 Comment(4)
The fact rows we are reading will not change and the dimension rows will always be valid, but they may be expired and a new dimension creted for new facts.Fortis
Seems entirely straightfoward to me. I have only two questions. 1. Is there a problem with the way its running without making this change (i.e. is this possibly a premature optimization). 2. These are all read-only queries and you're relaxing their isolation levels. What bad thing (other than the obvious crappiness of the resuits, which you are evidently mitigating by your emphasis on appends and fact versioning) are you imagining?Palaeozoic
I'm not in control of the ETL, but I am responsible for all the reporting. I am not given access to sp_who, so I need to proactively make sure all my (significant) processing does not interfere with daily and monthly loads before the DBAs complain I'm blocking them.Fortis
Then I think the simple answer to your specific question is that using WITH(NOLOCK) will decrease and likely eliminate the likelihood that you will create any locks which would cause the ETL processes to fail or block; and there are no disadvantages created by this strategy. You have a bunch of suggestions for more exotic ways to accomplish approximately the same thing, but they generally involve global server setting alterations that you probably don't have access to and you dbas will be suspicious about.Palaeozoic
C
2

Have you considered creating a DATABASE SNAPSHOT of your DW and run your reports off it?

Corporeal answered 19/6, 2009 at 10:6 Comment(2)
No, that's not really possible, since we're talking about several TBs of data. The DW is designed for this purpose, which is why the fact tables are partitioned by date.Fortis
But a database snapshot is a sparse file with copy-on-write semantics. All you need is the space on disk to be reserved, actual I/O operations will occur only when a write occurs on the original database.Corporeal
S
1

Yes. Your SQL will be far less readable. You will inevitably miss some NOLOCK hints because SQL SELECT commands using the NOLOCK strategy have to put it all over the place.

You can get the same thing by setting the isolation level

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

In the end you get a 10% performance boost (sorry I'm too lazy too look up the article for it, but it's out there)

I'd say a 10% gain isn't worth reducing readability.

Screed answered 18/6, 2009 at 23:11 Comment(2)
+1 for setting isolation level but "with nolock" is better than nothing. It's not a 10% thing as much as will it run or not thing.Fantoccini
I will consider it. There are other operations on the connection (i.e. my process configuration and status data, not the fact/dim star-modelled data) which I do not necessarily want to have behave that way. At this point I have the WITH(NOLOCK) centralized into the views I use which wrap and flatten the star models in the DW database.Fortis
M
1

If making the whole database read-only is possbile, Then this is a better option. You'll get read-uncommitted performance without having to modify all your code.

ALTER DATABASE adventureworks SET read_only
Medicaid answered 19/6, 2009 at 0:3 Comment(2)
The DW database is already read-only to my user, but has to be writable to the ETL processes loading new data into the facts (and dimensions as necessary). My database holds my process procedures and configuration.Fortis
I think it only works if the db is read only not the user. Maybe consider altering the database out & into read_only as part of the ETLMedicaid
A
0

NOLOCK performs a ‘dirty read’ (indecently READ UNCOMMITTED does the same thing as NOLOCK). If the database is being updated as you read there is a danger that you will get inconsistent data back. The only option is to either accept locking and hence blocking, or to pick one of the two new isolation levels offered in SQL 2005 onwards discussed here.

Acrolein answered 16/7, 2009 at 17:10 Comment(1)
There are no inserts or updates to the data possible. The only changes will be to future dates which we don't read until the processing is complete.Fortis
P
0

There should be only one service in a Kimball-DWH that manipulationg data - the etl-process - himself.

If you have a full end-to-end etl-job you will never ever encounter locks (wehen you set the dependecies of the sub-tasks correct).

But: If you have independent jobs, which are updating data-pipelines end-2-end from sourcing up to the stars and models and reports, you need a concept to ensure consistency and accessibility for concurrent jobs sharing ressources/artefacts. A good advice is partitioned tables and updating cloned tables and switch the updated partitions of involved tables in a short transaction together (after the etl process). so the main-table should be consistent with the others and accessible all the time.

this pattern is a best practise but not without stones in your road - if you google a bit - you will agree.

Pueblo answered 24/8, 2022 at 18:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.