We're troubleshooting a sort of Sync Framework between two SQL Server databases, in separate servers (both SQL Server 2008 Enterprise 64 bits SP2 - 10.0.4000.0), through linked server connections, and we reached to a point in which we're sort of stuck.
The logic to identify which are the records "pending to be synced" is of course based on ROWVERSION
values, including the use of MIN_ACTIVE_ROWVERSION()
to avoid dirty reads.
All SELECT
operations are encapsulated in SPs on each "source" side. This is a schematic sample of one SP:
PROCEDURE LoaderRetrieve(@LastStamp bigint, @Rows int)
BEGIN
...
(vars handling)
...
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
Select TOP (@Rows) Field1, Field2, Field3
FROM Table
WHERE [RowVersion] > @LastStampAsRowVersionDataType
AND [RowVersion] < @MinActiveVersion
Order by [RowVersion]
END
The approach works just fine, we usually sync records with the expected rate of 600k/hour (job every 30 seconds, batch size = 5k), but at some point, the sync process does not find any single record to be transferred, even though there are several thousand of records with a ROWVERSION
value greater than the @LastStamp
parameter.
When checking the reason, we've found that the MIN_ACTIVE_ROWVERSION()
has a value less than (or slightly greater, just 5 or 10 increments) the @LastStamp
being searched. This of course shouldn't be a problem since the MIN_ACTIVE_ROWVERSION()
approach was introduced to avoid dirty reads and posterior issues, BUT:
The problem we see in some occasions, during the above scenario occurs, is that the value for MIN_ACTIVE_ROWVERSION()
does not change during a long (really long) period of time, like 30/40 minutes, sometimes more than one hour. And this value is by far less than the @@DBTS
value.
We first thought this was related to a pending DB transaction not yet committed. As per MSDN definition about the MIN_ACTIVE_ROWVERSION()
(link):
Returns the lowest active rowversion value in the current database. A rowversion value is active if it is used in a transaction that has not yet been committed.
But when checking sessions (sys.sysprocesses
) with open_tran > 0
during the duration of this issue, we couldn't find any session with a waittime greater than a few seconds, only one or two occurrences of +/- 5 minutes waittime sessions.
So at this point we're struggling to understand the situation: The MIN_ACTIVE_ROWVERSION()
does not change during a huge period of time, and no uncommitted transactions with long waits are found within this time frame.
I'm not a DBA and could be the case that we're missing something in the picture to analyze this problem, doing some research on forums and blogs couldn't found any other clue. So far the open_tran > 0 was the valid reason, but under the circumstances I've exposed, it's clear that there's something else and don't know why.
Any feedback is appreciated.