SQL Server and WCF Transactions across Linked Oracle Servers
Asked Answered
E

2

7

Note Please see Update 6. It has a simple app that demonstrates how to recreate the problem.

I'm living a DTC nightmare... Our setup is that we have two databases; a SQL Server 2008 database and an Oracle database (11g, I believe). I've got the oracle MTS stuff installed. I have DTC configured to allow distributed transactions. All access to the Oracle tables takes place via views in the SQL Server database that go against Oracle tables in the linked server.

(With regard to DTC config: Checked-> Network DTC Access, Allow Remote Clients, Allow Inbound, Allow Outbound, Mutual Authentication (tried all 3 options), Enable XA Transactions and Enable SNA LU 6.2 Transactions. DTC logs in as NT AUTHORITY\NetworkService)

Our app is an ASP.NET MVC 4.0 app that calls into a number of WCF services to perform database work. Currently the web app and the WCF service share the same app pool (not sure if it's relevant, but just in case...)

Some of our services are transactional, others are not.

Each WCF service that is transactional has the following attribute on its interface:

[ServiceContract(SessionMode=SessionMode.Required)]

and the following attribute on the method signatures in the interface:

[TransactionFlow(TransactionFlowOption.Allowed)]

and the following attribute on every method implementations:

[OperationBehavior(TransactionScopeRequired = true, TransactionAutoComplete = true)]

In my data access layer, all the transactional methods are set up as follows:

using (IDbConnection conn = DbTools.GetConnection(_configStr, _connStr, true))
{
    using (IDbCommand cmd = DbTools.GetCommand(conn, "SET XACT_ABORT ON"))
    {
        cmd.ExecuteNonQuery();
    }
    using (IDbCommand cmd = DbTools.GetCommand(conn, sql))
    {
       ... Perform actual database work ...
    }
}

Services that are transactional call transactional DAL code. The idea was to keep the stuff that needs to be transactional (a few cases) separate from the stuff that doesn't need to be transactional (~95% of the cases).

There ought not be cases where transactional and non-transactional WCF methods are called from within a transaction (though I haven't verified this and this may be the cause of my problems. I'm not sure, which is part of why I'm asking here.)

As I mentioned before, in most cases, this all works fine.

Periodically, and I cannot identify what initiates it, I start getting errors. And once they start, pretty much everything starts failing for a while. Eventually things start working again. Not sure why... This is all in a test environment with a single user.

Sometimes the error is:

Unable to start a nested transaction for OLE DB provider "OraOLEDB.Oracle" for linked server "ORACLSERVERNAME". A nested transaction was required because the XACT_ABORT option was set to OFF.

This message, I'm guessing is happening when I have non-transactional stuff within transactions, as I'm not setting XACT_ABORT in the non-transactional code (that's totally doable, if that will fix my issue).

Most often, however, the error is this:

System.Data.SqlClient.SqlException (0x80131904): The operation could not be performed because OLE DB provider "OraOLEDB.Oracle" for linked server "ORACLSERVERNAME" was unable to begin a distributed transaction.

Now, originally we only had transactions on SQL Server tables and that all worked fine. It wasn't until we added transaction support for some of the Oracle tables that things started failing. I know the Oracle transactions work. And as I said, most of the time, everything is just hunky dorey and then sometimes it starts failing and keeps failing for a while until it decides to stop failing and then it all works again.

Wish I could figure out how to sell that as a "feature" to my users, but I'm not optimistic, so I'd appreciate help in trying to track it down. Please let me know if I've omitted any important information.

Update 1: I noticed that our transactions didn't seem to have a DistributedIdentifier set, so I added the EnsureDistributed() method from this blog post: http://www.make-awesome.com/2010/04/forcibly-creating-a-distributed-net-transaction/

Instead of a hardcoded Guid (which seemed to cause a lot of problems), I have it generating a new Guid for each transaction and that seems to work, but it has not fixed my problem. I'm wondering if the lack of a DistribuedIdentifier is indicative of some other underlying problem. I've never dealt with an environment quite like this before, so I'm not sure what is "normal".

Update 2: I've noticed that the DistributedIdentifier doesn't get passed to WCF. From the client, I have a DistributedIdentifier and a LocalIdentifier in Transaction.Current.TransactionInformation. In the WCF server, however there is only a LocalIdentifier set and it is a different Guid from the client side (which makes sense, but I would have expected the DistributedIdentifier to go across).

Update 3: It appears that when I'm in the midst of transactions failing, even after I shut down IIS, I'm unable to get the DTC service to shutdown and restart. If I go into Component Services and change the security settings, for example, and hit Apply or OK, after a bit of a wait I get a dialgo that says, "Failed ot restart the MS DTC serivce. Please examine the eventlog for further details."

In the eventlog I get a series of events:

1 (from MSDTC): "The MS DTC service is stopping"
2 (From MSSQL$SQLEXPRESS): "The connection has been lost with Microsoft Distributed Transaction 
                            Coordinator (MS DTC). Recovery of any in-doubt distributed transactions 
                            involving Microsoft Distributed Transaction Coordinator (MS DTC) 
                            will begin once the connection is re-established. This is an 
                            informational message only. No user action is required."
-- Folowed by these 3 identical messages
3 (from MSDTC Client 2): 'MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system GCOVA38.'
4 (from MSDTC Client 2): 'MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system GCOVA38.'
5 (from MSDTC Client 2): 'MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system GCOVA38.'
6 (From MSDTC 2): MSDTC started with the following settings:
                     Security Configuration (OFF = 0 and ON = 1):
                     Allow Remote Administrator = 0,
                     Network Clients = 1,
                     Trasaction Manager Communication: 
                     Allow Inbound Transactions = 1,
                     Allow Outbound Transactions = 1,
                     Transaction Internet Protocol (TIP) = 0,
                     Enable XA Transactions = 1,
                     Enable SNA LU 6.2 Transactions = 1,
                     MSDTC Communications Security = Mutual Authentication Required,
                     Account = NT AUTHORITY\NetworkService,
                     Firewall Exclusion Detected = 0
                     Transaction Bridge Installed = 0
                     Filtering Duplicate Events = 1

This makes me wonder if there's something maybe holding a transaction open somewhere? I get the sense that there's some sort of, for lack of a better term, 'dangling transaction' that's not getting committed or rolled back. In every case where I use a TransactionScope, it's happening in a "using" statement, so everything ought to be either rolling back or committing. But I'm really starting to think that somehow, something's leaking...

Update 4: Related to Update 3. I am performing manual enlistment. Our connection string has "Enlist=false". DBTools.GetConnection() takes a boolean parameter that specifies whether or not to enlist the current transaction into the connection. I'm posting this update because, based on the stuff from Update 3, I'm wondering if, perhaps, connection that aren't supposed to be enlisting transactions are, somehow enlisting them.

public static IDbConnection GetConnection(string configString, string connectionString, bool enlistTransaction)
{
    SqlConnection conn = new SqlConnection(connectionString);
    conn.Open();
    if (enlistTransaction && Transaction.Current != null)
    {
        conn.EnlistTransaction(Transaction.Current);
    }
    return conn;
}

public static IDbCommand GetCommand(IDbConnection conn, string command)
{
    IDbCommand cmd = conn.CreateCommand();
    cmd.CommandText = command;
    return cmd;
}

Update 5: I've managed to find a set of unit tests that, if I run the group, it always fails in the same place in the same test (but if I just run that test by itself, over and over, it doesn't fail. It has something to do with the tests running before it.) I managed to get some DTC Trace logs. Here's a log that shows the initial failing transaction. I'm showing some preceding transactions as well, in case seeing some sucecssful ones helps. The failing transaction begins at seq=1846.

pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.064   ;seq=1822       ;eventid=TRANSACTION_BEGUN                        ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"transaction has begun, description :'<NULL>'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.064   ;seq=1823       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #1. RM guid = '344d3060-811c-4fc6-bab6-0eea76e3af3a'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.064   ;seq=1824       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #2. RM guid = '7b16851c-00a5-41dd-b59c-b003dcae08ec'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.064   ;seq=1825       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1005 enlisted as transaction enlistment #3. RM guid = '72efe9cc-80f2-4a5b-9659-28b07987b600'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.079   ;seq=1826       ;eventid=RECEIVED_COMMIT_REQUEST_FROM_BEGINNER    ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"received request to commit the transaction from beginner"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.079   ;seq=1827       ;eventid=RM_ISSUED_PREPARE                        ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"prepare request issued to resource manager #1004 for transaction enlistment #1"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.079   ;seq=1828       ;eventid=RM_ISSUED_PREPARE                        ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"prepare request issued to resource manager #1004 for transaction enlistment #2"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.079   ;seq=1829       ;eventid=RM_ISSUED_PREPARE                        ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"prepare request issued to resource manager #1005 for transaction enlistment #3"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1830       ;eventid=RM_VOTED_COMMIT                          ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 voted commit for transaction enlistment #2"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1831       ;eventid=RM_VOTED_COMMIT                          ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 voted commit for transaction enlistment #1"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1832       ;eventid=RM_VOTED_READ_ONLY                       ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"resource manager #1005 voted read-only for transaction enlistment #3"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1833       ;eventid=TRANSACTION_COMMITTED                    ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"transaction has got committed"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1834       ;eventid=RM_ISSUED_COMMIT                         ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"commit request issued to resource manager #1004 for transaction enlistment #1"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1835       ;eventid=RM_ISSUED_COMMIT                         ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"commit request issued to resource manager #1004 for transaction enlistment #2"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1836       ;eventid=RM_ACKNOWLEDGED_COMMIT                   ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"received acknowledgement of commit request from the resource manager #1004 for transaction enlistment #1"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.079   ;seq=1837       ;eventid=RM_ACKNOWLEDGED_COMMIT                   ;tx_guid=49a79b73-66c0-48cb-abb1-8b657a2e3e4d     ;"TM Identifier='(null)                                            '" ;"received acknowledgement of commit request from the resource manager #1004 for transaction enlistment #2"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.002   ;seq=1838       ;eventid=TRANSACTION_BEGUN                        ;tx_guid=55dd8607-c01e-4135-a247-7ef435c70bc6     ;"TM Identifier='(null)                                            '" ;"transaction has begun, description :'<NULL>'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.018   ;seq=1839       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=55dd8607-c01e-4135-a247-7ef435c70bc6     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #1. RM guid = '7b16851c-00a5-41dd-b59c-b003dcae08ec'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.018   ;seq=1840       ;eventid=RECEIVED_COMMIT_REQUEST_FROM_BEGINNER    ;tx_guid=55dd8607-c01e-4135-a247-7ef435c70bc6     ;"TM Identifier='(null)                                            '" ;"received request to commit the transaction from beginner"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:45.018   ;seq=1841       ;eventid=TRANSACTION_COMMITTED                    ;tx_guid=55dd8607-c01e-4135-a247-7ef435c70bc6     ;"TM Identifier='(null)                                            '" ;"transaction has got committed"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.106   ;seq=1842       ;eventid=TRANSACTION_BEGUN                        ;tx_guid=cc4a8215-3475-4c14-b40d-d150fc79f8f7     ;"TM Identifier='(null)                                            '" ;"transaction has begun, description :'<NULL>'"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.121   ;seq=1843       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=cc4a8215-3475-4c14-b40d-d150fc79f8f7     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #1. RM guid = '7b16851c-00a5-41dd-b59c-b003dcae08ec'"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.121   ;seq=1844       ;eventid=RECEIVED_COMMIT_REQUEST_FROM_BEGINNER    ;tx_guid=cc4a8215-3475-4c14-b40d-d150fc79f8f7     ;"TM Identifier='(null)                                            '" ;"received request to commit the transaction from beginner"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:45.121   ;seq=1845       ;eventid=TRANSACTION_COMMITTED                    ;tx_guid=cc4a8215-3475-4c14-b40d-d150fc79f8f7     ;"TM Identifier='(null)                                            '" ;"transaction has got committed"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.657   ;seq=1846       ;eventid=TRANSACTION_BEGUN                        ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"transaction has begun, description :'<NULL>'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.657   ;seq=1847       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #1. RM guid = '344d3060-811c-4fc6-bab6-0eea76e3af3a'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1848       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"resource manager #1004 enlisted as transaction enlistment #2. RM guid = '7b16851c-00a5-41dd-b59c-b003dcae08ec'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1849       ;eventid=RM_ENLISTED_IN_TRANSACTION               ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"resource manager #1005 enlisted as transaction enlistment #3. RM guid = '72efe9cc-80f2-4a5b-9659-28b07987b600'"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1850       ;eventid=RECEIVED_ABORT_REQUEST_FROM_NON_BEGINNER ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"received request to abort the transaction from non beginner"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1851       ;eventid=TRANSACTION_ABORTING                     ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"transaction is aborting"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1852       ;eventid=RM_ISSUED_ABORT                          ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"abort request issued to resource manager #1004 for transaction enlistment #1"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1853       ;eventid=RM_ISSUED_ABORT                          ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"abort request issued to resource manager #1004 for transaction enlistment #2"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1854       ;eventid=RM_ISSUED_ABORT                          ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"abort request issued to resource manager #1005 for transaction enlistment #3"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1855       ;eventid=RM_ACKNOWLEDGED_ABORT                    ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"received acknowledgement of abort request from the resource manager #1005 for transaction enlistment #3"
pid=1244       ;tid=6284       ;time=10/15/2013-10:00:59.672   ;seq=1856       ;eventid=RM_ACKNOWLEDGED_ABORT                    ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"received acknowledgement of abort request from the resource manager #1004 for transaction enlistment #2"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:59.672   ;seq=1857       ;eventid=RM_ACKNOWLEDGED_ABORT                    ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"received acknowledgement of abort request from the resource manager #1004 for transaction enlistment #1"
pid=1244       ;tid=8488       ;time=10/15/2013-10:00:59.672   ;seq=1858       ;eventid=TRANSACTION_ABORTED                      ;tx_guid=d07cc436-033c-450b-a36c-7e2fe79cdb81     ;"TM Identifier='(null)                                            '" ;"transaction has been aborted"

I'm not sure, but it might be worth noting that there's a 14.5 second delay between the completion of the previous transaction and the start of this one. The unit tests seem to hang there, and I've yet to figure out why.

This probably doesn't matter, but the failure is in the ExecuteNonQuery below:

public IClientInternal GetClient(string clientCode)
{
    string sql = "SELECT [CLIENT_CODE], [COMPANY], [EMPLOYEE] << more fields here >> FROM  OPENQUERY("+
                    _settings.LinkedOracleServer + ", 'SELECT * FROM CLIENT WHERE "+ 
                    "CLIENT_CODE = ''" + clientCode + "'' "+
                    "')";

    using (IDbConnection conn = DbTools.GetConnection(_configStr, _connStr, true))
    {
        using (IDbCommand cmd = DbTools.GetCommand(conn, "SET XACT_ABORT ON"))
        {
            cmd.ExecuteNonQuery();
        }
        using (IDbCommand cmd = DbTools.GetCommand(conn, sql))
        {
            DbTools.AddParameter(cmd, "@CLIENT_CODE", DbType.String, clientCode);
            IDataReader reader = cmd.ExecuteReader();
            if (reader.Read())
            {
                return ClientInternal.FromReaderRecord(reader);
            }
            return null;
        }
    }
}

This code gets called a number of times successfully before it eventually fails.

On thing I'm wondering is, if it's possible that trasnactions aren't getting cleared out properly on the Oracle side. If I understand correctly, Oracle has a 10 distributed transaction limit. Is it possible that it thinks previous distributed transactions are still open (I see no signs fo this. All evidence seems to indicate that all the previous transactions are operating perfectly find and the DTC log shows them being committed).

Update 6: I've managed to produce the problem in a fairly small piece of code. Below, with the exception of changing the names of the DB and clientCode, is the exact code to reproduce the problem. I get a SqlException on the cmd.ExecuteReader() call in GetClients() The operation could not be performed because OLE DB provider "OraOLEDB.Oracle" for linked server "ORACLE" was unable to begin a distributed transaction. I can call GetEmployeeBadges() over and over and over again and it'll work. I can call GetClients() over and over and it'll work. But if I call GetEmployeeBadges() and then call GetClients() within a transcaction, it fails. It appears to be the result of the second call being enlisted in the transaction.

As an additional note, it appears there's something weird with the GetEmployeeBadges(). This is another oddity of our environment, but V_EMPLOYEE and V_PAEMPLOYEE are actually views on yet another Oracle database. So the Oracle server I'm hitting against has views on another Oracle server. So it's a SQL Server view of an oracle view of an oracle table. I know, it's a bit nuts. It's amazing anything here functions. If I try to run the GetEmployeeBadges() in a transaction and enlist it in a transaction, it will actually fail with: Cannot execute the query "<<query text here>>" against OLE DB provider "OraOLEDB.Oracle" for linked server "ORACLE".

I can't imagine getting this a whole lot more minimal.

class Program
{
    private static string connStr = @"Server=localhost\SQLEXPRESS;Database=MYDB;Trusted_Connection=True;Enlist=false";
    static void Main(string[] args)
    {
        GetEmployeeBadges();

        using (TransactionScope ts = new TransactionScope(TransactionScopeOption.RequiresNew, new TimeSpan(0, 15, 0)))
        {
            GetClients();
            ts.Complete();
        }
    }

    private static void GetEmployeeBadges()
    {
        string sql = @"select * from OPENQUERY(ORACLE, 'select a.security_nbr, a.employee, b.last_name, b.first_name, b.department
                        from V_PAEMPLOYEE a, V_EMPLOYEE b
                        where 
                        LENGTH(TRIM(a.SECURITY_NBR)) > 0 and
                        a.EMPLOYEE = b.EMPLOYEE and
                        a.COMPANY = 3')";        
        using (IDbConnection conn = DbTools.GetConnection(connStr))
        {
            using (IDbCommand cmd = DbTools.GetCommand(conn, "SET XACT_ABORT ON"))
            {
                cmd.ExecuteNonQuery();
            }
            using (IDbCommand cmd = DbTools.GetCommand(conn, sql))
            {
                cmd.CommandTimeout = 240;
                IDataReader reader = cmd.ExecuteReader();
            }
        }
    }

    public static void GetClients()
    {
        string clientCode = "clientCode";
        string sql = @"SELECT [CLIENT_CODE], [COMPANY], [EMPLOYEE], [MENU_TOKEN_CODE], [ADMINISTRATOR_FLAG], [SUPERVISOR_FLAG], 
                              [CLIENT_DESCR], [DEFAULT_QUEUE_CODE], [FG_WHSE_CODE], [FRT_WHSE_CODE], [BILL_COMP_CODE], [FI_COMP_CODE], 
                              [DEFAULT_ROLE], [DEFAULT_PRINTER_CODE], [SHIP_PRINTER_CODE], [DEFAULT_DISPLAY] 
                       FROM  OPENQUERY( ORACLE, 'SELECT * FROM CLIENT WHERE CLIENT_CODE = ''clientCode''')";

        using (IDbConnection conn = DbTools.GetConnection(connStr, true))
        {
            using (IDbCommand cmd = DbTools.GetCommand(conn, "SET XACT_ABORT ON"))
            {
                cmd.ExecuteNonQuery();
            }
            using (IDbCommand cmd = DbTools.GetCommand(conn, sql))
            {
                DbTools.AddParameter(cmd, "@CLIENT_CODE", DbType.String, clientCode);
                IDataReader reader = cmd.ExecuteReader();
                reader.Read();
            }
        }
    }
}

The DbTools stuff, in case it's helpful is:

public static class DbTools
{
    public static IDbConnection GetConnection(string connectionString)
    {
        return GetConnection(connectionString, false);
    }

    public static IDbConnection GetConnection(string connectionString, bool enlistTransaction)
    {
        SqlConnection conn = new SqlConnection(connectionString);
        conn.Open();
        if (enlistTransaction && Transaction.Current != null)
        {
            conn.EnlistTransaction(Transaction.Current);
        }
        return conn;
    }

    public static IDbCommand GetCommand(IDbConnection conn, string command)
    {
        IDbCommand cmd = conn.CreateCommand();
        cmd.CommandText = command;
        return cmd;
    }

    public static IDbDataParameter AddParameter(IDbCommand cmd, string name, DbType type, object value)
    {
        IDbDataParameter param = cmd.CreateParameter();
        param.ParameterName = name;
        param.DbType = type;
        param.Value = value != null ? value : DBNull.Value;
        cmd.Parameters.Add(param);
        return param;
    }
}

Just 23 hours left on the bounty. I'd LOVE to give that 150 points to someone!!!

Eskimo answered 7/10, 2013 at 20:51 Comment(11)
check if "Ad Hoc Distributed Queries" is activated at the instance level : run sp_configure "Ad Hoc Distributed Queries"Ting
@Ting As best I can tell this has had no effect. Does Ad Hoc Distributed Queries affect OPENQUERY? The documentation only mentions OPENROWSET and OPENDATASOURCE.Eskimo
The linked server works well ? Can you begin a distributed transaction within sql server management studio ?Ting
As I said in my post, most of the time, everything works fine. Then, out of nowhere, it begins failing and continues to fail for a period (usually a few minutes) and then all of a sudden starts working again.Eskimo
You can trace the MSDTC service to see what's the actual problem. I recommend you to check sql server logs as well and oracle ones. maybe some transaction isn't correctly configured and is the cause of your problems.Ting
@Ting 'You can trace the MSDTC service to see what's the actual problem.' - How? I've tried msdtcvtr –tracelog <NameofTheDtcTrace.log> –o OutPutFile, but this merely produces an empty .csv file.Eskimo
check : blogs.msdn.com/b/distributedservices/archive/2009/02/07/…Ting
Maybe you didn't notice that the command I posted is the exact one from that blog. As I said, it produces an empty .csv file.Eskimo
yes, but this link also says : if you collect a trace from a Windows 2003 machine and try to parse it on (let’s say), a Windows XP machine, the parsed output may be blank.Ting
Not the case here, but thanks for the help.Eskimo
nothing on sql server logs or oracle logs ? you can run a sql profiler trace and next time you have issues try to analyse what may have triggered these problems.Ting
E
0

The problem was caused by our awefully strange setup.

We have a SQL Server that has a linked Oracle server. We create views in SQL Server to the tables on the Oracle server. There are a handful of views in the Oracle server that are views on tables in yet another Oracle server. We were simply creating views in SQL server of the Oracle views in server 1 on the Oracle tables in server 2. Even though we weren't using transactions when getting data from the views of views, it was causing the transactions in subsequent executions to fail (if someone cares to comment about how that's possible, I'd love an explanation.)

Our solution was to simply create a second linked server and bypass the views of views. You may wonder why we didn't do this to begin with and it was simply because, until we ran into this issue, there was no compelling reason to have a second linked server (and the second server has very sensitive info on it, so we kind of liked the idea of minimizing paths to access it). Obviously we have a compelling reason now.

Though I didn't give the answer to Exth3, I did give him/her the bounty points for providing the information that ultimately led to me finding the problem. Thanks Exth3!

Eskimo answered 7/10, 2013 at 20:51 Comment(0)
S
0

According to KB187289, Oracle does not support nested transactions, hence the first error message whenever a transaction is aborted without a clean rollback. Neither database can finish the transaction in this case and will stall until the blocking transaction is discarded.

XACT_ABORT forces the transaction to terminate at the first runtime error and to perform an immediate rollback. Activate XACT_ABORT for all connections in order to get the runtime errors thrown immediately, one of them is probably causing the transaction to fail on either of the databases.

Without XACT_ABORT, MS SQL server attempts a silent, partial rollback of the broken query in a nested transaction. The runtime error will be suppressed and will go unnoticed unless caught explicitly.

This does not solve your original problem (there is still a failing transaction), but it enables you to see the runtime error in the first place.

TRY...CATCH and nested transactions would be the cleaner way to perform proper error handling since SQL 2005, but this is not an option for you since Oracle does not support these language extensions.

Sailing answered 14/10, 2013 at 23:0 Comment(8)
I've found a reliable way to reproduce the problem. If I run a particular set of unit tests, it will always start failing in the same place in the same unit test. I can run this unit test repeatedly and it works fine, however. It has something to do with running the group (that is, some previous test(s) that's running before it, is setting it up for failure). But there does not appear to be a nested transaction. I added a test to see if Transaction.Current is not null, but it is null before I create the transaction that fails.Eskimo
Additionally, I turned XACT_ABORT on by default (for all connections) AND explicitly turned it on for every database operation (as above in the ExecuteNonQuery). Still seems to fail exactly the same way in exactly the same place.Eskimo
There is one weird thing in the DCT log: RECEIVED_ABORT_REQUEST_FROM_NON_BEGINNERSailing
Also, why would a simple SELECT on a VIEW need 3 resource managers? Please check Oracles log.xml for messages at the corresponding timestamp, as well as the timestamp of the previous test. If nothing, lower threshold for logging to WARNINGSailing
I have been unable to find the log.xml that you mentioned. Where would I find that? Also, please see the latest update.Eskimo
Use show parameter background on the Oracle DB to get the path of the trace directory. You probably want the file which name starts with "alert".But it might also work like this: oracle-magician.blogspot.de/2008/07/… if it's a 11g or newer. So far it sounds like you can't do transactions which span all 3 servers?Sailing
I think I know now what exactly is happening. Oracle is so "friendly" to split the remote query in two separate queries which are executed in different transactions (hence the 3 managers, 1 local, 2 remote) but these transaction are blocking each other because the underlying views share the same tables. This might give you a few hints: docs.oracle.com/cd/B28359_01/server.111/b28310/ds_appdev004.htm The DRIVING_SITE hint could possibly be used to avoid the split by enforcing the whole remote query to be executed on the third server.Sailing
Thanks for all your help. A combination of things you provided ended up ultimately leading me to the answer, so you get the 150 points. Thanks a lot!Eskimo

© 2022 - 2025 — McMap. All rights reserved.