WeakReference returns wrong object
Asked Answered
A

1

10

I've noticed a strange behavior in one of our applications recently.

Exception=System.InvalidCastException: Unable to cast object of type 'System.Data.SqlClient.SqlTransaction' to type 'System.Byte[]'.
   at ServiceStack.Text.Pools.BufferPool.GetCachedBuffer(Int32 minSize) in C:\BuildAgent\work\912418dcce86a188\src\ServiceStack.Text\Pools\BufferPool.cs:line 55
   at ServiceStack.Redis.RedisNativeClient..ctor(RedisEndpoint config) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 447
   at ServiceStack.Redis.RedisClient..ctor(RedisEndpoint config) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisClient.cs:line 66
   at ServiceStack.Redis.RedisConfig.<>c.<.cctor>b__35_0(RedisEndpoint c) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisConfig.cs:line 22
   at ServiceStack.Redis.RedisResolver.CreateRedisClient(RedisEndpoint config, Boolean master) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisResolver.cs:line 76
   at ServiceStack.Redis.RedisManagerPool.GetClient() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisManagerPool.cs:line 214

...

Or

Exception=System.InvalidCastException: Unable to cast object of type 'System.Byte[]' to type 'System.Transactions.SafeIUnknown'.
   at System.Transactions.Transaction.JitSafeGetContextTransaction(ContextData contextData)
   at System.Transactions.Transaction.FastGetTransaction(TransactionScope currentScope, ContextData contextData, Transaction& contextTransaction)
   at System.Transactions.Transaction.get_Current()
   at System.Data.ProviderBase.DbConnectionPool.GetFromTransactedPool(Transaction& transaction)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at ServiceStack.OrmLite.OrmLiteConnection.Open() in C:\BuildAgent\work\27e4cc16641be8c0\src\ServiceStack.OrmLite\OrmLiteConnection.cs:line 86
   ...

Exceptions similar to these two, are being thrown from various parts of our application. The only thing in common to all of those exceptions is the WeakReference object, but I can see no obvious cause for this exception. For example, the ServiceStack.Text code that throws this exception is:

    private class CachedBuffer
    {
        private readonly WeakReference _reference;

        public int Size { get; }

        public bool IsAlive => _reference.IsAlive;
        public byte[] Buffer => (byte[])_reference.Target;

        public CachedBuffer(byte[] buffer)
        {
            Size = buffer.Length;
            _reference = new WeakReference(buffer);
        }
    }

Exception is thrown from Buffer property getter, apparently _reference.Target points to SqlTransaction object and not byte[], but _reference is only initialized once in the constructor and can't be changed afterwards, so how could this exception be thrown?

Additionally none of this code has changed recently, so it makes no sense, that it would suddenly start throwing errors. I also can't see any way we could have caused this bug by some change in our code, or am I wrong about that?

Could this be a bug in .net clr and if so, how could I diagnose it? Our application uses .NET 4.8 framework and we've been seeing those bugs on Windows server 2012 and Windows server 2019 machines within our testing environments, but I have not been able to reproduce them locally on my development machine.

Alrzc answered 22/4, 2021 at 12:28 Comment(11)
I ... can't fault your logic; it really does look like something seriously screwy is happening in (presumably) the GC internals here; has this changed recently? (for reference, here's the relevant System.Data bits - similar WeakReference code, but nothing odd: referencesource.microsoft.com/#system.transactions/System/…). Is this x86? x64? Sad note, though: .NET Framework is largely obsolete now - if there is a bug, I doubt it is going to get much love (and it may already be fixed in .NET 5+)Lennielenno
The application targets x64 architecture. We started noticing this error last Wednesday, soon after we deployed a new version of application to our testing environment. There were no new updates installed to our servers, pretty much the only thing that changed was our application and there were no major changes, just some minor bug fixes.Heraclitean
any Windows Update history, perhaps? I'm ... honestly quite impressed here.Lennielenno
Do you have any unsafe code or reflection?Karwan
@Karwan it is notable that CachedBuffer in this case - and the same for the ADO.NET FastGetTransaction example - is an internal implementation detail of a library; you'd have to work hard even to get hold of the instances to manipulate it with reflection/unsafe; don't get me wrong, it is absolutely possible, but: that wouldn't happen accidentallyLennielenno
@MarcGravell Yeah unlikely reflection is an issue (although it might be a screwy debugger/profiler), but badly written unsafe or Pinvoke native code could do anythingKarwan
Your implementation does technically exhibit a race condition. You are supposed to check .IsAlive both before (to avoid an exception), and after .Target (to create a GC Root), but before casting. That being said, you shouldn't be seeing this behavior, ever - at worst you should be seeing a NullReferenceException. Target shouldn't be pointing to random objects on the heap. PInvoke gone awry, or an out of date .Net installation, is a top suspect here.Clarkclarke
@JonathanDickinson I don't believe you do need to check IsAlive before, Target will just return null. Once it's in a local variable you can null-check and cast it, nothing will happen as it is already a strong-reference. So IsAlive is not necessary at all, just use Target. See https://mcmap.net/q/21227/-c-properly-using-weakreference-isalive-propertyKarwan
@Karwan we do use reflection in some parts of the code, but nothing that could access any of the objects that are throwing these exceptions. That was something I suspected as well, but I couldn't find any cases on reflection being used on classes, that contain any of these objects. We only use reflection on simple dto classes. I don't think we have any Pinvoke calls in our own code, but it's used in several libraries, not to mention .net framework classes themselves. None of that changed in this latest patch though.Heraclitean
@Marc Gravell last windows updates were applied two weeks before we started noticing these errors.Heraclitean
Are your running x86, x64 or AnyCPU build? Debug or Release? Perhaps try other combinations. What about setting gcServer=false or true in the app.configKarwan
A
7

I've managed to locate the problem, it was indeed our own code that caused this error and it was related to reflection. Long story short, one of our developers introduced a code, that invoked deep clone on ExpandoObject.Keys property. This property is not a simple collection of strings, but also contains a reference to entire ExpandoObject and deep inside ExpandoObject there are also some WeakReference fields. I still don't understand exactly what happens, but I guess that cloning those WeakReferences inside ExpandoObject somehow caused the bug we experienced.

Thanks for your help. I inspected that code several times, but completely missed the deep clone invocation, because that extension method had a very generic name.

Alrzc answered 22/4, 2021 at 21:20 Comment(3)
This is very intriguing. I would be very interested in seeing the code that introduced this, because quite honestly: the result is terrifying. Although to be fair, the runtime folks would say "you used reflection to hack inside an object: any consequences are on you"Lennielenno
What deep cloning method was being used? Force.DeepCloner by any chance?Hyden
I'm having a very similar problem https://mcmap.net/q/21228/-system-invalidcastexception-unable-to-cast-object-of-type-39-system-data-sqlclient-sqltransaction-39-to-type-39-system-transactions-bucketset-39/8479 when calling TransactionScope constructor: System.InvalidCastException: Unable to cast object of type 'System.Data.SqlClient.SqlTransaction' to type 'System.Transactions.BucketSet'. We do deep cloning on ExpandoObjects, so I think it could be the same. I'll report back if I solve it.Lyris

© 2022 - 2024 — McMap. All rights reserved.