Context
I've written a parallel job framework in C# to import/export a large amount of data from/to an ElasticSearch cluster. To do this I've modelled each import or export of a single item as an object that gets executed at some point by the framework. To interface with ElasticSearch I'm using NEST (official .NET ElasticSearch client library) v1.7.1 and JSON.Net 7.0.1.
Each of the import/export task objects interacts with ElasticSearch using NEST. For performance reasons, I have written a proxy class which groups search requests generated by the task objects into fixed-size batches to use with NEST's _msearch API. The caller to this class is delayed until its batch returns. That class is available here.
My framework wraps models the result of each import/export task as either "bool" or "Exception". The overall process is able to continue even if errors with individual items are encountered.
Problem
I am seeing the following exception raised thousands of times after several hours of tasks completing without errors:
System.InvalidOperationException: Current error context error is different to requested error.
at _____.Matcher.<GetBestMatchAsync>d__15.MoveNext() in C:\\_work\\edc7a363\\_____\\Matcher.cs:line 266
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
_____.MatchBlock`1.<ExecuteAsyncInternal>d__19.MoveNext() in C:\\_work\\edc7a363\\_____\\MatchBlock.cs:line 111
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at _____.Block.BlockBase.<ExecuteAsync>d__11.MoveNext() in C:\\_work\\edc7a363\\_____\\Block\\BlockBase.cs:line 33
This is the code throwing the exception (from the bulk searcher class linked above):
try
{
var bulkResponse = Client.MultiSearch(searchDescriptor);
var items = bulkResponse.GetResponses<T>().ToList();
// Set response values and release all waiting tasks
var zip = currentBuffer.Zip(items, (op, result) => new { op, result });
foreach (var a in zip)
{
a.op.Response = a.result;
a.op.Cts.Cancel();
}
}
catch (Exception e)
{
foreach (var op in currentBuffer)
{
op.Error = e;
op.Cts.Cancel();
}
}
where Client
is an IElasticClient
.
Googling the exception message leads me to this method in the JsonSerializerInternalBase class in JSON.Net, which seems to be executed after each deserialisation:
private ErrorContext GetErrorContext(object currentObject, object member, string path, Exception error)
{
if (_currentErrorContext == null)
{
_currentErrorContext = new ErrorContext(currentObject, member, path, error);
}
if (_currentErrorContext.Error != error)
{
throw new InvalidOperationException("Current error context error is different to requested error.");
}
return _currentErrorContext;
}
Given a single NEST object is being reused for every operation across multiple threads - and I think NEST only uses one JsonSerializer instance - this makes me think this part of JSON.Net is not thread-safe. Though it's strange how the error doesn't start happening until a few hours into a run.
How can I debug this further?
JsonSerializerInternalBase
is the base class forJsonSerializerInternalWriter.cs
andJsonSerializerInternalReader.cs
notJsonSerializer
. For each call toSerialize
,Deserialize
orPopulate
,JsonSerializer
allocates one of these to do the actual work -- likely for thread safety. – Tizesserializer.Serialize()
including howJsonSerializer.Error
is initialized? – Tizes