I want to load large XML documents into XDocument objects.
The simple synchronous approach using XDocument.Load(path, loadOptions)
works great, but blocks for an uncomfortably long time in a GUI context when loading large files (particularly from network storage).
I wrote this async version with the intention of improving responsiveness in document loading, particularly when loading files over the network.
public static async Task<XDocument> LoadAsync(String path, LoadOptions loadOptions = LoadOptions.PreserveWhitespace)
{
String xml;
using (var stream = File.OpenText(path))
{
xml = await stream.ReadToEndAsync();
}
return XDocument.Parse(xml, loadOptions);
}
However, on a 200 MB XML raw file loaded from local disk, the synchronous version completes in a few seconds. The asynchronous version (running in a 32-bit context) instead throws an OutOfMemoryException
:
at System.Text.StringBuilder.ToString()
at System.IO.StreamReader.<ReadToEndAsyncInternal>d__62.MoveNext()
I imagine this is because of the temporary string variable used to hold the raw XML in memory for parsing by the XDocument
. Presumably in the synchronous scenario, XDocument.Load()
is able to stream through the source file, and never needs to create a single huge String to hold the entire file.
Is there any way to get the best of both worlds? Load the XDocument
with fully asynchronous I/O, and without needing to create a large temporary string?
XDocument.Load(stream)
? – EloyelreathMemoryStream
. Then setMemoryStream.Position
to 0 and load (synchronously) it withXDocument
. That way you avoid needing to make a 200MB string (which is probably actually becoming 400MB with .net UTF-16 encoding of a file which is likely mostly ASCII and encoded to 200MB with UTF-8). However, the accepted answer allows you to fully avoid building the separate buffer which, in this environment, makes it the best choice even though it has blocking. – Mervin