Azure CloudAppendBlob errors with concurrent access
Asked Answered
L

4

11

My understanding was that the Azure CloudAppendBlob was safe from concurrency issues as you can only append to this blob storage and it does not need to compare E-tags. As stated by this post:

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx

specifically:

In addition, Append Blob supports having multiple clients writing to the same blob without any need for synchronization (unlike block and page blob)

However the following unit test raises:

412 the append position condition specified was not met.

stack trace

Microsoft.WindowsAzure.Storage.Blob.BlobWriteStream.Flush()
Microsoft.WindowsAzure.Storage.Blob.BlobWriteStream.Commit()
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.UploadFromStreamHelper
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendFromStream
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendFromByteArray
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendText

Here is the unit test. Maybe the service will handle requests from different contexts but not like this as a parallel?

    [TestMethod]
    public void test_append_text_concurrency()
    {
        AppendBlobStorage abs = new AppendBlobStorage(new    TestConnectConfig(), "testappendblob");

        string filename = "test-concurrent-blob";

        abs.Delete(filename);                       

        Parallel.Invoke(
            () => { abs.AppendText(filename, "message1\r\n"); },
            () => { abs.AppendText(filename, "message2\r\n"); }
        );

        string text = abs.ReadText(filename);

        Assert.IsTrue(text.Contains("message1"));
        Assert.IsTrue(text.Contains("message2"));
    }

Method in AppendBlobStorage

    public void AppendText(string filename, string text)
    {
        CloudAppendBlob cab = m_BlobStorage.BlobContainer.GetAppendBlobReference(filename);

        // Create if it doesn't exist
        if (!cab.Exists())
        {
            try
            {
                cab.CreateOrReplace(AccessCondition.GenerateIfNotExistsCondition(), null, null);
            }
            catch { }
        }

        // Append the text
        cab.AppendText(text);      
    }

Maybe I'm missing something. The reason I'm trying to do this as I have multiple web jobs which can all write to this append blob and I figured this was what it was designed for?

Leighton answered 11/9, 2015 at 18:40 Comment(2)
I have also just tested this from spinning up multiple webjobs and writing some text to the same appendblob. I get the same error.Leighton
Note that I'm using version 5.0.2.0 of Microsoft.WindowsAzure.Storage (I have also tried against 5.0.3.0 preview)Leighton
L
12

After a bit more searching it looks like this is an actual problem.

I guess AppendBlobStorage is fairly new. (There are also other issues at the moment with AppendBlobStorage. see

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/09/02/issue-in-azure-storage-client-library-5-0-0-and-5-0-1-preview-in-appendblob-functionality.aspx)

Anyway I fixed the issue by using the AppendBlock varient rather than AppendText as suggested here:

https://azurekan.wordpress.com/2015/09/08/issues-with-adding-text-to-azure-storage-append-blob/

The change to the appendtext method which passes the unit test defined above

    public void AppendText(string filename, string text)
    {
        if (string.IsNullOrWhiteSpace(filename))
            throw new ArgumentException("filename cannot be null or empty");

        if (!string.IsNullOrEmpty(text))
        {
            CloudAppendBlob cab = m_BlobStorage.BlobContainer.GetAppendBlobReference(filename);

            // Create if it doesn't exist
            if (!cab.Exists())
            {
                try
                {
                    cab.CreateOrReplace(AccessCondition.GenerateIfNotExistsCondition(), null, null);
                }
                catch (StorageException) { }
            }

            // use append block as append text seems to have an error at the moment.
            using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(text)))
            {
                cab.AppendBlock(ms);
            }
        }

    }
Leighton answered 12/9, 2015 at 15:27 Comment(0)
M
4

the class CloudAppendBlob's append methods, includes

AppendBlock/AppendFromByteArray/AppendFromFile/AppendFromStream/AppendText

essentially they will all use this same rest api endpoint. read the document: https://learn.microsoft.com/en-us/rest/api/storageservices/append-block

But only AppendBlock should be used in multi-writer scenario, all others should be used in single-writer scenario. The reason is: AppendBlock will NOT send the header x-ms-blob-append-offset with the PUT HTTP request.

the header x-ms-blob-append-offset basically saying, MUST append this block data at this offset of the blob.

so for AppendBlock the http request looks like this:

PUT https://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-client-request-id: bb7f5a93-191d-40f9-8b92-4ec0476be920 x-ms-date: Fri, 23 Mar 2018 20:21:29 GMT Authorization: SharedKey XXXXX Host: test.blob.core.windows.net Content-Length: 99

For all the other append methods, it will send the header x-ms-blob-append-offset. The value of this header should be the current length of the blob before append. so how does the library know the value? It actually will send a HEAD http request to get that information

HEAD http://test.blob.core.windows.net/test/20180323.log HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2 x-ms-date: Fri, 23 Mar 2018 20:29:19 GMT Authorization: SharedKey XXXX Host: test.blob.core.windows.net

the response header Content-Length's value will be the value for the header x-ms-blob-append-offset in the following PUT http request:

PUT http://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-blob-condition-appendpos: 1287 x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2 x-ms-date: Fri, 23 Mar 2018 20:29:20 GMT Authorization: SharedKey XXXXX Host: test.blob.core.windows.net Content-Length: 99

so the original question, when two parallel tasks call the AppendText at the same time, most likely, the two tasks will send the HEAD http request to get the blob's current length, which will be the same. Then task that send the PUT http request first will succeed, but the task that send the PUT http request later will fail because the blob's length already changed, and that offset has been already taken by the first PUT http request.

So if you have a multi-writer scenario, AppendBlock is the method that works right now. But you do have to be aware that

  • you will have no control of the position of the block in blob
  • the blob block has a size limit ( i think it is 4M)
  • if you use AppendBlock to upload the data more than 4M, the request will fail, with a response: HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit
  • if you use other methods except AppendBlock to upload a large data, it will send one HEAD http request to get the blob length, then automatically split the data into multiple PUT http requests. the block size can be controlled by CloudAppendBlob.StreamWriteSizeInBytes. if you don't set, it will default to 4M.
  • So as the name AppendBlock hints, it can only append one block, not more than one block. So if you want to upload a large blob, you have split the data yourself. But if you have a multi-writer scenario, you can not guarantee the splitted blocks will be together in the blob.
Maros answered 23/3, 2018 at 21:29 Comment(1)
Hi, sorry for asking after so long. But to clarify, AppendText is thread-safe, the issue is just that it throws an error if the file has changed if 2 or more threads are writing to the same file, so if retry logic is implemented this should not be an issue. Is this assumption correct?Schechter
R
0

For people who need a more generic solution to this problem, I created an extension method:

public static async Task AppendTextConcurrentAsync(this CloudAppendBlob appendBlob, string content)
{
    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(content)))
    {
        await appendBlob.AppendBlockAsync(stream);
    }
}

This solution is more consistent with how you use other Append* methods on CloudAppendBlob.

Roble answered 19/1, 2017 at 19:21 Comment(0)
H
-1

You might try AppendTextAsync. That seemed to work for me in a similar situation. Using the lock keyword might also work.

public void Log(string message)
{
    lock (this.appendBlob)
    {
        appendBlob.AppendText(string.Format("[{0:s}] {1}{2}", DateTime.Now, message, Environment.NewLine));
    }
}
Harts answered 22/2, 2017 at 22:27 Comment(1)
Using lock will serialize the appends rather than letting them run in parallel as intended by the OP. Also, I've encountered the same issue as the OP with AppendTextAsync.Luting

© 2022 - 2024 — McMap. All rights reserved.