"The specified block list is invalid" while uploading blobs in parallel
Asked Answered
C

3

15

I've a (fairly large) Azure application that uploads (fairly large) files in parallel to Azure blob storage.

In a few percent of uploads I get an exception:

The specified block list is invalid.

System.Net.WebException: The remote server returned an error: (400) Bad Request.

This is when we run a fairly innocuous looking bit of code to upload a blob in parallel to Azure storage:

    public static void UploadBlobBlocksInParallel(this CloudBlockBlob blob, FileInfo file) 
    {
        blob.DeleteIfExists();
        blob.Properties.ContentType = file.GetContentType();
        blob.Metadata["Extension"] = file.Extension;

        byte[] data = File.ReadAllBytes(file.FullName);

        int numberOfBlocks = (data.Length / BlockLength) + 1;
        string[] blockIds = new string[numberOfBlocks];

        Parallel.For(
            0, 
            numberOfBlocks, 
            x =>
        {
            string blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
            int currentLength = Math.Min(BlockLength, data.Length - (x * BlockLength));

            using (var memStream = new MemoryStream(data, x * BlockLength, currentLength))
            {
                var blockData = memStream.ToArray();
                var md5Check = System.Security.Cryptography.MD5.Create();
                var md5Hash = md5Check.ComputeHash(blockData, 0, blockData.Length);

                blob.PutBlock(blockId, memStream, Convert.ToBase64String(md5Hash));
            }

            blockIds[x] = blockId;
        });

        byte[] fileHash  = _md5Check.ComputeHash(data, 0, data.Length);
        blob.Metadata["Checksum"] = BitConverter.ToString(fileHash).Replace("-", string.Empty);
        blob.Properties.ContentMD5 = Convert.ToBase64String(fileHash);

        data = null;
        blob.PutBlockList(blockIds);
        blob.SetMetadata();
        blob.SetProperties();
    }

All very mysterious; I'd think the algorithm we're using to calculate the block list should produce strings that are all the same length...

Cyprian answered 16/10, 2012 at 15:11 Comment(4)
You mentioned that it happens sometimes. Does this happen for particular file or does it happen randomly i.e. the code would fail for a file and then work again for the same file? Is there by any chance, the number of blocks are more than 50,000. Also can you check for the files that fail, the size of the file is exactly divisible by the block length or not? Also if possible, can you run fiddler and trace the failed request especially the data being sent.Cameroncameroon
Good thoughts, @Gaurav Mantri, thank you -- and no, it happens "randomly", a retry for the same file appears to work. There's no particular dependency on the block size (4M) either, and yes, my colleague pointed out the fencepost error in there. Can't Fiddle, unfortunately, as we're in the cloud on this one.Cyprian
Wondering if you ever found a solution to this? We have a retry policy in place, that fixes this for us, but we're still getting maybe 1 fail out of 100K writes (as an estimate). Did you ever get to the bottom of it?Gasper
@Gasper - regrettably, no. We too put in a retry policy but still saw (admittedly rare) failures. I've moved on to another project now so I don't know if it still happens.Cyprian
S
8

We ran into a similar issue, however we were not specifying any block ID or even using the block ID anywhere. In our case, we were using:

using (CloudBlobStream stream = blob.OpenWrite(condition))
{
   //// [write data to stream]

   stream.Flush();
   stream.Commit();
}

This would cause The specified block list is invalid. errors under parallelized load. Switching this code to use the UploadFromStream(…) method while buffering the data into memory fixed the issue:

using (MemoryStream stream = new MemoryStream())
{
   //// [write data to stream]

   stream.Seek(0, SeekOrigin.Begin);
   blob.UploadFromStream(stream, condition);
}

Obviously this could have negative memory ramifications if too much data is buffered into memory, but this is a simplification. One thing to note is that UploadFromStream(...) uses Commit() in some cases, but checks additional conditions to determine the best method to use.

Septet answered 29/10, 2018 at 21:59 Comment(1)
Nice, thanks. My original question was from six years ago: the code no longer exists, and alas the organisation I wrote it for no longer exists either.Cyprian
N
7

This exception can happen also when multiple threads open stream into a blob with the same file name and try to write into this blob simultaneously.

Nostoc answered 29/4, 2019 at 11:48 Comment(0)
S
3

NOTE: this solution is based on Azure JDK code, but I think we can safely assume that pure REST version will have the very same effect (as any other language actually).

Since I have spent entire work day fighting this issue, even if this is actually a corner case, I'll leave a note here, maybe it will be of help to someone.

I did everything right. I had block IDs in the right order, I had block IDs of the same length, I had a clean container with no leftovers of some previous blocks (these three reasons are the only ones I was able to find via Google).

There was one catch: I've been building my block list for commit via

CloudBlockBlob.commitBlockList(Iterable<BlockEntry> blockList)

with use of this constructor:

BlockEntry(String id, BlockSearchMode searchMode)

passing

BlockSearchMode.COMMITTED

in the second argument. And THAT proved to be the root cause. Once I changed it to

BlockSearchMode.UNCOMMITTED

and eventually landed on the one-parameter constructor

BlockEntry(String id)

which uses UNCOMMITED by default, commiting the block list worked and blob was successfuly persisted.

Sanyu answered 29/1, 2020 at 20:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.