Reverse tree building (with an odd number of children)
Asked Answered
N

1

8

I just found out about the AWS Glacier service and wanted to write a small Python application to upload files via the REST API. I took a look at the required headers and stumbled upon x-amz-sha256-tree-hash. I need to calculate the SHA-256 hash of the entire file as well as the hash of the parent of all hashes of each 1 MB chunk. This leads to the following tree:

AWS's SHA-256 Tree Hash procedure

(Image taken from here)

I already made a function that reads 1 MB chunks and a class which calculates their hashes on-the-fly but then I completely struggle:

In my application I made a class called chunk which takes data and calculates a hash in the __init__ method as well as holds parent and children (like a regular tree). When the user opens a file those chunks instances will be generated properly with their respective hashes (in this example that would be 7 chunk instances).

Now I have two big problems that are connected to each other:

  1. How do I build this tree in reverse? I basically need to make a new chunk for each two chunk instances on the lowest layer and calculate a hash based on those two hashes. But where do I store that parent? In the children of the parent and do reverse tree walking?
  2. How does that work with an odd number of children? If I have an algorithm that goes through each parent layer then I would miss the last (0.5 MB) chunk.

I checked out this topic on SO but that method only works with an even children count which is not always given.

Can you help me finding a way/an algorithm/an approach on how to solve this issue?

Thanks in advance!

Paul

Narcotic answered 21/8, 2012 at 15:13 Comment(13)
How do you decide which is the parent and which is the child ?Deland
Did you already try to hash the archive normally with hashlib?Mauretania
@Deland What do you exactly mean? The bottom layer are the children, the hashes built from 2 chunks are the parent of those 2 chunks and so on.Narcotic
@pythonm Yes, but that is only 1/2 of the required headers. I also need this SHA-256 tree in order to successfully upload files. See here for the x-amz-content-sha256 and the x-amz-sha256-tree-hash.Narcotic
So, my question is whether you already have the tree built ? or you have just the hashes and you want help in building the tree using those hashes?Deland
@Deland I just have the hashes. From my knowledge point I am unable to think of a way how to build that according to the given procedure with my problems in mind. (Also I only have the hashes for the chunks, I could not go farther because I am unable to build the second layer properly).Narcotic
ok. I am unaware of AWS, but, dealing this problem as a algorithm question. I have 1 more doubt. If you have the hashes, how do you decide which hash becomes the parent and which becomes the child?Deland
Good question. I just set that in my mind. From what I learned is that you have a parent with children following, so the lowest layer has to contain the the children, right? However, switching the logic you could say that the hash I want to achieve could be the lowest children and the chunks are all parents (kind of a masterpiece family, many parents: one final product).Narcotic
I am trying to ask : given a child, how do you decide the child-parent relationship ?Deland
Given the bottom left child (in the example): It would be the child of the hash in the level above. That hash is the parent of the bottom left child and a child of the hash above of it and so on.Narcotic
let us continue this discussion in chatDeland
You can also find some code for doing this in the BOTO git repository, in the glacier branch. Here's a link to the latest version of this code as of the time of writing: github.com/boto/boto/blob/glacier/boto/glacier/writer.pyTelegraphic
Now in botocore.utils.calculate_tree_hashGregorio
M
4

First calculate the number of levels, then

def proclevel(levels):
    if levels > 0:
        generator = proclevel(levels - 1)
        temp = None
        for firsthash, secondhash in generator:
            if not temp: temp = hashofthem(firsthash, secondhash)
            else: yield temp, hashofthem(firsthash, secondhash); temp = None
        #If odd number of packets
        if temp: yield temp, None
    else:
        temp = None
        for chunk in chunks:
            if not temp: temp = hash(chunk)
            else: yield temp, hash(chunk); temp = None
        if temp: yield temp, None

Make sure to handle None as second argument in hashofthem :)

Mauretania answered 21/8, 2012 at 15:37 Comment(3)
Ah, recursion, of course facepalm. I'm going to try that out and will - hopefully - report success. ;)Narcotic
This binary tree ASKS for recursion ;)Mauretania
It seems to have worked! Thank you very much! you should have made the hint that the second argument could be None really, really bold - took me a while to figure out what went wrong ;)Narcotic

© 2022 - 2024 — McMap. All rights reserved.