How is the minio etag generated
Asked Answered
P

2

6

Does anyone know how the minio etag is generated when you PUT an object? Is it a hash of the file and can we use it to prevent uploading the same file twice?

Many thanks!

Primero answered 24/6, 2020 at 12:28 Comment(1)
I also want to know, but if it is compatible with Amazon S3, then I imagine it is an MD5 hash.Hortatory
F
2

According to the answer of a Minio maintainer from a relevant github issue, ETag is not always MD5 hash of a file:

ETag has multiple meanings and its easy to confuse them to be md5sum - https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html

Here is the explanation that you might want @audouts

ETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.

  • If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console will upload or copy that object as a Multipart Upload, and therefore the ETag will not be an MD5 digest.

This kind of confusing implementation is an AWS S3 baggage sadly we have to carry :-)

So, based on this and another answer from the same issue, if your file is not encrypted, then etag is:

  • if file size < 16 Mb: md5(file)
  • if file size > 16 Mb:
    1. Split it into parts by 16Mb.
    2. Calculate md5 for each part
    3. etag = md5sum(md5-part1.. md5-partN)-N, where N is the number of parts

However the chunk size is not strict. It can be choosen by client. For example, Minio Client (mc) uses 16Mb, but apache-libcloud uses 5Mb. As a result, they have different hashes.

Frager answered 15/5 at 12:34 Comment(0)
P
3

etag is just a md5 hex string. You can simple test it like this:

MinioClient client = new MinioClient("your endpoint","your accesskey","your secretkey");
ObjectStat objectStat = client.statObject("test", "XW02.jpg");
System.out.println(objectStat);

InputStream inputStream = client.getObject("test", "XW02.jpg");
String md5 = DigestUtils.md5DigestAsHex(inputStream);
System.out.println(md5);
Pamilapammi answered 31/12, 2020 at 1:59 Comment(0)
F
2

According to the answer of a Minio maintainer from a relevant github issue, ETag is not always MD5 hash of a file:

ETag has multiple meanings and its easy to confuse them to be md5sum - https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html

Here is the explanation that you might want @audouts

ETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.

  • If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console will upload or copy that object as a Multipart Upload, and therefore the ETag will not be an MD5 digest.

This kind of confusing implementation is an AWS S3 baggage sadly we have to carry :-)

So, based on this and another answer from the same issue, if your file is not encrypted, then etag is:

  • if file size < 16 Mb: md5(file)
  • if file size > 16 Mb:
    1. Split it into parts by 16Mb.
    2. Calculate md5 for each part
    3. etag = md5sum(md5-part1.. md5-partN)-N, where N is the number of parts

However the chunk size is not strict. It can be choosen by client. For example, Minio Client (mc) uses 16Mb, but apache-libcloud uses 5Mb. As a result, they have different hashes.

Frager answered 15/5 at 12:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.