The core hasher functions in hashlib accept the string contents to be hashed, not filenames to open and read, so as James says, you're hashing the same value 'text.txt'
in both cases.
Python 3.11+
If you're only targeting Python 3.11+, a new option is available: hashlib.file_digest()
. It takes a file object and a hash function or the name of a hashlib hash function. The equivalent to what you tried would be like this:
import hashlib
with open('text.txt', 'rb') as file:
print(hashlib.file_digest(file, 'md5').hexdigest())
Python 2.7-3.10+
While hashlib.file_digest()
will not be available on all supported Python versions until October 2026, we can still take a look inside to get an idea of how we could make an even better version of md5sum()
, using bytearray
and memoryview
instead of iter(lambda: f.read())
.
import hashlib
def md5sum(filename, _bufsize=2**18):
digest = hashlib.md5()
buf = bytearray(_bufsize)
view = memoryview(buf)
with open(filename, 'rb') as file:
while True:
size = file.readinto(buf)
if size == 0:
break # EOF
digest.update(view[:size])
return digest.hexdigest()