Does NTFS store the hash or CRC32 of every inode/file, how to access it?
Asked Answered
U

0

6

I know how to read a file, pass these bytes to a hashing algorithm such as MD5SUM, SHA256 or CRC32, and get the hash.

Here I'm asking something slightly different:

Each time we write/modify a file on a NTFS partition, does it re-compute a hash or CRC32 and store this information in the NTFS metadata / FAT / MFT (Master File Table) (I don't remember the exact name)?

Note: the important thing is that I just want to read the stored hash/CRC stored in the filesystem (i.e. read a few bytes, should be a few milliseconds maximum), and not recompute the hash (that would take many seconds for a 10 GB file).


If so, how to access this CRC or hash for a specific file, using Python? Is there something like:

import ntfsutil
ntfsutil.getCRC('d:/big50GBfile.dat')  # done in < 10 ms
Undulate answered 24/11, 2018 at 10:6 Comment(12)
No.Technicolor
@Technicolor So the only way is to re-compute the hash/checksum each time I need it? NTFS cannot be configured to store checksums? PS: prime number sequence in your pseudo (nearly), is it on purpose? :)Undulate
NTFS doesn't do file checksums. (Also, the 2 3 5 7 11 thing was pure luck, but it's part of the reason I never changed my default username.)Technicolor
You could cache the checksum and corresponding last-write timestamp in an alternate data stream, e.g. "big50GBfile.dat:md5". This "md5" stream will be lost if the file is copied to a file system that doesn't support named streams (e.g. FAT32).Greed
@eryksun Wow, this is the first time I hear about streams, very cool! howtogeek.com/howto/windows-vista/…. For future ref: notepad c:\test.txt:secret will create a stream named secret! Funny that this is so rarely used...Undulate
See File Streams. An NTFS file/directory is a collection of resident and non-resident attributes (i.e. streams). A NTFS directory has an index stream of the contained files (e.g. "C:\", "C:\::$INDEX_ALLOCATION", and "C:\:$I30:$INDEX_ALLOCATION" are equivalent). An NTFS file has at least one anonymous (nameless) data stream (e.g. "C:\test.txt" and "C:\test.txt::$DATA" are equivalent). Both files and directories can have named data streams (e.g. "C:\:secret" or "C:\:secret:$DATA", a data stream on the root directory).Greed
FYI, streams/attributes are fundamental to the design of NTFS, so this capability has been around since NT 3.1 in 1993. The article you linked to mistakenly claims this was added in Windows 2000, which is probably due to a literal interpretation of the Windows version supported (listed at the bottom of the MSDN docs) at the time the article was published.Greed
@eryksun Good to know! Do you know popular software that make useful use of this feature? By the way, maybe you could post your solution using these streams as an answer, for future reference (Comments might get deleted one day).Undulate
Adding a checksum "near" a file and relying on that would defeat the purpose of the checksum itself.Effeminacy
@ErykSun I think your comments together could be the answer if you post it, I'll accept it.Undulate
@AndreaLazzarotto: Depends on the use case. If you just want to detect "benign" (i.e. non-malicious) changes to a file, CRC32 stored in a non-protected location is fine.Tallyho
@Undulate "Do you know popular software that makes use of this feature?" IE/Edge/Chrome use it to give downloaded files the "Mark of the Web". SQL Server (pre-2016) used an alternate stream as it's working file while it checked database file for corruption. NTFS encryption uses a stream to store the key used to encrypt a file.Pronounced

© 2022 - 2024 — McMap. All rights reserved.