File Last Modified
Asked Answered
F

3

8

Is it safe to use File Last Modified (e.g. NTFS) when detecting if a file has changed? If not, does file backup applications always hash the whole file to check for changes? If so what hash algorithm is suited for this check?

Faintheart answered 10/12, 2011 at 22:18 Comment(3)
Hashes are what I'd use (SHA512 if you're really worried about file hash collisions. md5 if you don't care much).Insinuation
Agreed with @Blender. The modified date can change without the contents of the file actually changing--say if you touch the file, or hit save without any changes. VCS tools like git use hashes (SHA1 in git's case), since it's very unlikely for two mostly similar files to have the same digest.Strunk
+1 for asking an interesting question. Using hash for checking if file has been modified gets time-consuming if the amount of data grows large.Post
C
5

It depends on the requirements of the application. Can it tolerate false positives? False negatives?

A File Last Modified date is not reliable. For example, FTP may change the modified date without changing the file, or a file could be downloaded twice, once over itself, changing the modified date without changing the file. On the other hand, there are a few utilities that will change a file but keep the same File Last Modified date.

If action absolutely must be taken on a file when it has been changed, the reliable way is to use a good hash or fingerprint. This does take time. One way to improve the odds without taking so much time would be to compare the modified date along with the file size, but again this is not foolproof.

Coeternal answered 11/12, 2011 at 0:1 Comment(0)
F
2

I wouldn't trust last modified time so much since even opening a file and adding a single character would change it modification time. Hashing has the problem of collisions, so I would suggest reading about Rabin's Fingerprinting algorithm.

Fia answered 10/12, 2011 at 22:39 Comment(4)
Is collisions for hashing a realistic problem when checking for file modification (assuming a decent hash)? If it was, would that not mean it would be "easy" to generate content matching a hash at will?Post
Fingerprinting may be seen as a good hash function which guarantees no collisions. It can uniquely identify large blocks of data where cryptographic hash functions may be unnecessary. Many file backing up applications typically transfer modified contents over the network, fingerprinting can be used to detect if a file has changed by downloading the fingerprint and comparing against its previous fingerprint.Fia
What I meant what, once the files start growing, generating the hash / fingerprint takes time. Say you have 1 TB of data, just reading it over SATA 3 would take at least 22 minutes. So, some other support is needed if you often want to check if data has changed.Post
@ydev, you cannot use a hash to guarantee no collisions. If there is no chance of collisions, the hash must be at least as large as the file (or a lossless compression of such).Furmark
B
0

I think get used to setting up effective and routinely monitored hash check. Last modified I think is not as safe as many like to think. Stick with checking the hash and use a good software that does it regularly.

Trust me, once you get used to not picking easiest route and always do safest, you’ll develop great habits that will carry you forward to other security measures.

Boiler answered 3/9, 2018 at 19:53 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.