File containing its own checksum
Asked Answered
S

11

58

Is it possible to create a file that will contain its own checksum (MD5, SHA1, whatever)? And to upset jokers I mean checksum in plain, not function calculating it.

Siloxane answered 13/7, 2009 at 7:4 Comment(0)
D
40

I created a piece of code in C, then ran bruteforce for less than 2 minutes and got this wonder:

The CRC32 of this string is 4A1C449B

Note the must be no characters (end of line, etc) after the sentence.

You can check it here: http://www.crc-online.com.ar/index.php?d=The+CRC32+of+this+string+is+4A1C449B&en=Calcular+CRC32

This one is also fun:

I killed 56e9dee4 cows and all I got was...

Source code (sorry it's a little messy) here: http://www.latinsud.com/pub/crc32/

Diamond answered 26/7, 2011 at 17:9 Comment(3)
hey, how did you make this precomputed table? i want to do exactly the same... :)Esposito
I think i found the code. It is dirty and there is no precomputed table. latinsud.com/pub/crc32Diamond
@Diamond I'm a java person and not great with c. Can you explain how the code works? I don't understand how you can use a precomputed table when the crc is part of the string you're calculating.Stalwart
S
18

Yes. It's possible, and it's common with simple checksums. Getting a file to include it's own md5sum would be quite challenging.

In the most basic case, create a checksum value which will cause the summed modulus to equal zero. The checksum function then becomes something like

(n1 + n2 ... + CRC) % 256 == 0

If the checksum then becomes a part of the file, and is checked itself. A very common example of this is the Luhn algorithm used in credit card numbers. The last digit is a check digit, and is itself part of the 16 digit number.

Strawser answered 13/7, 2009 at 7:17 Comment(2)
Right, that's what I said. :-) Since it's only 32 bits, it's entirely feasible to just brute-force the solution.Recognizee
This does not show how to include the md5sum of a file within the file, which is what the question asked.Topflight
L
13

Check this:

echo -e '#!/bin/bash\necho My cksum is 918329835' > magic
Lurid answered 25/7, 2012 at 8:4 Comment(1)
Just incremented the number and checked by a bash script at around 350 checks per second for 3 months or so. I think this in not the only valid cksum for this fileLurid
S
8

"I wish my crc32 was 802892ef..."

Well, I thought this was interesting so today I coded a little java program to find collisions. Thought I'd leave it here in case someone finds it useful:

import java.util.zip.CRC32;

public class Crc32_recurse2 {

    public static void main(String[] args) throws InterruptedException {

        long endval = Long.parseLong("ffffffff", 16);

        long startval = 0L;
//      startval = Long.parseLong("802892ef",16); //uncomment to save yourself some time

        float percent = 0;
        long time = System.currentTimeMillis();
        long updates = 10000000L; // how often to print some status info

        for (long i=startval;i<endval;i++) {

            String testval = Long.toHexString(i);

            String cmpval = getCRC("I wish my crc32 was " + testval + "...");
            if (testval.equals(cmpval)) {
                System.out.println("Match found!!! Message is:");
                System.out.println("I wish my crc32 was " + testval + "...");
                System.out.println("crc32 of message is " + testval);
                System.exit(0);
            }

            if (i%updates==0) {
                if (i==0) {
                    continue; // kludge to avoid divide by zero at the start
                }
                long timetaken = System.currentTimeMillis() - time;
                long speed = updates/timetaken*1000;
                percent =  (i*100.0f)/endval;
                long timeleft = (endval-i)/speed; // in seconds
                System.out.println(percent+"% through - "+ "done "+i/1000000+"M so far"
                        + " - " + speed+" tested per second - "+timeleft+
                        "s till the last value.");
                time = System.currentTimeMillis();
            }       
        }       
    }

    public static String getCRC(String input) {
        CRC32 crc = new CRC32();
        crc.update(input.getBytes());
        return Long.toHexString(crc.getValue());
    }

}

The output:

49.825756% through - done 2140M so far - 1731000 tested per second - 1244s till the last value.
50.05859% through - done 2150M so far - 1770000 tested per second - 1211s till the last value.
Match found!!! Message is:
I wish my crc32 was 802892ef...
crc32 of message is 802892ef

Note the dots at the end of the message are actually part of the message.

On my i5-2500 it was going to take ~40 minutes to search the whole crc32 space from 00000000 to ffffffff, doing about 1.8 million tests/second. It was maxing out one core.

I'm fairly new with java so any constructive comments on my code would be appreciated.

"My crc32 was c8cb204, and all I got was this lousy T-Shirt!"

Stalwart answered 11/3, 2013 at 13:31 Comment(0)
Q
6

Certainly, it is possible. But one of the uses of checksums is to detect tampering of a file - how would you know if a file has been modified, if the modifier can also replace the checksum?

Quickwitted answered 13/7, 2009 at 7:6 Comment(3)
@AmigableClarkKant, my point being that going down this path is harmful - it defeats the purpose of having a checksum in the first place. The question specifically mentioned cryptographic algorithms so I presume the intent was to detect deliberate tampering rather than accidental corruption.Quickwitted
@MarkRansom I wouldn't trust any cryptographic algorithm that derives its "security" from a lack of public discussion of how to break it. In cases like that, there should be public discussion. It wouldn't ruin the security because any security would have been fake anyway, and that way people will know the algorithm isn't actually secure and that they should use something else instead.Shandra
@flarn2006 my point is that putting the checksum on the file would not provide any security at all. If you want to detect accidental corruption of a file then it might be useful, but it is worthless against an intentional attack.Quickwitted
R
5

Sure, you could concatenate the digest of the file itself to the end of the file. To check it, you would calculate the digest of all but the last part, then compare it to the value in the last part. Of course, without some form of encryption, anyone can recalculate the digest and replace it.

edit

I should add that this is not so unusual. One technique is to concatenate a CRC-32 so that the CRC-32 of the whole file (including that digest) is zero. This won't work with digests based on cryptographic hashes, though.

Recognizee answered 13/7, 2009 at 7:8 Comment(0)
W
2

There is a neat implementation of the Luhn Mod N algorithm in the python-stdnum library ( see luhn.py). The calc_check_digit function will calculate a digit or character which, when appended to the file (expressed as a string) will create a valid Luhn Mod N string. As noted in many answers above, this gives a sanity check on the validity of the file, but no significant security against tampering. The receiver will need to know what alphabet is being used to define Luhn mod N validity.

Wallas answered 6/9, 2011 at 20:59 Comment(0)
L
1

I don't know if I understand your question correctly, but you could make the first 16 bytes of the file the checksum of the rest of the file.

So before writing a file, you calculate the hash, write the hash value first and then write the file contents.

Louls answered 13/7, 2009 at 7:6 Comment(4)
Although it's perfectly valid practical approach, I meant checksum that will include itself alsoSiloxane
I'm not a mathematician, but I think this is simply impossibleLouls
It isn't impossible, but it is very very difficult.Arleanarlee
For CRC-32, it's actually quite simple. For a crypto hash, you'd be quite correct.Recognizee
V
1

If the question is asking whether a file can contain its own checksum (in addition to other content), the answer is trivially yes for fixed-size checksums, because a file could contain all possible checksum values.

If the question is whether a file could consist of its own checksum (and nothing else), it's trivial to construct a checksum algorithm that would make such a file impossible: for an n-byte checksum, take the binary representation of the first n bytes of the file and add 1. Since it's also trivial to construct a checksum that always encodes itself (i.e. do the above without adding 1), clearly there are some checksums that can encode themselves, and some that cannot. It would probably be quite difficult to tell which of these a standard checksum is.

Violaceous answered 4/5, 2010 at 22:17 Comment(0)
B
-1

Sure.

The simplest way would be to run the file through an MD5 algorithm and embed that data within the file. You can split up the check sum and place it at known points of the file (based on a portion size of the file e.g. 30%, 50%, 75%) if you wish to try and hide it.

Similarly you could encrypt the file, or encrypt a portion of the file (along with the MD5 checksum) and embed that in the file. Edit I forgot to say that you would need to remove the checksum data before using it.

Of course if your file needs to be readily readable by another program e.g. Word then things become a little more complicated as you don't want to "corrupt" the file so that it is no longer readable.

Broz answered 13/7, 2009 at 7:20 Comment(3)
If you embed that data within the file, wouldn't that change the md5 checksum?Whinchat
It would if you ran the checksum routine on it again, but that is the point of removing it before use. Simplest way would be to just add the checksum onto the end of the file. When the file is received you remove the checksum data and rerun the checksum routine on the remaining data. Any data corruption to either the checksum or the original data will show up here.Broz
I am fairly certain zakovyrya was asking for the checksum to be included in its own calculation.Violaceous
S
-1

You can of course, but in that case the SHA digest of the whole file will not be the SHA you included, because it is a cryptographic hash function, so changing a single bit in the file changes the whole hash. What you are looking for is a checksum calculated using the content of the file in way to match a set of criteria.

Supervisory answered 13/7, 2009 at 7:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.