Can anyone define the Windows PE Checksum Algorithm?
Asked Answered
V

9

13

I would like to implement this in C#

I have looked here: http://www.codeproject.com/KB/cpp/PEChecksum.aspx

And am aware of the ImageHlp.dll MapFileAndCheckSum function.

However, for various reasons, I would like to implement this myself.

The best I have found is here: http://forum.sysinternals.com/optional-header-checksum-calculation_topic24214.html

But, I don't understand the explanation. Can anyone clarify how the checksum is calculated?

Thanks!

Update

I from the code example, I do not understand what this means, and how to translate it into C#

sum -= sum < low 16 bits of CheckSum in file // 16-bit borrow 
sum -= low 16 bits of CheckSum in file 
sum -= sum < high 16 bits of CheckSum in file 
sum -= high 16 bits of CheckSum in file 

Update #2

Thanks, came across some Python code that does similar too here

    def generate_checksum(self):

    # This will make sure that the data representing the PE image
    # is updated with any changes that might have been made by
    # assigning values to header fields as those are not automatically
    # updated upon assignment.
    #
    self.__data__ = self.write()

    # Get the offset to the CheckSum field in the OptionalHeader
    #
    checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64

    checksum = 0

    # Verify the data is dword-aligned. Add padding if needed
    #
    remainder = len(self.__data__) % 4
    data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) )

    for i in range( len( data ) / 4 ):

        # Skip the checksum field
        #
        if i == checksum_offset / 4:
            continue

        dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0]
        checksum = (checksum & 0xffffffff) + dword + (checksum>>32)
        if checksum > 2**32:
            checksum = (checksum & 0xffffffff) + (checksum >> 32)

    checksum = (checksum & 0xffff) + (checksum >> 16)
    checksum = (checksum) + (checksum >> 16)
    checksum = checksum & 0xffff

    # The length is the one of the original data, not the padded one
    #
    return checksum + len(self.__data__)

However, it's still not working for me - here is my conversion of this code:

using System;
using System.IO;

namespace CheckSumTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var data = File.ReadAllBytes(@"c:\Windows\notepad.exe");

            var PEStart = BitConverter.ToInt32(data, 0x3c);
            var PECoffStart = PEStart + 4;
            var PEOptionalStart = PECoffStart + 20;
            var PECheckSum = PEOptionalStart + 64;
            var checkSumInFile = BitConverter.ToInt32(data, PECheckSum);
            Console.WriteLine(string.Format("{0:x}", checkSumInFile));

            long checksum = 0;

            var remainder = data.Length % 4;
            if (remainder > 0)
            {
                Array.Resize(ref data, data.Length + (4 - remainder));
            }

            var top = Math.Pow(2, 32);

            for (int i = 0; i < data.Length / 4; i++)
            {
                if (i == PECheckSum / 4)
                {
                    continue;
                }
                var dword = BitConverter.ToInt32(data, i * 4);
                checksum = (checksum & 0xffffffff) + dword + (checksum >> 32);
                if (checksum > top)
                {
                    checksum = (checksum & 0xffffffff) + (checksum >> 32);
                }
            }

            checksum = (checksum & 0xffff) + (checksum >> 16);
            checksum = (checksum) + (checksum >> 16);
            checksum = checksum & 0xffff;

            checksum += (uint)data.Length; 
            Console.WriteLine(string.Format("{0:x}", checksum));

            Console.ReadKey();
        }
    }
}

Can anyone tell me where I'm being stupid?

Varityper answered 21/6, 2011 at 17:55 Comment(3)
What about the code do you not understand? Example code would be helpful.Ejaculatory
Sorry - editing to make clearerVarityper
Internet Checksum, lld/COFF/Writer.cpp (C++), Python-based implementation under GPL, ReactOS implementation (C)Balance
V
10

Ok, finally got it working ok... my problem was that I was using ints not uints!!! So, this code works (assuming data is 4-byte aligned, otherwise you'll have to pad it out a little) - and PECheckSum is the position of the CheckSum value within the PE (which is clearly not used when calculating the checksum!!!!)

Note that the following is C# code.

static uint CalcCheckSum(byte[] data, int PECheckSum)
{
    long checksum = 0;
    var top = Math.Pow(2, 32);

    for (var i = 0; i < data.Length / 4; i++)
    {
        if (i == PECheckSum / 4)
        {
            continue;
        }
        var dword = BitConverter.ToUInt32(data, i * 4);
        checksum = (checksum & 0xffffffff) + dword + (checksum >> 32);
        if (checksum > top)
        {
            checksum = (checksum & 0xffffffff) + (checksum >> 32);
        }
    }

    checksum = (checksum & 0xffff) + (checksum >> 16);
    checksum = (checksum) + (checksum >> 16);
    checksum = checksum & 0xffff;

    checksum += (uint)data.Length;
    return (uint)checksum;

}
Varityper answered 22/6, 2011 at 10:29 Comment(1)
This isn't correct. It doesn't match the ImageHlp.dll MapFileAndCheckSum output. Check out the output using the PE Checksum Executable provided in codeproject.com/Articles/19326/…Est
E
6

The code in the forum post is not strictly the same as what was noted during the actual disassembly of the Windows PE code. The CodeProject article you reference gives the "fold 32-bit value into 16 bits" as:

mov edx,eax    ; EDX = EAX
shr edx,10h    ; EDX = EDX >> 16    EDX is high order
and eax,0FFFFh ; EAX = EAX & 0xFFFF EAX is low order
add eax,edx    ; EAX = EAX + EDX    High Order Folded into Low Order
mov edx,eax    ; EDX = EAX
shr edx,10h    ; EDX = EDX >> 16    EDX is high order
add eax,edx    ; EAX = EAX + EDX    High Order Folded into Low Order
and eax,0FFFFh ; EAX = EAX & 0xFFFF EAX is low order 16 bits  

Which you could translate into C# as:

// given: uint sum = ...;
uint high = sum >> 16; // take high order from sum
sum &= 0xFFFF;         // clear out high order from sum
sum += high;           // fold high order into low order

high = sum >> 16;      // take the new high order of sum
sum += high;           // fold the new high order into sum
sum &= 0xFFFF;         // mask to 16 bits
Ejaculatory answered 21/6, 2011 at 18:20 Comment(2)
Thanks again - have bumped your reply, but marked mine as the answer, as I think it represents the complete algorithm.Varityper
@Mark: certainly, I only intended to illustrate the relevant parts of the article.Ejaculatory
S
2

Java code below from emmanuel may not work. In my case it hangs and does not complete. I believe this is due to the heavy use of IO in the code: in particular the data.read()'s. This can be swapped with an array as solution. Where the RandomAccessFile fully or incrementally reads the file into a byte array(s).

I attempted this but the calculation was too slow due to the conditional for the checksum offset to skip the checksum header bytes. I would imagine that the OP's C# solution would have a similar problem.

The below code removes this also.

public static long computeChecksum(RandomAccessFile data, int checksumOffset)
                throws IOException {
        
        ...
        byte[] barray = new byte[(int) length];     
        data.readFully(barray);
        
        long i = 0;
        long ch1, ch2, ch3, ch4, dword;
        
        while (i < checksumOffset) {
            
            ch1 = ((int) barray[(int) i++]) & 0xff;
            ...
            
            checksum += dword = ch1 | (ch2 << 8) | (ch3 << 16) | (ch4 << 24);

            if (checksum > top) {
                checksum = (checksum & 0xffffffffL) + (checksum >> 32);
            }
        }
        i += 4;
        
        while (i < length) {
            
            ch1 = ((int) barray[(int) i++]) & 0xff;
            ...
    
            checksum += dword = ch1 | (ch2 << 8) | (ch3 << 16) | (ch4 << 24);
            
            if (checksum > top) {
                checksum = (checksum & 0xffffffffL) + (checksum >> 32);
            }
        }
        
        checksum = (checksum & 0xffff) + (checksum >> 16);
        checksum = checksum + (checksum >> 16);
        checksum = checksum & 0xffff;
        checksum += length;
        
        return checksum;
    }

I still however think that code was too verbose and clunky so I swapped out the raf with a channel and rewrote the culprit bytes to zero's to eliminate the conditional. This code could still probably do with a cache style buffered read.

public static long computeChecksum2(FileChannel ch, int checksumOffset)
            throws IOException {
    
    ch.position(0);
    long sum = 0;
    long top = (long) Math.pow(2, 32);
    long length = ch.size();
    
    ByteBuffer buffer = ByteBuffer.wrap(new byte[(int) length]);
    buffer.order(ByteOrder.LITTLE_ENDIAN);
    
    ch.read(buffer);
    buffer.putInt(checksumOffset, 0x0000);
    
    buffer.position(0);
    while (buffer.hasRemaining()) {
        sum += buffer.getInt() & 0xffffffffL;
        if (sum > top) {
            sum = (sum & 0xffffffffL) + (sum >> 32);
        }
    }   
    sum = (sum & 0xffff) + (sum >> 16);
    sum = sum + (sum >> 16);
    sum = sum & 0xffff;
    sum += length;
    
    return sum;
}
Smalto answered 16/12, 2012 at 16:53 Comment(0)
C
2

No one really answered the original question of "Can anyone define the Windows PE Checksum Algorithm?" so I'm going to define it as simply as possible. A lot of the examples given so far are optimizing for unsigned 32-bit integers (aka DWORDs), but if you just want to understand the algorithm itself at its most fundamental, it is simply this:

  1. Using an unsigned 16-bit integer (aka a WORD) to store the checksum, add up all of the WORDs of the data except for the 4 bytes of the PE optional header checksum. If the file is not WORD-aligned, then the last byte is a 0x00.

  2. Convert the checksum from a WORD to a DWORD and add the size of the file.

The PE checksum algorithm above is effectively the same as the original MS-DOS checksum algorithm. The only differences are the location to skip and replacing the XOR 0xFFFF at the end and adding the size of the file instead.

From my WinPEFile class for PHP, the above algorithm looks like:

    $x = 0;
    $y = strlen($data);
    $val = 0;
    while ($x < $y)
    {
        // Skip the checksum field location.
        if ($x === $this->pe_opt_header["checksum_pos"])  $x += 4;
        else
        {
            $val += self::GetUInt16($data, $x, $y);

            // In PHP, integers are either signed 32-bit or 64-bit integers.
            if ($val > 0xFFFF)  $val = ($val & 0xFFFF) + 1;
        }
    }

    // Add the file size.
    $val += $y;
Commissar answered 21/11, 2019 at 1:21 Comment(0)
F
2

The CheckSum field is 32 bits long and is calculated as follows

1. Add all dwords (32 bit pieces) of the entire file to a sum

Add all dwords of the entire file not including the CheckSum field itself, including all headers and all of the contents, to a dword. If the dword overflows, add the overflowed bit back to the first bit (2^0) of the dword. If the file is not entirely divisible into dwords (4 bit pieces) see 2.

The best way I know to realize this is by using the GNU C Compilers Integer Overflow Builtin function __builtin_uadd_overflow. In the original ChkSum function documented by Jeffrey Walton the sum was calculated by performing an add (%esi),%eax where
esi contains the base address of the file and eax is 0 and adding the rest of the file like this

adc 0x4(%esi),%eax
adc 0x8(%esi),%eax
adc 0xc(%esi),%eax
adc 0x10(%esi),%eax
...
adc $0x0,%eax

The first add adds the first dword ignoring any carry flag. The next dwords
are added by the adc instruction which does the same thing as add but
adds any carry flag that was set before executing the instruction in addition
to the summand. The last adc $0x0,%eax adds only the last carry flag if it
was set and cannot be discarded.

Please keep in mind that the dword of CheckSum field itself should not be added.

2. Add the remainder to the sum if there is one

If the file is not entirely divisible into dwords, add the remainder as a
zero-padded dword. For example: say your file is 15 bytes long and looks like this
0E 1F BA 0E | 00 B4 09 CD | 21 B8 01 4C | CD 21 54
You need to add the remainder as 0x005421CD to the sum. My system is a
little-endian system. I do not know if the checksum would change because of the
this order of the bytes on big-endian systems, or you would just simulate this
behaviour.
I do this by rounding up the buffer_size to the next bytecount divisible by 4
without remainder or put differently: the next whole dword count represented
in bytes. Then I allocate with calloc because it initializes the memory block
with all zeros.

if(buffer_size%4) {
  buffer_size+=4-(buffer_size%4);
...
calloc(buffer_size,1)

3. Add the lower word (16 bit piece) and the higher word of the sum together.

sum=(sum&0xffff)+(sum>>16);

4. Add the new higher word once again

sum+=(sum>>16);

5. Only keep the lower word

sum&=0xffff;

6. Add the number of bytes in the file to the sum

return(sum+size);

This is how I wrote it. It is not C#, but C. off_t size is the number of bytes in the file. uint32_t *base is a pointer to the file loaded into memory. The block of memory should be padded with zeros at the end to the next bytecount divisible by 4.

uint32_t pe_header_checksum(uint32_t *base,off_t size) {
  uint32_t sum=0;
  off_t i;
  for(i=0;i<(size/4);i++) {
    if(i==0x36) continue;
    sum+=__builtin_uadd_overflow(base[i],sum,&sum);
  }
  if(size%4) sum+=base[i];
  sum=(sum&0xffff)+(sum>>16);
  sum+=(sum>>16);
  sum&=0xffff;
  return(sum+size);
}

If you want you can see the code in action and read a little bit more here.

Fare answered 11/2, 2022 at 20:55 Comment(0)
T
1

The Java example is not entirely correct. Following Java implementation corresponds with the result of Microsoft's original implementation from Imagehlp.MapFileAndCheckSumA.

It's important that the input bytes are getting masked with inputByte & 0xff and the resulting long masked again when it's used in the addition term with currentWord & 0xffffffffL (consider the L):

    long checksum = 0;
    final long max = 4294967296L; // 2^32

    // verify the data is DWORD-aligned and add padding if needed
    final int remainder = data.length % 4;
    final byte[] paddedData = Arrays.copyOf(data, data.length
            + (remainder > 0 ? 4 - remainder : 0));

    for (int i = 0; i <= paddedData.length - 4; i += 4)
    {
        // skip the checksum field
        if (i == this.offsetToOriginalCheckSum)
            continue;

        // take DWORD into account for computation
        final long currentWord = (paddedData[i] & 0xff)
                               + ((paddedData[i + 1] & 0xff) << 8)
                               + ((paddedData[i + 2] & 0xff) << 16)
                               + ((paddedData[i + 3] & 0xff) << 24);

        checksum = (checksum & 0xffffffffL) + (currentWord & 0xffffffffL);

        if (checksum > max)
            checksum = (checksum & 0xffffffffL) + (checksum >> 32);
    }

    checksum = (checksum & 0xffff) + (checksum >> 16);
    checksum = checksum + (checksum >> 16);
    checksum = checksum & 0xffff;
    checksum += data.length; // must be original data length

In this case, Java is a bit inconvenient.

Tanah answered 30/3, 2018 at 13:42 Comment(0)
D
0

I was trying to solve the same issue in Java. Here is Mark's solution translated into Java, using a RandomAccessFile instead of a byte array as input:

static long computeChecksum(RandomAccessFile data, long checksumOffset) throws IOException {
    long checksum = 0;
    long top = (long) Math.pow(2, 32);
    long length = data.length();

    for (long i = 0; i < length / 4; i++) {
        if (i == checksumOffset / 4) {
            data.skipBytes(4);
            continue;
        }

        long ch1 = data.read();
        long ch2 = data.read();
        long ch3 = data.read();
        long ch4 = data.read();

        long dword = ch1 + (ch2 << 8) + (ch3 << 16) + (ch4 << 24);

        checksum = (checksum & 0xffffffffL) + dword + (checksum >> 32);

        if (checksum > top) {
            checksum = (checksum & 0xffffffffL) + (checksum >> 32);
        }
    }

    checksum = (checksum & 0xffff) + (checksum >> 16);
    checksum = checksum + (checksum >> 16);
    checksum = checksum & 0xffff;
    checksum += length;

    return checksum;
}
Discountenance answered 14/5, 2012 at 13:27 Comment(0)
M
0
private unsafe static int GetSetPEChecksum(byte[] Array) {
    var Value = 0;
    var Count = Array.Length;
    if(Count >= 64)
        fixed (byte* array = Array) {
            var Index = 0;
            var Coff = *(int*)(array + 60);
            if(Coff >= 64 && Count >= Coff + 92) {
                *(int*)(array + Coff + 88) = 0;
                var Bound = Count >> 1;
                if((Count & 1) != 0) Value = array[Count & ~1];
                var Short = (ushort*)array;
                while(Index < Bound) {
                    Value += Short[Index++];
                    Value = (Value & 0xffff) + (Value >> 16);
                    Value = (Value + (Value >> 16)) & 0xffff;
                }
                *(int*)(array + Coff + 88) = Value += Count;
            }
        }
    return Value;
}

If you need short unsafe... (Not need use Double and Long integers and not need Array aligning inside algorithm)

Minivet answered 10/12, 2016 at 3:55 Comment(0)
M
0

I would like to clarify the situation with the PE checksum 'de-mystifying' the whole story. The checksum algorithm is just like this:

  • Load the PE image file
  • Determine the absolute offset of the CheckSum structure member (an uint32_t, differs for PE/32bit and PE/64bit)
  • Set sum := 0
  • Add all words of the image EXCEPT FOR the uint32_t containing the CheckSum itself to 'sum' WITH CARRY
  • Finally, add the 32bit-wide PE image size to sum, output CheckSum

There is no 'folding' whatsoever. This is simple finite addition taking any overflows into account. In terms of x86 CPU instructions, this is the 'adc' (add with carry) mnemonic instead of the 'add' mnemonic.

In plain C, without any compiler intrinsics, this is:

#include <stdint.h>
#include <byteswap.h>

#ifdef _BIG_ENDIAN_MACHINE
#define csum_bswap_16(_x) bswap_16(_x)
#else // Little Endian
#define csum_bswap_16(_x) (_x)
#endif

// IMPORTANT NOTE: DO provide one extra zero byte after the image if its
// --------------- size is ODD!!!
uint32_t compute_pe_checksum_16bit ( const uint8_t *image_data, uint32_t image_size, uint32_t csum_ofs )
{
  uint32_t          i;
  register uint16_t csum = 0, carry_check;

  for (i = 0; i < csum_ofs; i += sizeof(uint16_t) )
  {
    carry_check = csum;
    csum += csum_bswap_16( *((uint16_t*)(image_data + i)) );
    csum += (csum < carry_check) ? 1 : 0;
  }

  for (i = csum_ofs + sizeof(uint32_t); i < image_size; i += sizeof(uint16_t))
  {
    carry_check = csum;
    csum += csum_bswap_16( *((uint16_t*)(image_data + i)) );
    csum += (csum < carry_check) ? 1 : 0;
  }

  return ((uint32_t)csum) + image_size;
}

I have split the entire loop into two parts (before and after the CheckSum structure element, which has to be excluded) to make the CPU's branch prediction unit happy (otherwise, it would have to check for the CheckSum offset in every loop iteration).

Facing an overflow (i.e. CPU carry bit is 1) just means: 'If you compute x+y and the result is lesser than the previous sum (here: carry_check), then carry bit is 1'.

The proof that this implementation is correct, is simple: Use 'user32.dll' from your Windows installation and compute the checksum (that is what I did for testing).

Some additional remarks:

  • on a Big Endian machine, you have to byte-swap the to-be-added uint16_t values;
  • if you modify this function to use uint32_t's instead of uint16_t's (as mentioned by some answers/comments above), then this yields completely different checksums;
  • this very old (MS-DOS-style) algorithm is still used today: UEFI BIOS' load EFI binaries, which are PE/COFF executables. There might be UEFI implementations that verify the checksum;
  • if you are looking for an optimized implementation and your system is equipped with a SIMD (Single Instruction Multiple Data) unit (almost all CPUs have this today), then you could add eight 16bit values totally in parallel (assuming 128bit SIMD registers) with carry, sometimes called 'horizontal add'.
Millham answered 19/8, 2024 at 11:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.