What is the fastest way I can compare two equal-size bitmaps to determine whether they are identical?
Asked Answered
S

9

47

I am trying to write a function to determine whether two equal-size bitmaps are identical or not. The function I have right now simply compares a pixel at a time in each bitmap, returning false at the first non-equal pixel.

While this works, and works well for small bitmaps, in production I'm going to be using this in a tight loop and on larger images, so I need a better way. Does anyone have any recommendations?

The language I'm using is C# by the way - and yes, I am already using the .LockBits method. =)

Edit: I've coded up implementations of some of the suggestions given, and here are the benchmarks. The setup: two identical (worst-case) bitmaps, 100x100 in size, with 10,000 iterations each. Here are the results:

CompareByInts (Marc Gravell) :   1107ms
CompareByMD5  (Skilldrick)   :   4222ms
CompareByMask (GrayWizardX)  :    949ms

In CompareByInts and CompareByMask I'm using pointers to access the memory directly; in the MD5 method I'm using Marshal.Copy to retrieve a byte array and pass that as an argument to MD5.ComputeHash. CompareByMask is only slightly faster, but given the context I think any improvement is useful.

Thanks everyone. =)

Edit 2: Forgot to turn optimizations on - doing that gives GrayWizardX's answer even more of a boost:

CompareByInts   (Marc Gravell) :    944ms
CompareByMD5    (Skilldrick)   :   4275ms
CompareByMask   (GrayWizardX)  :    630ms
CompareByMemCmp (Erik)         :    105ms

Interesting that the MD5 method didn't improve at all.

Edit 3: Posted my answer (MemCmp) which blew the other methods out of the water. o.O

Said answered 8/1, 2010 at 22:28 Comment(7)
Have you tested this with the "full size" bitmaps you are going to run through in production? Has it proven itself slow? Have you profiled your code on your production bitmaps to determine where it is slowing down?Water
Yes - the problem is that the worst case - scanning both bitmaps and determining that they are identical - is also the most common case.Said
Define "identical" in the context of bitmaps. Do you mean, binary identical, file-based (both are .png files, and the contents are identical)? Do you mean pixel-identical (could be different file formats, but the pixels are the same)? Do you mean perceptually identical (slightly different red channels are OK since the eyes of humans aren't that good to discern differences in the red channel anyway?Bonnette
just a small tip. We (in college) used to do a 1D allocation for the image using pointers (in C) with one malloc instead of multiple allocations when having the image represented as 2D.Kacikacie
I mean pixel-perfect identical.Said
Thanks for doing some science with your test cases.Preconception
See my answer in #43789 slightly beats memcmp.Adze
S
45

Edit 8-31-12: per Joey's comment below, be mindful of the format of the bitmaps you compare. They may contain padding on the strides that render the bitmaps unequal, despite being equivalent pixel-wise. See this question for more details.


Reading this answer to a question regarding comparing byte arrays has yielded a MUCH FASTER method: using P/Invoke and the memcmp API call in msvcrt. Here's the code:

[DllImport("msvcrt.dll")]
private static extern int memcmp(IntPtr b1, IntPtr b2, long count);

public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
{
    if ((b1 == null) != (b2 == null)) return false;
    if (b1.Size != b2.Size) return false;

    var bd1 = b1.LockBits(new Rectangle(new Point(0, 0), b1.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
    var bd2 = b2.LockBits(new Rectangle(new Point(0, 0), b2.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);

    try
    {
        IntPtr bd1scan0 = bd1.Scan0;
        IntPtr bd2scan0 = bd2.Scan0;

        int stride = bd1.Stride;
        int len = stride * b1.Height;

        return memcmp(bd1scan0, bd2scan0, len) == 0;
    }
    finally
    {
        b1.UnlockBits(bd1);
        b2.UnlockBits(bd2);
    }
}
Said answered 10/1, 2010 at 20:35 Comment(6)
Side note: This does not necessarily work, because each stride may contain padding.Trifacial
Ah, very interesting. I hadn't considered this possibility - although in practice for my application this isn't an issue as the bitmaps in question are formatted Format32bppRgb. Thanks for the hint! =)Said
It occurs to me that another way to deal with the stride padding issue would be to invoke memcmp on a line by line basis, and setting 'count' to the number of bytes per line minus the padding. Might take a little longer, but would reduce the number of false misses due to padding inconsistencies.Said
The question is what overhead a P/Invoke call per line has in that case.Trifacial
I did some timing tests and found that, for how I'm comparing bitmaps, the above example is slower by a factor of 100 as compared to a direct comparison of each pixel using a for loop and b1.GetPixel(x, y) == b2.GetPixel(x, y). I'm doing small bitmap comparisons (around 100 x 10 px), so the result might change for comparisons against larger bitmaps.Ichthyosis
Note that the last parameter of memcmp is an IntPtr, not a long, because if the program is running at 32 bits it is 32 bits, at 64 bits it is 64 bits. Clearly then you have to return memcmp(bd1scan0, bd2scan0, (IntPtr)len) == 0;Atlante
S
9

If you are trying to determine if they are 100% equal, you can invert one and add it to the other if its zero they are identical. Extending this using unsafe code, take 64 bits at a time as a long and do the math that way, any differences can cause an immediate fail.

If the images are not 100% identical (comparing png to jpeg), or if you are not looking for a 100% match then you have some more work ahead of you.

Good luck.

Suite answered 8/1, 2010 at 22:33 Comment(8)
I am looking for 100% identical, so this method could work. Thanks =)Said
Isn't adding one pixel to the inverse of another and seeing if the result is zero equivalent or slower to comparing the two pixels and seeing if they're the same? Or am I missing something?Wauters
He is using pointer arithmetic (i.e. unsafe code) and so he can treat the pointers as an array of fixed size types (i.e. longs) and do pure hardware math.Suite
Also testing for zero is performs better than testing for any other constant, from what I recall.Said
You can't invert one and subtract it from the other. That would be like a - (-b). If a==b, then this will equal a*2. I have no problems understanding what you mean though, but you should get the method correct.Bonnette
I think inversion is still OK, provided its all done with 8 bits.Kacikacie
32-bits works, and is fastest in this context - so I've marked it as the answer. I'll be posting benchmark results shortly.Said
By the way - the relevant comparison is: if ((~cursor1[0] & cursor2[0]) != 0x00) return false; - invert one, AND it to the other, and check the result. Alternatively I could XOR them together without inverting, and compare that result to zero. Either way it seems like the advantage this has over comparing for equality is that comparisons with zero are faster than comparing with any other constant.Said
F
8

Well, you're using .LockBits, so presumably you're using unsafe code. Rather than treating each row origin (Scan0 + y * Stride) as a byte*, consider treating it as an int*; int arithmetic is pretty quick, and you only have to do 1/4 as much work. And for images in ARGB you might still be talking in pixels, making the math simple.

Froemming answered 8/1, 2010 at 22:29 Comment(1)
I will be using ARGB images so this sounds like it might be the winner. I'll give it a shot and do some profiling. Well, some more profiling.Said
W
6

Could you take a hash of each and compare? It would be slightly probabilistic, but practically not.

Thanks to Ram, here's a sample implementation of this technique.

Wauters answered 8/1, 2010 at 22:33 Comment(3)
It won't fail fast, but it does work better if you have to compare 1 image to multiple candidates...Scissure
You could do a hybrid, and test a random sample of say 2% of pixels to see if they're the same, and if so move onto hashing...Wauters
+1 for mentioning hash. Here is a sample implementation codeproject.com/KB/GDI-plus/comparingimages.aspxKacikacie
A
3

If the original problem is just to find the exact duplicates among two bitmaps, then just a bit level comparison will have to do. I don't know C# but in C I would use the following function:

int areEqual (long size, long *a, long *b)
{
    long start = size / 2;
    long i;
    for (i = start; i != size; i++) { if (a[i] != b[i]) return 0 }
    for (i = 0; i != start; i++) { if (a[i] != b[i]) return 0 }
    return 1;
}

I would start looking in the middle because I suspect there is a much better chance of finding unequal bits near the middle of the image than the beginning; of course, this would really depend on the images you are deduping, selecting a random place to start may be best.

If you are trying to find the exact duplicates among hundreds of images then comparing all pairs of them is unnecessary. First compute the MD5 hash of each image and place it in a list of pairs (md5Hash, imageId); then sort the list by the m5Hash. Next, only do pairwise comparisons on the images that have the same md5Hash.

Alvardo answered 9/1, 2010 at 20:16 Comment(0)
D
3

If these bitmaps are already on your graphics card then you can parallelize such a check by doing it on the graphics card using a language like CUDA or OpenCL.

I'll explain in terms of CUDA, since that's the one I know. Basically CUDA lets you write general purpose code to run in parallel across each node of your graphics card. You can access bitmaps that are in shared memory. Each invocation of the function is also given an index within the set of parallel runs. So, for a problem like this, you'd just run one of the above comparison functions for some subset of the bitmap - using parallelization to cover the entire bitmap. Then, just write a 1 to a certain memory location if the comparison fails (and write nothing if it succeeds).

If you don't already have the bitmaps on your graphics card, this probably isn't the way to go, since the costs for loading the two bitmaps on your card will easily eclipse the savings such parallelization will gain you.

Here's some (pretty bad) example code (it's been a little while since I programmed CUDA). There's better ways to access bitmaps that are already loaded as textures, but I didn't bother here.

// kernel to run on GPU, once per thread
__global__ void compare_bitmaps(long const * const A, long const * const B, char * const retValue, size_t const len)
{
 // divide the work equally among the threads (each thread is in a block, each block is in a grid)
 size_t const threads_per_block = blockDim.x * blockDim.y * blockDim.z;
 size_t const len_to_compare = len / (gridDim.x * gridDim.y * gridDim.z * threads_per_block);
# define offset3(idx3,dim3)  (idx3.x + dim3.x * (idx3.y + dim3.y * idx3.z))
 size_t const start_offset = len_to_compare * (offset3(threadIdx,blockDim) + threads_per_block * offset3(blockIdx,gridDim));
 size_t const stop_offset = start_offset + len_to_compare;
# undef offset3

 size_t i;
 for (i = start_offset; i < stop_offset; i++)
 {
  if (A[i] != B[i]) 
  {
   *retValue = 1;
   break;
  }
 }
 return;
}
Dugger answered 10/1, 2010 at 18:13 Comment(0)
S
0

If you can implement something like Duff's Device in your language, that might give you a significant speed boost over a simple loop. Usually it's used for copying data, but there's no reason it can't be used for comparing data instead.

Or, for that matter, you may just want to use some equivalent to memcmp().

Siderostat answered 8/1, 2010 at 22:48 Comment(1)
You can do loop unrolling in virtually any language (where it applies). You may not get such pretty and compact syntax as Duff's device, but the performance will be similar.See
G
0

You could try to add them to a database "blob" then use the database engine to compare their binaries. This would only give you a yes or no answer to whether the binary data is the same. It would be very easy to make 2 images that produce the same graphic but have different binary though.

You could also select a few random pixels and compare them, then if they are the same continue with more until you've checked all the pixels. This would only return a faster negative match though, it still would take as long to find 100% positive matches

Grammatical answered 8/1, 2010 at 23:52 Comment(0)
L
0

Based on the approach of comparing hashes instead of comparing every single pixel, this is what I use:

public static class Utils
{
    public static byte[] ShaHash(this Image image)
    {
        var bytes = new byte[1];
        bytes = (byte[])(new ImageConverter()).ConvertTo(image, bytes.GetType());

        return (new SHA256Managed()).ComputeHash(bytes);
    }

    public static bool AreEqual(Image imageA, Image imageB)
    {
        if (imageA.Width != imageB.Width) return false;
        if (imageA.Height != imageB.Height) return false;

        var hashA = imageA.ShaHash();
        var hashB = imageB.ShaHash();

        return !hashA
            .Where((nextByte, index) => nextByte != hashB[index])
            .Any();
    }
]

Usage is straight forward:

bool isMatch = Utils.AreEqual(bitmapOne, bitmapTwo);
Longawa answered 1/3, 2014 at 4:41 Comment(1)
Your compare will always return true for the images of the same dimension. "hashB = imageA.ShaHash()" should be "imageB". Typo?Buttock

© 2022 - 2024 — McMap. All rights reserved.