How to efficiently compare a block of memory to a single byte?

Asked 6/8, 2014 at 16:59 Answered 4/1 at 9:5

I'm trying to see if a struct came back as all 0xFF for the size of the struct.

memcmp seems like the obvious starting point, BUT I'd have to allocate a second memory block, populate it with 0xFF's. It just seems like a waste.

Does a standard function exist for this? Or should I just punt and iterate via a for loop?

Azalea answered 6/8, 2014 at 16:59 Comment(5)

you can try to compare by words. whether this is more efficient is something you have to measure. – Harborage 6/8, 2014 at 17:8

Depending on the size of the struct, statically allocating an instance filled with 0xFF (ie: let the compiler and loader do this once off), and using memcmp should allow for the best method as the compiler would be able to specialize for a known/constant size. – Africanist 6/8, 2014 at 17:30

With @Karoly on this. Make sure to test the first byte(s) "manually" if they are not aligned on a DWORD, divide the remaining by 4 (or 8, if your system can handle 64 bit words), and let it rip. Search for "optimized memory copy" for hints on this. – Cumings 6/8, 2014 at 17:30

Methinks some pointer arithmetic and masks should do the trick. Look out for struct packing and use sizeof. – Vancouver 6/8, 2014 at 18:25

See also HEKTO's C++ function over on Code Review. – Gymnasiarch 5/1 at 7:57

The most obvious solution here seems to be to simply loop over the size of the struct and compare it byte-by-byte.

The approach of allocating a block of 0xFF followed by memcmp should achieve the same with but a higher space complexity.

Casals answered 6/8, 2014 at 17:17 Comment(0)

I know no standard function for this.

I don't think that memcmp is always the right choice (it needs twice the memory bandwith).

I would write an iteration (even a very naive one). Most compilers optimize that very well (when asked). So they probably unroll your loops and may do word compares (even if you coded a naive byte iteration).

You might code specialized openmp variants (at least on GCC). See http://openmp.org/

If the structure is big (e.g. dozens of kilobytes, because of the cost of GPGPU <-> RAM data copy) and if you have lot of development time to waste, consider perhaps OpenCL (in particular if you have specialized hardware supporting it, e.g. GPGPUs). It might never worth the cost (unless you do something -which does not require a lot of memory bandwidth- on the CPU while the GPGPU is working)

I would code the naive loop, and won't bother optimizing by hand (unless benchmarking of compiler-optimized code suggests otherwise), because the bottleneck is probably the memory bandwidth.

Fuzee answered 6/8, 2014 at 17:9 Comment(4)

On modern CPUs, well, on anything found on the desktop in the last 15 years, really, the block comparison against a constant can be coded to run at the memory bandwidth. Having a GPGPU do it is counterproductive, since by the time the system memory has been read, a single core would have finished the comparison. At that point a GPGPU is merely ready to start the work. Never mind the latencies of actually getting the GPGPU to start the task, and then getting the result back. – Prearrange 6/8, 2014 at 17:21

Agreed, this is why I mentioned GPGPUs only for large block of data. – Fuzee 6/8, 2014 at 17:22

I claim that the use of GPGPU for any block of data is counterproductive by definition, unless all of your CPU cores are busy. If you're looking at latency of things, with an available single core, running the block comparison on that single core of the CPU will be done at memory bandwidth. IOW, it can't be done any faster in terms of wall time by splitting the work onto multiple cores, nor can it be done any faster by using the GPGPU. You simply can't get the data out of the memory any faster than that single core can swallow it. – Prearrange 6/8, 2014 at 17:23

Don't GPU:s often have higher memory bandwidth than CPU cores? – Burlburlap 12/8, 2014 at 16:37

I don't know if this would help with performance a lot, but you could follow this algorithm:

Compare the 1st byte of the struct with 1 byte of allocated memory 0xFF
Compare the 2nd byte of the struct with the 1st byte of the struct
Compare bytes 3-4 of the struct with bytes 1-2 of the struct
Compare bytes 5-8 of the struct with bytes 1-4 of the struct

And continue in the same fashion until the end of the struct. If at any point the statement is false, you know the struct isn't all 0xFF. You would also need to handle it differently when the remaining portion of the struct is smaller than the first part checked, but that should be relatively simple.

In the end you've allocated 1 extra byte of memory and the algorithm is O(log n) (a slight improvement on what I've seen in the answers so far).

edit: As escrafford mentioned below, if you substitute "byte" for "word" in the above portion, it may run a little faster. I can't comment on how much speed you might gain, but it would increase the extra memory stored (albeit by a tiny amount on today's computers).

Scop answered 6/8, 2014 at 17:34 Comment(1)

I think this is the only solution posted here that has a chance of being faster than a naive for loop. My only addition would be to use register sized comparisons (32/64bit). – Westlund 6/8, 2014 at 17:37

The logical name for such a function would be memcchr - it is to memchr as strcspn is to strspn.

And look here: google results for memcchr show that it has been implemented under that name as part of the FreeBSD kernel, and they've made some attempt at optimizing it beyond the obvious 1-byte-at-a-time loop.

It will probably take some additional work to make this function suitable for use in any program other than the FreeBSD kernel.

Typesetting answered 6/8, 2014 at 17:35 Comment(0)

There is memchr(), which does the opposite of what you're asking for - search for the first occurrence of a byte within a mem block. afaik, there isn't a standard function to search for a byte that doesn't match a specific one. for loop sounds like the way to go. Maybe go 32/64bits at a time to speed it up.

-- An extra bit of not-answer: memcmp is going to be slower than a for loop. First, you'll need to fill a block of memory the same size as your original block (this portion will probably take as long as a naive for loop). Then you need to read each memory block into registers to compare them. A for loop will have a value in a register and just read in one memory block to compare with the unchanging register.

Westlund answered 6/8, 2014 at 17:8 Comment(2)

To quickly fill a block of a known size with a single byte, use memset. (I do agree it must be slower than a naive loop, which, being intrinsically simple, surely will be optimized as much as possible.) – Cumings 6/8, 2014 at 17:27

As Kuba has hinted at in his comments, modern cpus are faster than ram (ram will be the bottleneck). Accessing ram 3x the size of your buffer will be ~3x slower than accessing it 1x, regardless of how well you optimize your for loop. – Westlund 6/8, 2014 at 17:33

You don't necessarily need a second copy of the structure for memcmp() to work. Having tested the value of the first element, then you can compare the structure to the rest of itself:

/* Untested! */
#include <stdbool.h>
#include <stdlib.h>

bool mem_eq(const void *const p, const char c, const size_t len)
{
    if (!len) { return true; }
    const char *const mem = p;
    if (*mem != c) { return false; }
    return !memcmp(mem, mem+1, len-1);
}

Improving efficiency (perhaps by using an offset that's a multiple of the processor's word size, instead of 1) is left as an exercise for the reader.

Gymnasiarch answered 4/1 at 9:5 Comment(1)

This works and beats my code in my performance test! Please see github.com/alekseYY/… – Floreated 4/1 at 22:25

A dirty re-write of the code in Why does this implementation of strlen() work?. Did some quick tests; no guarantees.

This should return the number of 0xFF bytes; if it equals the number you started it with, you are in the safe. (Of course you could just let it return 0 or 1 as well.) Remove the printfs when satisfied.

#define LONGPTR_MASK (sizeof(long) - 1)

int find_no_ff (const char *memory, size_t length)
{
    const char *p;
    const unsigned long *lp;
    size_t remain = length, to_do;

    printf ("non-aligned, start:\n");
    /* Test the first few bytes until we have an aligned p */
    for (p = memory; (uintptr_t)p & LONGPTR_MASK; p++)
    {
        printf ("testing %02X\n", *p & 0xff);
        if (*p != '\xFF')
            return (p - memory);
        remain--;
    }

    printf ("passed.\n");

    printf ("aligned:\n");
    to_do = remain/sizeof(long);
    remain -= (to_do*sizeof(long));

    /* Scan the rest of the string using word sized operation */
    for (lp = (const unsigned long *)p; to_do--; lp++)
    {
        printf ("testing %08lX\n", *lp);
        if (*lp +1)
            return p - memory;
    }
    printf ("passed.\n");

    p = (const char *)lp;

    printf ("non-aligned, end:\n");
    /* Test the last bytes until we have an aligned p */
    while (remain--)
    {
        printf ("testing %02X\n", *p & 0xff);
        if (*p != '\xFF')
            return (p - memory);
        p++;
    }
    printf ("passed.\n");
    return p - memory;
}

int main (void)
{
    char data[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };

    printf ("input size: %ld\n", sizeof(data));
    printf ("test result: %d\n", find_no_ff (data, sizeof(data)));

    return 0;
}

Cumings answered 6/8, 2014 at 18:0 Comment(0)

I like Erik's suggestion, but it can be simplified in an interesting way as follows (not tested):

if((*pBytes == 0xFF) && (memcmp(pBytes, pBytes + 1, byteCount - 1) == 0)) // The byteCount bytes at pBytes are 0xFFs.

The condition will be true only if A) the first byte is 0xFF, and B) every other byte is equal to the byte before it. The combination means every byte is 0xFF.

Mythicize answered 5/5, 2018 at 22:55 Comment(0)

Take a look into strspn function. But you need to place '\0' in the first byte after the struct under test in order to be able to use this function.

Do answered 8/11, 2022 at 14:32 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags