Understanding the implementation of memcpy()
Asked Answered
B

2

14

I was looking the implementation of memcpy.c, I found a different memcpy code. I couldnt understand why do they do (((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1)

#if !defined(__MACHDEP_MEMFUNC)

#ifdef _MSC_VER
#pragma function(memcpy)
#undef __MEMFUNC_ARE_INLINED
#endif

#if !defined(__MEMFUNC_ARE_INLINED)
/* Copy C bytes from S to D.
 * Only works if non-overlapping, or if D < S.
 */
EXTERN_C void * __cdecl memcpy(void *d, const void *s, size_t c)
{
    if ((((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1)) {

        BYTE *pS = (BYTE *) s;
        BYTE *pD = (BYTE *) d;
        BYTE *pE = (BYTE *) (((ADDRESS) s) + c);

        while (pS != pE)
            *(pD++) = *(pS++);
    }
    else {
        UINT *pS = (UINT *) s;
        UINT *pD = (UINT *) d;
        UINT *pE = (UINT *) (BYTE *) (((ADDRESS) s) + c);

        while (pS != pE)
            *(pD++) = *(pS++);
    }
    return d;
}

#endif /* ! __MEMFUNC_ARE_INLINED */
#endif /* ! __MACHDEP_MEMFUNC */
Bowing answered 4/10, 2013 at 17:45 Comment(1)
And that's why you don't want to write memcpy yourself :-) The real beauty is only visible when you look at the generated machine code, though.Selfhelp
P
16

The code is testing whether the addresses are aligned suitably for a UINT. If so, the code copies using UINT objects. If not, the code copies using BYTE objects.

The test works by first performing a bitwise OR of the two addresses. Any bit that is on in either address will be on in the result. Then the test performs a bitwise AND with sizeof(UINT) - 1. It is expected the the size of a UINT is some power of two. Then the size minus one has all lower bits on. E.g., if the size is 4 or 8, then one less than that is, in binary 112 or 1112. If either address is not a multiple of the size of a UINT, then it will have one of these bits on, and the test will indicate it. (Usually, the best alignment for an integer object is the same as its size. This is not necessarily true. A modern implementation of this code should use _Alignof(UINT) - 1 instead of the size.)

Copying with UINT objects is faster, because, at the hardware level, one load or store instruction loads or stores all the bytes of a UINT (likely four bytes). Processors will typically copy faster when using these instructions than when using four times as many single-byte load or store instructions.

This code is of course implementation dependent; it requires support from the C implementation that is not part of the base C standard, and it depends on specific features of the processor it executes on.

A more advanced memcpy implementation could contain additional features, such as:

  • If one of the addresses is aligned but the other is not, use special load-unaligned instructions to load multiple bytes from one address, with regular store instructions to the other address.
  • If the processor has Single Instruction Multiple Data instructions, use those instructions to load or store many bytes (often 16, possibly more) in a single instruction.
Phthalocyanine answered 4/10, 2013 at 17:47 Comment(0)
G
14

The code

((((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1))

Checks to see if either s, d, or c are not aligned to the size of a UINT.

For example, if s = 0x7ff30b14, d = 0x7ffa81d8, c = 256, and sizeof(UINT) == 4, then:

s         = 0b1111111111100110000101100010100
d         = 0b1111111111110101000000111011000
c         = 0b0000000000000000000000100000000
s | d | c = 0b1111111111110111000101111011100
(s | d | c) & 3 =                        0b00

So both pointers are aligned. It is easier to copy memory between pointers that are both aligned, and this does it with only one branch.

On many architectures, *(UINT *) ptr is much faster if ptr is correctly aligned to the width of a UINT. On some architectures, *(UINT *) ptr will actually crash if ptr is not correctly aligned.

Gaius answered 4/10, 2013 at 17:47 Comment(5)
It checks the length, too.Bowman
Wow, talk about conciseness over clarity. Any code that requires four sets of parens ought to be rethought.Bellarmine
@EricAndres Welcome to the world of performance. That kind of stuff is all over the place. :)Butta
@Butta I'm sure it is. I'm not a C programmer, so I tend to dismiss things like this pretty easily. This definitely is a piece of code where the micro-optimization can be justified.Bellarmine
@EricAndres But the parentheses aren't what matters, of course. Ideally, you'd see ((ADDRESS) s) and ((ADDRESS) d) as opaque chunks S and D, so you've got (S | D | c) & (sizeof(UNIT) - 1), which isn't too bad, and then you've got one more set from the if ( … ) …. The expression itself isn't nested in a particularly complex way.Working

© 2022 - 2024 — McMap. All rights reserved.