How to understand this code snippet in the bcopy.c of bionic?
Asked Answered
P

2

2

I read the memcpy implementation in the http://androidxref.com/4.0.4/xref/bionic/libc/string/bcopy.c find following code is hard to understand, can anybody explain it?

 36 /*
 37  * sizeof(word) MUST BE A POWER OF TWO
 38  * SO THAT wmask BELOW IS ALL ONES
 39  */
 40 typedef long word;      /* "word" used for optimal copy speed */
 41 
 42 #define wsize   sizeof(word)
 43 #define wmask   (wsize - 1)
 44 

...

        /*
 78          * Copy forward.
 79          */
 80         t = (long)src;  /* only need low bits */
 81         if ((t | (long)dst) & wmask) {
 82             /*
 83              * Try to align operands.  This cannot be done
 84              * unless the low bits match.
 85              */
 86             if ((t ^ (long)dst) & wmask || length < wsize)
 87                 t = length;
 88             else
 89                 t = wsize - (t & wmask);

What is mean of these bitwise operations? What is their intent?

Poussin answered 28/6, 2012 at 3:22 Comment(3)
I know these bitwise operators, and I want to know these logic meaning.Poussin
Heh, I recognize my old code there... :-) This is basically just the way one writes this in assembly languages (most of them anyway), converted to C code. Of course when you're doing raw assembly you can make use of any special machine instructions and characteristics.Marleenmarlen
Chris Torek? Whoa, Victor hit the jackpot.Hannus
M
7

The basic idea is to meet alignment constraints: each "word" to be copied one word at a time must be aligned on a "word" boundary.

Some CPUs have this as a fundamental constraint, that loads and stores must occur on a "natural" boundary. On older ARM processors the low bits of the address are actually ignored entirely, so that a load or store of two bytes from an odd-valued address have the same effect as:

short w = *(short *)(addr & ~1);

for instance. On some other CPUs an unaligned load or store results in a trap (MIPS and SPARC for instance), and still others will just do it, but with a performance penalty (x86).

So, suppose you're copying a large number of bytes (say, 4096 of them) from address 0x12345 to address 0x22345, and that the "word size" is 4 bytes. If we first copy three bytes, the addresses will now be 0x12348 and 0x22348. At this point we can copy just 1023 4-byte words, one word at a time, without tripping over any alignment issues. After that we'll have one remaining byte to copy, because 4096 = 3 + (4 * 1023) + 1.

This all makes the assumption that bytes are all addressed individually, even when loading and storing "words". This assumption is false on some machines: for instance, the old Data General MV10000 CPU would address "words" using "word addresses", which are essentially byte addresses divided by two. (It's thus impossible to address a "word" that spans two bytes: the word at location 0 has byte addresses 0 and 1 but word address 0; the word at location 1 has byte addresses 2 and 3; the word at location 2 has byte addresses 4 and 5; and so on.) On machines like this, you may need to use a different version of bcopy.c.

As @Alex noted, the XOR is just making sure that it is actually possible to align the two addresses. If you're copying from 0x12345 to 0x22345, it is; but if you're copying from 0x12345 to 0x22344, the two addresses will never line up.

Marleenmarlen answered 28/6, 2012 at 4:32 Comment(4)
If short w = *(short *)(addr & ~1); is not aligned, then that's undefined behavior in C/C++. If the code is not removed by the compiler, then you get either (1) a sig_bus on older ARM processors in some cases, or (2) a performance penalty when the CPU fixes up the misaligned access in some cases.Plowshare
@jww: yes, I tried to cover all these actual outcomes: "On older ARM ... the low bits ... are ignored ... on other CPUs ... a trap ... still others ... a performance penalty". The example CPUs in that sentence are meant as examples only, not complete lists.Marleenmarlen
Original ARM behaviour is to rotate single unaligned loads, but ignore the low bits on multiple loads ans stores.Corcyra
@TimothyBaldwin: Interesting. How does the rotation work - is it left rotate or right rotate? Does it use the register width (32 bits), or the load width (are there 16 bit loads?)? Is it byte-wise rotation? (This seems much more useful than bit-wise since the offset can only go to 3 at most.) DEC Alpha originally had only full-word load, as I understand it. but I can't remember if it trapped or ignored low bits.Marleenmarlen
S
6

Just do it step by step:

t = (long)src;
if ((t | (long)dst) & wmask)

These check if at least one of src and dst isn't a multiple of sizeof(long).

if ((t ^ (long)dst) & wmask || length < wsize)

This checks if src and dst are aligned differently w.r.t. sizeof(long) (IOW, aren't "equal" multiples of sizeof(long)) or if length < sizeof(long)-1.

At the end you receive in t how many bytes have to be copied as bytes between unaligned locations, either all (length), or just enough (less than sizeof(long)) to reach addresses that are multiple of sizeof(long) from where the rest can be copied in units of long. And the latter is the speed optimization.

To see all that you have to know that an integer, when expressed in binary, is a multiple of a certain power of 2 when its least significant bits below that power of 2 are all zeroes.

Examples:

1002 (410) is a multiple of 1002 (410)
11002 (1210) is a multiple of 1002 (410)
100002 (1610) is a multiple of 1002 (410)
02 (010) is a multiple of 1002 (410)
112 (310) is not a multiple of 1002 (410)
11012 (1310) is not a multiple of 1002 (410)

This is what & (sizeof(long)-1) is used for.

You also need to know that a value XORed with itself gives 0 and when you XOR different values the result is non-zero. So you can use XOR for comparison. And that's what (t ^ (long)dst) is for.

Shutdown answered 28/6, 2012 at 4:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.