Reading shorts in 32-bits architectures (for example)
Asked Answered
L

1

0

First of all, sorry for my English.

I know architectures are very complex and there's a broad sprectrum of situations, but a common generalization is if a computer architecture has 32-bits words, means registers, memory accesses and buses work with words of 32-bits long (but I think there's a lot of variants in current architectures).

Ok, let's suppose this is the rule and our architecture is a little-endian one, as x86. In such a case, if we want to read a short int (2-bytes long), the memory reads then the 4-bytes word which contains our short. Let's suppose the containing word W is 0xf1342ea0, in memory:

{a0, 2e, 34, f1} // a0 is the byte in the lowest address.

and our half-word H is in the highest part of W, then, H is 0xf134. I understand the processor receives, from the memory, a word with the short shifted:

{34, f1, 00, 00}

since 0x0000f134 equals 0xf134.

With this picture in mind, since the processor is 4-bytes long and it is thus neccesary by all means a shifting, why must 2-bytes data to be aligned in 2-bytes word boundaries?

In other words:

Why is encouragingly recommended not to read the short 0xf134 in the word:

{ff, 34, f1, 0a}

?

EDIT: Other way of expressing the same doubt is: why the definition of alignment is

A object of size N and address d is aligned if d is divisible by N.

and not:

A object of size N and direction d is aligned respect to an architecture
of B bytes if d is divisible by B, or ⌊d/B⌋ == ⌊(d+N)/B⌋ if N < B.

?

NOTE: The property ⌊d/B⌋ == ⌊(d+N)/B⌋ implies the object belongs to an aligned word.

Largehearted answered 2/4, 2014 at 18:57 Comment(0)
M
2

If the memory is { ff, 34, f1, 0a }, then it's not a problem for an x86 processor. However, if the memory is { ff, ff, ff, 34 } {f1, aa, aa, aa }, the processor must perform two bus cycles to retrieve the value of the short. (Also note there are some RISC-based processors that do not support misaligned accesses at all.)

Mckeon answered 2/4, 2014 at 19:7 Comment(9)
and why thousands of documents and blogs show examples of alignment and padding betweens shorts if with them there's no problems? It is always the same history: a perfectly explained Internet resource is speaking about the problem of misaligned «words», and the conclusion example contains unaligned short's without further clarification.Largehearted
A char in C has no alignment restrictions, since it's only one byte and can be placed anywhere in memory. A short is two bytes, so a short is the smallest «word» that has alignment restrictions, and therefore it is commonly used as the example for how alignment works.Mckeon
But, as you said, alignment of shorts is not necessary. So, I don't understand why does these alignment restrictions exist.Largehearted
I make two statements. 1) alignment of shorts is required on some processors. 2) alignment of shorts improves performance on all processors. If a variable is aligned properly, it can always be read in 1 bus cycle. A misaligned access may take 2 bus cycles. (note: this statement assumes that the processor bus width is greater than or equal to the variable size.)Mckeon
But why does a short (in the second of your cases) get worse the performance when it is inside of an aligned 4-bytes word? In this case, the memory or CPU needs just 1 bus cycle as well, since the processor does never read half-words (it reads complete words in 1 bus cycle + an extra shifting to put zeros at the 2 most significant bytes, if I'm not wrong)Largehearted
Ah, I see your point now. It is possible for a compiler on an x86 to manage alignment as you suggest. However, I don't know of any x86 compilers that work that way. The reason for not supporting that type of alignment is that the compilers are designed to support multiple processors, and supporting that type of alignment on one particular processor family would add complexity to the compiler, with almost no benefit.Mckeon
But, why not!? hehe, I mean, if I've a bus 4-bytes long, it is impossible not to read 4-bytes! and, even a 2-byte aligned short must be transformed to get a "4-bytes short" with zeros as padding, so, are there a real difference between pushing 2 bytes (e.g, ssdd -> 00ss, begin ss the short, dd memory dust, and 00 2-bytes of padding) and pushing only 1 byte? (e.g, dssd -> 00ss)?Largehearted
Ok, I've found a reason. If you have a vector of shorts, you does unavoidably get misaligned elements respecto to 4-byte words.Largehearted
@Peregring-lk There you go, I like that answer :)Mckeon

© 2022 - 2024 — McMap. All rights reserved.