There are several layers in your question involved, so I try to address them one by one...
Machine:
The machine has memory addressable by bytes. First byte has address 0, second has address 1, etc... Whenever I will write about content of memory in this answer, I will use this formatting: 01 02 0E 0F 10 ...
, using hexadecimal values and using spaces between bytes, with addresses going continually from starting address toward ending address. I.e. if this content would start at address 0x800000, the memory would be (all hexa):
address | byte value
------- | ----------
800000 | 01
800001 | 02
800002 | 0E
800003 | 0F
800004 | 10
800005 | ...
So far it does not matter, whether the target MIPS platform is little or big endian, as long as byte-sized memory is involved, the order of bytes is "normal".
If you would load byte from address 0x800000
into t0
(with lb
instruction), t0
will be equal to value 1
.
If you would load word from address 0x800000
into t0
(with lw
instruction), the endianness will come to play finally.
On little-endian machine the t0
will be equal to value 0x0F0E0201
, the first byte of word (in memory) is amount of 2560 (the lowest power), second is amount of 2561, ... the last one is amount of 2563.
On big-endian machine the t0
will be equal to value 0x01020E0F
, the first byte of word (in memory) is amount of 2563, second is amount of 2562, ... the last one is amount of 2560.
(256 is 28, and that magic number comes from "one byte is 8 bits", one bit can contain two values (0 or 1), and one byte has 8 bits, so one byte can contain 28 different values)
In both cases the CPU will read the same four bytes from memory (at addresses 0x800000 to 0x800003), but the endianness defines in which order they will appear as the final 32 bits of word value.
The t0
is physically formed by 32 bits on the CPU chip, it has no address. When you want to address it in CPU instruction (i.e. use value stored in t0
), you encode it into instruction as $8
register ($8
has $t0
alias for convenience in your assembler, so I'm using that t0
alias rather).
The endianness does not apply to those 32 bits of register, they are already 32 bits b0-b31, and once the value 0x0F0E0201
is loaded, those 32 bits are set to 0000 1111 0000 1110 ...
(I'm writing it from top b31 bit down to bottom b0, to make sense of shift left/right instructions and also to make it work as human formatted binary number), there's no point to think about endianness of register or in which physical order the bits are stored on the chip, it's enough to think about it as full 32 bit value and in arithmetic instructions it will work as that.
When loading byte value with lb
into register, it lands into b0-b7 bits with b8-b31 containing copy of b7 (sign-extending the signed 8 bit value into signed 32 bit value).
When storing value of register into memory, the endianness again does apply, i.e. storing word
value 0x11223344
into memory will set up individual bytes as 44 33 22 11
.
Assembler (source code and compilation)
A well configured assembler for it's target platform will hide the endianness from programmer, to make usage of word values convenient.
So when you define memory value like:
myPreciousValue .word 0x11223344
The assembler will parse text (your source code is text (!), i.e. one character is one byte value - in ASCII encoding, if you write the source in UTF8 text editor and use non-ASCII characters, they may be encoded across multiple bytes, the ASCII printable characters have the same encoding in both ASCII and UTF8, and occupy single byte only) "0x11223344" (10 bytes 30 78 31 31 32 32 33 33 34 34
), calculate 32 bit word value 0x11223344
out of it, and then it will apply target-platform endianness to that to produce four bytes of machine code, either:
44 33 22 11 # little-endian target
or:
11 22 33 44 # big-endian target
When you then use the lw
instruction in your code, to load myPreciousValue
from memory into register, the register will contain the expected word value 0x11223344
(as long as you didn't mix up your assembler configuration and used the wrong endianness, can't happen in MARS/SPIM, as that supports only little-endian configuration in everything (VM, assembler, debugger)).
So the programmer does not have to think about endianness every time he writes the 32 bit value somewhere in the source, the assembler will parse and process it to the target variant of byte values.
If the programmer wants to define four bytes 01 02 03 04
in memory, she can be "smart" and use .word 0x04030201
for little-endian target platform, but that's obfuscating the original intent, so I suggest to use .byte 1, 2, 3, 4
in such case, as the intent of programmer was to define bytes, not word.
When you declare byte values with .byte
directive, they are compiled in the order how you write them, no endianness is applied to that.
Debugger
And finally memory/register view of debugger... this tool again will try hard to work in intuitive/convenient way, so when you check memory view, and have it configured to bytes, the memory will be shown as:
0x800000: 31 32 33 34 41 42 43 44 | 1234ABCD
When you switch it to "word" view, it will use the configured endianness to concatenate bytes in the target platform order, i.e. in MARS/SPIM as little-endian platform it will show on the same memory:
0x800000: 34333231 44434241
(if the ASCII view is also included, is it "worded" too? If yes, then it will look as 4321 DCBA
. I don't have at the moment MARS/SPIM installed to check what they memory view in debugger actually looks like, sorry)
So you as programmer can read the "word" value directly from display, without shuffling the bytes into "correct" order, you already see what the "word" value will be (from those four bytes of memory content).
The register view usually by default shows hexadecimal word values, i.e. after loading word from that address 0x800000 into t0
, the register $8
will contain value 0x34333231
(875770417
in decimal).
If you are interested what was the value of first byte in memory used for that load, at this point you have to apply your knowledge of endianness of that target platform, and look either at the first two digits "34" (big endian), or last two "31" (little endian) in the register view (or rather use the memory view in byte-view mode to avoid any mistake).
Runtime detection in code.
So with all that information above, the runtime detection code should be easy to understand (unfortunately I don't have MARS/SPIM at the moment, so I didn't verify it works, let me know):
.data
checkEndianness: .word 0 # temporary memory for function
# can be avoided by using stack memory instead (in function)
.text
main:
jal IsLittleEndian
# ... do something with v0 value ...
... exit (TODO by reader)
# --- functions ---
# returns (in v0) 0 for big-endian machine, and 1 for little-endian
IsLittleEndian:
# set value of register to 1
li $v0,1
# store the word value 1 into memory (4 bytes written)
sw $v0,(checkEndianness)
# memory contains "01 00 00 00" on little-endian machine
# or "00 00 00 01" on big-endian machine
# load only the first byte back
lb $v0,(checkEndianness)
jr $ra
What is it good for? As long as you write your software for the single target platform, and you are storing/loading words by the target CPU, you don't need to care about endianness.
But if you have software which is multi-platform, and it does save binary files... To make the files work in the same way on both big/little endian platforms, the specification of file structure must specify also endianness of the file data. And then according to that specs, one type of target platforms may read it as "native" word values, the other one will have to shuffle the bytes in word values to read correct word value (plus the specs should also specify how many bytes "word" is :) ). Then such runtime test may be handy, if you will include the shuffler into save/load routines, using the endianness detection routine to decide whether it has to shuffle the word bytes or not. That will make the target platform endianness "transparent" to the remaining code, which will simply send to save/load routine it's native "word" values, and your save/load may use the same source on every platform (at least as long as you use some multi-platform portable programming language like C, of course the assembly for MIPS will not work on different CPUs at all, and would need to be rewritten from scratch).
Also the network communication is often done with custom binary protocols (wrapped usually in the most common TCP/IP packets for the network layer, or even encrypted, but your application will extract the raw bytes content out of it at one point), and then endianness of sent/received data matters, and the "other" platforms have to shuffle the bytes to read correct values then.
Other platforms (non-MIPS)
Can apply pretty much everything from above, just check what is byte
and word
on the other platform (I think byte
is pretty set in stone as 8 bits for last 35+ years, but word
may differ, for example on x86 platforms word
is 16 bit only). Still little-endian machine will read "word" bytes in "reversed" order, the first byte used as amount of the smallest 2560 power and last byte used as amount of the highest 256 power (2561 on x86 platform, as only two bytes form word there, the MIPS "word" is called "double word" or "dword" in x86 world).
00 00 00 01
. The.byte
directive will not rearrange bytes in any way, they are stored exactly as you wrote them. – Playtime.word
you tell the assembler to treat the next text as integer value of 32 bit size, so it will parse the text like "1", convert it to integer value1
, and store that into binary in correct endianness to maintain that value, when you operate over that memory with word instructions. So the.word 0x11223344
defines four bytes44 33 22 11
. Then when you dolw $t0,(...)
, thet0
value will be0x11223344
, as you wrote in source. It would be crazy to write all values in source in "byte swapped" way, that's why.word
exist (when.byte
can be used for everything). – Playtime.word 0x11223344
as four bytes11 22 33 44
. So the assembler (when targetting the correct platform) is hiding the endian-fuss from you, when you work with word values in source code (text). If you insist on defining particular bytes, and want to handle endianness yourself, the assembler allows you to do that by using byte-size directives like.byte
or.space
. Then you need to be aware of target platform and how to define word values by bytes correctly. (In your case the.word 1
will compile as01 00 00 00
, NOT 0,0,0,1) – Playtime