What does : (colon) mean in x86 addressing modes, between ES and the rest?

Asked 2/11, 2013 at 5:40 Answered 22/5, 2014 at 5:15

In this assembly instruction

mov ax, es:[bx]

what does the : do?

Triangular answered 2/11, 2013 at 5:40 Comment(1)

possible duplicate of x86 assembly - what the colon means? (GAS syntax) – Stalky 2/11, 2013 at 8:47

what specifically does the : do?

The ":" doesn't "do" anything, in the same way that "." doesn't "do" anything in most high level programming languages. A ':' is used with an instruction of the form <segment register> : <address expression>. By default, all x86 instructions have a "default segment selector" which is used to determine the address indicated by an instruction's "memory operand". This is usually either "ds" or "ss", depending on the instruction. An instruction may specify any of the CS,DS,ES,SS, FS, and GS segment registers, however, by specifying an appropriate "instruction prefix byte" in the instructions binary encoding.

In 16 bit "real mode" programs the value in a segment register is used to determine the "higher order bits" of a memory address. It get's combined with the memory address specified in the instruction to generate the actual address referenced by the instruction. This allowed programs running on 16 bit hardware to have access to larger than 16 bit memory spaces, provided they could group memory into 4k chunks that could be accessed relative to a "segment selector" register.

In 32 bit programs the segment selector is actually an index into a structure that describes a dynamic mapping, including an offset and a size. The address is computed by combining the information present in the indexed structure with the memory operand present in the instruction.

Most of the time, in 32 bit programs, most segment registers point to structures that specify the entire 32 bit address space. The primary exception is the "fs" register, which specifies an offset and size that maps to a special data structure defined by the operating system. It is used as one of the mechanisms for communication between kernel space and user space. It usually contains access to all the "user space visible" attributes of the Kernel's representation of the current "process or thread".

64 bit programs completely eschew segment registers. All segment registers except FS and GS are defined to have no effect, and behave as if they mapped the entire user space. The FS register is usually used to provide access to the current "32 bit context" of the executing program. The "GS" register is usually used to provide access to the current "64 bit context". This allows 32 bit programs to run on 64 bit systems, but also gives the 64 bit kernel (and the mapping layer between 32 bit process and 64 bit processes) access to the 64 bit context it needs to work.

So, to answer your original question:

Probabilisticly (given no knowledge about the mode of the processor or the operating system), the instruction:

mov ax, es:[bx]

is actually equivalent to:

mov ax, [bx]

However, the fact that it uses 16 bit registers indicates that it might be a real mode program, in which case it may mean:

mov ax, [<addr>]

where addr == (es << 4) + [bx]

Carlsbad answered 2/11, 2013 at 7:21 Comment(7)

The two instrcutions mov ax, es:[bx] and mov ax, [bx] are not equivalent, unless an assume command was used before. The default register in this case would be ds not es, so the segment o verride is needed, which is what the es: does. – Tradesman 2/11, 2013 at 8:18

Most of the time, in a 32 bit program, es has offset 0 and references the entire address space. That's why I said "probabilistically speaking" they are equivalent. – Carlsbad 2/11, 2013 at 8:25

That doesn't make it better, because the answer depends on ds and es having the same value. So in Windows programs the effect of the instructions are the same, because es and ds points to the same segment, but that doesn't make the instruction itself equivalent, only the effect of it in a Windows system (maybe also in Linux? Since the OP apparently doesn't know this, he would be misleaded. You should replace the probabillity part with assuming es and ds are equal. – Tradesman 2/11, 2013 at 8:29

The text actually talks about what the segments mean in a 32 bit program (please read it). Then I mention that most of the time, the segments aren't used in a 32 bit program, except for FS and GS, which means for almost any program you look at, they will be equivalent. Everything I said is accurate. – Carlsbad 2/11, 2013 at 8:37

The beauty of segmented addressing in real mode was that it allowed a meg of memory to be accessed as segments of any combination of lengths from 16 bytes to 64K, in multiples of 16 bytes. If the 80386 had a similar mode with 32-bit segment registers, .NET would be able to access up to 64 gigs of objects using 32-bit object references (as opposed to having to double the size of object references when going beyond 4 gig). – Nationalize 3/11, 2013 at 17:51

If ds was expected to be equal to es, the code would have used just [bx], because the default segment for [bx] is DS, not ES. Using es requires a segment override prefix (one extra byte of code size). The fact that they went out of their way to use ES means it's highly likely not equivalent. – Coalfield 17/4, 2019 at 20:25

"provided they could group memory into 4k chunks" This is wrong, there is no need for 4 KiB chunks in Real/Virtual 86 Mode. Every 16-Byte paragraph boundary from 00000h to 0FFFF0h can be used as a segment base, so the chunk size is 1 paragraph if anything. – Kist 2/10, 2023 at 9:10

: is convention to indicate the segment portion of an address. ES therefore is a segment (hence SI for instance in this position would be invalid) and [BX] the offset within that segment; a segment register used as an offset would equally be invalid and generte an error.

Citrine answered 2/11, 2013 at 5:45 Comment(1)

Thank you. Makes much more sense – Triangular 2/11, 2013 at 5:49

When you access some data in your process memory, then there is always a segment register involved, which defines a memory window with the main register as the offset. These registers are cs, ds, es, ss, fs and gs. Some of these segment registers have a special purpose like the cs (code segment) or the ss (stack segment). When you access data with a register like in your example then a default segment is selected by the assembler. This segment register is encoded in the instruction. There are cases when you want to override the default selection, and use a different segment register, than the default one, and you can achieve this by using a segment override which is what your example is doing.

When executing

mov   eax, [ebx]

by default the ds segment would be used

but the instruction with the segment override

mov   eax, es:[ebx]

specifies that the es segment should be used instead. In Windows, by default, ds and es point to the same segment, so this override wouldn't be needed, as it will access the same linear and physical address.

Tradesman answered 2/11, 2013 at 8:42 Comment(0)

DS：OFFSET where DS is the segment address and ,OFFSET is the offset relative to the segment.

it means compute the address like this way :DS * size_of_segment + OFFSET

normally, for x86 the size of segment is 16byte.

For example:

      DS:  07C0H   0000 0111 1100 0000 
+ OFFSET:   0000H       0000 0000 0000 0000
=          07C00H  0000 0111 1100 0000 0000

Brisbane answered 22/5, 2014 at 5:15 Comment(1)

No, DS*16 + OFFSET. In real mode, the size of a segment is fixed at 64kiB. The distance between adjacent segment bases is 16 bytes, but that's not the size. – Coalfield 17/4, 2019 at 20:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags