Segment size in x86 real mode
Asked Answered
A

2

8

I have one doubt regarding the size of segments in real mode as they can't be more than 64K but can be less than that.

My question is how these segment size and base address is initialized? Like there are GDT's and LDT's in protected mode. Real mode segments can also overlapped,disjoint or adjacent.

Like BIOS has some reserved area for specific things like boot code,video buffer etc does assembly programs need to do something like that ?

Adagio answered 22/7, 2013 at 11:24 Comment(1)
L
7

The segment limit in real mode is 64K, even on a 386 or later CPU where you can use 32-bit address-size via prefixes. e.g. mov ax, [edx + ecx*4] is still limited to offsets of 64 KiB in real mode.

If you exceed this limit, it raises a #GP exception on 286+. (Or #SS if the segment was SS).
8086 didn't have #SS or #GP exceptions, it had no protection general or otherwise, just using Sreg << 4 added to the offset to form a linear address.

16-bit address-size can exceed the 64K segment limit via a word or wider access at seg:FFFF. On 8086, the higher byte comes from seg:0000 (wrapping of the offset in the logical address before computing a new linear address for the 2nd memory bus transaction, not accessing outside the 64K linear range of the segment).
On 286 and later, #GP or #SS for both data and instructions in this case as well. https://www.os2museum.com/wp/does-eip-wrap-around-in-16-bit-segments/

In general, addressing modes like [bx + si + 1] wrap at 16 bits. (And push word with SP=0 wraps to SP=FFFEh, no problem as long as the stack is aligned). So only code using the 0x67 address-size prefix (added in 386) for addressing modes like [eax] can exceed segment limits in real mode, except for word or wider accesses at the very end of a segment.

Segments that start within 64K of the highest possible address wrap around at 1MiB on 8086, and on later CPUs if A20 is disabled. Otherwise they extend past 1MiB for an address like FFFF:FFFF seg:off = 0x10ffef linear. See What are Segments and how can they be addressed in 8086 mode?


Unreal mode: flat memory model for 386 real mode

If you switch to protected mode and set a segment register, the CPU keeps the segment description (base + limit) cached internally, even across switching back to 16-bit real mode. This situation is called unreal mode.

Writing to a segment register in 16-bit mode only sets the segment base to value << 4 without changing the limit, so unreal mode is somewhat durable for segments other than CS. CS:EIP is special, especially if you need to avoid truncating EIP to 16 bits when returning from interrupts or whatever. See the osdev wiki linked earlier.

push/pop/call/ret use SS:ESP or SS:SP according to the B flag in the current stack-segment descriptor; the address-size prefix only affects stuff like push word [eax] vs. push word [si].

The GDT / LDT are ignored when you write a value to a segment register in real mode. The value is used directly to set the cached segment base, not as a selector at all.

(Each segment is separate; unreal mode isn't an actual mode like protected vs. real; the CPU is in real mode. Writing the FS register, for example, puts that segment back into normal real-mode behaviour except for its limit, but doesn't change the others. It's just a name for being in real mode with cached segment descriptors with larger limits, so you can use 32-bit address-size for a larger flat address space. Often with base=0 and limit=4G)

AFAIK, there's no way to query the internal limit value of a segment in real mode. lsl loads the segment-limit value directly from a descriptor in the GDT / LDT in memory, not from the internal value (so it's not what you want), and it's not available in real mode anyway.

See comments on this answer for more details about taking segments out of unreal mode intentionally or unintentionally.

286 and 386 CPUs supported a LOADALL instruction which could set segment limits from real mode, but later CPUs don't have it. Commenters say that SMM (system management mode) may be able to do something similar on modern x86.

Lesbianism answered 23/3, 2018 at 8:53 Comment(22)
Not quite correct. If you are in "unreal mode" and you modify a segment register the descriptor cache base will change accordingly, but the descriptor cache limit will be left alone. Unreal mode should remain in place until the next time you switch into protected mode and change the segment limit and base of the segment registers in question.Bramlett
There is a another mechanism to change them while in real mode (including unreal) and that is via the LOADALL instruction, but that instruction isn't available on most processors. The LOADALL instruction was useful on Intel 386's and 286s since you could effectively get unreal mode without switching into protected mode at all.Ona 286 that was a bonus since there was a high performance cost of switching back to real mode from protected mode.Bramlett
CS is a different beast and comes with a bunch of pitfalls that can be influenced by the processor type and how things are handled. Using unreal mode with cached CS is problematic and not for the faint of heart.Bramlett
@MichaelPetch Still not quite correct. Unreal mode can also be exited in two other ways: switching to VM86 mode and resetting the processor (hard or soft reset). Both revert all limits back to 64k. Also unreal mode does not work for the CS and SS segments. Note that unreal mode is only supported on post-286 processors since 286 doesn't support switching from protected to real mode without resetting the processor. This means that on 286, the segment limits cannot be even smaller than 64k; they can only be exactly 64k. Anyway, this answer is still much more accurate than nio's and deserves a UP.Atalanti
LOADALL is not a standard x86 instruction and seems to be only supported on Intel 286 and 386.Atalanti
@HadiBrais : Did you read what I said about LOADALL? Not all processors supported it. Specifically on a 286 they didn't enter 16-bit protected mode to get to real mode. They used LOADALL (Himem.sys on 286 being the first) to get into unreal mode without the switch to protected mode (so there was not the same penalty coming back out of protected mode). As for resetting the processor that was a given . In my comments I mentioned the nature of CS (some call it huge unreal mode) and that it has serious pitfalls and that it acts different on different processors.Bramlett
LOADALL was an undocumented instruction that was known to larger companies with deep pockets. LOADALL was taken advantage of in earlier MSDOS in HIMEM.SYS to switch to unreal mode without passing through protected mode. The process changed for 386 given the performance penalty of resetting the processor to get back to real ode wasn't an issue anymore.Bramlett
@MichaelPetch Yes I just wanted to add that LOADALL is not a standard x86 instruction. Actually I have not heard of it before so thanks for mentioning it. The state of the segment cache registers after reset is not very straightforward, so I just wanted to explicitly mention it.Atalanti
VM8086 requires passing through protected mode first and I mentioned the issue of the hidden descriptors being potentially reset in that scenario.Bramlett
Intel had considered nixing the cached descriptor behavior that gave rise to Unreal mode, but they couldn't because of backwards compatibility with all the software (including MSDOS) that took advantage of it. Often things that were not documented or considered an abuse of processor internals became standard for backwards compatibility reasons.Bramlett
And for the record, there were some unusual BIOSes in the late 80s early 90s) that quietly switched to protected mode (potentially resetting unreal mode) when certain BIOS interrupts were used (drive access etc).Bramlett
@MichaelPetch I wanted to specifically point out that just by switching to VM86 (through protected mode) without writing to the segment register will by itself revert the limit back. So overall there are exactly four ways (three excluding LOADALL) by which unreal mode can be exited intentionally, as far as I know.Atalanti
@HadiBrais : It can be more than that. Invoking Int 6h (invalid opcode) on early 386's (with certain BIOSes) was meant to simulate LOADALL in the absence of a full featured 286 LOADALL instruction. This was later simulated via SMM on some systems.Bramlett
@MichaelPetch You mean the BIOS will simulate it by switching to protected mode and back to real mode? The new limit can be passed as an argument to int 6 I suppose. But is that faster than doing it manually? Or maybe that simulation was not about supporting unreal mode. Why SMM needed to simulate that?Atalanti
@MichaelPetch Even with such simulation, one of the four techniques must be used, right? Simulation is just an abstraction over them.Atalanti
@HadiBrais Regarding SMM and Int 6h, one needs to read about RSM. It had an execution state similar to LOADALL and it was possible for SMM to modify the state before returning to previous CPU mode with the effect of simulating most of LOADALL without int 6h changing to protected mode (to set up unreal mode): asm.inightmare.org/opcodelst/index.php?op=RSMBramlett
@MichaelPetch: Thanks for the correction; that makes unreal mode a lot more durable / usable / useful than I thought it was (other than CS). Updated my answer.Lesbianism
This article says that on an SMI, the segment descriptor caches are saved to memory in undocumented reserved fields, which can then be accessed and even manipulated in SMM mode.Atalanti
I think my statement "unreal mode does not work for the CS and SS segments" may not be accurate. Since using the BP or SP registers as a base register of a memory operand defaults to the stack segment, then by using the address size override prefix, EBP or ESP are used and so a 32-bit address can be formed into the stack segment. But still push and pop cannot form a 32-bit memory address.Atalanti
@HadiBrais: I checked the manual for PUSH to see what it said about stack address size. Updated the answer.Lesbianism
Something not related to the rest of the discourse, I noticed this in the text "The value is used directly to set the segment base". Not sure if it's worth noting (maybe it isn't and I'll just leave this comment here) that this segment base that gets computed is set in the descriptor cache for the particular segment reg being updated. The value in the segment reg isn't used for anything after that except for what one might call display purposes. It was possible to use LOADALL to set the descriptor cache base to one value and the segment reg itself to something completely different.Bramlett
Something I didn't discover until now is that the OP originally referenced a knowledge base article that can still be found here: jeffpar.github.io/kbarchive/kb/120/Q120069 . What has become clear is that the OP may have have encountered the confusing notion of segment as was defined by MASM and the concept of a segment from the perspective of the CPU. This has been a confusing point for decades.Bramlett
M
3

In real mode segmented addresses are hardwired into memory. To get a physical address you can use this equation:

physical address = segment * 16 + offset

Both segment and offset addreses are 16 bit. By using this equation you can make one 20 bit address and access low 640kB of RAM with no problem.

There is no table that holds where some segment is located. The problem is that you have to set both segment and offset registers to be able to access any address. So you could access maximum of 64k of RAM bytes with a simple loop that just increments the offset register, which makes memory access to larger buffers less comfortable than in flat model.

Miles answered 22/7, 2013 at 11:29 Comment(3)
Thanks nio for your answer.So is it the assembly programmers job to decide the base address of a segment and size of a segment ? If that is the case segments can be overlapped and the other segment which is overlapped can be used by some other program, thus corrupting the data ? Let say two MSDOS program is running how memory will be allocated as there is no protection ?Adagio
I'm not sure how memory management in DOS works but here is some memory address table: webpages.charter.net/danrollins/techhelp/0094.HTM If you're making a DOS program you have to be careful not to overwrite some other .com driver or TSR routine. There usualy runs only one DOS program at the same time.Miles
@nio: If one arranges objects to be paragraph-aligned, one need only load the segment register to access something that's stored at a known offset within the object. I don't know of any compiled languages that take advantage of this, but it's a common trick in assembly code.Fume

© 2022 - 2024 — McMap. All rights reserved.