Assembly Segmented Model 32bit Memory Limit

Asked 28/10, 2010 at 2:38 Answered 15/8, 2014 at 6:37

Solved assembly x86 operating-system paging memory-segmentation

If a 32bit Operating System operated with a segmented memory model would their still be a 4GB limit?

I was reading the Intel Pentium Processor Family Developer's Manual and it states that with a Segmented memory model that it is possible map up to 64TB of memory.

"In a segmented model of memory organization, the logical address space consists of as many as 16,383 segments of up to 4 gigabytes each, or a total as large as 2^46 bytes (64 terabytes). The processor maps this 64 terabyte logical address space onto the physical address space by the address translation mechanism described in Chapter 11. Application programmers can ignore the details of this mapping. The advantage of the segmented model is that offsets within each address space are separately checked and access to each segment can be individually controlled.

alt text

This is not a complex question. I just want to be sure I understood the text correctly. If Windows or any other OS worked in a segmented model rather than a flat model would the memory limit be 64TB?

Update:

alt text

Intel's 3-2 3a System Documentation.

alt text

http://pdos.csail.mit.edu/6.828/2005/readings/i386/c05.htm

The Segment Register should NOT be thought as in the traditional Real-Mode sense. The Segment Register acts as a SELECTOR for the Global Descriptor Table.

In Protected mode you use a logical address in the form A:B to address memory. As in Real Mode, A is the segment part and B is the offset within that segment. The registers in > protected mode are limited to 32 bits. 32 bits can represent any integer between 0 and 4Gb. Because B can be any value between 0 and 4Gb our segments now have a maximum size of 4Gb (Same reasoning as in real-mode). Now for the difference. In protected mode A is not an absolute value for the segment. In protected mode A is a selector. A selector represents an offset into a system table called the Global Descriptor Table (GDT). The GDT contains a list of descriptors. Each of these descriptors contains information that describes the characteristics of a segment.

The Segment Selector provides additional security that cannot be achieved with paging.

Both of these methods [Segmentation and Paging]have their advantages, but paging is much better. Segmentation is, although still usable, fast becoming obsolete as a method of memory protection and virtual memory. In fact, the x86-64 architecture requires a flat memory model (one segment with a base of 0 and a limit of 0xFFFFFFFF) for some of it's instructions to operate properly.

Segmentation is, however, totally in-built into the x86 architecture. It's impossible to get around it. So here we're going to show you how to set up your own Global Descriptor Table - a list of segment descriptors.

As mentioned before, we're going to try and set up a flat memory model. The segment's window should start at 0x00000000 and extend to 0xFFFFFFFF (the end of memory). However, there is one thing that segmentation can do that paging can't, and that's set the ring level.

-http://www.jamesmolloy.co.uk/tutorial_html/4.-The%20GDT%20and%20IDT.html

A GDT for example lists the various users their access levels and the areas of memory access:

Sample GDT Table

GDT[0] = {.base=0, .limit=0, .type=0};             
// Selector 0x00 cannot be used
GDT[1] = {.base=0, .limit=0xffffffff, .type=0x9A}; 
// Selector 0x08 will be our code
GDT[2] = {.base=0, .limit=0xffffffff, .type=0x92}; 
// Selector 0x10 will be our data
GDT[3] = {.base=&myTss, .limit=sizeof(myTss), .type=0x89}; 
// You can use LTR(0x18)

http://wiki.osdev.org/GDT_Tutorial#What_should_i_put_in_my_GDT.3F

The Paging portion is what maps to physical memory. (PAE) is what provides addtional memory up to 64GB.

So in short. The answer is no you cannot have more than 4GB of logical memory. I consider the claim for 64TB a misprint in the Intel Pentium Processor Family Developer's Manual.

Bose answered 28/10, 2010 at 2:38 Comment(2)

internals.com/articles/protmode/protmode.htm This linked helped me a little. – Bose 28/10, 2010 at 13:6

Yes, this "misprint" made me 4 hrs searching for all the nonsense questions that arise due to it. – Tati 27/4, 2021 at 20:5

Edit: My answer assumes that by "4GB limit" you are referring to the maximum size of linear (virtual) address space, rather than of physical address space. As explained in the comments below, the latter is not actually limited to 4GB at all - even when using a flat memory model.

Repeating your quote, with emphasis:

the logical address space consists of as many as 16,383 segments of up to 4 gigabytes each

Now, quoting from "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture" (PDF available here):

Internally, all the segments that are defined for a system are mapped into the processor’s linear address space.

It is this linear address space which (on 32-bit processor) is limited to 4GB. So, a segmented memory model would still be subject to the limit.

Transmission answered 28/10, 2010 at 3:3 Comment(5)

Not... quite. Close enough for jazz and not far enough to warrant a -1, but later processor extensions on the Pentium line (like PAE and PSE-36 and the like) actually give you a linear address space significantly larger than 4GB. 16 times larger, in fact. Were an operating system to use segments appropriate it would be possible for a single process to have access to all that space (minus whatever the kernel reserves for itself, naturally). – Schooling 28/10, 2010 at 3:30

PAE and PSE-36 increase the physical address space beyond 4GB - but linear (virtual) addresses are still limited to 32 bits. – Transmission 28/10, 2010 at 3:38

Not to disagree just to clarify from a Windows perspective: "Microsoft Windows implements PAE if booted with the appropriate option, but current 32-bit desktop editions enforce the physical address space within 4GB even in PAE mode. According to Geoff Chappell, Microsoft limits 32-bit versions of Windows to 4GB due to a licensing restriction, and Microsoft Technical Fellow Mark Russinovich says that some drivers were found to be unstable when encountering physical addresses above 4GB." - Wiki on PAE – Bose 28/10, 2010 at 12:57

"Because multitasking computing systems commonly define a linear address space much larger than it is economically feasible to contain all at once in physical memory, some method of “virtualizing” the linear address space is needed. This virtualization of the linear address space is handled through the processor’s paging mechanism." - Intel 3-2 Vol3A Page 94 This quote seems to be the opposte of your statement that a linear address space on a 32bit process is limited to 4GB. It sounds like the linear address space is larger than memory is feasible. – Bose 28/10, 2010 at 17:31

Myth: PAE increases the virtual address space beyond 4GB: blogs.msdn.com/b/oldnewthing/archive/2004/08/18/216492.aspx I swear the Intel documentation can be misleading. – Bose 28/10, 2010 at 22:46

Do you remember the old days? DOS on x86 in real mode with 64kb segments? FAR pointers? HMA? XMS? As the amount of memory grew, they've found ways to use more memory than processor could normally address. But it was ugly.

Sure they could use segmentation for 32 bits, but why? There was no need. When 32 bit processors appeared the 4Gb limit was more than enough, so the decision to use flat model was made.

Also, a 32bit OS can use more than 4Gb, it's the process that is limited to 4Gb address space (even 2 or 3 on windows).

Calhoun answered 28/10, 2010 at 3:6 Comment(3)

Is the process only limited because it is a flat-model? What if it was a segmented-model? – Bose 28/10, 2010 at 12:26

@Shiftbit Don't confuse the directly addressable address range with segmented access. You can use WME to access more memory, but you still won't be able to do char* c = malloc(5*GB) and read any value from if directly without any wrapper. – Calhoun 28/10, 2010 at 12:50

@Shiftbit here is a better example. How would you like to program for a processor that has 1M of 64kb segments? It would have 64Gb of address space. – Calhoun 28/10, 2010 at 12:55

The claim is 64TB of logical address space. Bringing up physical memory limitations is irrelevant because by enabling memory paging, one can bypass the physical limitations.

However, this is still a slightly misleading claim because the the Index field of the Segment Selector is 16 bits, minus 1 bit for Table Indicator and 2 bits for Request Protection Level, leaving a total of 8,192 (13 bits) segments selectors. With 8,192 4GB segments one could only have access to 32TB of logical memory in either the GDT(Global Descriptor Table) or the LDT (Local Descriptor Table). To be able to access 64TB of logical memory, one would have to fully utilize both the GDT and LDT with 16,384 unique segments.

Regardless, the first question was, "is there a 4GB limit", and the answer is, "no". On a 32-bit system with both Segmentation and Paging enabled, one could, for example, allocate 512MB to the Code Segment(CS), 1GB to the Stack Segment(SS), and 4GB to the Data Segment(DS).

The answer to the second question of whether the OS would be limited to 64TB if it used a segmented memory model is less straight forward. It is the job of the OS to provide the memory manager. Obviously there would be a physical limitation of 32GB of RAM. 32-bit Linux, because it uses paging, can provide each application with a 4GB flat address space (ignoring the kernel/user split details). And, every process believes it has 4GB of physical address space.

In short summary, I think you are confusing the limitations of segmentation with the limitations of paging. Paging enables a system or application to use more RAM than what is physically available. Segmentation enables a process to map in multiple 32-bit logically addressable segments. Its key to note that even flat mode uses segmentation, but all segment registers are mapped to the same base address.

Sim answered 15/8, 2014 at 6:37 Comment(2)

segmentation is "resolved" before paging. With paging enabled, segment base + offset produces a 32-bit "linear" virtual address. i.e. segment bases are virtual, and don't expand the amount of memory you can have mapped at any given time. To access more than 4GiB of physical RAM, you need multiple "threads" / processes with separate virtual address spaces, or you need to remap different parts of physical RAM into your 32-bit virtual address space. Or use x86-64 long mode, of course. – Northman 5/4, 2018 at 6:33

I'm not sure about with paging disabled. Without paging you can't have PAE, but IDK what happens if you set a segment limit=4G / base=3G (or just below 4G). Possibly you could then access some parts of physical memory outside the low 4G, with 32-bit offsets from that high base address. Or maybe not, if linear addresses are still 32-bit. – Northman 5/4, 2018 at 6:36

AFAIK, the answer is 'not necessarily', due to other limitations of the OS. They may want to keep the maximum size of memory down well below the theoretical limit, because this could make some of the internal memory structures smaller and more performant. But I really don't know... I'm no Mark Russinovich...

Take a look at PAE. I think this is what you're talking about, but since I've graduated to 64-bit pointers, I've decided to kill the brain cells which dealt with windowing memory models with Kentucky Straight Bourbon Whiskey.

Myrnamyrobalan answered 28/10, 2010 at 3:14 Comment(1)

I like this quote in particular from Wiki "x86 processor hardware-architecture is augmented with additional address lines used to select the additional memory, so physical address size increases from 32 bits to 36 bits. This, theoretically, increases maximum physical memory size from 4 GB to 64 GB. The 32-bit size of the virtual address is not changed, so regular application software continues to use instructions with 32-bit addresses and (in a flat memory model) is limited to 4 gigabytes of virtual address space." – Bose 28/10, 2010 at 12:28

-1

The Intel's segmented model is limited to 16,384 segments. That is too small a number to really make things convenient. What would have been much nicer would have been if the system could quickly switch among two or four billion segments. That's what I would have liked to have seen, rather than a 64-bit linear space. A design that could efficiently put each allocated object into a different segment would allow for no-extra-overhead range checking on every individual allocated object, object relocation with minimal impact on running code (assuming the CPU could notice when a currently-selected segment was invalidated), etc. while only requiring object references to take half as much space in the cache as a 64-bit pointer.

Goforth answered 28/10, 2010 at 3:55 Comment(6)

It sounds like you want a 32-bit pointer to be a segment selector. (And addressing modes would have to define which component is the segment and which are offsets?) If the CPU hardware has to look up the segment base/limit from a segment descriptor table, then you've just re-invented virtual memory page tables, but maybe without a fixed page size. So the tables will be gigantic. You'll definitely need some kind of TLB-like structure to cache recently-used segment base/limits, because in actual x86 you can't use a segment selector without first mov fs, eax or whatever, which is slow. – Northman 5/4, 2018 at 4:39

Anyway, I don't see how this could be no-extra-overhead, even if you do redesign x86 segmentation even more than 386 did, so you aren't limited to a few segments active at once and instead can use a segment selector value in a general-purpose register as a pointer. That would cost HW complexity to implement with any kind of good performance, and presumably 386 had a hard enough time just spending transistors on a TLB for paging. (Or this could replace paging altogether, but that would have made x86 a weird quirky ISA way different from normal ones with paging.) – Northman 5/4, 2018 at 4:41

@PeterCordes: As a real simple approach, scale a 32-bit value 4-bits left like the 8086 did and your address space is 64GiB rather than 4. Note that some Java implementations use 32-bit references which are scaled up by a factor of 8 to yield a 35-bit address, so this would be pretty much the same thing. As a somewhat better approach, use the top 4 bits to select one of 16 configurable segment groups, and scale the remaining 28 by a value which is configurable on a per-group basis. – Goforth 5/4, 2018 at 14:45

@PeterCordes: That approach wouldn't provide the bounds checking, but would allow the use of 32-bit references rather than 64-bit. Going with a segment-descriptor approach would require having hardware cache segment descriptors, but if one were to have the first few bytes of each segment share a cache line with the descriptor, then code which needs to access an object header before accessing an object that it hadn't accessed recently (common in VMs like .NET) would grab the information it needed at the same time as the cache descriptor, minimizing overhead. – Goforth 5/4, 2018 at 14:48

Ok, that could work, but definitely not zero-overhead. To get byte-addressability within a segment, you'd need to use 2-register addressing modes like [ebx + eax], where the base is treated as a segment and the index is treated as a byte offset. For static objects, [disp32 + idx] could be a byte offset within a static segment, but [disp32 + base] could treat the disp32 as a byte offset (outside disp8 range) with base as the segment. IDK how 8086's lea instruction would be extended. Seems like it would have made 386 a difficult compiler target, esp. for compilers at the time. – Northman 5/4, 2018 at 20:9

Or are you proposing that this new segmentation thing could be optional, with the current model as another option (using the traditional segment registers, and setting them all to base=0/limit=4G gives you a flat paged virtual address space). If you want to put some of your comments into your answer, I'll remove my downvote. Especially if you remove the suggestion that it could be zero overhead. Your first idea reg<<4 idea appears to make pointer-increments still always consume two registers, unless the increment is by the segment granularity. (I guess you'd unroll to save registers...) – Northman 5/4, 2018 at 20:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags