What are Segments and how can they be addressed in 8086 mode?

Asked 17/3, 2017 at 15:31 Answered 17/3, 2017 at 20:20

Solved assembly x86 operating-system x86-16 memory-segmentation

Ever since I started with 8086 Assembly Language programming, I have been hammering my mind about these Segments and Segment registers. The problem I am facing is that I can't have a visual image of what segments are in my mind and therefore the concepts are not clear to me.

Can anyone help me understand the concept relating it to a real world scenario? Also I have the following questions:

Question 1:

As far as I have understood, In 16-bit real mode with 20 address line enabled, we could divide the physical memory into 16 segments with 64KiB each. The first segment starts at 0x00000. What will be the starting address of next segment. Will it be by adding 0x10000 (65536 = 64KiB)?

Question 2:

This question is a bit odd to ask here but still SO is my only option. Suppose if I am given with an Offset address of 0x6000, How can I find the segment to which it belongs in order to address it.

Thanks

Fume answered 17/3, 2017 at 15:31 Comment(6)

Here's a Youtube tutorial on the 8086 Memory Segmentation – Franklin 17/3, 2017 at 15:41

The 8086 can address up to 1MB directly (20 bit address). The segments, as far as the instruction set are concerned, are logical. The address is determined by segment_reg << 4 + offset_reg. As long as that gets you the address you want, it works. If you need to address location 0x6000, then the segment register you use could be 0 and the offset register could be 0x6000 since that's a 16-bit value. Or, the segment register could be 0x600 and your offset can be 0. That works, too. – Franklin 17/3, 2017 at 15:45

A real 8086 doesn't have a A20 address line, it only has 20 address lines: A0 through A19. This means while the 8086 has 65536 different overlapping segments, the last 4095 segments (0xF001 - 0xFFFF) wrap around the 20-bit address space. You can't determine the segment from just an offset because offsets are added to segment values to determine the physical address used. If you have a given physical address then there are 4096 possible different segment values that can be combined with an appropriate offset to access that physical address. – Flex 17/3, 2017 at 17:41

you looked at the intel documentation yes? pretty sure there are diagrams...The next "segment" would be when you add one to the segment register, wouldnt it? Why would it be anything else? the point is to create the final desired address by combining the two in a way that makes sense to your program. no reason to try to make it more complicated than that. – Collodion 17/3, 2017 at 19:0

amazon.com/Manual-Programmers-Hardware-Reference-240487-001/dp/… best place to start. can get one for a few bucks – Collodion 17/3, 2017 at 19:3

related: What are the segment and offset in real mode memory addressing? – Khelat 23/3, 2018 at 8:34

...we could divide the physical memory into 16 segments with 64KiB each.

True, but more exact would be to phrase this as "16 non-overlapping segments" since there's also the possibility to divide the memory into 65536 overlapping segments.

When the A20 line is enabled, we have more than 1MB to play with. (1048576+65536-16) When setting the relevant segment register to 0xFFFF, we can gain access to the memory between 0x0FFFF0 and 0x10FFEF.

The main features of both kinds of segments are:

Non-overlapping segments
- Contain 65536 bytes.
- Are 65536 bytes apart in memory.
- This is the way us people often conveniently view memory. It enables us to say that we've put
  - the graphics window in the A-segment (0xA0000-0xAFFFF)
  - the text video window in the B-segment (0xB0000-0xBFFFF)
  - the BIOS in the F-segment (0xF0000-0xFFFFF)
Overlapping segments
- Contain 65536 bytes.
- Are 16 bytes apart in memory.
  
  Sometimes you'll see people refer to a 16-byte chunk of memory as a segment but obviously this is wrong. There is however a widely used name for such an amount of memory : "paragraph".
- This is the way the CPU (in the real address mode) sees memory.
  The processor calculates the linear address using next steps:
  - First is calculated the offset address from the operands of the instruction. The result is truncated to fit in 16 bits (64KB wraparound).
  - Next is added the product of SegmentRegister * 16
    If the A20 line is inactive the result is truncated to fit in 20 bits (1MB wraparound).
    If the A20 line is active the result is used as is and thus no 1MB wraparound occurs.

Suppose if I am given with an Offset address of 0x6000, How can I find the segment to which it belongs in order to address it.

Here again the problem lies in the phrasing!

If by "an Offset address of 0x6000" you mean an offset like the one we normally use in the real address mode programming then the question cannot be answered since there is such an offset 0x6000 in every segment that exists!

If on the other hand the wording "an Offset address of 0x6000" actually refers to the linear address 0x6000 then there are a lot of solutions for the segment register:

segment:offset
--------------
   0000:6000
   0001:5FF0
   0002:5FE0
   0003:5FD0
   ...
   05FD:0030
   05FE:0020
   05FF:0010
   0600:0000

As you can see there are 0x0601 possible segment register settings to get to linear address 0x6000.
The above applies to when the A20 line is indeed enabled. If A20 was inactive then the linear address 0x6000 (just like any other linear address from 0 to 1MB-1) can be reached in precisely 0x1000 (4096) ways:

segment:offset
--------------
   F601:FFF0
   F602:FFE0
   F603:FFD0
   ...
   FFFD:6030
   FFFE:6020
   FFFF:6010
   0000:6000
   0001:5FF0
   0002:5FE0
   0003:5FD0
   ...
   05FD:0030
   05FE:0020
   05FF:0010
   0600:0000

Rubdown answered 17/3, 2017 at 17:47 Comment(0)

In this answer, I am only giving an explanation for real mode. In protected mode, segementation is a bit more complicated and as you're probably never going to write a segmented protected mode program, I'm not going to explain this.

Segments are very simple actually. The 8086 CPU has four segment registers named cs, ds, es, and ss. when you access memory, the CPU computes the physical address like this:

physical_address = segment * 16 + effective_address

where effective_address is the address indicated by the memory operand and segment is the content of the segment register for this memory access. By default, cs is used when the CPU fetches code, ss is used for stack pushes and pops as well as memory operands with bp as the base register, es is used for certain special instructions and ds is used everywhere else. The segment register can be overridden using a segment prefix.

What does that mean in practice? The 8086 has 16 bit registers, so using a register to store an address allows us to address up to 65536 bytes of RAM. The idea behind using segment registers is that we can store additional bits of the address in a segment, allowing the programmer to address a bit more than 2²⁰ = 1048576 bytes = 1 MiB of RAM. This RAM is sliced into 65536 overlapping segments of 65536 bytes each, where each segment is one value you can load into a segment register.

Each of these segments starts at an address that is a multiple of 16 as you can see in the address computation logic above. You can tile the entire 1 MiB physical address space with 16 non-overlapping segments (as you explained in your question) values 0x0000, 0x1000, ..., 0xf000 but you can use any segment selector you like as well.

Aloise answered 17/3, 2017 at 15:48 Comment(0)

In general segments are intervals of memory using an internal indexing system.

If you think of the memory as a long array of bytes mem[0x100000] you could specify a continuous slice seg=mem[a:a+b], with len(seg)=b where

seg[0] is stored in mem[a]
seg[1] is stored in mem[a+1]
...
seg[b-1] is stored in mem[a+b-1].

The advantage of using segments is, that the addresses (index of seg) within a segment can be shorter, e.g. in case of the 8086, the addressable memory goes up to physical address (index of mem) 2²⁰-1, (on the successors you can even go a little further, with 16 bit addresses in segments). Also it's rather simple to put a program anywhere in memory because you only need to allocate one or a few free segments and most addresses operate within the dedicated segment without needing to adjust them.

On the 8086 all segments are 2¹⁶ bytes long, so that the intra segment addresses fit within 16 bits which makes them easy to handle. For the start address you can select any address in the physical memory that can be divided by 16 and is equal or below 0xFFFF0. This means that any physical address lies in multiple segments. The segments are described by a 16 bit number which is the start address divided by 16.

So Segment 0xBADA corresponds to the segment starting at 0xBADA0.

Fatso answered 17/3, 2017 at 20:20 Comment(5)

"On the 8086 all segments are 2^16-1 byte long" This is clearly wrong! You're off by 1. – Rubdown 18/3, 2017 at 12:56

"So Segment 0xbada corresponds to the segment starting at 0xbaba." Care to explain a bit further? – Rubdown 18/3, 2017 at 12:57

Sorry your found the two errors in my post, I've fixed that. I should be more carefull in the future. What I did is trying to explain what segments are about in general. How they work in the 8086 is probably explained better the other answers already. – Fatso 18/3, 2017 at 17:25

"So Segment 0xbada corresponds to the segment starting at 0xbaba0." Your 'fix' still has this unexplained difference (d vs b). Let's call it a typo, but make sure to correct it. – Ric 19/3, 2017 at 16:40

Your answer is not without merits. It mentions some good points about segments. You could improve it a lot by using bullet points, well chosen whitespace, bold/italic words,... (Please change "witch" into "which") – Ric 19/3, 2017 at 16:58

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags