How does an operating system draw windows on the screen?

I realized after many years of using and programming computers that the stack of software that actually draws on the screen is mostly a mystery to me.

I have worked on some embedded LCD GUI applications and I think that provides some clues as to a simplified stack but the whole picture for something like the Windows operating system is still murky.

From what I know:

Lowest level 0 is electronic hardware (integrated circuits) that provide a digital interface to turn a pixel on the screen a certain color or grey scale shade. The interface is documented in data sheets so you know how to toggle the digital lines to turn any pixel the way you want it.
Next level 1 is a hardware driver. This usually abstracts the hardware into a common interface. Something like SetPixel() etc.
Next level 2 is 2D/3D graphics library (of which I have limited widget/single screen experience). The lower levels seem to provide a buffer or range of memory that represents the pixels on the screen. The graphics library abstracts this so you can call functions like DrawText("text", 10, 10, "font") and it will set the pixels for you in the right way.
Next level would be the magic of the OS. The windows/buttons/forms/WPF/etc is created in memory and then routed to the appropriate driver while also being directed to a certain part of the screen?

But how does something like Windows really work?

I would assume that the GPU fits between level 0 and level 1. The GPU drives the pixels on the display directly and now the level 1 drivers are a GPU driver. There are more functions available to enable the added functionality a GPU provides. (what would this be though? Does the OS pass on an array of triangles in 3D space and the GPU processes this into a 3D perspective view and then chuck it on the screen?)

The biggest mystery to me though is when you get into the windows part of things. You can have sketch up, visual studio and a FPS game all running at the same time and be able to switch between them, or in some cases tile them on the screen or have then spread across multiple screens. How is this tracked and rendered? Each of these would have to be running in the background and the OS would have to say which graphics pipe should be connected to which part of the screen. How would Windows say this part of the screen is a 3D game and this part is a 2D WPF app etc?

On top of that all you have DirectX used in one application and Qt in another. I remember having multiple games or apps running that use the same technology so how would that work? From what I can see you would have Application->Graphics library (DirectX, WPF etc)->Frame Buffer->Windows director (where and what part of the screen should this frame buffer be scaled to)->Driver?

In the end it is just bits toggling to indicate which pixel should be what color but it is one hell of a lot of toggling bits along the way to get there.

If I fire up Visual Studio and create a basic WPF app what is all going on in the background when I drop a button on the screen and hit start? I have seen the VS designer to drop it on, created it in XAML and I have even manually drawn things pixel by pixel in an embedded system but what happens in between, the so-called meat of this sandwich?

I have used Android, iOS, Windows and Linux and it seem to be a common functionality but I have never seen or heard an explanation of the how behind what I outline above, I only have a slightly educated guess.

Is anyone able to shed some light on how this works?

VGA

Assuming x86, VGA memory is mapped at a standard video buffer address in the lowest 1 MiB (0x000B8000 for text mode and 0x000A0000 for graphics mode). There are also many VGA registers that control the behaviour of the card. There were two widely used video modes, mode 0x12 (16-color 640x480) and mode 0x13 (256-color 320x200). Mode 0x12 involved switching planes (blue, green, red, white) with VGA registers, while mode 0x13 involved having a 256-color palette which can be modified using VGA registers.

Normally, an OS relying on VGA would set the mode using BIOS while booting, or write to the appropriate VGA registers at runtime (if it knows what it is doing). To draw to the screen, the video driver would either simply write to the video memory (mode 0x13) or combine that with writing to VGA registers too (mode 0x12).

Most cards in use today are still (partly) VGA compatible.

VBE

Some years later, VESA invented "VESA BIOS Extensions", which was a standard interface for video cards and allowed higher resolutions and greater color depths. The video memory was exposed through two different ways: banked mode and linear framebuffer. The banked mode would expose some small portion of the video memory to a low address (0x000A0000) and the video driver would need to switch banks almost each time the screen is to be updated. The linear framebuffer is a much more convenient solution, which would map the entire video memory to a non-standard high address.

During boot, an OS would call the VBE interface to query for supported modes and to set the most convenient one, or it would bypass the VBE interface and write directly to the needed video hardware registers (if it knows what it is doing). In either between the banked mode and the linear framebuffer, the video driver would write to the specified memory address to which the video memory is mapped.

Most cards in use today are still (partly) VBE compatible.

Modern video interfaces

The most modern video interfaces usually aren't documented as widely as VGA and/or VBE. However, the video memory is still mapped at an address, while hardware registers and/or a buffer contain modifiable information about the behaviour of the graphics card. The difference is that the interfaces aren't standardised anymore and nowadays an advanced OS requires different drivers for each graphics card.

VGA

VBE

Modern video interfaces

Recommended topics

Hot tags