How can CAS simulators like PTLsim achieve cycle accurate simulation of x86 hardware?

Asked 10/1, 2013 at 13:54 Answered 13/1, 2013 at 11:2

Can somebody enlighten me how does CAS software like http://www.ptlsim.org/ work? How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known? Or is everything available over NDAs?

I suppose they probably can be pretty accurate with caches being hit or missed (maybe I'm wrong but I think it's easier to predict the internal algorithm), but without knowledge how pipelining and superscalarity are implemented I'm confused how to make simulation cycle accurate.

Consequently, is it theoretically possible to implement hard rtos on x86 based hardware? Or one should sign some sort of NDA to be able to estimate required time per instruction?

Commit answered 10/1, 2013 at 13:54 Comment(5)

Check out the FAQ, focus on their "co-simulation" technique. This works by actually executing the simulated code on the host processor. – Gneiss 10/1, 2013 at 16:58

@HansPassant I think you misunderstand the concept of "co-simulation". Cycle accurate simulation is exclusively based on the user provided model of the processor. The host processor is only used to speed up and/or verify the simulation. – Isthmian 10/1, 2013 at 23:20

That would be just plain simulation, without the "co". – Gneiss 10/1, 2013 at 23:30

@HansPassant I agree: the term is misleading. Maybee they thought of co-routines? Who knows... – Isthmian 10/1, 2013 at 23:40

Intel employee WONTfixing this in 2010: software.intel.com/en-us/forums/… Related for ARM: #17455455 – Leckie 7/11, 2015 at 8:47

First Question: How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known?

The simulator does provide a cycle accurate simulation for a sufficiently accurate CPU model but does not come with out-of-the box models for Intel's or AMD's current offerings. Someone at Intel or AMD with access to the required information could create a RTL level model and get cycle accurate simulations for current processors. People outside Intel and AMD cannot. You can still feed publically known information to the simulator and get reasonable results. These results will not be identical to the real hardware.

If you are a software developer and want to benchmark real hardware, use real hardware! Simulators like PLTsim are designed for (academic) hardware developers who want to test new hardware features without spending hundreds of thousands of dollars on a new chip.

Second Question: Is it theoretically possible to implement hard rtos on x86 based hardware?

Of course it is theoretically possible. You would need to consider the absolute worst case for each code segment for all inputs under all circumstances. The practical problem is that processors like Core 2 are very complex and the state of the processor is enormous. Additionally these processors are not designed to behave deterministically with respect to timing. A really hard RTOS would have to be extremely conservative. Finally, as you correctly observe, people outside Intel and AMD don't have access to all the information required to make those conservative assumptions. In practice it is resonable to pass on the latest and greatest cpus and instead use older, simpler cpus that have a deterministic timing.

On the other hand, if the RTOS does not have to be really hard real time, you can always just include some safety margin and hope for the best. ;-)

Isthmian answered 10/1, 2013 at 23:12 Comment(0)

This is not an answer to both questions; I'm only going to answer the second one. Feel free to upvote, although Mackie's answer seems better in general.

Hard RTOS is hard to implement on x86. One special thing that can kill every promise made by RTOS is SMM, or System Management Mode. CPU enters it after System Management Interrupt, which can fire for different reasons - hardware failure, write at some special MMIO location, out instruction to some special port. You cannot disable it, you cannot really predict when SMI happens and SMI handlers can take very long time to finish.

Essentially, you know exactly nothing about when CPU is in SMM, until something fails in your OS thanks to long time CPU has spent handling SMI. In some special cases, it can become a problem even for not-realtime OSes, not to mention hard RTOSes.

There is also this thread than can provide you with some more points about RTOSing on x86.

Priester answered 10/1, 2013 at 23:25 Comment(0)

On the site you linked there are statements such as;

PTLsim is a state of the art cycle accurate microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets.

and

It runs directly on the same platform it is simulating (an x86-64 or x86 machine running Linux)

It is not clear to me then how this differs from any other x86 Virtual machine technology such as QEMU, VirtualBox, VMWare or Virtual PC, which would be cycle accurate by virtue of actually directly running instructions on the hardware (as well as running at core speeds). Is it a simulator or a VM? In my mind they are not the same thing; bochs for example is a simulator rather than a VM, PTLsim appears to be somewhere in-between perhaps?

Backspin answered 13/1, 2013 at 11:2 Comment(0)

Recommended topics

Hot tags