First Question: How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known?
The simulator does provide a cycle accurate simulation for a sufficiently accurate CPU model but does not come with out-of-the box models for Intel's or AMD's current offerings. Someone at Intel or AMD with access to the required information could create a RTL level model and get cycle accurate simulations for current processors. People outside Intel and AMD cannot. You can still feed publically known information to the simulator and get reasonable results. These results will not be identical to the real hardware.
If you are a software developer and want to benchmark real hardware, use real hardware! Simulators like PLTsim are designed for (academic) hardware developers who want to test new hardware features without spending hundreds of thousands of dollars on a new chip.
Second Question: Is it theoretically possible to implement hard rtos on x86 based hardware?
Of course it is theoretically possible. You would need to consider the absolute worst case for each code segment for all inputs under all circumstances. The practical problem is that processors like Core 2 are very complex and the state of the processor is enormous. Additionally these processors are not designed to behave deterministically with respect to timing. A really hard RTOS would have to be extremely conservative. Finally, as you correctly observe, people outside Intel and AMD don't have access to all the information required to make those conservative assumptions. In practice it is resonable to pass on the latest and greatest cpus and instead use older, simpler cpus that have a deterministic timing.
On the other hand, if the RTOS does not have to be really hard real time, you can always just include some safety margin and hope for the best. ;-)