Why did Meltdown and Spectre bugs go undiscovered for so long?

About

Asked 31/1, 2018 at 11:19 Answered 31/1, 2018 at 14:15

Why did Meltdown and Spectre bugs go undiscovered for so long?

Nearly 20 years these bugs have been present in the CPU's why wasn't this discovered sooner given the serious implications for all computers using these processors?

Moria answered 31/1, 2018 at 11:19 Comment(11)

Because the people who put them there didn't want them gone and very few people know enough to find the issue to begin with. This is more a philosophical question though, not a programming one. For that you want Stefan Molyneux. Voting to close. – Cumshaw 31/1, 2018 at 11:27

OK thank you but I still think it is programming relevant as these bugs can be exploited with programmers code! – Moria 31/1, 2018 at 11:35

Yeah, but you understand that this question is waaay too broad and not programming specific. I mean, almost everything these days ties back to code at some point. Governments run on software, so do cars. But this stackexchange is specifically for solving extremely specific programming questions. Why would major companies receive CIA money to create backdoors into pretty much every system on the planet? Can you think of a code-related motivation for that? I don't think you can solve the question with code. – Cumshaw 31/1, 2018 at 11:42

@G_V: you're suggesting that CPU architects knew there was a security issue, but intentionally left it unfixed? Or that a specific CPU architect introduced them without anyone else realizing? That's quite a conspiracy theory, and not very plausible. Spectre is a fundamental consequence of out-of-order speculative execution + branch prediction. Meltdown has similar obvious (to a CPU architect) motivations for performance. See this 2012 answer from Intel P6 architect Andy Glew about why delayed permission checks on memory access makes sense. – Ensile 31/1, 2018 at 19:30

IMO, the discovery of Meltdown is kind of like discovering that CFCs deplete the ozone layer so we have to redesign / replace lots of existing hardware, or work around it. With new CPU hardware, Meltdown can be fixed for near-zero perf cost. Spectre is even worse: there's no clear path to a low-overhead way to mitigate it in general, in hardware or software. Even protecting the kernel itself from user-space is hard, let alone all of user-space. – Ensile 31/1, 2018 at 19:36

I'm voting to close this question as off-topic because this would be better asked on Information Security, for instance. – Ful 31/1, 2018 at 23:36

Let's just say wikileaks has some interesting documents on this and close the question. – Cumshaw 1/2, 2018 at 8:43

@PeterCordes: umm, Peter, the answer you refer to isn't talking about delaying the permissions check on an access, but doing it anyway. The CPUs I worked on report the access later, but did the check immediately, and did not perform the access - i.e. they did not have Meltdown. In a later comment I mention bluesky speculating past page faults - but, again, that was doing speculative work from later instructions independent of the page fault, not completing the not allowed page fault. – Hanky 25/4, 2018 at 22:4

@KrazyGlew: oh! It's been widely reported that all Intel out-of-order CPUs are vulnerable to Meltdown. At least I thought it had been, or maybe that was Spectre. P6-family apparently became vulnerable with Conroe/Merom if that list is complete. I re-read your answer, it says less than I remembered about synchronous exceptions. Apparently some of what I thought you said is what I made up for answers like Out-of-order execution vs. speculative execution :P – Ensile 25/4, 2018 at 22:38

@KrazyGlew: Any idea what kind of microarchitectural benefits you'd get from doing a load if there's any kind of TLB hit, even one with insufficient permissions? (please reply over here on my answer about it.) – Ensile 25/4, 2018 at 22:47

I'm voting to close this question as off-topic because it us about hardware design and QA. – Tincal 6/10, 2018 at 8:40

-3

The answer is quite simple: modern CPUs have few billions transistors. For example, the latest Intel Skylake architecture has ~2 billion. Each transistor might have a state which influence the state of other transistors (i.e. those transistors are connected somehow).

Basically, this means there is so many possible permutations or states of the modern CPUs, we simply are not able to test in a lifetime. So we (or rather producer) test just some of the states and in some scenarios, leaving a potential room for dangerous corner cases.

Spectre and Meltdowns are such untested corner cases, but there might be much more there due to complexity of the modern CPUs.

Latinalatinate answered 31/1, 2018 at 14:15 Comment(8)

Please comment why downvote? There are might be some people on StackOverflow believing that Intel, ARM and IBM did that on purpose. But it is StackOverflow, not RumorsOverflow. So until otherwise is proven in court, we should stick with technical reasoning, like the one I gave in the answer. Maybe it is oversimplified, but most of the people do not quite understand how complex are CPUs nowadays... – Latinalatinate 1/2, 2018 at 7:54

Spectrum and Meltdowns are not corner cases, they are side channel attacks, the system behave as it supposed to, and those exploits take advantage of the “speculation” – Swarey 8/2, 2018 at 5:58

@Matt.St so you suggest Intel, ARM, IBM, AMD all knew there is a side channel, yet they decided to hide it from us? I used the therm "corner case" to show that CPU producers were not aware about the situation, because the modern CPUs are extremely complex piece of hardware with up to 8 billion transistors in some Xeons, for instance. It is simply impossible to test all the states of the CPU and discover all possible side effects it produces... I am perfectly aware what is Spectre and Meltdown, here is my 99-lines PoC: github.com/berestovskyy/spectre-meltdown But this is a simple answer – Latinalatinate 8/2, 2018 at 9:44

Pretty sure Intel / AMD knew how their CPUs behaved, especially Meltdown because it's simple. Just none of those people who knew the details had thought of the security implications. – Ensile 25/4, 2018 at 22:47

@PeterCordes it simple, but it works only in conjunction with Spectre (i.e. a side channel). So IMO it is a corner case, since for a successful attack few things need to be there and carefully prepared... – Latinalatinate 27/4, 2018 at 10:4

Terminology: Spectre is a separate exploit, not part of Meltdown. Spectre is the exploit where you prime the branch predictors to mispredict and indirect-branch, or a conditional branch in front of an array access. Both Spectre and Meltdown put secret architectural into the microarchitectural state, which Intel engineers thought was ok because they didn't know about or realize the power of cache-read timing side-channels to read that microarchitectural state back into architectural state. – Ensile 27/4, 2018 at 11:20

@PeterCordes sorry, I meat "... in conjunction with cache timing (or another side channel)". Peter, what exactly you disagree with, terminology? Ok, let's not call it a "corner case" but whatever... – Latinalatinate 27/4, 2018 at 11:56

I was disagreeing with "but it works only in conjunction with Spectre (i.e. a side channel)." You can do a Meltdown attack without Spectre, or vice versa. Spectre is not the name of the side-channel. I wrote some about Spectre in SO answers and on Security.SE, including Why are AMD processors not/less vulnerable to Meltdown and Spectre? with some microarchitectural details / explanations. – Ensile 27/4, 2018 at 17:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags