The disclosure of the Meltdown and Spectre computer vulnerabilities on 2 January 2018 was in many ways unprecedented. It shocked – and scared – even the experts.
The vulnerabilities bypass traditional security measures in the computer and affect billions of devices, from mobile phones to massive cloud servers.
We have, unfortunately, grown used to attacks on computer systems that exploit the inevitable flaws resulting from vast conceptual complexity. Our computer systems are the most complex artefacts humans have ever built, and the growth of complexity has far outstripped our ability to manage it.
A new kind of vulnerability
Meltdown and Spectre are qualitatively different from previous computer vulnerabilities. Not only are they effective across a wide class of computer hardware and operating systems from competing vendors. And not only were the vulnerabilities hiding in plain sight for more than a decade. The really shocking realisation is that Meltdown and Spectre do not exploit flaws in the computer hardware or software.
As Intel stated in its press release, these attacks:
…gather sensitive data from computing devices that are operating as designed.
The ingenuity of the attacks lies in combining seemingly unrelated design features that were thought to be well understood – stuff we teach undergraduate computer science students. The vulnerability is not in any of the individual features, but in the complex interaction between them.
It turns out that computer systems are insecure not because of mistakes made in the implementation, but because of ill-conceived design.
As a community of computer systems experts, we have to ask ourselves how such a debacle is possible, and how a recurrence can be prevented.
We have known for a while that the established “wait for something to happen and then try to fix it” approach – better known as “patch and pray” – does not work even for more common implementation flaws, as witnessed by the proliferation of exploits. It works even less well for such insecure-by-design situations.
Automated evaluation of designs
The fundamental problem is that humans are unable to fully understand the conceptual complexity of modern computer systems and how its seemingly unrelated features might interact. There is no hope that this will change.
But solving complex problems is what machines are increasingly good at. So, the only real solution can be the automated evaluation of designs, with the aim of mathematically proving that under all circumstances a design will behave in a way that is considered secure – in particular by not leaking secret data.
In other words, a design must be considered insecure unless there is a rigorous mathematical proof to the contrary.
This is not an easy ask by any definition, and much more work across many areas of computer science and engineering is needed to make it a reality. But we need to start somewhere, and we need to start now.
We will reap benefits of embarking on such a program long before we achieve the goal of rigorous end-to-end proof. Significant improvements will be achieved through partial results, both in the form of proving weaker properties, and by establishing desired properties in a less rigorous fashion.
For example, an incomplete evaluation may be more feasible than a complete one, and produce a probabilistic result, such as a greatly reduced likelihood of exploits.
Rewriting the hardware-software contract
A necessary, and overdue, first step is a new and improved hardware-software contract.
Computer systems are a combination of hardware and software. The people and companies that develop hardware are largely separate from those developing the software. Given the vastly different skills and experience required, this is inevitable.
To make development practical, both sides work to an interface, called the instruction-set architecture (ISA), which presents the contract between hardware and software functionality.
The problem, clearly exhibited by the Meltdown and Spectre attacks, is that the ISA is under-specified for security, or safety for that matter. It simply does not provide ways to isolate the speed of progress of a computation from other system activities.
The ISA a functional specification, meaning it defines how the visible state of the machine will (eventually) change if an operation is triggered. It intentionally abstracts away anything to do with time. In particular, it hides how long operations take and how this time depends on the internal state of the machine. The problem is that this internal state depends on potentially confidential data processed by previous operations.
This means that by observing the exact timing of particular sequences of operations, it is possible to infer data that is supposed to be kept secret. This is exactly what happened with Meltdown and Spectre.
The abstraction is there for a good reason: It allows hardware designers to change things “under the hood”, usually in order to improve performance. Consequently, there will be resistance from hardware manufacturers to a tighter contract. But we believe that the refined specifications can be kept abstract enough to retain manufacturers’ ability to innovate, and to avoid exposing confidential IP.
The recent debacle has shown that the ISA is too abstract, making it impossible to tell whether a system is secure or if it will leak secrets. This must change, urgently.