It was bought to my attention recently that it’s been a long time since I’ve posted anything in my Safe Computing series. Today I want to talk about an idea for post-modern CPUs that I’m going to call Heterogeneous Multi-Core (HMC). As always, this is strictly a theoretical exercise; I’d be delighted if CPU designers picked up my ideas, but I don’t actually expect it to happen, so don’t bother pointing out that it won’t. On the other hand, please do point out any logical errors or oversights on my part.
Heterogeneous Multi-Core means exactly what it says: the CPU has more than one type of core. Such CPUs may well exist already, but the best known lines of CPU are homogeneous, that is, each core is (more or less) the same as the others. (I don’t count integrating a GPU into a CPU because this is such a trivial case, and because if I understand correctly the CPU and GPU are still logically distinct even if they are on a single chip.) Note that in this context I’m talking about differences as seen by a software developer, rather than hardware differences, although of course the one largely determines the other.
There are all sorts of possibilities within this general concept. For example, you could have cores of varying speeds (in fact I gather this is already done). I’m mainly interested in possibilities that would help with Safe Computing, though, so I’m going to concentrate on that.
The master core (or perhaps cores) runs the top-level kernel. The key idea here is that each master core determines all execution context for the cores under its command. The various servant cores don’t have a kernel mode; everything that would normally be done in kernel mode is instead handled by instructions from the master core over a dedicated channel.
In principle, the master core could run the OS kernel, but I think it would be preferable for it to run a hypervisor instead. Also, to minimize compatibility issues, the hypervisor code should be part of the CPU firmware rather than being a third-party product. You’d need some kind of message queue from the servant cores so that the OS kernel(s) can give the hypervisor instructions.
The master core should use an instruction set suitable for the purpose and the hypervisor should be single-threaded. The source code should also be available to the public so it can be analyzed for possible flaws.
Hypervisor device drivers for on-board devices could be included in the motherboard firmware. (These wouldn’t be run on the master core itself, of course.) Similarly, the motherboard can provide a base OS in firmware to serve the user interface functions currently provided by the BIOS as well as those normally provided by a host OS or hypervisor console.
I’m guessing that each master core could handle at least 64 servant cores, and quite possibly many more than that. So it seems likely that for the time being only one master core would be needed. This would be preferable, because it simplifies the hypervisor design significantly.
Central Processing Cores
I’m going to call servant cores intended for general processing, i.e., running the OS and applications, Central Processing Cores, or CPCs for short. As previously discussed, these cores wouldn’t have any form of kernel mode. Instead, they would receive instructions from the Master Core (MC) to set security context, load or save register values, start, stop, and so on.
One useful optimization would be for the CPCs to have two sets of registers – somewhat similar to hyperthreading – so that all the information necessary for the next thread to execute can be loaded into the core ahead of time. When the MC issues the instruction to switch context, the core could begin executing the new thread almost immediately. The register contents for the old thread could then be lazily saved back to main memory before setting up for the next context switch.
As a special exception to the “no kernel mode instructions” rule, a halt command could be available to code running on the CPC. If the MC has already configured and approved the next context switch, that could happen immediately, if not, the CPC would halt execution until instructed by the MC to continue.
Although I don’t normally talk about backwards compatibility in my “Safe Computing” posts, this particular possibility is too significant to overlook. An x86 core would implement the user mode parts of the x86 instruction set. It should be simpler than a “real” x86 core because it doesn’t need kernel mode. An AMD64 core would implement the AMD 64-bit instruction set. It might be preferable to provide cores that can do either, but I’d leave this to the experts to decide.
The theory here is that operating systems could be ported to the HMC CPU while retaining the ability to run existing PC applications. That is, the operating system would need to be ported, but the applications wouldn’t. Windows already supports multiple CPU architectures, so it should be (relatively!) simple for Microsoft to port it to HMC. I’m not sure about Linux, but I know there is already support for paravirtualization under Xen, so one option would be to port Xen, or, rather, re-implement the Xen guest machine interface.
It would also be desirable to be able to run unmodified PC operating systems in virtual machines. This would require a bit more work. Conceptually it would be cleaner to handle the kernel mode parts entirely (or almost entirely) in software. There are some instructions which exist in both kernel and user mode but behave differently, so these would need to be implemented carefully to avoid causing problems. We would also need x86-like address space mapping, which is a shame, but would probably be worth it.
In summary, some additional hardware assistance might be appropriate, but only if it doesn’t complicate the cores too much.
Non-x86 Processing Cores
Just because we want some x86/AMD64 cores for running legacy applications doesn’t mean we shouldn’t have better instruction sets available as well. I keep meaning to write about some of the ways a CPU instruction set could help make software more reliable, but so far I’ve only posted about protecting the flow of execution. I’ll try to do better in the coming months.
Since I mentioned it already, I will point out that while we of course need to be able to restrict which parts of memory the CPC can read and/or write to, we don’t necessarily need a fully mappable per-process address space of the sort x86 CPUs provide. Once again, I’ll try to discuss this in more detail later this year.
There could be some advantage to having a separate core type designed specifically for OS kernels, but my ideas on this front are still pretty unformed. I’d probably insist on a single-threaded design, so that wouldn’t work for ported operating systems.
When it comes to the hypervisor’s device drivers, we’re free to insist on a new instruction set, since architectural differences would almost certainly make porting existing drivers infeasible anyway. (Also, we’re talking about a brand new hardware platform, so it seems likely that most of the core devices will be new designs. Conventional motherboard designs are fairly unsatisfactory from a security standpoint – with luck, the subject of a future post – so that further reduces any risk from changing the instruction set.)
I don’t really have many solid ideas about IO core design either, but I’m of the opinion that it would be beneficial to use a customized instruction set. Again, the drivers should be single-threaded. We’d need an efficient way of passing packets of data between related drivers, including both hypervisor and OS-level drivers. Some IO cores might be dedicated to running a single driver, i.e., without multitasking, which I think would make programming some sorts of drivers a whole bunch easier.
I’d also want to investigate whether we could eliminate the need for DMA by using dedicated IO cores with a suitable instruction set and direct access to the IO bus(es). As well as potential security benefits, if we assume that there will be only a single multi-core CPU per motherboard, this would mean that only the CPU would need to talk to the RAM, which has to simplify all sorts of issues around caching and locking memory access.
It might be useful to have a cheap and simple core designed specifically for applications that are generating sounds; it shouldn’t need to be all that fast, since audio frequencies are 20kHz or lower, although you’d need enough power to process, say, MP3 data in real-time. (I presume dedicated hardware support would speed this up.)
The main idea here is to ensure that audio applications aren’t being interrupted arbitrarily by other tasks, so as to reduce pops and crackles and to make it possible to minimize latency. A direct channel from the audio multimedia core to the audio IO core might be sensible – personally I’d be inclined to bypass the need for audio device drivers altogether and just have a couple of analog output pins on the CPU, but I imagine that would horrify all the audiophiles.
Video multimedia is a different story. I’m not sure that a completely distinct instruction set would be sensible, but perhaps an extended instruction set to provide hardware assistance to codecs could be provided on only some of the CPCs? On the other hand, the OS would then have keep track of which threads needed which cores, which might be more trouble than it was worth.
Mentioned only because it is so obvious. I don’t have the background knowledge to weigh in on the argument about whether separate or integrated GPUs are better, although I do want to see standardization on the instruction set.
Well, I guess that’s all. Thanks for reading.