One-Pagers‎ > ‎OP4: Layering‎ > ‎

OP4: Virtual Disk Layering

Vera Salvisberg
There are many advantages to VMs offering the same block abstraction as conventional systems. First of all, using an identical API means that the same (OS) code can run on both platforms (portability), resulting in lower costs through reuse. Furthermore, programmers don’t need to learn about a new abstraction. It also means that the VM can be hidden completely, so that the host has no way of telling whether it is on a virtual system or not (hiding).

But it also has some downsides, mainly manifesting themselves as performance trade-offs. Each layer introduces a new computation delay because of the translations to the API of the next lower layer. But even worse than this overhead is the fact that the lower layer sees a file, whereas the upper layer sees blocks of memory. The host OS probably does some smart things about block allocation, but in a naive implementation the hypervisor will just mirror the data and free blocks continuously in one huge file, wasting a lot of memory. Caching is another crucial part of today’s storage systems, and here again, the caching mechanisms for files might not be ideal for the use case where blocks are read individually. Performance can also suffer because the VM has no way of knowing the actual disk layout if all it sees is a file, so the tricks like putting the metadata in the middle of the spindle might result in the opposite placement on the physical disk.

Our idea to have a layer-free solution is to use layer bypassing. When the host OS emits a block instruction, it gets passed through the hypervisor and directly executed on the hardware, bypassing the file layer. In the other sense, the hypervisor can provide the topologically correct block information.

However, the system still has to guarantee the isolation of the virtualized system, and protection for the lower layer’s own data structures. Those two properties are the minimal set needed for a safe operation of any kind of storage system, and they make it impossible to design the perfect system. We need to have a certain division of the physical name space, which is either done by the virtualization layer, trading power against ease of implementation in the virtualized host, or by the hardware layer, in which case the virtualization isn’t a perfect illusion anymore because the virtualized host won’t be able to access some of the blocks.

Today’s VM architecture usually chooses the virtualization approach because development time and simplicity are valued more highly than squeezing the last performance out of a system, at the cost of a less beautiful abstraction. One could use the idea of the Differentiated Storage Services paper [1] to pass additional information about the properties of the blocks to be used, without exposing the lowest layer directly.

[1] M. Mesnier et al., Differentiated Storage Services (SOSP 2011)