One-Pagers‎ > ‎OP4: Layering‎ > ‎

OP4: Using files to store VM disk image

Pierre-Julien Bringer
Virtual machines require some form of backup storage. For many virtualization platforms this backend is plain files. We explore the upsides and downsides of this design.

Files come with high level constructs which are useful for guest virtual machines (VM). First, files are easy to manage from the host system: if the VM is turned off, or it can be otherwise guaranteed that writes won't happen, the file can be manipulated by regular tools. Files are referenced by names, and the associated content can be changed, giving flexibility such as replacing the image for a system. Many modern file systems support sparse files, which enables oversubscription of disk space. Using files allows for copy-on-write, either at the filesystem if it supports deduplication as ZFS does, or at the hypervisor level, such as with KVM's qcow format. Both of these optimisations are sensible in a virtualized context if the VMs are exposed more disk than most will use, and share a lot of common block, for example if they were copied from the same base image.

The downside to this approach is that it is more complicated and less performant. Accessing a random part of a file requires going through two file and VFS stacks. I/O scheduling becomes less efficient: the guest OS can't perform useful reordering of I/O requests [1]. Modern file systems comes with mechanisms to ensure a higher level of durability through the use of logs; this functionality is implemented twice when layering a file system on top of another one, breaking with the end-to-end principle.

The virtualization method dictates what approach can be taken. In full device virtualization, all behavior of the virtual disk is emulated in software. The emulation software will in particular implement SCSI read and write commands. In paravirtualization, the guest is modified to interact with the host system at a higher level, essentially bypassing the lower level layers of the guest. This requires modification to the guest, but has better performance. Virtual machines can also directly be allocated a physical I/O device. This requires address translation for DMA and interrupts to function correctly [2]. In the x86 world, hardware virtualization support for this has appeared starting 2005.

In addition, volume management tools have improved. For instance on Linux, LVM can add more storage without stopping the guest. It supports snapshots and copy-on-write. Replication at a block level is possible with DRBD. Performance and reliability levels are easily chosen at the volume level with the use of RAID.

Because of the appearance of the required processor extensions, and the improvements to volume management tools, a modern approach to virtualization on the x86 architecture should be based on direct access by the guest to a volume.

[1] Does virtualization make disk scheduling passé? Boutcher, Chandr
[2] Intel Virtualization Technology for Directed I/O