Title
Virtualized Build Automation in Vesta
Abstract
Vesta is a system for version control, configuration management, and build automation. It uses a light-weight filesystem virtualization to encapsulate each step performed during a build. This insulates builds from external interference and facilitiates the use of multiple execution environments with different versions of compilers and other tools. Total filesystem virtualization also makes it possible to provide guarantees about the correctness and repeatability of build results.
Vesta has been in use by microprocessor design teams at Compaq and Intel for 10 years. It has been available as free software for 6 years. In that time it has been successfully applied to a wide variety of build automation tasks. Vesta may represent the most extensive use of virtualization for build automation. Both practical experience with Vesta and other technologies that have emerged and matured since Vesta was developed suggest opportunities for further reasearch and improvement.
Outline
- Overview of Vesta
- Motivation
- Repeatability
- Auditability
- User/machine/site independence
- Insulating users from each other
- Fine-grained dependency detection and precise re-use
- Detailed Technology Overview
- Virtual filesystem + chroot + execve
- Initial filesystem and environment tightly controlled
- callbacks on each access
- Constant-time start-up
- dependency detection
- Changes captured at the end of each tool
- SDL
- Virtual filesystem + chroot + execve
- Build Environments
- Libraries of SDL code to simplify build tasks
- Convention for OS components simplifies constructing chroots
- Motivation
- Related Work
DSEE / ClearCase
- Apparently no published papers, but there are a few patents
Grexmk: Speeding Up Scripted Builds
- Uses strace for monitoring, no real virtualization
- Limitations
- Virtualization only covers the filesystem
- Network access is still possible
- Tools can depend on the system clock
- Though environment variables are encapsulated, no fine-grained dependency detection is performed on them. The system assumes that every tool depends on the full contents of every environment variable.
- The virtual filesystem is limited in some ways
- No symlinks
- No file locking
- No support for block special files
- Special OS-provided virtual filesystems (/proc) aren't present
- Virtualization only covers the filesystem
- Practical experience
- Constructing a chroot is still challenging. It's not a common activity for most users. Tools may not document what they need. In many cases tool providers may not even know the full extent of the requirements. Users often resort to trial and error due to insufficient information about the requirements for running a tool. Thankfully the dependency detection system build into Vesta provides some help.
- For simple and widely tested tools (C/C++ compilers, lex, yacc, etc.) the lack of certain filesystem features (symlinks, file locking) isn't a problem. For complex EDA tools, internally developed tools, and legacy tools/scripts it's occasionally a problem. We've gotten by without these capabilities so far, so the lack of them isn't insurmountable but it occasionally requires effort to work around.
- Some compilers (Java) try to access information in OS-provided virtual filesystems (/proc)
- Tools that require a license server are a minor problem.
- We assume their results don't depend on the license, just whether they trivially fail or run normally.
- Even if the network could be virtualized, there may be contractual obligations to consider for license server issues.
- Some batch tools require an X-windows display to connect to even when they never display anything. This is probably be due to common initialization code shared between GUI and batch tools that are part of the same suite.
- There are external effects not related to the filesystem
- Competition with other processes for limited CPU and virtual memory resources can affect tool behavior and outcome. (Usually this isn't a problem unless the tool has bugs that cause non-deterministic results.)
- There are known workarounds for some holes in the encapsulation
- In some cases a common command can be replaced with a simple program that has a fixed behavior. For example, the "hostname" command might be replaced with a program that simply prints "localhost".
- On some modern OSes, a shared library can replace certain library and system calls if it is pre-loaded during process start-up. (This is usually done with the LD_PRELOAD environment variable.)
In some cases special device nodes that might be used can be replaced with something that returns a fixed value. For example, in a chroot /dev/random could be created with the major/minor device numbers that provide /dev/zero.
- Future Work
- More complete virtualization
- Current system was developed before hardware and software support for virtualization was widely available
- Holes in current virtualization could be closed
- Note that we may need to leave some holes open for practical reasons such as license servers
- Limitations of current virtualization could be removed
- Portability concerns
- Current filesystem-only virtualization system could be retained as an alternative
- Performance concerns
- Constant-time start-up
- Should refer to papers about "fast" virtual machine/environment cloning/start-up. Those may indicate that what's considered "fast" is still too slow for the design constraints of individual tools during a build.
- "The solution also allows a VM ... to be cloned within 160 seconds for the first clone and within 25 seconds for subsequent clones."
VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing
- "Results show that efficient cloning allows a VMware-based VMPlant prototype to achieve VM creation in 17 to 85 seconds."
- "...allow VMs to be instantiated, on average, in 25 to 48 seconds..."
- OTOH, while these times are too long for compiling one C/C++ file, they might be acceptable for very long-running tools.
- Some EDA tools run for 10s of minutes.
- Some tools that we've never tried to run under Vesta control can take hours or even days. This then makes the result data very precious, which could make Vesta's caching an advantage.
- Still need fine-grained dependency detection
- Perhaps fine-grained dependency detection could be extended to non-filesystem information the tool uses (e.g. detecting which environment variables it accesses)
Simplifying the building of chroots. Use the existing virtual filesystem method to pass through requests to the normal filesystem while simultaneously monitoring and building the chroot. (See CrudeToolImport.)
- Intercepting sub-processes and caching them separately
- This would make it easier to break up existing complicated build scripts into many discrete steps to provide more incrementalism
- We've experiemented with a make compatibility system. Implicit assumptions tend to become built into Makefiles. These can be difficult to deal with without executing each step in line as make normally would.
- Extending support to OSes that lack chroot
- To implement an equivalent to the current UNIX chroot method would require implementing some other method of virtualizing all filesystem accesses.
- Some commercial software exists which can do this on Microsoft Windows. There may be similar internally-developed tools at some large companies.
- More complete virtualization
References
- Vesta
- Book published by Springer-Verlag
- PLDI paper
- Related work
DSEE / ClearCase papers
- Papers on cloning VMs for grid computing