Assessment of Various SCM Systems

About This Document

The following comparison was written by the original Vesta developers some time ago. It compares and contrasts Vesta to RCS, CVS, and Make. Today there are many other alternatives available. Like Vesta, they each try to address some of the failings of older systems (e.g. RCS/CVS+Make). Unfortunately, the current Vesta maintainers haven't found the time to get familiar enough with these new options to provide an in-depth analysis of the pros and cons of the modern alternatives.

The comparison below is still worth reading. You might find also find Shlomi Fish' "Better SCM" comparison useful. It compares several modern versioning systems, including Vesta. (Unlike the comparison on this page, it doesn't cover building.)

Assessment of Various SCM Systems

This document attempts to assess the pros and cons of various software configuration management (SCM) systems. It is intended to be used by organizations evaluating whether or not Vesta is suitable to their needs. As with any such assessment, whether to classify an item as a pro for one system or a con of another is a subjective decision.

The core SCM domain can be roughly divided into four areas:

Versioning: Versioning is the facility for assigning version numbers to (collections of) source files.
Source Control: Source control is the mechanism by which new versions are checked out and checked in.
Configuration Management: A software artifact consists of multiple components, each of which is built from some (versions of) the sources. Configuration management is the mechanism for specifying which components go together to make the final artifact.
Building: Building is the process for turning sources into derived files.

For each of the SCM systems evaluated below, we indicate which of these four areas it addresses.

This document covers the following SCM systems:

RCS (Revision Control System)

RCS handles versioning and source control. It is often used in conjunction with Make.

Pros

simple

well-documented

Cons

no direct access to older versions

In RCS, a separate checkout step is required to access an older version of a file. Hence, older versions are not directly accessible to tools like text editors. By default, the tools check out all versions under the same name, so it is a bit awkward to examine two versions of a file at once, although a special RCS-aware version of diff (rcsdiff) is provided that can compare two versions without checking them out.

awkward tagging facilities

In RCS, the unit of check-in/-out is the file. To keep track of which versions of all files go together, the user can attach the same tag to the files. But the whole tagging facility is somewhat awkward.

CVS (Concurrent Version System)

CVS handles versioning and source control. It is often used in conjunction with Make.

Pros

well-documented

unit of check-in/-out is the module, an arbitrary directory tree

Portions of a module (such as subdirectories or groups of files) may also be checked out, but most often, modules are checked out in their entirety.

division of sources into modules is less important due to support for concurrent updates

In CVS, any developer can edit any file at any time. Hence, the division of files into modules is less important than if only one developer could be editing the files in a module at once. Some CVS operations do cause individual files to be locked for short periods of time, but for the most part developers may edit files and perform CVS operations on them freely.

support for vendor releases

Vendor releases can be checked into CVS and tagged as such. The normal differencing tools can then be used to import the release (paying careful attention to the resulting "merge" reports!).

support for remote access

CVS supports the ability for updates to be performed on a source repository from geographically remote locations.

Cons

no direct access to older versions

As in RCS, a separate checkout step is required to access older source versions.

conflict detection is simple-minded

Because CVS allows multiple developers to modify the same file concurrently, it must detect conflicting edits to the same file. However, the algorithm it uses is based on diff3, a line-based tool.

semantic conflicts may go undetected

If CVS does not detect a conflict, it silently merges the changes made by multiple developers. As a result, some true conflicts may go undetected.

conflicts can be difficult to resolve when they occur

This problem is just a fact of life when multiple developers can edit files at will within the same module. CVS proponents claim that true conflicts are rare, but they definitely do occur, and when they occur, the cost of both detecting and correcting them is quite high.

check-in/-out not atomic with respect to each other

If user A is checking in a module at the same time that user B is checking out that module, user B may get some older file versions and some of the newer ones that A is checking in. Moreover, CVS provides no indication to B that he has checked out an inconsistent version of the module, so B has no way of knowing to repeat the checkout operation. These points are described in the on-line CVS documentation.

check-in/-out can be slow

Because check-in and check-out copy files between a user's local work area and the central repository, the time required for check-in/-out is proportional to the number of files that must be copied. In typical usage, only a portion of a module must be copied. However, there are occasions when whole modules must be copied; in those cases, check-in and check-out can be time-consuming operations.

tagging a module can be slow

When a module is tagged, say for a release, each of the files in the module must be updated. This can be a slow operation.

Make

Make handles configuration management (without explicit version numbers) and building. It is often used in conjunction with RCS or CVS.

Pros

widely used

simple syntax (although somewhat cryptic)

easy to use

can be adapted to tasks other than building software

For example, Make can be used to "build" documents by invoking LaTeX and dvips.

Cons

scales poorly

Ignoring the time spent outside of Make proper running the actual build tools, the time required by Make to do a build is proportional to the size of the software being built, not to the amount of incremental work that must be done.

dependencies must be specified explicitly

Dependencies of objects on the sources from which they were built must be listed in the Makefile. This is a completely manual process. Tools like makedepend(1) help, but makedepend suffers from several problems: it is slow, it works only for C/C++ sources, it does not detect all dependencies (e.g., dependencies on tools or other non-C/C++ files read during the build), and it must still be run by hand. Because it is slow, developers tend not to run makedepend as often as they should. As a result, the potential for producing an inconsistent build is increased.

many dependencies are inexpressible or simply too costly to express

An example of an inexpressible dependency is a dependency on the value of an environment variable read during the build. An example of a dependency that is too costly to express is a dependency on the Makefile itself; this dependency is usually omitted because it would cause all derived files to appear stale any time even an inconsequential edit was made to the Makefile. Because the dependency is omitted, developers must often delete a subset of their derived files to force a recompilation whenever they change the Makefile instructions to build those files. This process is quite error-prone.

missing/incorrect dependencies can lead to inconsistent builds

An inconsistent build results when one component of a software system was compiled against one version of a source, and another component was compiled against a different version of that source. The resulting software system may fail to operate or, even worse, it may run but have mysterious bugs. Because Make's dependency mechanism has the problems listed above, the only guaranteed way to produce a consistent build using Make is to do it from scratch.

Make's language is too limited

The downside of using a simple language is that all of Make's operations are quite low-level. The Make language does not include any facility for defining more abstract, high-level building operations. To make up for this deficit, people have written tools like smake and imake, but they are awkward to use and require an extra processing step.

little or no integration with versioning tools

Some variants of Make are integrated with RCS to the extent that Make can be made to check out the latest version of a source file if it does not exist in the source directory. But version numbers are still completely absent from most Makefiles. Hence, configuration management must be done by hand: to build some arbitrary configuration, the user must manually check out all the right versions of the sources before invoking make on them.

dependency analysis based on timestamps is problematic

Make's test that a derived file has become stale is based on the last-modified times of the derived file and all of the sources contributing to it. This is problematic, especially when building from older source versions. After checking out older source versions, the timestamps of the sources will most likely precede that of the derived file, so Make will think the derived file is up-to-date! The developer's only recourse in that situation is to manually delete the derived files and perform a scratch build.

Vesta

Vesta handles versioning, source control, configuration management (with explicit version numbers), and building.

Pros

repeatability

Any piece of software that has been built before can be built identically again.

automatic dependency detection - no more makedepend!

Vesta detects all dependencies automatically, including dependencies on build tools and the build instructions themselves. It also detects dependencies on the non-existence of files, an important property.

guaranteed consistency

In principle, every Vesta build is done from scratch. In practice, Vesta's caching technology is used to make all builds incremental. But because Vesta detects all dependencies automatically, the software artifacts produced by Vesta are guaranteed to be consistent.

builder is integrated with the version control system

In Vesta, the build instructions name particular versions of all the sources that contribute to the build. Hence, in contrast to Make, Vesta does true configuration management. Because sources are versioned at the granularity of packages, and because most file references are within a package (in which case no version number need be specified), specifying version numbers in Vesta is not unduly burdensome.

better performance than Make

Vesta outperforms Make, especially on incremental builds. The larger the software you are building, the better Vesta looks in comparison to Make. The main reason for Vesta's advantage is that the time spent in the Vesta builder is proportional to the incremental amount of building work required, not to the size of the software being built, as in Make.

designed to scale to large software

The entire Vesta system was designed with an eye toward scalability, both in the size of the software to be built and in the number of developers it can support. The system was designed to build systems containing 20 million lines of code, but it still works quite well on even modest-sized programs.

flexible system modeling language

In Vesta, the build instructions take the form of a program written in the Vesta system modeling language, a functional programming language. Hence, it is easy to define new functions representing high-level build operations. The build description for a large, complex system can thus be made much simpler and more maintainable by splitting it into many small, parameterized, reusable modules.

good support for parallel development by multiple developers

In Vesta, each developer has control over when he sees new versions produced by other developers. This property allows each developer to work productively in isolation, without being hampered by untimely changes made by others. Vesta also makes it easy for a developer to see how his changes to a local component affect the build of the entire system, again without affecting the progress of other developers.

site-wide cache means developers benefit from each others' builds

Vesta's single site-wide cache "remembers" the build work done by every developer. Since the cache is shared by all developers at the same site, they can benefit from each others' builds. Moreover, the work done by a developer during a checkout session can be re-used during a complete release build after the checkout session is over -- so release builds are often very quick.

customized builds

The flexibility of the Vesta system modeling language allows the "bridges" to the construction tools to be highly parameterized, thereby allowing a wide variety of customized builds to be supported. Example customizations include overriding which package versions are used in a build and overriding the command-line switches used to build the entire program, a library, or even a single file.

derived files are managed automatically

Derived files produced during the course of a build are managed automatically by Vesta. The final results of the build can easily be copied out of Vesta into a standard file system.

multi-target builds

Because Vesta manages derived files automatically, building for multiple target platforms is easier. The target platform is one of the parameters to the build process.

direct access to older versions

Older source versions can be accessed directly by all standard tools through a filesystem interface; no separate checkout step is required to access older source versions as in RCS and CVS.

repository attributes

The repository allows arbitrary name-value pairs to be associated with package versions and other directories. By default, the repository tools tag directories with attributes such as the time and date of check-out/-in, the person performing the operation, the previous version on which the changes were based, and change log messages. Attributes could also be used to tag package versions with, for example, quality assurance labels. Attributes can be read and set from the command-line using the vattrib(1) program.

fast check-in/-out

Check-in and check-out in Vesta are nearly instantaneous, regardless of the number of files in the package. The Vesta repository is able to achieve this speed because it also manages the user's mutable copy of the package, using copy-on-write techniques; the repository physically copies a file only when it is modified.

sophisticated support for code sharing among multiple sites

Vesta includes several features that enable groups at geographically distributed sites to do shared development. There is a flexible tool for replicating sources between repositories. When a user at site A wishes to check out a package whose master copy is at site B, the checkout tool automatically contacts the remote repository to request permission for the checkout. When the user checks in a new version, it is automatically replicated back to the master site by the checkin tool. There is also another tool for transferring mastership of individual packages or entire hierarchies to another repository.

Cons

not widely used

Vesta has spent the majority of its life as an internal research project in Digital/Compaq. Until 2001, it was only used by its developers and one significant Compaq-internal customer (the Alpha microprocessor group).

Though Vesta is now available as free software, it gets the most use at Intel (which acquired the Alpha microprocessor group in 2001). It is not as widely used as some other alternatives.

no guaranteed support

Vesta is the result of a years-long research project. It is not sold by a commercial vendor, nor is it bundled with commercial system software, so its continued support cannot be guaranteed. On the other hand, with the release of Vesta as free software, a self-supporting community of users and co-developers has the chance to grow, and even in the worst case users have access to the Vesta source code and can maintain it themselves.

user guide needed

A user guide is needed to instruct users on the Vesta methodology (see next point). The set of documentation available continues to grow, but remains a little limited on the introductory end of the spectrum. The documentation avilable includes:

Installation and setup instructions
A tutorial that leads the user through creating a package and building a simple program
Complete man pages for all of the Vesta tools, as well as a summary of how to use common Vesta commands
Both a programmer's guide to the system modeling language and a more precise specification of its syntax and semantics
A detailed research report describing the entire system

This is perhaps less of a con than it was when this comparison was first written, but Vesta could still benefit from a more user-oriented overview of the entire system and its methodology.

training is required to learn the Vesta methodology

The Vesta methodology of building against immutable sources using complete build descriptions is rather different from what people are used to, so some training will be required for people to learn the Vesta tools and the rudiments of Vesta's system modeling language. As for the language, we expect that only a small number of people at a site will have to learn the language in any detail; most developers' system description files will take the form of highly stylized templates, and with a little investment can be manipulated solely through scripts or custom GUIs. Hence, most Vesta users should require very little understanding of the language.

Limited platform support; porting is not striaghtforward

As of this writing, Vesta runs on the following platforms:
- Linux with at least a 2.4 kernel on the following CPU architectures:
- Compaq Tru64 UNIX on Alpha, versions 4.0D through 4.0G. (It has not been tested on 5.0 or 5.1.)
We attempted to make Vesta as portable as possible, so we believe it will not be difficult to port Vesta to other variants of UNIX. However, porting is complicated by the fact that Vesta is built using itself. (No make instructions for building it exist.) For the ports to Alpha and IA-32 Linux, we used a boot-strapping approach with a Tru64 system "hosting" the port. (We plan on documenting this porting methodology in the near future.)
requires a small amount of administration to do weeding

Although Vesta manages derived files automatically, a program called the weeder must be run occasionally to delete unwanted derived files when the disk gets full. The frequency with which the weeder needs to be run depends on how quickly the backing disk fills up. Here are two data points:

Using a relatively small (4GB) disk and with three people doing active development, we found it necessary to run the weeder once every two weeks or so. On that size disk, weeding takes 10-15 minutes.
Another site used a much larger disk (100GB) with 100-150 active developer. They found it necessary to run the weeder about once a week, and the weeder would run for more than an hour.

Weeding can be automated using cron(8), and Vesta builds can still be performed while weeding is in progress; thus the weeder really imposes little administrative burden, and it has no noticeable impact on the average user.

few high-level tools written for "querying" the source repository

We have written relatively few high-level tools for performing queries on the source repository, such as, for example, asking which files have been checked in over the last 24 hours. We expect that such tools would be relatively easy to write. Such tools might also exploit the user-defined attribute values described above.

locking is at the granularity of packages

As opposed to RCS, locking in Vesta is at the granularity of collections of files called packages. This approach has its advantages, but is unfamiliar to most people. When two developers need to change files in the same package concurrently, at least one must create a branch in the version number sequence. This can become burdensome if it needs to be done often, though it can usually be avoided by a careful choice (and readjustment if necessary) of package boundaries.

no way to force developers to build against the latest version

One consequence of Vesta's support for multiple developers is that there is no way to actually force all developers to build against the latest version of some package. In the Vesta group, we simply announce the availability of new package versions on a central bboard, and each project member then invokes the vupdate(1) tool to pick up the latest version when he is ready for it. Stronger means of encouraging developers to run vupdate could easily be added, but we believe it would negate one of Vesta's greatest strengths if developers could be forced to take new versions before they were ready to accept them.

poor support for "incremental" tools

A tool is incremental if it reads and overwrites a derived file that it produced during a previous build. For example, the Unix ar tool can be invoked with switches that cause it to incrementally rewrite a library archive. Incremental tools run counter to the entire Vesta approach: Vesta can guarantee that all builds are repeatable only because the instructions for building an artifact say how that artifact should be built from scratch. But incremental tools require that the inputs to a build step are based on the results of a previous build. Hence, Vesta does not support incremental tools very well.

This is not to say that one cannot use tools in an incremental mode under Vesta, just that it is inconvenient and sometimes inefficient to do so.

Back to the Vesta home page
Last modified: Thu Jan 29 13:46:07 EST 2004