Controlling Cache Behavior

Introduction

The way bridges and other models are written affects the performance of evaluating them as well as the overall performance of the cache server. The bridge author should be particularly concerned with cache behavior, because the behavior of a bridge affects the performance of every evaluation invoking it.

There are two factors of cache performance to consider:

The total number of cache lookups. Each cache lookup consumes network bandwidth and incurs the latency of a round trip to the cache, as well as putting a computation load on the central cache server. Reducing the number of cache lookups will reduce load on the cache server and the network, but may increase the computational load on the machine running the evaluator.
Efficiency of individual lookups. Computing dependencies and searching for matching cache entries is a computation load. These operations are always guaranteed to be computed correctly, but the model writer can use a few techniques to shift the computation load from the central cache server to the the evaluator (thus reducing the load on shared resources, and improving performance for concurrently running builds across the entire site).

Techniques for reducing total cache lookups and making individual lookups more efficient are detailed here along with rules of thumb on when to apply them.

Background

The best source for this information is the Vesta research report. Specifically, chapter 6 describes the cache itself, and most of chapter 7 (sections 7.1 through 7.5) describe how the evaluator determines dependencies and interacts with the cache. However, a brief summary is given here. (If you find this to be too much, the evaluator man page includes an even more terse description.)

Cache entries correspond to individual function evaluations. These may be user-defined functions such as those provided by bridges (./Cxx/program), entire model evaluations such as those which represent a complete build (/vesta/vestasys.org/vesta/release/12/linux_i386.main.ves), or primitive functions provided by the evaluator (such as _run_tool). Cache entries are uniquely identified by their dependencies. Those dependencies which can be determined statically are called primary dependencies. For example, when compiling a source file, we know that the result will always depend upon the contents of that source file. Any dependencies which can only be determined dynamically are called secondary dependencies. When compiling a C/C++ source file, exactly which header files the compilation depends upon can only be determined by observing which ones are actually read by the compiler.

All primary dependencies are combined into a single number known as the primary key or just pk. The set of secondary dependencies is often referred to as the secondary key. In order to determine whether there is a cache entry corresponding to a particular function call, the cache server locates all entries with the same primary key, and then searches over them for a match with the secondary key of the function call to be evaluated. (You can think of this as a lookup in a hash table with buckets where all entries with the same primary key fall into the same bucket.) If a match is found, then the result value stored in the cache is used. Otherwise, the evaluator proceeds to evaluate the function. The cache server organizes cache entries such that locating the group of entries with a particular primary key is very fast. Once these are found, searching over them for a secondary dependency match essentially takes O(n) time.

Principle #1: Put more in the PK

As described above, the initial lookup based on a function evaluation's primary key is fast, and searching for a secondary dependency match is comparatively slow. This leads us to the primary principle of efficient cache lookups:

The fewer entries per primary key the better.

In the absence of hints from the model writer, the primary key for user-defined functions is based on the body of the function and the values of all simple parameters (booleans, integers, and text strings). Composite parameters (bindings, lists, and function closures) become part of the secondary key by default. The model writer can cause composite parameters to be included in the primary key by placing /**pk**/ immediately before the parameter in the function definition. Since the model writer can control what becomes part of the primary key and what is used to compute the secondary dependencies, this implies an obvious rule of thumb for the model writer:

The more function inputs included in the primary key the better.

However, care must be taken to avoid false cache misses. For example, if one were to include all header files in the primary key of a C/C++ compilation, you would have cache entries which are too coarse grained. A cache miss would result on a change to any header file, even if it was not included by the source file being compiled. Thus, a more complete rule of thumb would be:

Mark all parameters on which a function's result must depend with the /**pk**/ pragma. Don't use /**pk**/ to mark parameters which may be only partially used by a function.

One example of this principle is the function ./Cxx/program, which is used to compile and link a C++ program:

program(
  name: text,                   // name to give to resulting program
  /**pk**/ code: NamedFiles,    // source code files to compile
  /**pk**/ headers: NamedFiles, // local header files needed by "code"
  libs: LibPkgResults,          // libraries to build and link with "code"
  ovs: EnvOvs = [ ],            // overrides for compiling & linking "code"
  env_ovs: EnvOvs = [ ],        // overrides when building "libs" & "code"
  lib_ovs: NamedLibOvs = [ ]    // named library overrides
): NamedFile
{
  // ...
};

Every file in the binding code will be compiled, so we know that the result value will depend upon all of them. Marking it with /**pk**/ allows the evaluator to take advantage of this fact by including them in the computation of the primary key. Whenever any of these files change, the primary key changes, which makes it unlikely that the second stage search for a cache hit will need to search very many cache entries. By contrast, the libs parameter is left as part of the secondary key, as the source files may only include some of the header files they contain.

The astute observer might take issue with the /**pk**/ in front of headers. Strictly speaking, it would be possible for this to cause false cache misses if one of these headers was not included by any of the source files in code. However, since headers represents local (non-library) header files, most people would agree that this is a degenerate case. (Why not simply remove unused header files?) In normal use, you would only provide headers actually used by the code being compiled. The bridge author has made a tradeoff here by improving cache performance for the common case but allowing the possibility of false cache misses in an unlikely (and arguably unimportant) case.

Principle #2: Don't cache simple functions

By default, all user-defined functions are cached. However, the model writer can simply turn off caching for individual functions by immediately preceding their definition with /**nocache**/. Doing this cannot affect the correctness of an evaluation, but it does trade off cache lookups against local evaluation. It's also worth noting that the primitive _run_tool function is always cached, so this can only affect the steps leading up to and following individual tool invocations.

Often, small pieces of computation which do not include any tool invocations or other function calls are abstracted into functions. Usually, the total time it takes to evaluate such functions locally is less than the time to perform a cache lookup. Therefore, we have the following rule of thumb:

Consider using /**nocache**/ on functions which make no tool invocations or other function calls.

For example, many of the functions provided in ./generic are such pure computation functions. Consider ./generic/reverse_list:

/**nocache**/
reverse_list(l: list): list
/* Return the list "l" in reverse order. */
{
    return
      if _length(l) <= 1 then l else {
          res = < >;
          foreach v in l do
              res = < v > + res;
          value res;
      }
};

It doesn't perform any tool invocations or evaluate any functions (except _length, which is so simple doesn't really count). Making it /**nocache**/ avoids the latency of a cache lookup, which, unless the list being reversed is extremely long, would almost certainly take more time.

Sometimes, the cost is still less that a cache lookup even if the function does call other functions, including itself. Also, if one or a small number of functions which are cached are called, then the time cost of turning off caching is just the work that is done before and after those calls. On the other hand, if a function calls many functions which are cached, it probably should be cached also, because a single cache lookup on the enclosing function will be faster than many cache lookups on the ones called. This gives us another rule:

Consider using /**nocache**/ on functions which make a small number of function calls, but not on those which make a large number.

A good example of this is functions which compile single files such as ./Cxx/expert/compile:

/**nocache**/
compile(/**pk**/ src: NamedFile): DerivedFile
/* Compile a single C++ source file into an object file. "src" specifies
   the source file. The result is the compiled object file. In the
   event that an error occurs during compilation, the result is ERR. */
{
    // form command-line argument (a list of texts)
    cmd =
      <"/usr/bin/cxx">                                // executable cxx(1)
      + <"-c">                                        // standard option
      + <"-I", "-I/usr/include"> // only look in /usr/include
      + ./generic/binding_values(./Cxx/switches/compile)
      + <_n(src)>;                                    // source file
        // add library headers and local sources to "."
        . ++= [root/usr/include = ./temp/includes];       // library headers
        . ++= [root/.WD = src];                           // working directory

    // invoke the compiler
    r = _run_tool(build_platform(), cmd);
    outfile = object_name(_n(src), ./Cxx/switches/compile);
    return if r == ERR || r/code != 0 || r/signal != 0 then ERR else
      // in event of success, get output file from ".WD"
      ( [ $outfile = r/root/.WD/$outfile ] +
        // also include any objects placed in ".WD/cxx_repository"
        (if r/root/.WD!cxx_repository
           then [ cxx_repository = r/root/.WD/cxx_repository ]
           else [ ] ));
};

It really does just a small amount of setup for an inner tool invocation. Since the tool invocation is always cached and the work done before and after is relatively small, it's faster to not cache this kind of function.

Principle #3: Use models as cut off points for sub-components

The evaluation of models (.ves files) is slightly different than the evaluation of user-defined functions. The key difference is that model cache hits tend to be faster, which comes at the expense of model cache misses which are slower. This is achieved by having two cache entries per model: one which is coarse grained (by including the entire immutable directory in which the model is stored in the primary key, so the secondary dependencies are limited to the implicitly passed value of the special variable ".") and one which is not (in which imported files and models become secondary dependencies). (For the full details, read section 7.5 "Caching Model Evaluations and Dependencies" in the Vesta research report.)

This feature is infrequently used, but helpful in certain situations. Specifically, it can be used to cut off the propagation of dependencies from monolithic sub-components. For example, a tool which is compiled under Vesta and later invoked by a bridge, or a library built from source under Vesta. In both of these cases, the sub-component (the tool, the library) has many secondary dependencies (the source files imported to be compiled to construct it). However, a build which makes use of them might not otherwise have the same secondary dependencies. The propagation of such local secondary dependencies stops at model evaluations, so placing such builds inside separate models will tend to reduce the number of secondary dependencies of the build using them. This will make evaluating cache lookups of such builds faster. Thus, we have this rule:

Place builds of independent sub-components in models rather than user-defined functions.

The following models are examples of this technique:

/vesta/vestasys.org/bridges/lim/1/src/prog.ves
/vesta/vestasys.org/basics/os/7/src/lib.ves

Summary

The cache behavior of bridges and models in general has a significant impact on the latency of performing builds across an entire site sharing a cache server. Following a few simple principles can improve the overall performance of Vesta builds.

The two most important goals in optimizing the cache behavior of models are:

Reducing the total number of cache lookups (by using the /**nocache**/ pragma)
Making individual lookups more efficient (by using the /**pk**/ pragma and building independent sub-components in models)

These can be accomplished by applying a few simple rules:

Mark all parameters on which a function's result must depend with the /**pk**/ pragma. Don't use /**pk**/ to mark parameters which may be only partially used by a function.
Consider using /**nocache**/ on functions which make no tool invocations or other function calls. Also consider using /**nocache**/ on functions which make a small number of function calls, but not on those which make a large number.
Place builds of independent sub-components in models rather than user-defined functions.

Kenneth C. Schalk <ken@xorian.net> / Vesta SDL Programmer's Reference