Contents
Other References
vcheckout Dissection
A description of the implmentation of vcheckout circa October 2004.
Auto-generated Code Cross-reference
Thanks to KlausElmquist we now have a cross-reference of the code (generated with Doxygen).
Repository
Base Chains
Cache Server
/vesta/vestasys.org/vesta/cache/50
Background
Concepts
Cache Index (CI): It’s a 32-bit number which uniquely identifies every cache entry. First cache index that gets allocated when CS starts is 0. Indexes that are deleted during weeding reallocated later starting with the lowest available one.
PKFile: The on-disk representation of all cache entries for one Primary Key (PK). PK is a 128-bit number.
MultiPKFile: To avoid having 2^128 files in cache, CS groups PKfiles into MultiPKFiles. Each MultiPKFile contains PKFiles with the same first 16 bits in their PK.
Epoch: It's a 32-bit number. It’s like a version of PKFile. It is used for logging/recovery book-keeping.
GraphLog: Record written by CS and is used primary by the Weeder. It’s the dependency graph of all cache entries and derived files, plus their relationship to a specific builds. Each cache entry with its CI, child CIs and referenced derived files is recorded in GraphLog. Each build performed (even partial and failing builds) has records for the cache entries used by that build in the GraphLog. GraphLog root is a mapping from a specific build (specifically a model shortid and immutable directory fingerprint) to one or more CIs.
Free Variable:
Example: In expression x + 1; x is a free variable and dependency on it will be N/x = FP(int(x)).
Definition context:
Example: x = 2; x + 1; Here x is not a free variable, since it has a determined value, so no dependency will be recorded, x = 2 is a definition context.
Lease: Since Weeder is allowed to work in parallel with ongoing evaluation Cache Server should make sure that currently used cache entries don't get deleted. Cache Server contains a set of currently used CIs – Leased CIs. Whenever evaluator gets Hit on cache entry or just added an entry to the Cache, that CI is added to the set of leased CIs. There is a timeout for leased CIs.
Related Code
/vesta/vestasys.org/vesta/config – access to vesta.cfg.
/vesta/vestasys.org/vesta/srpc – Simple Remote Procedure Call.
/vesta/vestasys.org/vesta/log – atomic logging and recovery (writes .log and .ckp files).
/vesta/vestasys.org/vesta/fp – fingerprinting. Fingerprint is a deterministic hash function which was designed to have a high degree of randomness.
Useful links
General
sizeof_assert function was added when vesta was ported to different machines (see SizeAssert.H SizeAssert.C)
AtomSeq – Sequence of Atom objects:
AtomicFile – for request to open a file it creates temporary file with the same name extended by ;<some number>. All writes are performed to that temporary file. Once that’s done it does sync and replaces original file with the temporary one.
common
Creates leaf library libVestaCacheCommon.a
- Used by Cache Server for command line parsing.
Defines CacheArgs::StartsWith function.
- Holds cache configuration settings and initializes it.
Cache configuration setting should be initialized in particular order.
CacheConfig.H:31-48 variables to contain values read from vesta.cfg file
CacheConfig.H:54-67 variables to contain computed configuration values.
Code for initializing configuration variables uses ReadConfig.
- Local wrappers around Vesta config library.
All functions have empty throw() statements, so they should not fail, instead in a case of VestaConfig::failure they call DieWithError function which prints out error message and exits.
- Class Indices contains array of CIs and also knows how to send/receive it over the network.
Class IndicesApp – list that can be appended to.
Note the similarity to Derived.[HC]
- Defines enums used by cache clients, functions to convert those to human-readable strings.
LookupOutcome used internally to identify outcome of lookup (newHits – not written, warmHits – recently written)
DebugLevel – cdebug command line option. Defines what client of CS prints out when sending to/receiving from CS.
Data structures for current state of CS. Used by V!CacheMonitor.
MethodCnts, EntryState probably need to be defined as 64-bit.
- Debug output utility functions.
- Lock(), Unlock() for output mutex.
- Timestamp() returns current time.
Types for derived files and lists of derived files. Similar to CacheIndex, only Index is defined as ShortId.
- Skip() takes filestream with pos to beginning, reads length() of it without reading the contents and repositions.
- This class is used for copying a value out of one file and into another. Used by CS when re-writing the on-disk storage of cache entries (PKFiles).
- Holds position in stream, where it starts at and the length of it. The only function here is Copy(), which takes input stream, seeks to start position and copies length.
Usage example: ReadImmutable()
typedef for ShortId.
- Type for epoch of a PKFile. typedef for int.
PKPrefix.H PKPrefix.C (Note: don’t confuse with PrefixTbl)
- Takes PK and turns it into 16-bit prefix. Also defines type for a list of prefixes.
- Granularity() = 16 bits.
- Pathname() – takes PKPrefix and turns it into filename for MultiPKFile.
Hash() – defined because PKPrefix is used as Key to generics SharedTable.
- PKPrefix::List is used during weeding for deletion PKfiles.
- Reading/writing text strings. Text is a null terminated string.
- PKfile has source function - SDL source code for that PK.
Note: length limitation (example: Write() – run_tool call with command line bigger then 16 bits).
- Utility class for counting time going by.
- Was used for recording some statistic. Example: how long it takes to do lookup calls.
- Class for cache lookup timing statistic. Currently unused.
- Class for representing sets of integers clustered close to 0. Big array of bytes.
NextAvail() – index of least significant not allocated bit.
- Cardinality() – total number of bits which are set in that vector.
- Size() – everything higher then that guarantied to be unset.
- MSB() – returns -1 if there are no bits set.
- numWords() – total number of allocated Words.
PrintAll() – Weeder output: init CIs – current CIs that are allocated when Weeder starts.
operator<= (bv1 set = bv2 set, also bv2 may have bits set which are not set in bv1)
operator- mathematical set difference (bv1 & !bv2)
Cache contains BitVector of CIs. It is good to manipulate it (bit algebra). PKFiles also use BitVector to know which dependencies are common and which are uncommon.
- Prefix encoding representation of sets of paths with common prefixes. Used to store free variables.
- Used by CompactFV.
Storage optimization for storing and sending over the network sets of common and uncommon dependencies. Point of PrefixTbl is to compress the data.
- Contains 2 arrays: index array and arc array.
- Putting stuff into the table is more complicated then reading it out. It takes additional space to compute, but once you’re done you can throw it away.
Defined as TextIntTbl.
- Example:
B/x/y -> 0 (index in prefix table)
i
Arc
Index
0
y
1
1
x
2
2
B
<END>
B/x/z -> 3
i
Arc
Index
0
y
1
1
x
2
2
B
<END>
3
z
1
B/Q -> 4
B/R -> 5
i
Arc
Index
0
y
1
1
x
2
2
B
<END>
3
z
1
4
Q
2
5
R
2
Maximal number of entries in PrefixTbl is 2^16. If we are going to change that we will need to also change network protocol and make cache to understand both the previous version and the new one.
A compact representation of sets of free variables. Built on top of the PrefixTbl class. Used for storing FreeVars and sending them over the network.
- Some types of dependency:
- N – normal (entire value)
- ! – existence (!/b/n if n exists in b)
CompactFV = PrefixTbl + 2 arrays:
- Idx: indexes to prefix table
- Types: array for characters representing dependency types.
- In CompactFV Cache has a little understanding of dependency structure, so for a given i
- types[i] gives type of that dependency
idx[i] gives its index in the PrefixTbl
- CompactFV::List constructor - takes FV::List and for each name in that list
- extracts dependency type and writes it to the types[i].
pull name and put it into PrefixTbl, remembering index to the table in idx[i].
ToFVList() – Expanding all this to a full list. Getting full string for index i. Loops over CompactFV entries and for each calls PrefixTbl::GetString, writes it to 3rd character, reserving first two for type. Once full string is created it gets appended to free variables list.
- Representation of list of free variables. Primarily used internally by the Cache Server.
- FV::Epoch – incremented every time set of free variables gets changed.
- FV::T - subclass of Text, defines send/receive functions.
- FV::List - array of names. Basically it's a list of strings. Those strings do include type characters.
FV::ListApp - implements capability to append to the list.
Pack() - drop some names from the list based on BitVector it takes as an argument. BitVesctorIter is used to loop over set of bits, starting from i = 0 and ix which is next bit out of the BitVectorIter. ix >= i. Holes in BitVector are entries to be deleted. If hole in BitVector is reached then name[ix] is stored in index i.
- Representation of list of free variables. Primarily used by the evaluator.
Evaluator uses this class only when it wants to add entry. That's why there is no receive function defined. For FreeVariables call it uses CompactFV which has both functions.
FV2::T - sequence of strings. Subclass of AtomSeq.
ToText() - glues text strings together with delimiter.
FromStr() - finds delimiters in the string and splits it.
Send() - defined in private, must be called by FV2::List. Calls ToStr() to create a string. In a case of an arc that contains delimiter it constructs an error message and throws exception.
- FV2::List
- Send() - This method is only called to send the free variables when adding entry. Evaluator sends FV2::List and Cache receives it as FV::List.
- Classes for entries in the cache's graph log.
- It's in the common library because Cache writes it and Weeder reads it.
GraphLog is a sequence of two types of entries - Roots and Nodes.
Entry - super class, never gets instantiated. Contains kind to differentiate between Roots and Nodes.
Debug() - used by PrintGraphLog
DebugFull() - used by PrintGraphLog -v
Node - represents addition of new Cache Entry (CE). Contains CE index, PK, model (ShortId of SDL code where this function is defined), kids (array of CEs representing functions called by this function), refs (ShortIds of all files which are in the result of this function).
refs make GraphLog go big. For example if SDL function returns 1000 files in a binding then refs will be 1000 * sizeof(ShortId).
Root - Contains FP for pkgVersion in which that top model exists, model (ShortId), timestamp (when root was created), cis, done (purely for weeder to use in order to figure out what to keep).
When Weeder has an instruction to keep specific build it computes FP of the directory where that model exists, then matches it and model's sid with the ones of each GraphLog Root. Once match is found weeder knows what CIs to keep.
- done = true if build has completed.
- Class representing a cached value, which gets returned to the evaluator in a case of Cache Hit.
Uses Derived::Indices, PrefixTbl
- FP - is not manipulated by Cache, but by Evaluator. Next time Evaluator needs that FP it would not compute it again.
- bytes - serialized representation of all data structures representing binding, lists, etc.
- len - number of bytes.
Note: a cache entry includes more then this.
- Debug code that is used by test programs. Generates pseudo-random cache entries.
- Build instructions for the test programs.
TestBitVector.C - test for BitVector class. Self-contained program
TestCompactFV.C - test for ComapctFV class. Requires input file with specific info.
TestPKPrefix.C - test for PKPrefix class.
TestPrefixTbl.C - test for PrefixTbl class.
TestTimer.C - test for Timer.
TestSRPC.C - test of SRPC library functionality and Debug.[HC]
It would be good to convert them all to be self-contained programs.
client
Creates leaf library libVestaCacheClient.a
Class used for the remote procedure calls the evaluator makes: FreeVariables, Lookup, AddEntry, Checkpoint, RenewLeases
Evaluator instantiates instance of CacheC and uses it to contact Cache Server. When it instantiated we pass it DebugLevel which corresponds to the –cdebug evaluator command line option.
FreeVariables() - First thing evaluator does when it tries to perform cache lookup.
Evaluator computes PK and sends it to the CS.
- For user-defined function PK contains function body, variables
- For run-tool – command line, arguments, environment variables
Cache Server gets PK, finds PKFile, finds all dependencies in Cache entries in that PKFile and returns them in CompactFV::List format.
If there are no dependencies isEmpty is set to true.
FV::Epoch is returned along with the list of names and isEmpty boolean. It represents "version" of of secondary dependencies for this PK. Each time secondary dependency for this PK is added or removed Epoch gets updated.
- Lookup()
Evaluator
- gets names from CS returned as result of FV call.
Computes FPs of values in current context and creates a FP::List which is sent to CS along with Epoch returned by FV call and PK.
Cache Server returns LookupRes:
Hit: sets CI and VestaVal (serialized bits) arguments. For other results those variables are not changed.
FVMismatch is returned if Epoch passed to the Lookup call is older than the current. (That could happen if dependency was added by another eval thread or removed during weeding between those FV and Lookup calls). In that case Evaluator should again call FV, compute FP and make Lookup call. Potentially a problem of no continues progress.
BadLookupArgs is returned if number of FP is not equal to the len of names.
AddEntry() is called when Lookup results in Miss.
Evaluator passes
- PK
list of names in 2 arrays: types and FV2::List. Lengths of those arrays should be equal. Only dependencies that used to get the function result are passed to AddEntry.
FP::List - fingerprints of dependencies values
VestaVal - result of the function. AddEntry is called once the result is alredy computed
- model - sid where function is defined
- kids – other functions that the function called. (Those kids should be leased)
- sourceFunc – text denoting where this function is defined
Cache Server
- Returns CI in a case of success.
BadArgsParams if length of FP list is not equal to names length.
NoLease – if one of the kids CIs is not leased (should never happen).
RenewLeases()
Evaluator passes all CI it currently uses. There is Evaluator thread that wakes up once in a while and calls RenewLeases.
Cache Server should not return false in normal process.
Example: Added entries for G(): CI = 5, H(): CI = 7; Entry for f() is not added yet (because there might be for example a runtool that runs long time). RenewLeases is called and CI = 5 and CI = 7 are sent to the Cache.
f() { G(); H(); ... return; }
Checkpoint() creates roots in GraphLog. Arguments:
- FP of the package where the top model exists
- nodel - sid of that top model
- cis- indices it uses
- done - boolean which indicates if evaluation ended.
- MultiSRPC *conns - keeps track of open connections, makes it possible to multiple threads to contact the server.
- FP::Tag server_instance – random sequence of bits that is created when server started. Leases only stored in-memory, so if server restarts we loose leases info and all evaluations that contacted previous cache instance are stopped.
Class used for the remote procedure calls the weeder makes: WeederRecovering, StartMark, SetHitFilter, GetLeases, ResumeLeaseExp, EndMark, CommitChkpt
WeederRecovering() – returns true if there is another weeder in progress. Weeder should abort if that’s the case.
StartMark() – causes Cache to start new GraphLog and stop expire leases. Returns BitVector of all CIs that exist at that point of time. Then Weeder decides what it wants to delete.
SetHitFilter() – sets another BitVector of all CIs Weeder wants to delete, so Cache Server will return Miss on those entries.
GetLeases() – returns a BitVector with leased indexes, those should not be deleted.
ResumeLeases() – telling Cache to start expire leases.
EndMark() – passes Cache BitVector of indices to be deleted and !MultiPKFiles where those CIs exist.
CommitChkpt() – writes checkpoint file. Should be done by Cache.
- Interface intended for partitioning the cache across multiple servers. Used by CacheC and WeederC, but not really fleshed out.
Intention behind ParCache is to make possible to partition the Cache across multiple servers based on PK. But Cache Server has never been proven as a bottleneck, as opposite to repository. Cache only stores index of meta info. It has a lot of complexity, but it does not use as much CPU resources. So all efforts were put optimizing repository.
- Interface used by utility/test client programs.
Note: not currently in the client library, but it probably should be.
Uses CacheState.
FlushAll() - forces new cache entries to be written out to MultiPKFiles.
Builds the test programs TestCache and TestCacheRandom
Both need improvement.
- Program for focused tests of the cache server.
- Takes an input file that specifies a sequence of operations to perform. (See the tests sub-directory for test input files.)
Note: only works against an empty cache. It would be nice to extend it to work against a non-empty cache.
TestCacheRandom.C TestCacheRandom.1.mtex
- Program for performing pseudo-random cache lookups and entry additions.
- Can produce a much higher rate of transactions that normal, but devolves into lookup-only pretty quickly.
Code compiled into TestCacheRandom but apparently unused?
Builds the utility client programs ChkptCache, FlushCache, VCacheMonitor
ChkptCache.C ChkptCache.1.mtex
- Calls the cache Checkpoint method, forcing a flush of all logs and adding a fake graph log root
FlushCache.C FlushCache.1.mtex
Calls the cache FlushAll method, forcing all new cache entries to be written out to MultiPKFiles.
VCacheMonitor.C VCacheMonitor.1.mtex
- The cache server monitor program
server
Creates leaf library libVestaCacheServer.a
CacheConfigServer.H CacheConfigServer.C
- configuration settings that are specific for Cache Server
Config_MaxRunning – maximal number of threads that may talk to CS simultaneously
Config_MaxBlocked – maximal number of threads that are waiting to talk to CS.
- Class implementing a mapping from integers to integers.
- Used in cache entry class to map from the entry's FP list index to the PKFile's free variable list index
Combine::XorFPTag: class for combining multiple fingerprints, specifically the ucommon subset of the FPs for a cache entry to form an uncommon fingerprint
CE::T: Class representing a single cache entry
CE::List: Class representing a list of cache entries
- Implementing as a linked list (in LISP-ish cons-cell style)
The list of names is not present, those are kept in PKFile level. List of FPs in Cache Entry might be in completely different order then the list of names in PKFile. IntIntTbl performs mapping between entry's FP list index to the names list index.
- uncommonNames – set of names which this CE depends on, but not all CEs for given PK.
- uncommonTag – FPs of uncommon names, combined together.
- Class representing a stable (i.e. on-disk) PKFile
- Base class for VPKFile
- PKEpoch – incremented every time the file gets rewritten.
- namesEpoch – incremented every time set of names changes.
CPK!EntryMap – table mapping from common FP to the CEs list for that given FP.
- Helper code for reading/writing the on-disk representation of a PKFile
- Class representing a volatile (i.e. in memory) PKFile
- Derived from SPKFile
- every VPKFile has a mutex that provide an exclusive access to it.
NameMap – when evaluator wants to add an entry it sends list of names and FPs to CS. Some of those names may already exist for that PK. So there is a mapping from name to the index in allNames list.
- newCommon – new entries that were added recently, but are not on disk yet and depend on all common names. Cache updates set of Common Names when it writes them on disk, it does not do it in memory.
- newUncommon – entries that do not depend on all common names.
freePK!FileEpoch – keeps track how recently this PKFile was used by lookup or some other operation. If in some point it was not touched for some time the VPKFile gets removed from memory.
- evicted – if it has been decided that the file should be removed from memory, but there is a thread that just accepted pointer to that VPKFile, but did not do any operation on it.
- Each time PKfile gets rewritten the epoch is incremented.
- CS also remembers epoch of every PKFile that was deleted for a case of its recreation. In some cases thought it may be recreated with epoch 0.
PK is computed by Eval, CI is assigned by CS. The only way to go from CI to PK is to look in GraphLog, only GraphLog contain that information. Maybe we should create an utility to go over GraphLog and find PK for specific CI.
VestaLogSeq – GraphLog is a sequence of one .cpk file and one or more .log files.
graphLogSeq.Next() - returns RecoveryReader which is .cpk or .log file.
ReadGraphLog – gets RecoveryReader, reads it and creates right types of objects by calling GraphLog::Entry::Recover. It also increments Nodes and Roots counters accordingly.
- Data structure used while re-writing on-disk PKFiles. This allows the in-memory VPKFile to continue to be updated by clients while the file is being re-written.
- Also used after the on-disk PKFile has been updated to bring VPKFile into sync with the stable PKFile
Created with SPKFile::CheckPoint and VPKFile::CheckPoint
Used by SPKFile::Update and VPKFile::Update
- Class used for logging the allocation/freeing of CIs
- Class used to allow multiple threads to synchronize the operation of flushing an in-memory data structure to a log file on disk
PKFile re-write process
- VPKFileChkPt class
SPKFile::CheckPoint and VPKFile::CheckPoint: create a VPKFileChkPt
SPKFile::Update
SPKFile::ScanCommonTable and SPKFile::ScanList: determines:
- set of names in new PKFile
- set of common names in new PKFile
- how many entries are present in the new PKFile
- whether any are deleted
FV::ListApp::Pack: used to remove names which are no longer in the PKFile
SPKFile::UpdateCommonTable and SPKFile::UpdateList:
- update entries according to changes in set of names, common/uncommon name status
- places entires into CPF groups (as they may have changed)
- drops deleted entries
SPKFile::ExtractEntries: gathers all entries from all (old) CFP groups (used by SPKFile::UpdateCommonTable)
SPKFile::AddEntries: places new entries from VPKFileChkPt into CFP groups
VPKFile::Update: after updating the on-disk PKFile, updates the in-memory PKFile
VPKFile::DeleteCheckpointedEntries: removes entries from newCommon and newUncommon that were in the checkpoint used when writing the PKFile
VPKFile::RecordCIsFromTbl, VPKFile::RecordCIsFromList: Gather CIs of entries to be kept in memory
VPKFile::DeleteOldEntries: removes all entries from oldEntries. (This needs to be done because they need to be re-sorted into their new CFP groups.)
VPKFile::CopyByCIs: copies cache entries from the SPKFile's oldEntries into the VPKFile's oldEntries based on the set of CIs of cache entries to be kept in memory previously gathered
VPKFile::MoveCommonToUncommon, VPKFile::MoveCommonListToUncommon: moves any entries left in newCommon (just those added since the checkpoint was taken) into newUncommon. (This is neccessary, as we're about to change the common names based on exCommonNames and exUncommonNames.)
VPKFile::UpdateUncommonList: update all the entries in newUncommon based on packMask and reMap, and re-distribute them into newCommon and newUncommon based on the new set of common names.
VPKFile::CycleDeletedNamesInList: re-add any FV names used by new entries which were deleted in the stable PKFile
Cache Logs
For recording new entries:
CI log: logs allocation of cache indecies
.ckp file contains the full usedCIs BitVector
.log file contains a sequence of Intvl::T objects (see Intvl.H)
Graph log: logs parent/child relationship between newly added cache entries
.ckp and .log files have the same format: a sequence of GraphLog::Node objects
Cache log: logs newly added cache entries not yet written to on-disk PKFiles
.ckp and .log files have the same format: a sequence of CacheLog::Entry objects
The above three logs must always be written in exactly the order shown above! Because the data is stored in separate logs, we must write them out in this order to avoid inconsistencies if the server restarts. An inconsistency would be:
- A client getting a hit on a cache entry with no graph log entry (the weeder could delete it or its referenced files)
- A client getting a hit on a cache entry with an unallocated CI (the same CI could wind up used by another cache entry, and also cause problems during weeding)
- A graph log entry with an unallocated CI (also causing weeding problems)
Because of the order there are a couple possibilities of a newly added cache entry not being completely written to disk:
- If a server failure happens immediately, nothing might make it to disk at all (not even the CI allocation)
- The CI allocation might make be logged, but the graph log entry might not
- The CI allocation and graph log entry might be logged, but not the cache entry
- The CI allocation, graph log entry, and cache log entry might make it to disk, but the PKFile may not yet be re-written.
How is log order flushing enforced?
CacheS::FlushCacheLog calls CacheS::FlushGraphLog
CacheS::FlushGraphLog calls CacheS::FlushUsedCIs
RunToolServer
/vesta/vestasys.org/vesta/run_tool/37
Client Library
Creates leaf library libVestaRunTool.a
RunToolClient.H RunToolClient.C RunToolClientPrivate.H
Defines client side interface used by the evaluator and RunToolProbe
RunTool::do_it is the function for invoking a tool
RunTool::get_info is the function for finding out about a RunToolServer to match against a platform definition and determine how busy/idle it is.
The class Tool_Relay is used internally for relaying standard ouput/error back to the caller. It uses a separate TCP connection and a separate thread for each relayed stream.
Server
Startup code (main function)
LimService bits to receive network calls
RunToolDaemon.H RunToolDaemon.C
- Main server implementation
children class records active and pending tools
- handles waiting for an available slot
- handles waiting for tool completion
req_data class represents a single tool call
- records arguments
- has additional member variables for state
Constants used for interacting with the tool_launcher program
Cross-platform code for getting the load average. Lifted from xload and adapted for use in the RunToolServer
tool_launcher
- Small program to perform the chroot system call and exec the tool.
- chroot requires root priviledges, so the tool_launcher binary must be setuid root
- Written as a separate program to minimize the amount of code which executes setuid root
RunToolProbe
Small program for calling RunTool::get_info and printing the result to standard output.