Contents
Introduction
There are two kinds of support for symlinks that Vesta users would like to have:
- Support for tools that want to create symlinks or that want symlinks in their initial filesystem
- Support for storing symlinks under version control along with files and directories
Neither of these are truly necessary, but they can be helpful.
Some complicated tools, especially ones developed in a non-Vesta context and then imported into Vesta, may expect to use symlinks. This can sometimes be removed or worked around, and people have found some creative ways to do that. It is however a burden on the person trying to make the tool work in Vesta.
Putting symlinks under version control can make it easier to store large file sets from other places. There's no need for them if you're working purely in Vesta, but that's not always the case when interfacing with other development groups.
There are two open tracker entries, one for each of these issues:
We could close one of them, because at this point we intend to implement end-to-end support for symlinks all at once.
Functionality
This section describes what functionality we plan to add.
Repository
Creating symlinks will be supported in mutable directories (i.e. checkout working copies under /vesta-work) and volatile directories (the temporary directories where tools run). (Currently, attempting to create a symlink in either place will result in an error.)
An immutable snapshot taken of a working directory with a symlink will copy the symlink into the immutable directory. The symlink itself will obviously be immutable, though no restriction will be placed on its target. Users could shoot themselves in the foot a little with this if they don't pay attention, but only through the NFS interface to the repository.
The evaluator will be allowed to provide symlinks as part of the initial filesystem for a tool.
Evaluator
In SDL, a symlink will be represented as a singleton list containing a text value of the link target. This avoids the need to introduce any new syntax or types or other significant changes.
If a tool creates a symlink, it will appear in the result binding as a singleton list. For example, suppose the variable R contains the result of a _run_tool call which created the path /foo as a symlink to bar. That would give us:
R = [ root = [ foo = < "bar" > ] ];
The same representation will be supported in ./root when calling _run_tool. For example, to create the same symlink before running a tool:
. ++= [ root/foo = <"bar"> ];
The same representation will be used if a symlink is brought in from an immutable directory in the files clause. For example, if the directory containing a model has a symlink named "foo" that points to "bar" then this files clause:
files foo;
Would be the same as this assignment:
foo = <"bar">;
When shipping the result of an evaluation, a singleton list will create a symlink in the target directory.
Some new primitive functions will be added to make dealing with symlinks a little easier.
Replication agreement
Introducing a new type of object that can exist in the appendable portion of the repository means that we need to specify when two replicas of an object agree or don't agree.
A symlink inside an immutable directory will agree with a replica as long as both have the same link target text. The link target is the only information stored for a symlink, so there's really nothing else to compare.
Appendable Directories
We won't allow the new symlink type in appendable directories. stubs with symlionk-to attributes can be used there.
This is mainly to avoid complicating the notion of replication agreement of having to modify client programs to deal with symlinks in appendable directories.
Concerns
If symlinks point outside an immutable version, it's not really immutable!
It might appear that way if you just access the files through the NFS interface. However, a symlink in an immutable directory cannot have its link target changed. So it would be immutable.
If we support versioning symlinks at all, there's a risk of users making trouble for themselves (and other users) with symlinks. We don't think it fundamentally breaks Vesta or violates its guarantees.
The interpretation of symlinks can change based on context
The same immutable version can appear in multiple places (i.e. in a vadvanced snapshot and a checked-in version). That means a relative symlink hat uses ".." could point two different places from the same immutable version. Symlinks with ".." are something users should probably avoid.
The same goes for symlinks represented as singleton lists in a binding structure in SDL. If you move a binding around, relative symlinks could point somewhere else. Absolute symlinks are probably just as dangerous in SDL as relative ones with "..".
Symlinks will be hard to deal with in SDL
Suppose one piece of SDL code puts a directory and a symlink to it into ./root:
. ++= [ root = [ foo = [ ... ], bar = <"foo"> ] ];
Suppose some other piece of SDL code (perhaps far away in another function defined in a different SDL model) tries to put something into the directory "/bar" that's really a symlink:
. ++= [ root/bar/x = ... ];
Given the semantics of the ++ operator, that will replace what used to be a symlink to a directory with a completely separate directory.
The semantics of the ++ operator can't be changed to handle symlinks. Doing so would change the meaning of existing SDL code. If users need to merge two bindings and want symlinks to be followed in a manner consistent with UNIX filesystem semantics, they'll need to use something other than ++.
The binding lookup operator (/) could also be a problem when a binding contains singleton lists representing symlinks.
We'll add new primitive functions to help with these problems. We will not change existing syntax or semantics to help with symlinks.
Code Changes
Repository
VestaSource::unused -> VestaSource::symlink
The type of a directory entry in the repository is represented by a value from the enum VestaSource::typeTag. There's only one unused value, and that will become the value used to represent a symlink.
This will require some care to make sure that the packed in-memory representation doesn't get confused with the special byte 0xff used to mark the end of a block of directory entries. (See the isEndMark member function in the VDirChangeable class.) We need to be careful to avoid setting all of the master, hasEFPTag, and sameAsBase flags. (The inUse flag is used internally during the repository's directory structure garbage collection which happens during weeding.) This can be implemented with a check in VDirChangeable::appendEntry. hasEFPTag shouldn't be needed, and since symlinks can only be deleted and replaced (not modified), sameAsBase should be avoidable too.
Link target needs a new VMemPool type
The link target is just a string. It should be stored in the repository's memory. There's not really any place to put it in the packed representation of a directory, though there is room for one additional 32-bit number (in the "value" slot used to hold the shortid for files and a VMemPool short pointer for directories).
The value field of a symlink directory entry could be a short pointer to another VMemPool block containing the link target. However, because of the way the repository's VMemPool system works, we can't simply allocate a block and stick a string in it. (Each block must have a type which is used for the repository's mark/sweep garbage collection of its memory pool during weeding. See src/VMemPool.H in the repository package.) We would have to add a new VMemPool block type to hold the link target. Since the block type would be very simple this wouldn't be too hard, but it would require registering new callbacks for the block type with VMemPool.
We don't have to store symlinks in VMemPool, but it would be easiest to do so. Storing it in some other data structure would require that we make checkpoint that data structure.
Re-use VLeaf, or add a new VestaSource sub-class?
The VLeaf class (a sub-class of VestaSource) is currently used inside the repository server to represent files (both mutable and immutable), stubs, ghosts, and devices (which are only supported in volatile directories). We could further overload it to handle symlinks as well, or we could add a new sub-class. Symlinks would require adding two new member variables which would go unused for the other types (symlink target string and timestamp), and the current VLeaf member variables will go unused for symlinks. That suggests that a new VestaSource sub-class would be a good idea.
(This only affects the server side. All VestaSource objects in repository client programs use the VDirSurrogate class.)
New VestaSource functions
These new functions will be declared virtual in the VestaSource.
VestaSource::readlink get the target string of a symlink
- Implementation needed in VDirChangeable, VDirSurrogate, and VDirEvaluator
VestaSource::insertSymlink create a symlink in a directory
- Implementation needed in VDirChangeable, VDirSurrogate
Replication
Replication code will need changes to handle symlinks in immutable directories. Specifically, ReplicateImmDirCallback in Replicate.C would need to handle the new symlink type.
NFS glue
The do_symlink code in glue.C would need to be changed to create new-style symlinks in mutable directories and volatile directories.
The any_fattr code in glue.C would need to be changed to handle new-style symlinks.
VDirEvaluator
The repository side of the network protocol for representing evaluator directories will need to be changed.
The file Evaluator_Dir_SRPC.H defines and documents the network protocol.
VestaSource member functions/variables for symlinks
What should some of the member functions of a VestaSource representing a symlink do?
shortId should return NullShortId
timestamp should return the timestamp of the enclosing directory. This is a bit of a fudge, but avoids the need to store a timestamp for each symlink.
executable should return false. (This doesn't need to have any effect on the permissions of a symlink manifested through the NFS interface.)
That should the value of the fptag member variable be for a symlink?
Evaluator
_run_tool input
The ToolDirectoryServer will need to be changed to respond with a symlink type when a path is requested that has a singleton list for its value.
_run_tool result
AddToNewStuff in PrimRunTool.C needs to be modified to handle symlinks created by the tool.
Shipping
ShipValue in VASTi.C needs to be modified to allow shipping a singleton list containing a text as a symlink.
files clause
FileEC in Expr.C will need to modified to support symlinks brought in from immutable directories.
Primitive functions
Two new SDL primitive functions:
- Like ++ but follow symlinks in a filesystem-like way
- Lookup a path following symlinks
Other Programs
vcheckagreement
The vcheckagreement utility will need to handle symlinks in immutable directories.
Possible Problems
Both server-side and client-side code which lists directories may need changes to handle the symlink directory entry type. Some functions that may need attention:
VestaSource::makeFilesImmutable
VDirChangeable::freeTree
ReposUI::changed
ReposUI::cleanup
Possible Scope Reductions
Allowing symlinks in volatile/evaluator directories is the minimum we really need to support, as that's where it's most problematic. That requires:
- VDirChangeable modifications
- NFS glue code changes
ToolDirectoryServer and VDirEvaluator modifications
PrimRunTool changes
There are some things we could get away with not adding, but they would be obvious holes:
- Don't allow symlinks in mutable/immutable directories
- No need to support them in the SDL files caluse
- No replication support needed
- Don't support shipping symlinks
- Don't add new SDL primitive functions
These don't seem like significant scope reductions. It seems like it would be better to implement symlink support all at once.
Testing
Some cases we will need to test:
- Creating a symlink in a mutable directory
- Taking a snapshot of a mutable directory containing a symlink
SDL files clause that brings in a symlink from an immutable directory
- Providing a symlink in the initial filesystem for a tool
- Tool accesses the linked-to object
- Tool accesses the symlink with lstat or readlink but never accesses the linked-to object
- Tool that creates a symlink
- Using the new primitive functions
- With symlinks in the input binding
- Without symlinks in the input binding
- Shipping a symlink
Discussion
If you have questions or comments about this, please use the /Discussion sub-page. (That way we can keep the dicussion separate from the plans.)