Vesta supports arbitrarily large packages, but traditionally packages are sized such that typically only one person is actively changing a package at a time. So, for projects which have lots of simultaneous developers this manifests as lots of small packages. Projects which haver fewer simultaneous developers (like the Vesta source itself), packages have tended to be larger granularity.
The reason that Vesta has this usage model is at least partially due to the fact that Vesta has had some limitations which encourage one to use smaller packages. For example, until recently Vesta had no built-in utilities to aid users in merging--Ken's vmerge2.py script is an example of a utility that can close the merging gap. Below (in section 2) I discuss a couple more utilities that I am working on to aid in support for big packages.
Another reason that Vesta traditionally uses smaller packages is that it better fits the build-time configuration mindset that is one of Vesta's key characteristics. One can simply select the versions of packages in an SDL model to get the configuration that he or she wants (so long as all the mixing & matching is a package granularity). The larger the package, the more likely a user will need to go "out-of-band" by manually copying files in order to mix and match versions of files.
Why big packages?
The obvious reason to use big packages is that it is more user-friendly for the tastes of some people, at least for the common case of linear development. A user types a single command to get a copy of the entire development tree, edits whatever files they want to edit, and then types a single command to merge the result into the next version of the development tree.
A less obvious reason (but ultimately more important) is that any refactoring (removing/moving/changing directory structure) is *much* easier within a package than between packages in Vesta. This isn't necessarily a knock on Vesta--but for some users refactoring is such a critical function that it needs a graceful solution.
If there were anything worth addressing in a hypothetical Vesta-3, I'd say that it is the tension between wanting small packages (to allow graceful mixing & matching of source files) and wanting big packages (to better support refactoring). That's just one man's opinion, though.
Finally, another really good reason to support big packages is that it directly emphasizes one of Vesta's strengths, its copy on write virtual filesystem. Users of CVS or Perforce (among others) are used to having to wait for long periods of time to sync their client view to the top of tree because their revision control systems rely on physically copying the bits in order to give their users a mutable version of the repository. For some reason these users aren't too impressed by Vesta's navigable immutable filesystem (probably because it requires cd'ing back and forth between that and the mutable version)--but they are really impressed by its ability to fetch a mutable user 'view' of the repository instantly.
How do we support big packages?
These are the features that I think Vesta needs to better support this model. I'm hoping to get some feedback from the experts here on these plans as I go about implementing them:
vmerge : I think Ken's vmerge2.py script is mostly what we need here. One problem is that (for very large packages), the overhead of the UNIX 'cp' command is too high (it's used in cases where we need to freshen a file or directory without merging). Based on some IRC chat with Ken, my plan here is to (a) SWIGify the insertFile & insertImmutableDirectory calls from VestaSource, (b) change the 'cp' part of vmerge2.py to use a quicker insertFile/insertImmutableDirectory to link in an existing source file.
(KenSchalk) Note that there are a couple major limitations to be aware of with the current implementation of Vesta and vmerge2.py:
Merging across rename operations is not supported. (In other words, if Alice edits file foo and Bob renames file foo to bar and they merger their changes, Alice's changes will not be merged into the file bar.) Fixing this requires recording additional information when renames are performed and is a substantial change to the way versions are stored. For more on this see MergingFuture/Food4Thought/RenameTracking.
Vesta records history as a tree rather than a DAG. In other words, it records the version each new version is based on but it doesn't record merge operations. This means that there is limited information to help a merging algorithm determine what has happened in the past. This can reduce the effectiveness of a merge algorithm in complex histories. vmerge2.py has a useful property called Convergence which helps handle previous merges, but problems can still come up such as needing to manually resolve a conflict previously resolved in an earlier merge.
(KenSchalk) For a lot more on merging, see MergingFuture
- vdiff : I think we need a diff utility that also is more efficient at comparing very large directory trees. I think the diff-tree.py utility (ScottV?) under /vesta/beta.vestasys.org/vesta/extras/swig/13/examples/python will be sufficient. I'm planning on just cleaning this up a bit with (a) smarts about guessing that you want to diff against the newest [checked-in] version of the current package if not otherwise specified, and (b) the ability to run UNIX 'diff' between any files it determines are different.
(KenSchalk) See the RFE vdiff command
(BrannonBatson) Thanks, that has some good info. The critical feature of diff-tree.py is that it doesn't recurse into identical directories (like vmerge2.py) by essentially comparing fingerprints.
(KenSchalk) See also DiffIssues
- vsubmit : One problem is that the big package development model cannot tolerate any human-scale time lag between a vcheckout and a vcheckin. Instead, I'm thinking we'll have a vsubmit utility that:
- do a vdiff against the original version of this package to determine what has changed
- see if any of those files are different in the newest checked-in version (if so then exit with a msg that the user needs to merge)
(KenSchalk) I would argue that the user should be forced to merge if there have been any changes since the version which their checkout was based upon. While it's tempting to think that if parallel changes are made to different files that it's safe to combine them, they can have semantic conflicts in the way that they're used. (Imagine one change that removes an unused declaration from foo.h and a parallel change that adds the first use of the declaration in foo.c.) If users are forced to create the combined change first there's at least a better chance they will review and test them before checking in to the main-line version.
(BrannonBatson) I see what you're getting at, but I disagree. There's always potential for people to change different files in a way that creates a bug. Suppose the define was in global.h in a different package altogether--we'd still have a bug. Forcing this pre-checkin reconciliation at package granularity seems pretty arbitrary to me because you get different levels of protection (and reconciliation hassle) depending on your packaging granularity. In my experience, it's nice to have the low-level revision control tools enforce reconciliation (i.e., running a merge command) at the file granularity. There can always be some meta-mechanism (enforced by tools or methodology) to help catch inter-file conflicts.
(KenSchalk) Well I do tend to be an advocate of reading the diff before every checkin after checking that it builds and runs and always keeping the latest main-line version functional. Maybe that's too dogmatic for some tastes, so it could be a matter of policy. If you're willing to accept the possibility of parallel changes to different files with semantic conflicts getting checked in to the package main line, you could go further use the merge algorithm inside vmerge2.py to check whether parallel edits to the same file merge cleanly (i.e. result in zero conflicts) and automatically check in the merged version in that case.
vcheckout <big_pkg>
- repeat step b to cover the race case, if there was a conflict then vcheckin -c 0 to revert the checkout and exit with the message about needing to merge.
(KenSchalk) I'd suggest avoiding the race case altogether. Rather than using vcheckout's default behavior of reserving the next version in line, it might be better to use vcheckout's -n flag to explicitly request the next version after the one that the change is based upon. Suppose you have a change in a non-exclusive checkout version foo/checkout/12.jsmith_example.com.3/4. This would be based on foo/12, so when checking out you could pass "-n 13". If the checkout succeeds, no intervening changes have been made so it's safe to promote the edits into a new main-line version. If the checkout fails because another user got foo/13 first, the user could be offered to merge their changes into a new non-exclusive checkout based on the latest version.
(BrannonBatson) That's an interesting idea, and worth further consideration. My initial reaction, though, is that with the entire development tree in single big package then it would frequently be the case that a checkin fails just because some completely unrelated files were modified.
copy the changed files (from step a) into the new <big_pkg> working directory, vadvance & vcheckin
TBD.