Contents
The Problem
Vesta treats files like any normal UNIX filesystem. They are all the same, just a sequence of bytes to be stored. Vesta does not keep any meta-data about the format of the contents of files.
Merge tools need to differentiate between different kinds of files to decide how to merge parallel changes made to them. Usually there are at least two different ways to merge files, and thus two categories of files as far as a merge tool is concerned:
- Text files
Files that are made up of multiple newline-separated lines of text that are edited by a human with a text editor to modify, add, or remove lines. Source code in various programming languages are the most obvious example of this kind of file. (Note that some file formats like PostScript or XML may appear to be text files, but if they're automatically generated by a program rather than being manually edited they probably shouldn't be merged like a text file.)
- Binary files
- Files that are aren't made up of text, can't be reasonably divided into units based on newlines, or aren't manually edited by humans with a text editor. Compiled executables, shared objects, library archives, graphic images, and compressed archives are obvious examples.
A merge tool might have other categories corresponding to other merge algorithms. (The diffxml and patchxml tools provide one possible example.)
Approaches
There are several ways a merge tool can choose how to treat different files, with varying levels of user involvement:
Explicit specification. The user could explicitly declare "file X is text, file Y is binary"
Guess from filename. A user or administrator could configure a set of filename patterns and corresponding file types. In other words, they would be saying "files matching *.c should be merged as text, files matching *.gz should be merged as binary"
Guess from contents. An algorithm for inspecting a file's contents could be used to decide what format the file is. The standard file(1) command is one example of such an approach, but users might also want to supply their own way of determining file type from file contents (e.g. a regular expression or other pattern to be matched against some byte range of the file).
For example, CVS and Subversion assume that files are text files unless explicitly told that they are non-text files. Some other version control tools simply don't support non-text files.
Vesta Merge Tools
See MergingFuture/DevPlan/NonTextFiles for discussion of how this issue is handled in the Vesta merge prototype vmerge.