Executive Summary
- CVS is much worse than modern alternatives
- Switching SCMs is worth it
- git is a great alternative
- CVS is old, inactive and no longer popular
- No brief status overview
- Moving or renaming files disjoins history
- Most recent commit cannot be easily amended
- Semantic grouping of changes is not recorded
- Merging is limited
- Experimentation is discouraged; check-ins become large
- No way to quickly stash uncommitted changes
- Cannot commit less than an entire file
- CVS is unsafe and uncertain
Preamble
This article is a proposal to adopt a source code management system (SCM) better than CVS. The article aims not only to demonstrate the superiority of alternative SCMs, but also to show that the benefits afforded by the alternatives are non-negligible and outweigh the one-time costs of repository conversion and personnel training.
The target audience is twofold: people who decide what SCM to use for a project, team, department or company; and those who work for or with these people, and who would like to propose an alternative to CVS.
This article challenges the statements “CVS is good enough” and “we don’t have any problems with CVS”. It answers the question “Is CVS the best tool for the job?” with an emphatic “No!”
Staying with CVS
The pros of continuing with CVS are:
- workflow and status quo remain the same
- CVS is mature
- CVS is well-documented
The cons of changing SCMs are:
- team needs to be trained to use the new SCM
- existing repositories need to be converted
Flaws of CVS
Following are some of the flaws of CVS which modern SCMs correct or mitigate. It is worth noting that this list is not even comprehensive; there are further aspects of CVS which have come under criticism.
CVS is old, inactive and no longer popular
CVS was first publicly released in 1986. It is based on RCS, which was created in the early 1980s. At the time of this writing (July 2008), CVS had only one release since the previous one two years prior.
In contrast, Subversion was created in 2000, and has seen 15 releases in the last two years alone. Mercurial, first announced in 2005, has had 7 releases in the last two years. Git, created in 2005, has output over 60 releases in the last two years!
This means that CVS is much less likely to incorporate innovations which can improve source control in areas like storage efficiency, speed and user productivity. The core design flaws and missing features which CVS users have complained about for a long time have gone unrectified for years.
Also noteworthy is the fact that much of the writing about version control on the Internet right now (2008) takes for granted that Subversion is the de facto standard centralized version control system used around the world. The articles and blog posts write about Subversion, or write about converting from Subversion to another SCM.
No brief status overview
CVS does not offer a way to get a brief listing of dirty files in the working copy. “cvs update” can be used, but this, of course, also updates the working copy, and this is not always desirable when only a status overview is wanted. CVS provides a “cvs status” command, but this is too verbose to be useful, and people either use “cvs update” in lieu of “cvs status”, or they write wrapper scripts that grep out single lines containing pertinent information. Moreover, “cvs status” by default gives the status of all files, not just dirty ones.
Moving or renaming files disjoins history
Moving or renaming a file with CVS must be done by removing the file then adding it back. This results in broken history for that file’s content: The commit history up to the change remains with the file’s old name or position; history for the new name or position can only be traced as far back as the change. Furthermore, there is no inherent connection made between the two lines of history. To effect this, the committing developer must manually make note of the rename or move in the commit message. This should not be the SCM user’s responsibility.
There are ways to move or copy the history as well, but the CVS user or administrator must resort to hackery.
Most recent commit cannot be easily amended
If a developer makes a CVS commit with an error, such as an incomplete or imperfect commit message, or an incomplete or incorrect file set, there is no readily available means to correct this. The bad commit is permanent, and the developer must resort to such workarounds as a follow-up commit which clarifies or completes the previous one.
Semantic grouping of changes is not recorded
Suppose a developer implements a feature by updating three different files. CVS allows users to commit the three changes at the same time, but no link is made between those three files. Someone who attempts to review the commit log will encounter only one change, and will likely remain unaware that the feature’s implementation entailed more than what he sees in that commit message and revision diff. Worse still, if this developer wants to revert the feature, he is likely to be misled by the picture that CVS seems to present, and would revert only part of the feature’s implementation.
Suppose that the reverting developer is somehow aware of the full nature of the change, particularly that it involves three files. He must manually determine what file revision numbers he needs for all three files (especially if there were any changes committed after this feature was checked in), or use the date when the change happened. He cannot simply refer to some specific id number that corresponds to the semantic change set.
Merging is limited
CVS adequately merges in the basic case: bringing the modifications made in a branch back into the trunk. But merging from a branch more than once is troublesome.
CVS does not record or track merges beyond what the CVS user types in the merge commit message. There is thus no way to determine the details of a complex history of merges, or to see if some changes have already been merged to some other branch or the trunk. The CVS manual informs us that attempting to merge changes which have already been merged can have undesirable side effects. To avoid this with CVS requires careful bookkeeping, a burden which CVS transfers onto its users.
Experimentation is discouraged; check-ins become large
Because branching and merging are limited in power and become cumbersome to use in most non-trivial situations, oftentimes development teams become discouraged from branching. This has a few significant ramifications. Less branching means more work being done directly in the trunk; but development teams are often advised or forced to keep the trunk clean or in a working state. As a result, developers commit less often and in big chunks in order to ensure that the trunk always has functional code. Not only does this increase the risk of loss of work, but it dilutes the point of having source code management, which is to track and preserve source code and its changes. Larger changesets are harder to work with, and harder for people to follow, and smaller (but still meaningful) changes become lost or obscured.
Developers also become discouraged from experimenting freely, since they often or always work only in the trunk. This stifles innovation — improvements, features or bug fixes which might otherwise be investigated and implemented instead remain unexplored.
No way to quickly stash uncommitted changes
CVS does not provide any facility to save uncommitted changes temporarily, so as to return the working copy to a clean state. Such a feature is useful when, while work is being done on feature X, a change in circumstances suddenly requires feature Y to be finished first, and the two features deal with code in the same files.
The CVS user must work around this deficiency in CVS by choosing from unsavoury alternatives such as: abandoning his work on feature X; checking out a whole new additional working copy to work on feature Y; manually creating patches for the files dirtied by feature X, then removing and checking out clean copies of the files, and when feature Y is complete, manually applying the patches.
Cannot commit less than an entire file
The smallest thing that can be committed with CVS is a whole file. In real world development, however, it is desirable to check in just some of the changes in a file (or files). Consider the tangled working copy problem. If work is being done on topic X, and then, whether deliberately or inadvertently, work commences on topic Y in one of the same files dirtied by topic X, that file becomes “tangled”. That is, in the very same file there are uncommitted changes dealing with topic X, and other changes dealing with topic Y.
CVS does not provide any mechanism to assist in solving the problem of checking in only the changes belonging to just one of the topics. The developer must manually disentangle the two topics in the one file, manually store one changeset somehow, check in the other changeset, then manually restore the saved changeset onto the file. This might entail manual patch creation and application, or even manual copying and pasting with an editor. This problem obviously becomes even more hair-raising when more than one file is involved.
CVS is unsafe and uncertain
Commits in CVS are not atomic: if there is a problem during a commit (such as a network disconnection, or a server power failure), the commit is partial or broken.
If the central repository of a project becomes corrupt (such as by hard drive breakdown, or malicious tampering), the entire project is compromised; every developer is affected.
If the repository becomes damaged due to such mishaps, it may continue to appear to function. Any given file checked out may or may not reflect the actual changes made to date; there are no verification mechanisms within CVS.
The Alternatives
Numerous alternative SCMs are available. Here are some of the more popular ones:
Each of these resolves most or all of the problems described in this article. For a comprehensive comparison, refer to Wikipedia and the comparison tool of the Version Control Blog.
Of the available alternatives, this author recommends git.
How git solves all of the above
As mentioned, git development is extremely active (at the time of this writing), and its popularity continues to grow. Some well-known projects that use git: the Linux kernel, x.org, Ruby on Rails, WINE, Scriptaculous, Prototype.
Moving and renaming files is a non-issue in git.
git provides a status command that gives a concise, yet very informative overview of the current state of the working copy. It lists such things as which files have been modified, added and removed, and which branch is currently being worked with.
git permits the adjustment of the most recent commit. Either the commit message or the changeset (or both) can be amended.
git tracks content, not files, and a given individual commit is comprised of a set of changes. As such, git preserves and communicates the full meaning and grouping of a changeset into a semantic unit. When a commit message says “implemented feature X”, it is absolutely certain what exact changes were enacted throughout the entire repository for this feature, no matter how many files were involved. Anyone reviewing the commit history later will be able to know this without any prior knowledge or participation in the project.
git branching and merging are rapid and easy — so much easier than CVS that workflow improves significantly by means of frequent branching and merging. In the git community, there is a phenomenon called the “topic branch”, which refers to a branch made (quickly and easily) for the sole purpose of developing on a topic (new feature, bug fix, code refactor). git facilitates keeping topics self-contained and compartmentalized, distinct from one another. Switching between branches is as easy as switching directories in a filesystem. git merging is as simple and straightforward as it ought to be, and multiple merges between any combination of branches and trunk are handled smoothly.
Having easily-created topic branches allows developers to feel at liberty to experiment and explore development without adversely impacting other developers. Frequent check-ins become possible because there is little worry about breaking a main trunk of development; only a topic branch gets broken — and only a local copy, at that.
git provides a stash command which stashes away any uncommitted changes to the working copy, returning it to a clean state. Just as easily, stashed changes can be applied back to the working copy.
git can facilitate “hunkwise” commit; that is, committing only some of the changes made to a file. The tangled working copy problem is no problem at all when using git.
All git commits are not only atomic, but also checksummed. It can be known with great certainty that the data that went in is exactly the data coming out. Any data corruption along the line (whether by accident or due to malicious intent) is detected.
Conclusion
The decision to stay with or drop CVS hinges on a comparison between the daily cost of wrestling with CVS’s flaws and the cost of switching to a new version control system.
In the absence of someone already experienced with an alternative SCM, the time it takes for a team to get acquainted with a new system could be as high as several days of usage in the case of an SCM as different from CVS as git, or as little as half an hour in the case of Subversion, which was designed to be as similar to CVS as possible while reducing or eliminating its problems.
However, a skilled development team can be trained on a new SCM in under an hour by an experienced user, and can be guided for the first few days of usage.
On the other hand, CVS can cripple, stifle and hinder a development team day in and day out, to degrees ranging from annoying to outright prohibitive. Consider the developer-hours lost to even a single instance of detangling a tangled working copy; or decomposing multiple semantically distinct changes within a single file for clean check in; or resolving a large batch of merge conflicts because a particular branch veered widely from the trunk. In contrast, the single hour (or less) of training time seems insignificant.
The gains afforded by a modern SCM can empower a development team with smoother workflow and increased productivity week after week.
If CVS does not cost a team more than an hour of troubleshooting per year, then it is not worth switching to a new SCM. But if a development team must deal with and workaround CVS’s deficiencies with any regularity, it only makes sense to abandon it and begin reaping the benefits of a superior alternative.
Share This
Comment by Pistos — October 17, 2008 @ 07:22
Correction: I’ve since learned that, as an alternative to a custom script, you can use “cvs -q -n update” to get a brief status list. -n tells CVS not to do actually perform an update.