Code skimming

There are several tools out there that produce documentation based on comments in source code. They’re typically tailored to a specific programming language. For example, Java has Javadoc. Ruby has RDoc. Python has Epydoc. Almost every language has one and they’re almost always popular. From a developer’s standpoint they’re very compelling: you get to describe all of the functions, classes, methods, and things in the same source code files you work with all day. New documentation gets generated along side the latest build of the software itself. If you’re writing comments in your code anyway, it’s practically free documentation. Except for one thing: to describe the output of these tools as documentation is perhaps overstating the case.

First, let’s take a look at what actually comes out of one of these tools. Here’s a screenshot from the Epydoc output for Paramiko, a library for Python for working with Secure Shell connections:

The output from Javadoc is quite similar, while RDoc’s primary difference is putting the navigation frame along the top, rather than the left. Otherwise, all of these tools output something more or less the same: a bunch of pages which help show the various functions, classes, and methods available in package.

But now look at the source where this stuff actually comes from:

The difference here is minimal. The body of the methods, for example, are hidden in the Epydoc output (though accessible through the source code links), but we still see the method signatures and the first sentence from the docstring in the source (in other languages, an equivalent comment block would serve the same purpose).

And this brings me to my hesitation at calling the output of these tools documentation. These “documents” are handy and perhaps the highest ratio effort-to-usefulness things a package maintainer can do for her users, but they’re not so different from the source code itself.

So instead of thinking of these tools as documentation generators, I’ve come to think of Epydoc and Javadoc and RDoc as code skimmers. Instead of getting a human-organized corpus of for-human-readers-only material, what I’m really getting is the source code’s Greatest Hits. These sorta-docs contain the top of class definitions, method signatures, and comments: the things in the source code I would turn to first, before almost everything else, to try to understand how the software works. These documentation generators do an excellent job at producing materials that show the hows of software, even if they’re not accompanied by explanatory whys.

I don’t mean to cheapen the good that Javadoc and similar tools do by thinking of them as code skimmers. Rather, under such terms, the possibilities are broadened. Perhaps the descendants of these tools will serve as intelligent code viewers, interactively revealing and obscuring beautifully formatted source code for the benefit of human readers. In some respect these future code skimmers are already on their way in the form of integrated bug trackers like Trac, GitHub, and BitBucket.

(For a different—and much less generous—view of documentation generators, check out Writing great documentation: What to write by Jacob Kaplan-Moss.)

Hack Writing

Daniel Beck comments on technical writing.

Code skimming

One thought on “Code skimming”

Document your thoughts Cancel reply