8. Good documentation practice

The most important good documentation practice is to actually write some! Too many programmers omit this. But here are two good reasons to do it:

  1. Your documentation can be your design document. The best time to write it is before you type a single line of code, while you're thinking out what you want to do. You'll find that the process of describing the way you want your program to work in natural language focuses your mind on the high-level questions about what it should do and how it should work. This may save you a lot of effort later.

  2. Your documentation is an advertisement for the quality of your code. Many people take poor, scanty, or illiterate documentation for a program as a sign that the programmer is sloppy or careless of potential users' needs. Good documentation, on the other hand, conveys a message of intelligence and professionalism. If your program has to compete with other programs, better make sure your documentation is at least as good as theirs lest potential users write you off without a second look.

This HOWTO wouldn't be the place for a course on technical writing even if that were practical. So we'll focus here on the formats and tools available for composing and rendering documentation.

Though Unix and the open-source community have a long tradition of hosting powerful document-formatting tools, the plethora of different formats has meant that documentation has tended to be fragmented and difficult for users to browse or index in a coherent way. We'll summarize the uses, strengths, and weaknesses of the common documentation formats. Then we'll make some recommendations for good practice.

8.1. Documentation formats

Here are the documentation markup formats now in widespread use among open-source developers. When we speak of "presentation" markup, we mean markup that controls the document's appearance explicitly (such as a font change). When we speak of "structural" markup, we mean markup that describes the logical structure of the document (like a section break or emphasis tag.) And when we speak of "indexing", we mean the process of extracting from a collection of documents a searchable collection of topic pointers that users can employ to reliably find material of interest across the entire collection.

man pages

The most most common format, inherited from Unix, a primitive form of presentation markup. man(1) command provides a pager and a stone-age search facility. No support for images or hyperlinks or indexing. Renders to Postscript for printing fairly well. Doesn't render to HTML at all well (essentially as flat text). Tools are preinstalled on all Linux systems.

Man page format is not bad for command summaries or short reference documents intended to jog the memory of an experienced user. It starts to creak under the strain for programs with complex interfaces and many options, and collapses entirely if you need to maintain a set of documents with rich cross-references (the markup has only weak and normally unused support for hyperlinks).

HTML

Increasingly common since the Web exploded in 1993-1994. Markup is partly structural, mostly presentation. Browseable through any web browser. Good support for images and hyperlinks. Limited built-in facilities for indexing, but good indexing and search-engine technologies exist and are widely deployed. Renders to Postscript for printing pretty well. HTML tools are now universally available.

HTML is very flexible and suitable for many kinds of documentation. Actually, it's too flexible; it shares with man page format the problem that it's hard to index automatically because a lot of the markup describes presentation rather than document structure.

Texinfo

Texinfo is the documentation format used by the Free Software Foundation. It's a set of macros on top of the powerful TeX formatting engine. Mostly structural, partly presentation. Browseable through Emacs or a standalone info program. Good support for hyperlinks, none for images. Good indexing for both print and on-line forms; when you install a Texinfo document, a pointer to it is automatically added to a browsable "dir" document listing all the Texinfo documents on your system. Renders to excellent Postscript and useable HTML. Texinfo tools are preinstalled on most Linux systems, and available at the Free Software Foundation website.

Texinfo is a good design, quite usable for typesetting books as well as small on-line documents, but like HTML it's a sort of amphibian — the markup is part structural, part presentation, and the presentation part creates problems for rendering.

DocBook

DocBook is a large, elaborate markup format based on SGML (more recent versions on XML). Unlike the other formats described here it is entirely structural with no presentation markup. Excellent support for images and hyperlinks. Good support for indexing. Renders well to HTML, acceptably to Postscript for printing (quality is improving as the tools evolve). Tools and documentation are available at the DocBook website.

DocBook is excellent for large, complex documents; it was designed specifically to support technical manuals and rendering them in multiple output formats. Its main drawback is its verbosity. Fortunately, good tools and introductory-level documentation are now available; see the DocBook Demystification HOWTO for an introduction.

asciidoc

The one serious drawback of DocBook is that its markup is rather heavyweight and obtrusive. A clever recent workaround is AsciiDOC. This tool is a front end to DocBook with a much simpler and more natural input syntax. Users don't need to be aware of DocBook at all, but still get nearly the full power of those tools.

AsciiDOC (often referred to by the all-lower-case name of the formatter it ships) has been seeing very rapid uptake recently by projects which had previously moved to DocBook.

8.2. Good practice recommendations

Documentation practice has been changing since 2000, when some key open-source project groups (including the Linux kernel project, GNOME, KDE, the Free Software Foundation, and the Linux Documentation Project) agreed on an approach more web-friendly than traditional Unix's print-oriented tools. Today's best practice, since the XML-DocBook toolchain reached production status in mid-2001, is this:

  1. Maintain your document masters in either XML-DocBook or asciidoc. Even your man pages can be DocBook RefEntry documents. There is a very good HOWTO on writing manual pages that explains the sections and organization your users will expect to see.

  2. Ship the XML or asciidoc masters. Also, in case your users' systems don't have xmlto(1) (standard on all Red Hat distributions since 7.3), ship the troff sources that you get by running conversions on your masters. Your software distribution's installation procedure should install those in the normal way, but direct people to the XML/asciidoc files if they want to write documentation patches.

    It's easy to tell make(1) to keep the generated man files up to date. Just do something like this in your makefile:

    
foo.1: foo.xml
    	xmlto man foo.xml
    

    If you're using asciidoc, something like this should serve:

    
foo.1: foo.txt
    	asciidoc --backend=docbook foo.txt
    	xmlto man foo.xml
    
  3. Generate XHTML from your masters (with xmlto xhtml, or directly using asciidoc) and make it available from your project's web page, where people can browse it in order to decide whether to download your code and join your project.

For converting legacy documentation in troff formats to DocBook, check out doclifter. If you're unwilling to move from using man sources as a master format, at least try to clean them up so doclifter can lift them to XML automatically.