4.9. Microformatting shortcuts

By microformatting, I mean things like emphasising a single word. Most typsetters can set a plain, bold, italic, underlined, and mono-spaced font (but not plain text). Support for mathematics, however, is more limited.

The general solution to microformatting is to have specialised parsers. Just as the line by line parser can be extended by the user, specialised microformatting can be provided by the end user by writing a parser to further translate document source. Naturally, the weavers to be used will have to be support the constructions.

Weavers already perform some parsing. For example the Latex weaver has to translate the characters #$%^_ into the Latex macros that produce them, since they're reserved characters in standard Latex.

The biggest problem here is to specify a standard microformatting language. It is not too onerous to reserve the @ character at the beginning of a line, but how does one designate three special fonts (bold, italic and monospaced) and the scope they apply to? What about font size? For something more difficult, mathematics?

Any such language must eat up characters which can be typeset 'as is': the fewer such reserved characters, the more cluttered the source will become, whereas if more are reserved, the more likely the user is to forget to quote them properly when the character itself is required instead of 'magic'.

HTML reserves <> and uses tag pairs to do detailed markup, and uses & to allow quoting. Latex reserves #$%^&_\. Interscript reserves @ at the beginning of the line.

It is possible to do all formatting using lines. But that leads to a 'troff' like solution, which is extremely ugly. It should be possible to write normal text and have it print properly -- and for a programmer that will include setting special characters. Typesetting C code documentation in plain Latex is a pain because underscore means subscript and is an error outside maths mode: but underscore is more or less the C version of a hyphen, and more or less an alphabetic character.

The characters we can afford to reserve are those not commonly used in program documentation. There aren't any. Here's the proof by analogy: if we reserve @, for example, then in the very documentation describing the construction implemented using the @ character, the most commonly used special character will, of course, be @.

The solution I have adopted to this intransigent problem is as follows. First, all the constructions have to be provided as commands. That means that irrespective of other details, all the constructions are available, even if it is a pain to typeset them.

Secondly, we provide regular expression matching technology to extract microformatting details using some standard forms, but we will not enable it by default.

I'll call these things 'shortcuts'. For example, the first shortcut for code is an @ followed by a C identifier. An @ in any other context is typeset as an @.

Shortcuts are implemented by weavers. (The control loop never sees them). To provide typesetter independent shortcuts, we need a special kind of weaver: a filter. A filtering weaver translates shortcuts and then calls the normal weaver.

Interscript comes with a standard filtering weaver, and is equipped with a user programmable table of shortcuts based on regular expression matching. The default version of this weaver does not do any shortcuts, however. Shortcuts must be explicitly enabled by the programmer. However, there is a table of standard shortcuts prepared, and a command to enable them.