Difficulty: ★★☆ (medium)
Keywords: index, automatically insert indices, consistency

Problem

You want to add index entries (using the indexterm element) automatically and consistently into your document.

Solution

To see how the automatic addition works, the following procedure demonstrate this for the element envar.

Procedure 3.1. Adding indexterm Elements to envar
  1. Use the element envar in your document as usual. By default all envar elements get an indexterm. In cases you do not want this, add the attribute condition with its value noindex to suppress indexterm generation. This is done in the second para element:

    Example 3.15. profile-envar.xml
    <article  version="5.0"
      xmlns="http://docbook.org/ns/docbook">
      <title>Profiling Test</title>
      <para>Environment variable <envar>XML_CATALOG_FILES</envar></para>
      <para>Environment variable <envar condition="noindex">FOO</envar></para>
      <para>Environment variable <envar os="windows">Path</envar><envar
        os="linux">PATH</envar></para>
    </article>
  2. Create a stylesheet profile-tags.xsl with the following content:

    Example 3.16. profile-tag.xsl
    <xsl:stylesheet version="1.0"
      xmlns:d="http://docbook.org/ns/docbook"
      xmlns="http://docbook.org/ns/docbook"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      
    <xsl:param name="preferred">pref</xsl:param>
    
    <xsl:template name="check.index">
      <xsl:param name="node" select="."/>
      <xsl:param name="default" select="1"/>
        
        <xsl:choose>
          <xsl:when test="$node/@condition = 'noindex'">0</xsl:when>
          <xsl:when test="$node/@condition = 'index'">1</xsl:when>    
          <xsl:otherwise><xsl:value-of select="$default"/></xsl:otherwise>
        </xsl:choose>  
    </xsl:template>
    
    <xsl:template match="d:footnote|d:title|d:indexterm" mode="profile">
      <!-- Indexterms doesn't/shouldn't occur in the descendants of
           these elements so just copy it -->
      <xsl:copy-of select="."/>
    </xsl:template>
    
    <xsl:template match="d:envar" mode="profile">
        <xsl:variable name="do.index">
          <xsl:call-template name="check.index"/>
        </xsl:variable>
    
        <!-- Copy original element -->
        <xsl:copy-of select="."/>
      
        <xsl:if test="$do.index != 0">
          <indexterm>
            <primary><xsl:value-of select="."/></primary>
          </indexterm>
          <indexterm>
            <xsl:if test="contains(@conformance, $preferred)">
               <xsl:attribute name="significance">preferred</xsl:attribute>
            </xsl:if>
            <primary>environment variables</primary>
            <secondary><xsl:value-of select="."/></secondary>
          </indexterm>
        </xsl:if>
    </xsl:template>
    </xsl:stylesheet>
  3. Create the stylesheet add-indexterms.xsl. This stylesheet is based on profile.xsl of the DocBook XSL stylesheets. It contains the special mode profile to process elements to observe profiling conditions.

    Example 3.17. add-indexterms.xsl
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE xsl:stylesheet
    [
      <!ENTITY db "https://cdn.docbook.org/release/xsl/current"> 
    ]>
    <xsl:stylesheet version="1.0"
      xmlns:d="http://docbook.org/ns/docbook"
      xmlns="http://docbook.org/ns/docbook"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    
    <xsl:import href="&db;/profiling/profile.xsl"/>
    <xsl:output indent="yes" method="xml"/>
    <xsl:include href="profile-tags.xsl"/>
    
    </xsl:stylesheet>
  4. Transform your document:

    xsltproc add-indexterms.xsl profile-envar.xml

After applying the stylesheet add-indexterms.xsl you will get the following output:

Example 3.18. Output of the Transformation
<article xmlns="http://docbook.org/ns/docbook" version="5.0">
  <title>Profiling Test</title>
  <para>Environment variable <envar>XML_CATALOG_FILES</envar><indexterm>
    <primary>XML_CATALOG_FILES</primary>
  </indexterm><indexterm>
    <primary>environment variables</primary>
    <secondary>XML_CATALOG_FILES</secondary>
    </indexterm></para>
  <para>Environment variable <envar condition="noindex">FOO</envar></para>
  <para>Environment variable <envar os="windows"
        >Path</envar><indexterm><primary>Path</primary></indexterm><indexterm><primary>environment
        variables</primary><secondary>Path</secondary></indexterm><envar
      os="linux"
      >PATH</envar><indexterm>
        <primary>PATH</primary>
      </indexterm><indexterm>
        <primary>environment variables</primary>
        <secondary>PATH</secondary>
      </indexterm></para>
</article>

Discussion

Let´s go back at the beginning first. Assume you want to show an environment variable in the index. Usually you would mark up the text with the envar element and add an indexterm right after the first one. As is is useful to find the index term also under the primary term “environment variables”, you add an additional indexterm. This could look like this:

<para>Use the <envar>PATH</envar><indexterm>
        <primary>PATH</primary>
      </indexterm><indexterm>
        <primary>environment variables</primary>
        <secondary>PATH</secondary>
      </indexterm>
      to do ...
</para>

Although this is the usual method, it has some drawbacks:

  • It is hard to read. If you are get used to read the bare XML code, it is hard to read as the text is broken into pieces. The text is cluttered with indexterm elements all along.

  • It may be inconsistent. If you forgot the “s” in the primary index term it will lead to double entries (one singular and one plural form). This lead to inconsistencies. It can be painful if you have to go through the complete document just to fix the singular form into the plural form (or vice versa).

  • Whitespace could matter. The indexterm element(s) start directly after your term. If you or your editor introduces one or more whitespaces after your dedicated index term, in the worst case it could lead to a wrong page number in the index. This mainly affects the PDF rather than any online formats but could confuse your readers.

All of the above problems can be solved with the stylesheet from Example 3.16, “profile-tag.xsl. It exploits the DocBook XSL stylesheet´s profiling mechanism. Normally, profiling is a method to remove certain structures from a document rather than add something. In our case we use the special profile mode to customize the automatic index term addition.

With the above stylesheet, it is possible to influence how your index terms appear. This is done with the condition[8] attribute, demonstrated on our envar example:

<envar></envar>

Adds the indexterms directly after the envar element.

<envar condition="index"></envar>

Same as the previous entry.

<envar condition="noindex"></envar>

Suppresses any automatic generation of index entries.

<envar condition="pref"></envar>

Adds an preferred index entry. The keyword “pref” can be customized through the preferred parameter. If the keyword is added in the condition attribute, the following code is created:

<indexterm significance="preferred">...</indexterm>

This method can not solve all index problems. You should know some of its limits:

  • Document Type. Technical documents are more applicable than novels as the former contains usually a set of elements which are consistently used.

  • Consistent Elements. The document needs not only consistently use the same elements for the same structure, it has to use a specific element in the first place. For example, if you want to show your configuration files in your index, you need to mark it up with filename, otherwise this method has no chance.

  • Needed Elements. Similar to the previous point, you have to know which elements you need to show up in the index. You have to select from all possible inline elements only a handful which you consider important enough.

  • Only for Inline Elements. This method works only for inline elements well. DocBook´s inline elements occur usually inside a paragraph but can also show up in a title.

  • Hard-coded Primary Entry. It can be critized to add hard-coded text into the stylesheet profile-tag.xsl (here: “environment variables”). If you maintain different languages, you should replace it with a more general solution and move the language specific text into language files as described in Section 2.9, “Extending Language Files with Your Own Text”.

Although this method does not replace hand-written index entries, it can ease the pain. Especially for those entries which can be be inserted automatically it improves consistency. The method described above can also be implemented for other inline elements, like persons, functions, etc.

See Also


[8] Theoretically you could use any of the several common attributes available on every DocBook element. The condition was the one attribute that has fitted the best in the authors mind.


Project@GitHubIssue#8