3.6. Extracting One Element from DocBook Document

3.6. Extracting One Element from DocBook Document
Prev	Chapter 3. Manipulating DocBook Document Structure	Next

Difficulty: ★☆☆ (easy)

Keywords: extracting one element, rootid, xref, cross-references

Problem

You have a big DocBook document and you need to extract one structural element like a chapter, appendix etc. to edit or process it separately from the main document.

Solution

To make the solution work, the structural element needs an ID attribute. If this is available, use the following stylesheet:

Example 3.8. Extracting Stylesheet rootid.xsl

Download File 'extract/rootid.xsl'

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:key name="id" match="*" use="@id|@xml:id"/>
  <!-- Contains the ID attribute of the extracted element: -->
  <xsl:param name="rootid"/>
  <!-- Controls some log messages: 0=off, 1=on -->
  <xsl:param name="rootid.debug" select="0"/>

  <xsl:template match="/">
    <xsl:choose>
      <xsl:when test="$rootid !=''">
        <xsl:if test="count(key('id',$rootid)) = 0">
          <xsl:message terminate="yes">
            <xsl:text>ID '</xsl:text>
            <xsl:value-of select="$rootid"/>
            <xsl:text>' not found in document.</xsl:text>
          </xsl:message>
        </xsl:if>
        <xsl:call-template name="rootid.debug.message"/>
        <xsl:call-template name="rootid.process"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="normal.process"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="rootid.debug.message">
    <xsl:if test="$rootid.debug != 0">
      <xsl:message>
        <xsl:text>Using ID </xsl:text>
        <xsl:value-of select="concat('&quot;', $rootid, '&quot;')"/>
      </xsl:message>
    </xsl:if>
  </xsl:template>
  
  <xsl:template name="rootid.process">
    <xsl:apply-templates select="key('id',$rootid)" mode="process.root"/>
  </xsl:template>

  <xsl:template name="normal.process">
    <xsl:apply-templates/>
  </xsl:template>
  
  <xsl:template match="node() | @*" mode="process.root">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" mode="process.root"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Pass the rootid parameter to your XSLT processor with the corresponding ID, for example:

xsltproc --stringparam rootid intro rootid.xsl XML_FILE

The result contains only the element with the corresponding ID value and everything inside it.

Discussion

This solution cuts off the element with the corresponding ID and copies the element itself and its children to the output stream. The copying is done in the process.root mode. The stylesheet does not apply any further processing. This can be a disadvantage, for example, a xref pointing outside of the respective element. If the resulting file contains such a cross-reference, it will not be valid anymore.

It is possible to convert such cross-references into a “resolved form” by using the following code:

Example 3.9. rootid-resolve-xrefs.xsl

Download File 'extract/rootid-resolve-xrefs.xsl'

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:d="http://docbook.org/ns/docbook">
  
  <xsl:import href="rootid.xsl"/>
  
  <xsl:template match="d:xref" mode="process.root">
    <xsl:variable name="xhref" select="@xlink:href"/>
    <!-- is the @xlink:href a local idref link? -->
    <xsl:variable name="xlink.idref">
      <xsl:choose>
        <xsl:when test="starts-with($xhref,'#')">
          <xsl:value-of select="substring($xhref, 2)"/>
        </xsl:when>
        <xsl:when test="contains($xhref, '://')">
         <xsl:message>
            <xsl:text>ERROR: Don't know what do do with @xlink:href: </xsl:text>
            <xsl:value-of select="$xhref"/></xsl:message> 
        </xsl:when>
        <xsl:otherwise/>
      </xsl:choose>
    </xsl:variable>
    <xsl:variable name="xlink.targets" select="key('id',$xlink.idref)"/>
    <xsl:variable name="linkend.targets" select="key('id',@linkend)"/>
    <xsl:variable name="target" select="($xlink.targets | $linkend.targets)[1]"/>
    <xsl:variable name="refelem" select="local-name($target)"/>
    
    <xsl:variable name="this.div" 
      select="ancestor-or-self::d:*[@xml:id = $rootid][1]"/>
    <xsl:variable name="target.div"
      select="$target/ancestor-or-self::d:*[@xml:id = $rootid][1]"/>

    <xsl:choose>
      <xsl:when test="generate-id($this.div) = generate-id($target.div)">
        <xsl:copy-of select="."/>
      </xsl:when>
      <xsl:otherwise>
        <phrase xmlns="http://docbook.org/ns/docbook" remap="xref">
          <xsl:choose>
            <xsl:when test="@linkend">
              <xsl:attribute name="role">
                <xsl:value-of select="@linkend"/>
              </xsl:attribute>
            </xsl:when>
            <xsl:when test="$xlink.idref != ''">
              <xsl:attribute name="role">
                <xsl:value-of select="$xlink.idref"/>
              </xsl:attribute>
            </xsl:when>
            <xsl:otherwise>
              <xsl:attribute name="role">
                <xsl:value-of select="$xhref"/>
              </xsl:attribute>
            </xsl:otherwise>
          </xsl:choose>
          <xsl:apply-templates
            select="@*[local-name() != 'linkend' and
                       local-name() != 'href']"
            mode="process.root"/>
          <xsl:apply-templates
            select="($target/ancestor-or-self::d:*[d:title])[last()]/d:title/node()"
            mode="process.root"/>
        </phrase>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

The stylesheet in Example 3.9, “rootid-resolve-xrefs.xsl” imports the rootid.xsl and inherits all templates. To implement a different behaviour we need to add a new template matching for the xref element in mode process.root.

The template contains mostly code from the DocBook XSL stylesheets with some minor changes. The general behaviour is described in the following sequence:

Make sure, everything is in place and a xlink:href attribute does not contain a :// string. If this is the case emit an error message.
Populate the variables xlink.targets and linkend.targets with the target node. For xlink.targets use the XLink attribute xlink:href, for the variable linkend.targets use the linkend attribute. As only one of these attributes can be available, but not both, the variables are filled with zero nodes or more.
Create the set union of the variables xlink.targets and linkend.targets and select only one node.
Now it gets interesting: our context node is in xref. We need to know the node where the value of the xml:id attribute equals our rootid parameter. We climb up tree with the ancestor-or-self axis specifier and select every DocBook element. With the predicate [@xml:id = $rootid] the node set is filtered and only those element(s) are preserved where this expression is true. Only one node from the node set is selected.
This is done also for the target node and the result is saved in the variable target.div
The two node from the previous operation are compared through the generate-id function. That leaves two options:
- Both nodes are equal. The xref points somewhere inside the tree under the rootid element. That means, we can copy the xref element.
- Both nodes are not equal. The xref points outside of the rootid element. That means, you need to “resolve” the xref element to prevent validation errors.
If the xref needs to be revamped, we use the phrase element, copy all attributes (except linkend and xlink:href), and copy anything inside the title of the target node. As the target node could not be a title itself, we use again the ancestor-or-self axis to climb up the tree and select the first emerging title.

TODO: Add graphic to illustrate the method

Prev	Up	Next
3.5. Splitting DocBook Documents	Home	3.7. Transforming `sectX` Elements into `section` Elements