Difficulty: ★★☆ (medium)
Keywords: hyphenation, URLs, zero width space, soft hyphen

Problem

You have URLs or paths which you want to hyphenate correctly. The URLs have to break on slashes or other characters only, but not between words.

Solution

The hyphenation of URLs in the DocBook stylesheets are controlled by two parameters: ulink.hyphenate and ulink.hyphenate.chars. The first parameter, if not empty, turns on hyphenation. Specify a hyphenation character, usually either a Unicode soft hyphen (U+00AD) or a Unicode zero-width space (U+200B).

The second parameter, ulink.hyphenate.chars, let you define your allowable hyphenation points. The default value is a slash (/), but URLs can contain more characters where it is desirable to hyphenate. For this reason, the DocBook parameter reference recommends the following value:

<xsl:param name="ulink.hyphenate.chars">:/@&?.#</xsl:param

Discussion

The easiest way is to set the parameters ulink.hyphenate.chars and ulink.hyphenate to the values showed in the last section and be happy. However, for professional needs, this is not enough. The parameters and embedded algorithm do not take into account protocols, for example http. Protocols begin with the schema followed by :// as in http://. In some situations (although rare), a hyphenation can occur between the double slashes or before the colon. Furthermore, according to the Chicago Manual of Style, it is desirable to distinguish characters before and after the hyphenation takes place.

All these requirements are implemented in the stylesheet showed in Example 4.1, “hyphenate-url.xsl. It cuts off the protocol with its :// and iterates through each characters and checks, if a hyphenation point needs to be inserted before or after it.

Example 4.1. hyphenate-url.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format">

<xsl:template name="hyphenate-url">
  <xsl:param name="url" select="''"/>

  <!-- Remove the "schema://" prefix, so it disturbs not the
       algorithm in "hyphenate-url-string" -->
  <xsl:choose>
   <xsl:when test="$ulink.hyphenate = ''">
      <xsl:value-of select="$url"/>
   </xsl:when>
   <xsl:when test="contains($url, '://')">
      <xsl:value-of select="substring-before($url, '://')"/>
      <xsl:text>://</xsl:text>
      <xsl:copy-of select="$ulink.hyphenate"/>
      <xsl:call-template name="hyphenate-url-string">
        <xsl:with-param name="url" select="substring-after($url, '://')"/>
      </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
      <xsl:call-template name="hyphenate-url-string">
        <xsl:with-param name="url" select="normalize-space($url)"/>
      </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose>
</xsl:template>


<xsl:template name="hyphenate-url-string">
  <xsl:param name="url" select="''"/>
  <xsl:variable name="char" select="substring($url, 1,1)"/>

  <xsl:choose>
   <xsl:when test="$url=''"/>
   <!-- Insert breakpoint _before_ the character -->
   <xsl:when test="contains($ulink.hyphenate.before.chars, $char)">
     <xsl:value-of select="concat($ulink.hyphenate, $char)"/>
     <xsl:call-template name="hyphenate-url-string">
       <xsl:with-param name="url" select="substring($url, 2)"/>
     </xsl:call-template>
   </xsl:when>
   <!-- Insert breakpoint _after_ the character -->
   <xsl:when test="contains($ulink.hyphenate.after.chars, $char)">
     <xsl:value-of select="concat($char, $ulink.hyphenate)"/>
     <xsl:call-template name="hyphenate-url-string">
       <xsl:with-param name="url" select="substring($url, 2)"/>
     </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
     <xsl:value-of select="$char"/>
     <xsl:call-template name="hyphenate-url-string">
       <xsl:with-param name="url" select="substring($url, 2)"/>
     </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose>
</xsl:template>
</xsl:stylesheet>

Include the above stylesheet into your customization layer with additionally the following parameters:

<!-- Insert breakpoint /before/ the following characters: -->
<xsl:param name="ulink.hyphenate.before.chars"
  >.,%?&amp;#\-+{_</xsl:param>
<!-- Insert breakpoint /after/ the following characters: -->
<xsl:param name="ulink.hyphenate.after.chars"
  >/:@=};</xsl:param>

Project@GitHubIssue#9