You want to split your result HTML into different files, all correctly linked.
The DocBook XSL stylesheets uses the term
chunking for splitting up your result into
different HTML files. A chunk is therefor a single
HTML file.
Use the chunk.xsl
stylesheet, it is available
for all HTML variants. Usually this is enough to chunk your result.
To influence the chunking process, use the following parameters:
base.dir
Sets the output directory for all chunks. If not set, the output directory is system dependent. Usually it is the current directory from where you have executed your XSLT processor.
chunk.section.depth
Sets the depth to which sections should be chunked. Default is 1.
chunk.first.sections
Controls, if a first top-level sect1
or
section
element is chunked. If non-zero, a separate
file (“chunk”) is created, otherwise the section
is included in the component. Default is 0 (= zero, no separate
chunk is created).
use.id.as.filename
Controls the filename of the chunked element. If non-zero, the filename is derived from the ID of the element. If zero, the filename is generated and numbered according to its position. Default is 0 (= zero, do not use IDs for file names).
To better understand what the stylesheet creates, lets assume the following book structure:
The following subsections show how to influence the output.
Using the chunk.xsl
stylesheet with
xsltproc, saxon, or any other
XSLT processor leads to the following file names:
As you can see, the file name consists of several components:
An abbreviation of the chunked element. Each chunked element is assigned an abbreviation, one or two characters long. The available abbreviations are shown in Table 5.1.
Abbreviation | Element |
---|---|
ap | appendix |
ar | article |
bi | bibliography |
bk | book |
ch | chapter |
co | colophon |
go | glossary |
ix | index |
pr | preface |
pt | part |
re | refentry |
rn | reference |
s | section |
se | set |
si | setindex |
to | topic |
A consecutive number. Each chunked component gets a number. For example,
the first chapter has ch01
, the second
chapter ch02
, and so on.
Additional sub components. If components has subcomponents like sections, the
subcomponent's abbreviation is included in the file name.
As such, the second section in the first chapter gets the file
name ch01s02.html
.
If you want to have your files in a specific directory, set the
parameter base.dir
to your preferred value,
for example:
# base.dir=html/ book --> html/index.html preface --> html/pr01.html chapter --> html/ch01.html sect1 sect1 --> html/ch01s02.html appendix --> html/apa.html sect1 sect1 --> html/apas02.html
Example 5.5 showed, that the first section is not chunked.
This is the default behavior. However, if you want the first section
also to be written in a separate file, set the
chunk.first.sections
parameter to
1
to get the following result:
book --> index.html preface --> pr01.html chapter --> ch01.html sect1 --> ch01s01.html sect1 --> ch01s02.html appendix --> apa.html sect1 --> apas01.html sect1 --> apas02.html
If we have very deeply nested structures with sections and subsections, we may want to chunk these as well. As an example, lets assume the following chapter with these subsections:
chapter sect1 sect2 sect3 sect4 sect2 sect1
By default, only sect1
elements are written to a file.
Anything below a sect1
like sect2
,
sect3
etc. is written to the same file that contains the
sect1
element.
To control the chunking process for sections, use the parameter
chunk.section.depth
. By default, the parameter
is set to 1
which is equivalent to chunk only level
one sections. Setting chunk.section.depth
to
2
has the following effect:
# chunk.section.depth=2, chunk.first.sections=1 chapter --> ch01.html sect1 --> ch01s01.html sect2 --> ch01s01s01.html sect3 sect4 sect2 --> ch01s01s02.html sect1 --> ch01s02.html
As you can see, with the value of 2
, level two
sections (sect2
) are written to a separate file.
A value of 3
has the following effect:
# chunk.section.depth=3, chunk.first.sections=1 chapter --> ch01.html sect1 --> ch01s01.html sect2 --> ch01s01s01.html sect3 --> ch01s01s01s01.html sect4 sect2 --> ch01s01s02.html sect1 --> ch01s02.html
In other words, parameter chunk.section.depth
cuts at the respective section level.
The previous sections used predictable file names, but not stable ones. If you add or remove a section or chapter, the numbering of the chapters and sections will change and as such the file names too. If you want to share a link, this naming scheme is not useful as it is not stable.
Stable file names are not affected when you restructure your document. If you add or remove a structural element, the file names will still be the same.
To create such stable file names, use the parameter
use.id.as.filename
. This creates a file name
through the xml:id
attribute of your
component. However, you should keep in mind some issues when you use
this naming scheme:
Validate your document before you transform it. Validating your document shows you any problems with IDs. For example, double IDs, missing IDs, and syntactically wrong IDs. This is very useful as the DocBook XSL stylesheets do not check for file names which occur twice. This could lead to a situation where one file name overwrites the other.
Set IDs to your components. You need to set IDs to your components, otherwise it will fallback to the default naming scheme.
Use “speaking” IDs. Some tools can generate IDs automatically which could lead to
something like y8w739zya
. Such IDs are nonsense
and useless as you cannot memorize them and they do not give any hints.
Avoid that and replace such IDs with some meaningful and easy to
remember name. This will also benefit your file names.
Avoid unusual characters in your ID. Although it may be tempting to use umlauts, diacritica, or other Unicode characters, it is recommended to stay in the realm of the ASCII character set. Depending on the file system, the tools you use, or the operating system, Unicode characters could not be fully supported and as such could lead to wrong file names.
Structure your IDs consistently.
It is easier to find an HTML file if it is named consistently.
For example, if you have a chapter about introduction, you could
set the ID to intro
. Any sections
inside this chapter would use it as a prefix and append their
own. A section with describes an overview could have an ID
named intro.overview
.
This helps you when you search a specific HTML file.
Lets amend Example 5.4, “Book Structure With Components and Sections” with IDs:
Transforming it through chunk.xsl
leads to
the following result:
# use.id.as.filename=1, chunk.first.sections=1 book --> index.html preface --> preface.html chapter --> intro.html sect1 --> intro.concept.html sect1 --> intro.requirements.html appendix --> app.overview.html sect1 --> app.overview.method-a.html sect1 --> app.overview.method-b.html
Maybe the file name of the book is a bit surprising. By default, its
basename is index
. If you want to change that too,
set the parameter root.filename
to your preferred
value (without a file extension).
Of course, you can combine it with all the other parameters which are explained in this topic.
Project@GitHub | Issue#10 |