You want to split your result HTML into different files, all correctly linked.
The DocBook XSL stylesheets uses the term
chunking for splitting up your result into
different HTML files. A chunk is therefor a single
HTML file.
Use the chunk.xsl stylesheet, it is available
for all HTML variants. Usually this is enough to chunk your result.
To influence the chunking process, use the following parameters:
base.dirSets the output directory for all chunks. If not set, the output directory is system dependent. Usually it is the current directory from where you have executed your XSLT processor.
chunk.section.depthSets the depth to which sections should be chunked. Default is 1.
chunk.first.sectionsControls, if a first top-level sect1 or
section element is chunked. If non-zero, a separate
file (“chunk”) is created, otherwise the section
is included in the component. Default is 0 (= zero, no separate
chunk is created).
use.id.as.filenameControls the filename of the chunked element. If non-zero, the filename is derived from the ID of the element. If zero, the filename is generated and numbered according to its position. Default is 0 (= zero, do not use IDs for file names).
To better understand what the stylesheet creates, lets assume the following book structure:
book
preface
chapter
sect1
sect1
appendix
sect1
sect1The following subsections show how to influence the output.
Using the chunk.xsl stylesheet with
xsltproc, saxon, or any other
XSLT processor leads to the following file names:
# No parameters set, default behaviour
book --> index.html
preface --> pr01.html
chapter --> ch01.html
sect1
sect1 --> ch01s02.html
appendix --> apa.html
sect1
sect1 --> apas02.htmlAs you can see, the file name consists of several components:
An abbreviation of the chunked element. Each chunked element is assigned an abbreviation, one or two characters long. The available abbreviations are shown in Table 5.1.
| Abbreviation | Element |
|---|---|
ap | appendix |
ar | article |
bi | bibliography |
bk | book |
ch | chapter |
co | colophon |
go | glossary |
ix | index |
pr | preface |
pt | part |
re | refentry |
rn | reference |
s | section |
se | set |
si | setindex |
to | topic |
A consecutive number. Each chunked component gets a number. For example,
the first chapter has ch01, the second
chapter ch02, and so on.
Additional sub components. If components has subcomponents like sections, the
subcomponent's abbreviation is included in the file name.
As such, the second section in the first chapter gets the file
name ch01s02.html.
If you want to have your files in a specific directory, set the
parameter base.dir to your preferred value,
for example:
# base.dir=html/
book --> html/index.html
preface --> html/pr01.html
chapter --> html/ch01.html
sect1
sect1 --> html/ch01s02.html
appendix --> html/apa.html
sect1
sect1 --> html/apas02.htmlExample 5.5 showed, that the first section is not chunked.
This is the default behavior. However, if you want the first section
also to be written in a separate file, set the
chunk.first.sections parameter to
1 to get the following result:
book --> index.html
preface --> pr01.html
chapter --> ch01.html
sect1 --> ch01s01.html
sect1 --> ch01s02.html
appendix --> apa.html
sect1 --> apas01.html
sect1 --> apas02.htmlIf we have very deeply nested structures with sections and subsections, we may want to chunk these as well. As an example, lets assume the following chapter with these subsections:
chapter
sect1
sect2
sect3
sect4
sect2
sect1 By default, only sect1 elements are written to a file.
Anything below a sect1 like sect2,
sect3 etc. is written to the same file that contains the
sect1 element.
To control the chunking process for sections, use the parameter
chunk.section.depth. By default, the parameter
is set to 1 which is equivalent to chunk only level
one sections. Setting chunk.section.depth to
2 has the following effect:
# chunk.section.depth=2, chunk.first.sections=1
chapter --> ch01.html
sect1 --> ch01s01.html
sect2 --> ch01s01s01.html
sect3
sect4
sect2 --> ch01s01s02.html
sect1 --> ch01s02.html
As you can see, with the value of 2, level two
sections (sect2) are written to a separate file.
A value of 3 has the following effect:
# chunk.section.depth=3, chunk.first.sections=1
chapter --> ch01.html
sect1 --> ch01s01.html
sect2 --> ch01s01s01.html
sect3 --> ch01s01s01s01.html
sect4
sect2 --> ch01s01s02.html
sect1 --> ch01s02.htmlIn other words, parameter chunk.section.depth
cuts at the respective section level.
The previous sections used predictable file names, but not stable ones. If you add or remove a section or chapter, the numbering of the chapters and sections will change and as such the file names too. If you want to share a link, this naming scheme is not useful as it is not stable.
Stable file names are not affected when you restructure your document. If you add or remove a structural element, the file names will still be the same.
To create such stable file names, use the parameter
use.id.as.filename. This creates a file name
through the xml:id attribute of your
component. However, you should keep in mind some issues when you use
this naming scheme:
Validate your document before you transform it. Validating your document shows you any problems with IDs. For example, double IDs, missing IDs, and syntactically wrong IDs. This is very useful as the DocBook XSL stylesheets do not check for file names which occur twice. This could lead to a situation where one file name overwrites the other.
Set IDs to your components. You need to set IDs to your components, otherwise it will fallback to the default naming scheme.
Use “speaking” IDs. Some tools can generate IDs automatically which could lead to
something like y8w739zya. Such IDs are nonsense
and useless as you cannot memorize them and they do not give any hints.
Avoid that and replace such IDs with some meaningful and easy to
remember name. This will also benefit your file names.
Avoid unusual characters in your ID. Although it may be tempting to use umlauts, diacritica, or other Unicode characters, it is recommended to stay in the realm of the ASCII character set. Depending on the file system, the tools you use, or the operating system, Unicode characters could not be fully supported and as such could lead to wrong file names.
Structure your IDs consistently.
It is easier to find an HTML file if it is named consistently.
For example, if you have a chapter about introduction, you could
set the ID to intro. Any sections
inside this chapter would use it as a prefix and append their
own. A section with describes an overview could have an ID
named intro.overview.
This helps you when you search a specific HTML file.
Lets amend Example 5.4, “Book Structure With Components and Sections” with IDs:
book xml:id="book"
preface xml:id="preface"
chapter xml:id="intro"
sect1 xml:id="intro.concept"
sect1 xml:id="intro.requirements"
appendix xml:id="app.overview"
sect1 xml:id="app.overview.method-a"
sect1 xml:id="app.overview.method-b"Transforming it through chunk.xsl leads to
the following result:
# use.id.as.filename=1, chunk.first.sections=1
book --> index.html
preface --> preface.html
chapter --> intro.html
sect1 --> intro.concept.html
sect1 --> intro.requirements.html
appendix --> app.overview.html
sect1 --> app.overview.method-a.html
sect1 --> app.overview.method-b.htmlMaybe the file name of the book is a bit surprising. By default, its
basename is index. If you want to change that too,
set the parameter root.filename to your preferred
value (without a file extension).
Of course, you can combine it with all the other parameters which are explained in this topic.
| Project@GitHub | Issue#10 |