There are a few changes between DocBook XML and SGML. Handling these differences should be relatively easy for most small documents, and many authors will not need to make any changes to convert their documents other than the XML and DocBook declarations at the start of their document.
For others, here is a list of what you should keep in mind when converting your documents from SGML to XML.
An XML element typically has three parts: the start tag, the content (your words) and the end tag. Qualifiers are added in the start tag and are known as attributes. They will always have a name and a quoted value.
<filename class="directory">/usr/local<filename>
The start tag contains one attribute (class) with a value of “directory”. The end tag (also filename) must not contain any attributes.
Element names (tags) and their attributes are case-dependent--typically lowercase. The following will not validate because the end tag <PARA> is uppercase:
<para>This part will fail XML validation</PARA>
All attributes in the start tag must be "quoted". This can be either single (') or double (") quotes, but not reverse (`) or “smart quotes”. The quote used to start a name="value" pair must be the same quote used at the end of the value. In other words: "this" would validate, but 'that" would not.
Tags that have a start tag, but no end tag are
referred to as “empty” because they do
not contain (wrap around) anything. These tags must still be
closed with a trailing slash (/). For example:
xref
must be written as
<xref linkend="software"/>. You may not
have any spaces between the / and >.
(Although you may have a space after the final
attribute: <xref linkend="foo" />.)
Processing instructions that get sent to the transformation engine (DSSSL or XSLT) and must have a question mark at the end of the tag. All processing instructions are removed from the output stream. The XML version of this tag would look like this:
<?dbhtml filename="foo"?>
If you're converting from SGML to XML, be sure file names refer to .xml files instead of .sgml. Some tools may get confused if a .sgml file contains XML.
Tag minimizations were used in SGML instead of
writing out the element name in the end tag.
Example: <para>
This is foo.</>
Tag minimizations are
not supported in XML and their use is
discouraged in DocBook.
The significant changes between version changes in the DTD involve changes to the elements (tags). Elements may be: deprecated (which means they will be removed in future versions); removed; modified; or added. Almost all authors will run into a changed or deprecated tag when going from a lower version of DocBook to a higher version.
DocBook: The Definitive Guide does
an excellent job of showing you how elements fit
together. For each element it tells you what an
element must contain (its content model) and what is
may be contained in (who its parents are). For
example: a note
must contain a
para
. If you try to write
<note>Content in a
note</note> your document will not validate.
Learning how elements are assembled will make it a
lot easier to understand any validation errors that
are thrown at you. If you get truly stuck you can
also email the LDP's docbook mailing list for extra
hints. Information on subscribing is available from
Section 2, “Mailing Lists”
All tags that have been deprecated or changed for 4.x are listed in DocBook: The definitive guide, published by O'Reilly and Associates. This book is also available on-line from http://www.docbook.org.
Here are a few elements that are of particular relevance to LDP authors:
artheader
. has been changed to
articleinfo
.
Most other header elements have been renamed to info.
graphic
. has been deprecated and will be removed as of DocBook 5.x.
To prepare for this, start using
mediaobject
. There is more
information about mediaobject
in Section 5, “Inserting Pictures”.
imagedata
. file formats
must now be written in UPPERCASE letters. If you
use lowercase or mixed-case spellings
for your file formats, it will fail.
Valid:
<imagedata format="EPS" fileref="foo.eps">
Invalid:
<imagedata format="eps" fileref="foo.eps">