|The biggest known problem in this technology is the lack of XML/XSL expertise - both
being relatively new formats. Even if considered heavy and over-hyped, the XML/XSL pair
will do magic once it receives the widespread public knowledge it deserves. This
information has been compiled from many sources, intends to be a small step in the
direction of helping people to learn this technology.
Extensible Markup Language
You don't need to be an XML guru to to understand a fundamental XML concepts. Let's
first compare XML with the familiar HTML language.
XML is also a tag language, as is HTML. However, the similarity between HTML and XML
ends here. Although XML might look similar to HTML, XML isn't based on a fixed set of
predefined tags. Also, XML tags don't control how a Web browser displays text. XML is a
metalanguage used to create custom markup languages that can define all types of
information or data, such as documents, objects, etc. XML defines information and data
according to purpose rather than presentation so that several applications can use the
information and data in ways that promote diverse application reuse and extensibility.
Elements, Tags, Attributes, and Content:
An element is a pair of identically named tags that constitute a start tag and an end
tag. The start tag consists of a left angle bracket (<), followed by a lowercase name
that is typically a noun, an attribute list (sometimes optional), and a right angle
bracket (>). For example, the start tag <script language="language">
contains the tag name script and one attribute definition, language="language".
The end tag is similar to the start tag, except that you preface the lowercase tag name
with a slash (/) and exclude any attributes (e.g., </script>). Attributes are
mandatory (or optional) named values that provide additional information about the
corresponding tag or about the content that follows. For example, <script
language="vbscript"> means that the content of this element is VBScript
source code. You use a slash (/) just before the closing angle bracket to terminate empty
elements. The XML declaration and CDATA section are standard W3C XML 1.0 markup that
affect how the parser handles the document.
Listing 1. shows a small example of XML document:
Marking up data with XML makes it easy to exchange data between two applications, because
they don't have to understand anything about file formats. All they need to do is
understand the XML tags, which makes the data really easy to parse. There really aren't
any specific tags in XML. You can make up your own XML language with any tags you need to
describe the data. To construct your own XML language, you create a specific document type
definition (DTD). You don't always need a DTD, because the XML parser can figure out
the structure of the document by reading the elements.
Technologies for transformations
There are two basic technologies well use for transforming documents. The first
is the Extensible Stylesheet Language for Transformations, better known as XSL or XSLT.
The second is Java code that uses the methods of the Document Object Model
(DOM) or Simple API for XML (SAX). As well in some cases you may
consider to use XMLLight Java API. Most of
the time, its simpler to write an XSL stylesheet, but there may be times when a
stylesheet cant do what you want. In general, any time youre transforming a
document from one XML vocabulary to another, XSL is probably the best way to go.
On the other hand, if youre transforming an XML document into something special that
isnt a text or markup language, youll almost certainly want to write Java code
instead. Writing Java code is more difficult, but it gives you complete control over the
Extensible Strylesheet Language (XSL):
Although XML is easy to read and easy to exchange with other applications, it's not
very suitable for displaying on a Web page, precisely because it does not include any
information about what the visual presentation should be. XSL is the language you use to
transform, or format, XML into some other kind of document, such as an HTML page. When you
run the XML document through an XSL processor, the XML information will be evaluated and
completely transformed into something else. An XSL stylesheet contains some number of
templates, each of which describes how to transform a given element in the source
In an XSL transformation, an XSL processor reads both an XML document and an XSL style
sheet. Based on the instructions the processor finds in the XSL style sheet, it outputs a
new XML document or fragment thereof. There's also special support for outputting HTML.
The input must be an XML document. Most of the time the output is also an XML document.
However, it may also be a result as HTML and/or raw text. XML document is a tree. A tree
is a data structure composed of connected nodes beginning with a single node called the
root. The root is connected to its child nodes, each of which may is connected to zero or
more children of its own, and so forth. An XSL document contains a list of templates and
other rules. A template rule has a pattern specifying the trees it applies to and a
template to be output when the pattern is matched. When an XSL processor formats an XML
document using an XSL style sheet, it scans the XML document tree looking through each
sub-tree in turn. As each tree in the XML document is read, the processor compares it with
the pattern of each template rule in the style sheet. When the processor finds a tree that
matches a template rule's pattern, it outputs the rule's template. This template generally
includes some markup, some new data, and some data copied out of the tree from the
original XML document. XSL uses XML to describe these rules, templates, and patterns. The
XSL document itself is an xsl:stylesheet element. Each template rule is an
<xsl:template> element. The pattern of the rule is the value of the match attribute
of the xsl:template element. The output template is the content of the
<xsl:template> element. All instructions in the template for doing things like
selecting parts of the input tree to include in the output tree are performed by one or
another XSL element. These are identified by the "xsl:" prefix on the element
names. Elements that do not have an "xsl:" prefix are part of the result tree.
You can find more information and XSLT samples here:
XSL EASY - what is XSL?
XSL EASY 2 - make Resume using XSLT.
XSL EASY 3 - advanced XSLT samples.
Multi XSL Transformations
If you need to create an output document like HTML page or XML document you may
consider not just one XSL transformation, but several consequent transformations, which
use an information received in previous step to move to the next one. Actually allways the
processing of information may be divided in several steps. At the first step you are
getting initial data, then using this data you can create a requests to other data sources
to receive some additional information, then you process this information and store some
information in database, and as a result you create a final document. So, if we have to
dial with information stored in XML format, and final document must be XML (or HTML!) why
not create a tool which allows us to use XSL stylesheets to transform our data, store
intermediate results in memory, and start another transformation for the next step? The
idea is not new. As an example of this kind of tool you may consider an Apaches
Cocoon http://xml.apache.org/cocoon. As a
simple and more straightforward alternative we would like to present here a Multi XSL
transformations. Click here to find more information about it.
Java, Document Object Model (DOM), Simple API for XML (SAX)
If you want or need complete control over your transformations, you can write Java code
to process XML documents. The good news is that you can do anything
you want; the bad news is that you have to do everything yourself.
You can find more
information about DOM in the tutorial, XML Programming in Java,
available on the XML zone of IBMs
developerWorks Web site.
More information about SAX available at Megginson
Technologies Web site.
XMLLight is an alternative
approach which may be useful in low resource environments (like
applets or some other distributed applications).