SoftCorporation LLC.


XML & XSL technology

The biggest known problem in this technology is the lack of XML/XSL expertise - both being relatively new formats. Even if considered heavy and over-hyped, the XML/XSL pair will do magic once it receives the widespread public knowledge it deserves. This information has been compiled from many sources, intends to be a small step in the direction of helping people to learn this technology.

Extensible Markup Language (XML):

You don't need to be an XML guru to to understand a fundamental XML concepts. Let's first compare XML with the familiar HTML language.

XML is also a tag language, as is HTML. However, the similarity between HTML and XML ends here. Although XML might look similar to HTML, XML isn't based on a fixed set of predefined tags. Also, XML tags don't control how a Web browser displays text. XML is a metalanguage used to create custom markup languages that can define all types of information or data, such as documents, objects, etc. XML defines information and data according to purpose rather than presentation so that several applications can use the information and data in ways that promote diverse application reuse and extensibility.

Elements, Tags, Attributes, and Content:

An element is a pair of identically named tags that constitute a start tag and an end tag. The start tag consists of a left angle bracket (<), followed by a lowercase name that is typically a noun, an attribute list (sometimes optional), and a right angle bracket (>). For example, the start tag <script language="language"> contains the tag name script and one attribute definition, language="language". The end tag is similar to the start tag, except that you preface the lowercase tag name with a slash (/) and exclude any attributes (e.g., </script>). Attributes are mandatory (or optional) named values that provide additional information about the corresponding tag or about the content that follows. For example, <script language="vbscript"> means that the content of this element is VBScript source code. You use a slash (/) just before the closing angle bracket to terminate empty elements. The XML declaration and CDATA section are standard W3C XML 1.0 markup that affect how the parser handles the document.

Listing 1. shows a small example of XML document:

<?xml version="1.0"?>
        <description>Model 900</description>
        <description>32X CD-ROM</description>

Marking up data with XML makes it easy to exchange data between two applications, because they don't have to understand anything about file formats. All they need to do is understand the XML tags, which makes the data really easy to parse. There really aren't any specific tags in XML. You can make up your own XML language with any tags you need to describe the data. To construct your own XML language, you create a specific document type definition (DTD).  You don't always need a DTD, because the XML parser can figure out the structure of the document by reading the elements.

Technologies for transformations

There are two basic technologies we’ll use for transforming documents. The first is the Extensible Stylesheet Language for Transformations, better known as XSL or XSLT. The second is Java code that uses the methods of the Document Object Model (DOM) or Simple API for XML (SAX). As well in some cases you may consider to use XMLLight Java API. Most of the time, it’s simpler to write an XSL stylesheet, but there may be times when a stylesheet can’t do what you want. In general, any time you’re transforming a document from one XML vocabulary to another, XSL is probably the best way to go.

On the other hand, if you’re transforming an XML document into something special that isn’t a text or markup language, you’ll almost certainly want to write Java code instead. Writing Java code is more difficult, but it gives you complete control over the transformation.

Extensible Strylesheet Language (XSL):

Although XML is easy to read and easy to exchange with other applications, it's not very suitable for displaying on a Web page, precisely because it does not include any information about what the visual presentation should be. XSL is the language you use to transform, or format, XML into some other kind of document, such as an HTML page. When you run the XML document through an XSL processor, the XML information will be evaluated and completely transformed into something else. An XSL stylesheet contains some number of templates, each of which describes how to transform a given element in the source document.

In an XSL transformation, an XSL processor reads both an XML document and an XSL style sheet. Based on the instructions the processor finds in the XSL style sheet, it outputs a new XML document or fragment thereof. There's also special support for outputting HTML. The input must be an XML document. Most of the time the output is also an XML document. However, it may also be a result as HTML and/or raw text. XML document is a tree. A tree is a data structure composed of connected nodes beginning with a single node called the root. The root is connected to its child nodes, each of which may is connected to zero or more children of its own, and so forth. An XSL document contains a list of templates and other rules. A template rule has a pattern specifying the trees it applies to and a template to be output when the pattern is matched. When an XSL processor formats an XML document using an XSL style sheet, it scans the XML document tree looking through each sub-tree in turn. As each tree in the XML document is read, the processor compares it with the pattern of each template rule in the style sheet. When the processor finds a tree that matches a template rule's pattern, it outputs the rule's template. This template generally includes some markup, some new data, and some data copied out of the tree from the original XML document. XSL uses XML to describe these rules, templates, and patterns. The XSL document itself is an xsl:stylesheet element. Each template rule is an <xsl:template> element. The pattern of the rule is the value of the match attribute of the xsl:template element. The output template is the content of the <xsl:template> element. All instructions in the template for doing things like selecting parts of the input tree to include in the output tree are performed by one or another XSL element. These are identified by the "xsl:" prefix on the element names. Elements that do not have an "xsl:" prefix are part of the result tree.

You can find more information and XSLT samples here:

XSL EASY - what is XSL?

XSL EASY 2 - make Resume using XSLT.

XSL EASY 3 - advanced XSLT samples.

Multi XSL Transformations

If you need to create an output document like HTML page or XML document you may consider not just one XSL transformation, but several consequent transformations, which use an information received in previous step to move to the next one. Actually allways the processing of information may be divided in several steps. At the first step you are getting initial data, then using this data you can create a requests to other data sources to receive some additional information, then you process this information and store some information in database, and as a result you create a final document. So, if we have to dial with information stored in XML format, and final document must be XML (or HTML!) why not create a tool which allows us to use XSL stylesheets to transform our data, store intermediate results in memory, and start another transformation for the next step? The idea is not new. As an example of this kind of tool you may consider an Apache’s Cocoon As a simple and more straightforward alternative we would like to present here a Multi XSL transformations. Click here to find more information about it.

Java, Document Object Model (DOM), Simple API for XML (SAX) and XMLLight

If you want or need complete control over your transformations, you can write Java code to process XML documents. The good news is that you can do anything you want; the bad news is that you have to do everything yourself.

You can find more information about DOM in the tutorial, XML Programming in Java, available on the XML zone of IBM’s developerWorks Web site.

More information about SAX available at Megginson Technologies Web site.

XMLLight is an alternative approach which may be useful in low resource environments (like applets or some other distributed applications).



Keywords: XML, XSL, XSLT, XSL transformations, e-Business, SoftCorporation LLC.