Sunday, December 9, 2007

The Basics of XML

A Brief History of XML
Extensible Markup Language (XML) is based upon SGML (Standard Generalized Markup Language). The simplest explanation of SGML is that it is a method of writing documents with special formatting instructions, or markup, included. A publishing editor makes notations in the margin of a document to alert an author of changes needed to a document. The notations are markup of the document and, indeed, this is where the term "markup" originated. Markup allows the SGML or XML document to be distributed electronically while preserving the format or style of the text. An SGML document contains the content and the markup. The emphasis is placed on the formatting rather than the content, otherwise you would simply have an ordinary document.

SGML can be used to facilitate the publishing of documents as electronic or printed copy. Some programs that read the markup may also translate the styles, for example, to Braille readers and printers. The same document might be viewed on a smaller screen such as those on personal digital assistants (PDAs) or pagers and cellular telephones. The markup can mean something completely different based upon the final destination of the document and the translation to another format. Using stylesheets or transformation methods, a single document with content and markup can be changed upon output.

Markup Simplified

To help you understand markup, four examples are given in this section. They are based on the same results but have very different means of getting there. The first example illustrates that "there may be more than you see" on a monitor or printed page. The second example uses Rich Text Format (RTF) to show a way to embed formatting in a document for transportability. The third example shows the PostScript file (commands) to produce the desired results consistently on a laser printer. The fourth example uses the nested tag style found in SGML, HTML, and XML documents. You will begin to see how this final markup method can provide the formatting that you don't see, the transportability and the consistency of methods two and three, along with additional information about the document and document contents.

Example 1: Text Containing Bold Formatting

This has bold words in a sentence.

Using a word processor or electronic text editor, you may simply click on the word or phrase and apply the text style with special keystrokes (such as Control+B or Command+B) or choose Bold from a menu. On the word processor or computer screen, you can easily read the text, but you do not see the machine description, or code, describing how this text is to be displayed. You may not care how or why that happens, but the computer needs the instructions to comply with your wishes for a format change.

If you save the document and display or print it later, you want the computer to reproduce the document exactly as you designed it. Your computer knows what the stored code (or character markup) means for that text. A problem may arise if you place that code on another operating system or have a different word processor. There may be a different interpretation of the code that produces undesired results. This markup is consistent only if all other variables are equal. The next example uses a text encoding method to change the machine or application code into something more standard and portable.

Example 2: Revealing the Markup in Some Text Editors


{\rtf
{This has }{\b bold words}{ in a sentence.
\par }}

The above sentence shows Rich Text Format (RTF) markup interspersed and surrounding the words of a document. The characters "{", "}", and "\" all mean something in this document but have nothing to do with the content. Rich Text Format markup is used by many word processors to change the visual format of the displayed text. As each new style is encountered, the formatting changes without changing the content of the document. A document becomes easily transportable to other word processors by using Rich Text Format. Each application that knows how to interpret Rich Text Format can show the intent of the author. This book was composed on a word processor, saved as RTF, and electronically submitted to the publisher. Regardless of the application, electronic device, or operating system used to create the document, the styling is preserved.

Rich Text Format markup adds no other information about the text. We may not know who wrote the sentence or when it was written. This information can be included as part of the content of the document but may be difficult to extract easily. We may have no control over the formatting or be allowed to change it for use with other devices. Using a translation application, we can convert it to the next example, the commands our printer understands.

Example 3: PostScript Printer Commands for the Document

%!PS-Adobe-3.0
%%Title: ()
%%Creator: ()
%%CreationDate: (10:29 AM Saturday, May 26, 2001)
%%For: ()
%%Pages: 1
%%DocumentFonts: Times-Roman Times-Bold
%%DocumentData: Clean7Bit
%%PageOrder: Ascend
%%Orientation: Portrait
// more code here has been snipped for brevity //
%%EndPageSetup
gS 0 0 2300 3033 rC
250 216 :M
f57 sf
(This has )S
431 216 :M
f84 sf
.032 .003(bold words)J
669 216 :M
f57 sf
( in a sentence.)S
endp
showpage
%%PageTrailer
%%Trailer
end
%%EOF

The third example, above, is the same text used in the previous two examples and printed to a file as a PostScript document. It uses a different markup even though it is the same text and same document. PostScript is a language, developed by Adobe in 1985, that describes the document for printers, imagesetters, and screen displays. These files can also be converted to Adobe Portable Document Format (.pdf). The markup retains the document or image style so that it can be printed exactly the same way every time. It is a language that is specific to these PostScript devices. An application can translate this document to make it portable, too.

Example 4: Rules-based Nested Structure Used for Document Markup




This has bold words in a sentence.


The styling may be lost.


Unlike the Rich Text Format, nested markup may also contain a description of the text contents. The markup is often called a tag and may define various rules for the document. Sometimes the rules are internal such as "" and "" or external such as a stylesheet (set of rules) to apply to the whole document or portions of a document.

There can be rules for characters, words, sentences, paragraphs, and the entire document. Characters inherit the rules of the word they are in. Words inherit the rules of the sentence, and sentences inherit the rules of the paragraph. The rules may not be just the formatting or style of the text but may also allow for flexibility in display.

Some markup allows for a

XML Advantages

This section expands upon the goals for XML data exchange and how they can help you as a FileMaker Pro developer. The recommendations for the design of the Extensible Markup Language show some of the advantages this format offers. These XML design goals can be found in the document "Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation 6 October 2000",

  • XML shall be straightforwardly usable over the Internet.

  • XML shall support a variety of applications.

  • XML shall be compatible with SGML.

  • It shall be easy to write programs that process XML documents.

  • The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

  • XML documents should be human-legible and reasonably clear.

  • The XML design should be prepared quickly.

  • The design of XML shall be formal and concise.

  • XML documents shall be easy to create.

  • Terseness in XML markup is of minimal importance.



change in the document.

Some formatting rules may also be different and change the inherited rules. All of the characters and words in the sentence above have a rule telling them to be blue. The text color can change to red without changing the sentence's blue color. In this nested markup, only the inner tags make the rule change.

Whether you use Rich Text Format or the nested structure found in SGML, HTML, and XML, changing the content of the words and phrases in the document does not change the style, the format, or the rules. Documents created with markup can be consistent. As the content changes, the style, formatting, and rules remain the same. The portability of documents containing markup to various applications and systems makes them very attractive. Standards have been recommended to ensure that every document that uses these standards will maintain portability.

Using HTML and XHTML to Format Web Pages

HTML Document Structure

The proper HTML document begins with a prolog just like an XML document. This prolog consists of the Document Type Declaration and comes in three versions. Each of these may limit or increase the usage of particular markup. Original versions of HTML used some markup that has become deprecated (outdated or revised) or obsolete. Any of these deprecated elements used in this chapter are so noted.

HTML and XHTML documents must have a root element to make them well formed. This root element may have attributes to further define the document. Browsers may render the page differently based upon these attributes. The version attribute specifies the version listed in the DOCTYPE. The lang attribute can list the base language of the page. The dir attribute works with the lang attribute to specify the direction of the language as it is read natively. The values of the dir attributes can be left to right (LTR) or right to left (RTL). The element has been deprecated.

>

The HEAD Element

Basic HTML documents have two elements,> and>. The portion of this type of document can define the document and contain information about it that may not be displayed in the browser. Search engines often use the contents of the markup in the . Also contained in the element are references to other documents and objects that may be used in the document but not contained in the . The following shows the basic elements of the element:



The Main BODY of the HTML Document

The document that you see in a web browser or on a mobile telephone is formatted by the elements in the . The BODY element contains several attributes and is never empty if you want something to display. Some of these elements have been deprecated (are no longer used) but are listed for reference.


background

An image resource or path to an image to be displayed behind all other items on the page. The image will be displayed with a tiling effect. It will first appear in the upper-left corner and repeat down and to the right. If you make the image sufficiently wide, this effect will not be shown unless the user scrolls past the first repeat. It is also possible to create a small image with repeating patterns that appear to be one big graphic. This attribute has been deprecated for use with XHTML and XML, so use stylesheets to specify a background image.

bgcolor

The background color of the body of the web page. By default, the browser may display white or gray if no background color is specified in this element. This attribute is a solid color and does not have the tiling effect of background. Both attributes may be used, but the background may completely obscure a bgcolor. It may still be useful if an image cannot be found. While not deprecated, this attribute may also be specified in a stylesheet.

text

The color of the text on the page, also called the foreground color. If you use a bgcolor of black, you would specify white or another light color for the text, for example. This attribute is also deprecated and often specified in a stylesheet. The default foreground color or the text of the page is black if you do not specify one.

link

Hypertext links have a default of blue underline if you place them in a web page. Once they have been selected and visited, they change color. Your browser can override the defaults, or you can specify the color (or none) by using this deprecated attribute. vlink is the color of visited links and alink is the color of the selected links. These attributes can work together or separately with link, and all have been deprecated.

Other attributes for are id (must be unique in any document), class, lang, title, style, onload, onunload, onclick, ondblclick, onmousedown, onmouseover, onmouseup, onmousemove, onmouseout, onkeypress, onkeydown, and onkeyup. The most common attribute is the onload attribute. As the document loads into the window of the browser, the element can perform a script. An example usage for preloading images for animation effects is shown in the code below. This is calling the script preloadImageJS for the two images next.gif and prev.gif including the relative path to these images.


The element contains the elements that compose the page. These are text, tables, lists, blocks, anchors, images, objects, and forms. Each of these elements will be described in this chapter.



Extensible Stylesheet Language (XSL) and FileMaker

XSL is XML

XSL documents are written with rules and recommendations that follow the structure of well-formed and valid XML documents. If you open an XSL document with a text editor, you will see the familiar tree-like structure of XML with start, end, and empty markup tags. The transformations that are performed by the Extensible Stylesheet Language can be used to create XML documents, as well as other document formats.

Namespaces in FileMaker Pro 6

The XML that results from a query to a web-published FileMaker Pro database with the -format parameter includes the namespace declaration. Each of the three schema types has a different namespace:

  • -format=-fmp_xml&-view

  • -format=-fmp_xml&-find

  • -format=-fmp_dso&-find



Stylesheet Instruction in XML Documents

The XML information for a database file in the Database Design Report contains the prolog . The second line in the report is a processing instruction specifying the stylesheet to be used with the document:

The processing instruction xml-stylesheet has six attributes. The type attribute is required. If the stylesheet is XSL, the type attribute will have the value "text/xsl". The Cascading Style Sheet has a type attribute value of "text/css". The href attribute is also required in the xml-stylesheet processing instruction. The href attribute may be a relative or absolute URI path to the stylesheet document. Just like the hyperlink reference in HTML, this URI is not a namespace declaration, but the real location to the document. The href can be a fragment path, thus allowing the stylesheet to be a part of the XML document:

The examples also use JavaScript and the Document Object Model (DOM) to present the XML published by Web Companion. Only the Windows version of Internet Explorer 5 or greater will work properly with the examples. The JavaScript calls ActiveX, which only works on the Windows operating system.

Repeating Elements

The template can be used to set a rule for every element in the source tree. Another element can be used within the template to repeat a rule. This element is and is never an empty element. The only attribute for this element is select, which has the value of an XPath expression. The element can be used with to sort the source elements before returning to the results tree. Literal text, other elements, and rules can be used between the start and end elements of this element just as for the templates. The following template will return all the field names from the FMPXMLRESULT grammar:


XSLT Examples for FileMaker Pro XML

Transform FMPDSORESULT into FMPXMLRESULT

We'll use the FMPDSORESULT export along with an XSL stylesheet to transform into an FMPXMLRESULT document. These examples will work in small steps so that you understand how to build an XSL stylesheet.

Create a new text document and name it Transform1.xsl.

Add the prolog and the root element for all XSL stylesheets:




The stylesheet element has several attributes that we need to include. The version and XSL namespace for XSL have required values.




We'll add two more attributes. The first one is the namespace for the XML source document elements. The second attribute tells the XSL processor to not include this namespace with any elements we create in the resulting XML.




Add the top-level XSL output element and its attributes. For this example we want to show the result as text, so the method attribute has a value of "text". Later we'll show the result as XML. The output element shows us the version and encoding for the resulting document. The indent attribute probably should be "no" for most result documents. Any indentation in the stylesheet is added for readability in the code listings and should not be included when you create your stylesheets.





Display Something for the Fields

Here, we'll take the above stylesheet, name it transform2.xsl, and add another XSL element, , to display text for the fields in the export. Just to make it easier to see what is happening, we'll use the name of the field elements as the text to display. Within the xsl:for-each loop for the rows/records, we'll add another xsl:for-each loop. The select attribute tells us to get any child element ("*") of the current path ("."). The XPath expression "name()" is a function that returns the name of each of these child elements.





TRANSFORM2.TXT
Find our rows and show the
fields.



We found a row!








XML to HTML

All HTML documents produced can be viewed in a browser. The pages don't need to be served by a web browser. You may find that some CSS and JavaScript may not render correctly, depending on the browser version. Test all the examples in this section to see your results.

FMPDSORESULT to HTML

the output will be to method=HTML. The export and transformation will produce an HTML document. The data will be placed into an HTML table similar to the simple_table example. Instead of ROWs, we'll use the HTML element ; and instead of COLs, we'll use the HTML element . This example will use the FMPDSORESULT instead of the FMPXMLRESULT export.

Example 1: Create a Simple HTML Table from FMPDSORESULT

  1. First make a copy of the transform3.xsl file and rename it dso2html1.xsl.

  2. Change the output method to "html" and indent to "yes".

  3. We need to make this an HTML document, so add these tags just after :


  4. The HTML document needs to be closed, so add these tags just before :


  5. Just before the first , add the HTML element

    . For convenience, we'll show the table borders. Just after the final end tag, , add the table close tag,
    .

  6. Change the ROW element into the tr element. Don't forget the end tag! We won't use the MODID and RECORDID attributes at this time, so delete them from the stylesheet.

  7. Change the elements into the element. Change into the element.

  8. Save the changes to the stylesheet

  1. Export some fields from any of your databases or use Export.fp5

  2. Choose File, Export Records and name your export dso2html1.htm.

  3. Select the FMPDSORESULT grammar.

  4. Check the Use XSL style sheet option and click the File button.

  5. Use the stylesheet dso2html1.xsl and click the Open button.

  6. Click the OK button and specify the fields to use in your new HTML table.






Fixed-width Text Export

You can find an example of a text export for column widths to be of the same width. You can change the variable to be any width, but you don't have a way to change each column independently. This stylesheet is called fixed_width.xsl and is found with the other example stylesheets in FileMaker Pro 6. Our example will use to pass values and to pass the width of each column.