Extensible Markup Language (XML) is based upon SGML (Standard Generalized Markup Language). The simplest explanation of SGML is that it is a method of writing documents with special formatting instructions, or markup, included. A publishing editor makes notations in the margin of a document to alert an author of changes needed to a document. The notations are markup of the document and, indeed, this is where the term "markup" originated. Markup allows the SGML or XML document to be distributed electronically while preserving the format or style of the text. An SGML document contains the content and the markup. The emphasis is placed on the formatting rather than the content, otherwise you would simply have an ordinary document.
SGML can be used to facilitate the publishing of documents as electronic or printed copy. Some programs that read the markup may also translate the styles, for example, to Braille readers and printers. The same document might be viewed on a smaller screen such as those on personal digital assistants (PDAs) or pagers and cellular telephones. The markup can mean something completely different based upon the final destination of the document and the translation to another format. Using stylesheets or transformation methods, a single document with content and markup can be changed upon output.
Markup Simplified
To help you understand markup, four examples are given in this section. They are based on the same results but have very different means of getting there. The first example illustrates that "there may be more than you see" on a monitor or printed page. The second example uses Rich Text Format (RTF) to show a way to embed formatting in a document for transportability. The third example shows the PostScript file (commands) to produce the desired results consistently on a laser printer. The fourth example uses the nested tag style found in SGML, HTML, and XML documents. You will begin to see how this final markup method can provide the formatting that you don't see, the transportability and the consistency of methods two and three, along with additional information about the document and document contents.
Example 1: Text Containing Bold Formatting
This has bold words in a sentence.
Using a word processor or electronic text editor, you may simply click on the word or phrase and apply the text style with special keystrokes (such as Control+B or Command+B) or choose Bold from a menu. On the word processor or computer screen, you can easily read the text, but you do not see the machine description, or code, describing how this text is to be displayed. You may not care how or why that happens, but the computer needs the instructions to comply with your wishes for a format change.
If you save the document and display or print it later, you want the computer to reproduce the document exactly as you designed it. Your computer knows what the stored code (or character markup) means for that text. A problem may arise if you place that code on another operating system or have a different word processor. There may be a different interpretation of the code that produces undesired results. This markup is consistent only if all other variables are equal. The next example uses a text encoding method to change the machine or application code into something more standard and portable.
Example 2: Revealing the Markup in Some Text Editors
{\rtf
{This has }{\b bold words}{ in a sentence.
\par }}
The above sentence shows Rich Text Format (RTF) markup interspersed and surrounding the words of a document. The characters "{", "}", and "\" all mean something in this document but have nothing to do with the content. Rich Text Format markup is used by many word processors to change the visual format of the displayed text. As each new style is encountered, the formatting changes without changing the content of the document. A document becomes easily transportable to other word processors by using Rich Text Format. Each application that knows how to interpret Rich Text Format can show the intent of the author. This book was composed on a word processor, saved as RTF, and electronically submitted to the publisher. Regardless of the application, electronic device, or operating system used to create the document, the styling is preserved.
Rich Text Format markup adds no other information about the text. We may not know who wrote the sentence or when it was written. This information can be included as part of the content of the document but may be difficult to extract easily. We may have no control over the formatting or be allowed to change it for use with other devices. Using a translation application, we can convert it to the next example, the commands our printer understands.
Example 3: PostScript Printer Commands for the Document
%!PS-Adobe-3.0
%%Title: ()
%%Creator: ()
%%CreationDate: (10:29 AM Saturday, May 26, 2001)
%%For: ()
%%Pages: 1
%%DocumentFonts: Times-Roman Times-Bold
%%DocumentData: Clean7Bit
%%PageOrder: Ascend
%%Orientation: Portrait
// more code here has been snipped for brevity //
%%EndPageSetup
gS 0 0 2300 3033 rC
250 216 :M
f57 sf
(This has )S
431 216 :M
f84 sf
.032 .003(bold words)J
669 216 :M
f57 sf
( in a sentence.)S
endp
showpage
%%PageTrailer
%%Trailer
end
%%EOF
The third example, above, is the same text used in the previous two examples and printed to a file as a PostScript document. It uses a different markup even though it is the same text and same document. PostScript is a language, developed by Adobe in 1985, that describes the document for printers, imagesetters, and screen displays. These files can also be converted to Adobe Portable Document Format (.pdf). The markup retains the document or image style so that it can be printed exactly the same way every time. It is a language that is specific to these PostScript devices. An application can translate this document to make it portable, too.
Example 4: Rules-based Nested Structure Used for Document Markup
This has bold words in a sentence.
The styling may be lost.
Unlike the Rich Text Format, nested markup may also contain a description of the text contents. The markup is often called a tag and may define various rules for the document. Sometimes the rules are internal such as "" and "" or external such as a stylesheet (set of rules) to apply to the whole document or portions of a document.
There can be rules for characters, words, sentences, paragraphs, and the entire document. Characters inherit the rules of the word they are in. Words inherit the rules of the sentence, and sentences inherit the rules of the paragraph. The rules may not be just the formatting or style of the text but may also allow for flexibility in display.
Some markup allows for a
XML Advantages
This section expands upon the goals for XML data exchange and how they can help you as a FileMaker Pro developer. The recommendations for the design of the Extensible Markup Language show some of the advantages this format offers. These XML design goals can be found in the document "Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation 6 October 2000",
-
XML shall be straightforwardly usable over the Internet.
-
XML shall support a variety of applications.
-
XML shall be compatible with SGML.
-
It shall be easy to write programs that process XML documents.
-
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
-
XML documents should be human-legible and reasonably clear.
-
The XML design should be prepared quickly.
-
The design of XML shall be formal and concise.
-
XML documents shall be easy to create.
-
Terseness in XML markup is of minimal importance.
change in the document.
Some formatting rules may also be different and change the inherited rules. All of the characters and words in the sentence above have a rule telling them to be blue. The text color can change to red without changing the sentence's blue color. In this nested markup, only the inner tags make the rule change.
Whether you use Rich Text Format or the nested structure found in SGML, HTML, and XML, changing the content of the words and phrases in the document does not change the style, the format, or the rules. Documents created with markup can be consistent. As the content changes, the style, formatting, and rules remain the same. The portability of documents containing markup to various applications and systems makes them very attractive. Standards have been recommended to ensure that every document that uses these standards will maintain portability.
No comments:
Post a Comment