diff options
Diffstat (limited to 'doc/xml.doc')
-rw-r--r-- | doc/xml.doc | 634 |
1 files changed, 634 insertions, 0 deletions
diff --git a/doc/xml.doc b/doc/xml.doc new file mode 100644 index 000000000..b5f63c5c2 --- /dev/null +++ b/doc/xml.doc @@ -0,0 +1,634 @@ +/**************************************************************************** +** +** Documentation on the xml module +** +** Copyright (C) 2005-2008 Trolltech ASA. All rights reserved. +** +** This file is part of the Qt GUI Toolkit. +** +** This file may be used under the terms of the GNU General +** Public License versions 2.0 or 3.0 as published by the Free +** Software Foundation and appearing in the files LICENSE.GPL2 +** and LICENSE.GPL3 included in the packaging of this file. +** Alternatively you may (at your option) use any later version +** of the GNU General Public License if such license has been +** publicly approved by Trolltech ASA (or its successors, if any) +** and the KDE Free Qt Foundation. +** +** Please review the following information to ensure GNU General +** Public Licensing retquirements will be met: +** http://trolltech.com/products/qt/licenses/licensing/opensource/. +** If you are unsure which license is appropriate for your use, please +** review the following information: +** http://trolltech.com/products/qt/licenses/licensing/licensingoverview +** or contact the sales department at sales@trolltech.com. +** +** This file may be used under the terms of the Q Public License as +** defined by Trolltech ASA and appearing in the file LICENSE.QPL +** included in the packaging of this file. Licensees holding valid Qt +** Commercial licenses may use this file in accordance with the Qt +** Commercial License Agreement provided with the Software. +** +** This file is provided "AS IS" with NO WARRANTY OF ANY KIND, +** INCLUDING THE WARRANTIES OF DESIGN, MERCHANTABILITY AND FITNESS FOR +** A PARTICULAR PURPOSE. Trolltech reserves all rights not granted +** herein. +** +**********************************************************************/ + +/*! \page xml.html + +\title XML Module + +\if defined(commercial) +This module is part of the \link commercialeditions.html Qt Enterprise Edition\endlink. +\endif + +\tableofcontents + +\target overview +\section1 Overview of the XML architecture in Qt + +The XML module provides a well-formed XML parser using the SAX2 +(Simple API for XML) interface plus an implementation of the DOM Level +2 (Document Object Model). + +SAX is an event-based standard interface for XML parsers. +The Qt interface follows the design of the SAX2 Java implementation. +Its naming scheme was adapted to fit the Qt naming conventions. +Details on SAX2 can be found at +\link http://www.saxproject.org http://www.saxproject.org \endlink. + +Support for SAX2 filters and the reader factory are under +development. The Qt implementation does not include the SAX1 +compatibility classes present in the Java interface. + +For an introduction to Qt's SAX2 classes see +"\link #sax2 The Qt SAX2 classes \endlink". + +DOM Level 2 is a W3C Recommendation for XML interfaces that maps the +constituents of an XML document to a tree structure. Details and the +specification of DOM Level 2 can be found at +\link http://www.w3.org/DOM/ http://www.w3.org/DOM/ \endlink. +More information about the DOM classes in Qt is provided in the +\link #dom Qt DOM classes \endlink. + + +Qt provides the following XML related classes: + +\table +\header \i Class \i Short description +\row \i \l QDomAttr + \i Represents one attribute of a QDomElement +\row \i \l QDomCDATASection + \i Represents an XML CDATA section +\row \i \l QDomCharacterData + \i Represents a generic string in the DOM +\row \i \l QDomComment + \i Represents an XML comment +\row \i \l QDomDocument + \i The representation of an XML document +\row \i \l QDomDocumentFragment + \i Tree of QDomNodes which is usually not a complete QDomDocument +\row \i \l QDomDocumentType + \i The representation of the DTD in the document tree +\row \i \l QDomElement + \i Represents one element in the DOM tree +\row \i \l QDomEntity + \i Represents an XML entity +\row \i \l QDomEntityReference + \i Represents an XML entity reference +\row \i \l QDomImplementation + \i Information about the features of the DOM implementation +\row \i \l QDomNamedNodeMap + \i Collection of nodes that can be accessed by name +\row \i \l QDomNode + \i The base class for all nodes of the DOM tree +\row \i \l QDomNodeList + \i List of QDomNode objects +\row \i \l QDomNotation + \i Represents an XML notation +\row \i \l QDomProcessingInstruction + \i Represents an XML processing instruction +\row \i \l QDomText + \i Represents textual data in the parsed XML document +\row \i \l QXmlAttributes + \i XML attributes +\row \i \l QXmlContentHandler + \i Interface to report logical content of XML data +\row \i \l QXmlDeclHandler + \i Interface to report declaration content of XML data +\row \i \l QXmlDefaultHandler + \i Default implementation of all XML handler classes +\row \i \l QXmlDTDHandler + \i Interface to report DTD content of XML data +\row \i \l QXmlEntityResolver + \i Interface to resolve extern entities contained in XML data +\row \i \l QXmlErrorHandler + \i Interface to report errors in XML data +\row \i \l QXmlInputSource + \i The input data for the QXmlReader subclasses +\row \i \l QXmlLexicalHandler + \i Interface to report lexical content of XML data +\row \i \l QXmlLocator + \i The XML handler classes with information about the actual parsing position +\row \i \l QXmlNamespaceSupport + \i Helper class for XML readers which want to include namespace support +\row \i \l QXmlParseException + \i Used to report errors with the QXmlErrorHandler interface +\row \i \l QXmlReader + \i Interface for XML readers (i.e. for SAX2 parsers) +\row \i \l QXmlSimpleReader + \i Implementation of a simple XML reader (a SAX2 parser) +\endtable + +\target sax2 +\section1 The Qt SAX2 classes + +\target sax2Intro +\section2 Introduction to SAX2 + +The SAX2 interface is an event-driven mechanism to provide the user with +document information. An "event" in this context means something +reported by the parser, for example, it has encountered a start tag, +or an end tag, etc. + +To make it less abstract consider the following example: +\code +<quote>A quotation.</quote> +\endcode + +Whilst reading (a SAX2 parser is usually referred to as "reader") +the above document three events would be triggered: +\list 1 +\i A start tag occurs (\c{<quote>}). +\i Character data (i.e. text) is found, "A quotation.". +\i An end tag is parsed (\c{</quote>}). +\endlist + +Each time such an event occurs the parser reports it; you can set up +event handlers to respond to these events. + +Whilst this is a fast and simple approach to read XML documents, +manipulation is difficult because data is not stored, simply handled +and discarded serially. The \link #dom DOM interface +\endlink reads in and stores the whole document in a tree structure; +this takes more memory, but makes it easier to manipulate the +document's structure.. + +The Qt XML module provides an abstract class, \l QXmlReader, that +defines the interface for potential SAX2 readers. Qt includes a reader +implementation, \l QXmlSimpleReader, that is easy to adapt through +subclassing. + +The reader reports parsing events through special handler classes: +\table +\header \i Handler class \i Description +\row \i \l QXmlContentHandler + \i Reports events related to the content of a document (e.g. the start tag + or characters). +\row \i \l QXmlDTDHandler + \i Reports events related to the DTD (e.g. notation declarations). +\row \i \l QXmlErrorHandler + \i Reports errors or warnings that occurred during parsing. +\row \i \l QXmlEntityResolver + \i Reports external entities during parsing and allows users to resolve + external entities themselves instead of leaving it to the reader. +\row \i \l QXmlDeclHandler + \i Reports further DTD related events (e.g. attribute declarations). +\row \i \l QXmlLexicalHandler + \i Reports events related to the lexical structure of the + document (the beginning of the DTD, comments etc.). +\endtable + +These classes are abstract classes describing the interface. The \l +QXmlDefaultHandler class provides a "do nothing" default +implementation for all of them. Therefore users only need to overload +the QXmlDefaultHandler functions they are interested in. + +To read input XML data a special class \l QXmlInputSource is used. + +Apart from those already mentioned, the following SAX2 support classes +provide additional useful functionality: +\table +\header \i Class \i Description +\row \i \l QXmlAttributes + \i Used to pass attributes in a start element event. +\row \i \l QXmlLocator + \i Used to obtain the actual parsing position of an event. +\row \i \l QXmlNamespaceSupport + \i Used to implement \link xml.html#namespaces namespace\endlink + support for a reader. Note that namespaces do not change the + parsing behavior. They are only reported through the handler. +\endtable + + +\target sax2Features +\section2 Features + +The behaviour of an XML reader depends on its support for certain +optional features. For example, a reader may have the feature "report +attributes used for \link xml.html#namespaces namespace\endlink +declarations and prefixes along with the local name of a tag". Like +every other feature this has a unique name represented by a URI: it is +called \e http://xml.org/sax/features/namespace-prefixes. + +The Qt SAX2 implementation can report whether the reader has +particular functionality using the \l QXmlReader::hasFeature() +function. Available features can be tested with QXmlReader::feature(), +and switched on or off using \l QXmlReader::setFeature(). + +Consider the example +\code +<document xmlns:book = 'http://trolltech.com/fnord/book/' + xmlns = 'http://trolltech.com/fnord/' > +\endcode +A reader that does not support the \e +http://xml.org/sax/features/namespace-prefixes feature would report +the element name \e document but not its attributes \e xmlns:book and +\e xmlns with their values. A reader with the feature \e +http://xml.org/sax/features/namespace-prefixes reports the namespace +attributes if the \link QXmlReader::feature() feature\endlink is +switched on. + +Other features include \e http://xml.org/sax/features/namespace +(namespace processing, implies \e +http://xml.org/sax/features/namespace-prefixes) and \e +http://xml.org/sax/features/validation (the ability to report +validation errors). + +Whilst SAX2 leaves it to the user to define and implement whatever +features are retquired, support for \e +http://xml.org/sax/features/namespace (and thus \e +http://xml.org/sax/features/namespace-prefixes) is mandantory. +The \l QXmlSimpleReader implementation of \l QXmlReader, +supports them, and can do namespace processing. + +\l QXmlSimpleReader is not validating, so it +does not support \e http://xml.org/sax/features/validation. + + +\target sax2Namespaces +\section2 Namespace support via features + +As we have seen in the \link #sax2Features previous section\endlink +we can configure the behavior of the reader when it comes to namespace +processing. This is done by setting and unsetting the +\e http://xml.org/sax/features/namespaces and +\e http://xml.org/sax/features/namespace-prefixes features. + +They influence the reporting behavior in the following way: +\list 1 +\i Namespace prefixes and local parts of elements and attributes can +be reported. +\i The qualified names of elements and attributes are reported. +\i \l QXmlContentHandler::startPrefixMapping() and \l +QXmlContentHandler::endPrefixMapping() are called by the reader. +\i Attributes that declare namespaces (i.e. the attribute \e xmlns and +attributes starting with \e{xmlns:}) are reported. +\endlist + +Consider the following element: +\code +<author xmlns:fnord = 'http://trolltech.com/fnord/' + title="Ms" + fnord:title="Goddess" + name="Eris Kallisti"/> +\endcode +With \e http://xml.org/sax/features/namespace-prefixes set to TRUE +the reader will report four attributes; but with the \e +namespace-prefixes feature set to FALSE only three, with the \e +xmlns:fnord attribute defining a namespace being "invisible" to the +reader. + +The \e http://xml.org/sax/features/namespaces feature is responsible +for reporting local names, namespace prefixes and URIs. With \e +http://xml.org/sax/features/namespaces set to TRUE the parser will +report \e title as the local name of the \e fnord:title attribute, \e +fnord being the namespace prefix and \e http://trolltech.com/fnord/ as +the namespace URI. When \e http://xml.org/sax/features/namespaces is +FALSE none of them are reported. + +In the current implementation the Qt XML classes follow the definition +that the prefix \e xmlns itself isn't associated with any namespace at all +(see \link http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using +http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using \endlink). +Therefore even with \e http://xml.org/sax/features/namespaces and +\e http://xml.org/sax/features/namespace-prefixes both set to TRUE +the reader won't return either a local name, a namespace prefix or +a namespace URI for \e xmlns:fnord. + +This might be changed in the future following the W3C suggestion +\link http://www.w3.org/2000/xmlns/ http://www.w3.org/2000/xmlns/ \endlink +to associate \e xmlns with the namespace \e http://www.w3.org/2000/xmlns. + + +As the SAX2 standard suggests, \l QXmlSimpleReader defaults to having +\e http://xml.org/sax/features/namespaces set to TRUE and +\e http://xml.org/sax/features/namespace-prefixes set to FALSE. +When changing this behavior using \l QXmlSimpleReader::setFeature() +note that the combination of both features set to +FALSE is illegal. + +For a practical demonstration of how the two features affect the +output of the reader run the \link tagreader-with-features-example.html +tagreader with features example. \endlink + +\target sax2NamespacesSummary +\section3 Summary + +\l QXmlSimpleReader implements the following behavior: + +\table +\header \i (namespaces, namespace-prefixes) + \i Namespace prefix and local part + \i Qualified names + \i Prefix mapping + \i xmlns attributes +\row \i (TRUE, FALSE) \i Yes \i Yes* \i Yes \i No +\row \i (TRUE, TRUE) \i Yes \i Yes \i Yes \i Yes +\row \i (FALSE, TRUE) \i No* \i Yes \i No* \i Yes +\row \i (FALSE, FALSE) \i41 Illegal +\endtable + +<sup>*</sup> The behavior of these entries is not specified by SAX. + +\target sax2Properties +\section2 Properties + +Properties are a more general concept. They have a unique name, +represented as an URI, but their value is \c void*. Thus nearly +anything can be used as a property value. This concept involves some +danger, though: there is no means of ensuring type-safety; the user +must take care that they pass the right type. Properties are +useful if a reader supports special handler classes. + +The URIs used for features and properties often look like URLs, e.g. +\c http://xml.org/sax/features/namespace. This does not mean that the +data retquired is at this address. It is simply a way of defining +unique names. + +Anyone can define and use new SAX2 properties for their readers. +Property support is not mandatory. + +To set or query properties the following functions are provided: \l +QXmlReader::setProperty(), \l QXmlReader::property() and \l +QXmlReader::hasProperty(). + + +\target sax2Reading +\section2 Further reading + +More information about XML (e.g. \link xml.html#namespaces namespaces \endlink) +can be found in the \link xml.html introduction to the Qt XML module. \endlink + + + +\target dom +\section1 The Qt DOM classes + +\target domIntro +\section2 Introduction to DOM + +DOM provides an interface to access and change the content and +structure of an XML file. It makes a hierarchical view of the document +(a tree view). Thus -- in contrast to the SAX2 interface -- an object +model of the document is resident in memory after parsing which makes +manipulation easy. + +All DOM nodes in the document tree are subclasses of \l QDomNode. The +document itself is represented as a \l QDomDocument object. + +Here are the available node classes and their potential child classes: + +\list +\i \l QDomDocument: Possible children are + \list + \i \l QDomElement (at most one) + \i \l QDomProcessingInstruction + \i \l QDomComment + \i \l QDomDocumentType + \endlist +\i \l QDomDocumentFragment: Possible children are + \list + \i \l QDomElement + \i \l QDomProcessingInstruction + \i \l QDomComment + \i \l QDomText + \i \l QDomCDATASection + \i \l QDomEntityReference + \endlist +\i \l QDomDocumentType: No children +\i \l QDomEntityReference: Possible children are + \list + \i \l QDomElement + \i \l QDomProcessingInstruction + \i \l QDomComment + \i \l QDomText + \i \l QDomCDATASection + \i \l QDomEntityReference + \endlist +\i \l QDomElement: Possible children are + \list + \i \l QDomElement + \i \l QDomText + \i \l QDomComment + \i \l QDomProcessingInstruction + \i \l QDomCDATASection + \i \l QDomEntityReference + \endlist +\i \l QDomAttr: Possible children are + \list + \i \l QDomText + \i \l QDomEntityReference + \endlist +\i \l QDomProcessingInstruction: No children +\i \l QDomComment: No children +\i \l QDomText: No children +\i \l QDomCDATASection: No children +\i \l QDomEntity: Possible children are + \list + \i \l QDomElement + \i \l QDomProcessingInstruction + \i \l QDomComment + \i \l QDomText + \i \l QDomCDATASection + \i \l QDomEntityReference + \endlist +\i \l QDomNotation: No children +\endlist + +With \l QDomNodeList and \l QDomNamedNodeMap two collection classes +are provided: \l QDomNodeList is a list of nodes, +and \l QDomNamedNodeMap is used to handle unordered sets of nodes +(often used for attributes). + +The \l QDomImplementation class allows the user to query features of the +DOM implementation. + +\section2 Further reading + +To get started please refer to the \l QDomDocument documentation. + +\target namespaces +\section1 An introduction to namespaces + +Parts of the Qt XML module documentation assume that you are familiar +with XML namespaces. Here we present a brief introduction; skip to +\link #namespacesConventions Qt XML documentation conventions \endlink +if you already know this material. + +Namespaces are a concept introduced into XML to allow a more modular +design. With their help data processing software can easily resolve +naming conflicts in XML documents. + +Consider the following example: + +\code +<document> +<book> + <title>Practical XML</title> + <author title="Ms" name="Eris Kallisti"/> + <chapter> + <title>A Namespace Called fnord</title> + </chapter> +</book> +</document> +\endcode + +Here we find three different uses of the name \e title. If you wish to +process this document you will encounter problems because each of the +\e titles should be displayed in a different manner -- even though +they have the same name. + +The solution would be to have some means of identifying the first +occurrence of \e title as the title of a book, i.e. to use the \e +title element of a book namespace to distinguish it from, for example, +the chapter title, e.g.: +\code +<book:title>Practical XML</book:title> +\endcode + +\e book in this case is a \e prefix denoting the namespace. + +Before we can apply a namespace to element or attribute names we must +declare it. + +Namespaces are URIs like \e http://trolltech.com/fnord/book/. This +does not mean that data must be available at this address; the URI is +simply used to provide a unique name. + +We declare namespaces in the same way as attributes; strictly speaking +they \e are attributes. To make for example \e +http://trolltech.com/fnord/ the document's default XML namespace \e +xmlns we write + +\code +xmlns="http://trolltech.com/fnord/" +\endcode + +To distinguish the \e http://trolltech.com/fnord/book/ namespace from +the default, we must supply it with a prefix: + +\code +xmlns:book="http://trolltech.com/fnord/book/" +\endcode + +A namespace that is declared like this can be applied to element and +attribute names by prepending the appropriate prefix and a ":" +delimiter. We have already seen this with the \e book:title element. + +Element names without a prefix belong to the default namespace. This +rule does not apply to attributes: an attribute without a prefix does +not belong to any of the declared XML namespaces at all. Attributes +always belong to the "traditional" namespace of the element in which +they appear. A "traditional" namespace is not an XML namespace, it +simply means that all attribute names belonging to one element must be +different. Later we will see how to assign an XML namespace to an +attribute. + +Due to the fact that attributes without prefixes are not in any XML +namespace there is no collision between the attribute \e title (that +belongs to the \e author element) and for example the \e title element +within a \e chapter. + +Let's clarify this with an example: +\code +<document xmlns:book = 'http://trolltech.com/fnord/book/' + xmlns = 'http://trolltech.com/fnord/' > +<book> + <book:title>Practical XML</book:title> + <book:author xmlns:fnord = 'http://trolltech.com/fnord/' + title="Ms" + fnord:title="Goddess" + name="Eris Kallisti"/> + <chapter> + <title>A Namespace Called fnord</title> + </chapter> +</book> +</document> +\endcode + +Within the \e document element we have two namespaces declared. The +default namespace \e http://trolltech.com/fnord/ applies to the \e +book element, the \e chapter element, the appropriate \e title element +and of course to \e document itself. + +The \e book:author and \e book:title elements belong to the namespace +with the URI \e http://trolltech.com/fnord/book/. + +The two \e book:author attributes \e title and \e name have no XML +namespace assigned. They are only members of the "traditional" +namespace of the element \e book:author, meaning that for example two +\e title attributes in \e book:author are forbidden. + +In the above example we circumvent the last rule by adding a \e title +attribute from the \e http://trolltech.com/fnord/ namespace to \e +book:author: the \e fnord:title comes from the namespace with the +prefix \e fnord that is declared in the \e book:author element. + +Clearly the \e fnord namespace has the same namespace URI as the +default namespace. So why didn't we simply use the default namespace +we'd already declared? The answer is tquite complex: +\list +\i attributes without a prefix don't belong to any XML namespace at +all, not even to the default namespace; +\i additionally omitting the prefix would lead to a \e title-title clash; +\i writing it as \e xmlns:title would declare a new namespace with the +prefix \e title instead of applying the default \e xmlns namespace. +\endlist + +With the Qt XML classes elements and attributes can be accessed in two +ways: either by refering to their qualified names consisting of the +namespace prefix and the "real" name (or \e local name) or by the +combination of local name and namespace URI. + +More information on XML namespaces can be found at +\l http://www.w3.org/TR/REC-xml-names/. + + +\target namespacesConventions +\section2 Conventions used in Qt XML documentation + +The following terms are used to distinguish the parts of names within +the context of namespaces: +\list +\i The \e {qualified name} + is the name as it appears in the document. (In the above example \e + book:title is a qualified name.) +\i A \e {namespace prefix} in a qualified name + is the part to the left of the ":". (\e book is the namespace prefix in + \e book:title.) +\i The \e {local part} of a name (also refered to as the \e {local + name}) appears to the right of the ":". (Thus \e title is the + local part of \e book:title.) +\i The \e {namespace URI} ("Uniform Resource Identifier") is a unique + identifier for a namespace. It looks like a URL + (e.g. \e http://trolltech.com/fnord/ ) but does not retquire + data to be accessible by the given protocol at the named address. +\endlist + +Elements without a ":" (like \e chapter in the example) do not have a +namespace prefix. In this case the local part and the qualified name +are identical (i.e. \e chapter). +*/ |