NAME

tdom::schema -
Create a schema validation command

SYNOPSIS

package require tdom

tdom::schema ?create? cmdName
    

DESCRIPTION

This command creates validation commands with a simple API. The validation commands have methods to define a schema and are able to validate XML or DOM trees (and to some degree other kind of hierarchical data) against this schema.

Additionally, a validation command may be used as argument to the -validateCmd option of the dom parse and the expat commands to enable validation additional to what they otherwise do.

The valid methods of the created commands are:

prefixns ?prefixUriList?
This method gives control to a prefix (or abbreviation) to namespace URI mapping. Everywhere a namespace argument is expected in the schema command methods you may use the "prefix" pointing to the namespace URI in the current prefixUriList, set by this method. If the list map the same prefix to different namespace URIs the frist one win. If there isn't such a prefix the namespace argument is used literally as namespace URI. If the method is called without argument it returns the current prefixUriList. If the method is called with the empty string any namespace URI arguments are used literally. This is the default.
defelement name ?namespace? <definition script>
This method defines the element name (optional in the namespace namespace) in the schema. The definition script is evaluated and defines the content model of the element. If the namespace argument is given, any element or ref references in the definition script not wrapped inside a namespace command are resolved in that namespace. If there is already a element definition for the name/namespace combination the command raises error.
defpattern name ?namespace? <definition script>
This method defines a (maybe complex) content particle with the name (optional in the namespace namespace) in the schema, to be referenced in other definition scripts with the definition command ref. The definition script is evaluated and defines the content model of the content particle. If the namespace argument is given, any element or ref references in the definition script not wrapped inside a namespace command are resolved in that namespace. If there is already a pattern definition for the name/namespace combination the command raises error.
define <definition script>
This method allows to define several elements or pattern or a whole schema with one call.
start documentElement ?namespace?
This method defines the name and namespace of the root element of a tree to validate. If this method is used then the root element must match for validity. If start isn't used, any with defelement defined element may be the root of a valid document. The start method may be used serveral times with varying arguments during the lifetime of a validation command. If the command is called with just the empty string (and no namespace argument), the validation constrain for the root element is removed and any defined element will be valid as root of a tree to validate.
event (start|end|text) ?event specific data?
This method allows to validate hierarchical data against the content constrains of the validation command.
start name ?attributes? ?namespace?
Checks if the current validation state allows the element name in the namespace is allowed to start here. It raises error, if not.
end
Checks if the current innermost open element may end here in the current state without violate validation constrains. It raises error, if not.
text text
Checks if the current validation state allows the given text content. It raises error, if not.
validate <XML string> ?objVar?
Returns true if the <XML string> is valid or false otherwise. If validation failed and the optional objVar argument is given, then the variable with that name is set to a validation error message. If the XML string is valid and the optional objVar argument is given, then the variable with that name is set to the empty string.
domvalidate domNode ?objVar?
Returns true if the first argument is a valid tree or false otherwise. If validation failed and the optional objVar argument is given, then the variable with that name is set to a validation error message. If the dom tree is valid and the optional objVar argument is given, then the variable with that name is set to the empty string.
delete
This method deletes the validation command.
state
This method returns the state of the validation command with respect to validation state. The possible return values and their meanings are:
READY
The validation command is ready to start validation
VALIDATING
The validation command is in the process of validating input.
FINISHED
The validation has finished, no futher events are expected.
reset
This method resets the validation command into state READY (while preserving the defined grammer).

Schema definition scripts

Schema definition scripts are ordinary Tcl scripts that are evaluatend in the namespace tdom::schema. The below listed schema definition commands in this tcl namespace allow to define a wide variety of document structures. Every schema definition command establish a validation constraint on the content which has to match or must be optional to render the content as valid. It is a validation error if the element in the XML source has additional (not matched) content.

The schema definition commands are:

element name ?quant? ?<definition script>?
If the optional argument definition script isn't given this command refers to the element defined with defelement with the name name in the current context namespace. If the defelement script argument is given, then the validation constraint expects an element with the name name in the current namespace with content "locally" defined by the definition script. Forward references to so far not defined elements or pattern or other local definitions of the same name inside the definition script are allowed.
ref name ?quant?
This command refers to the content particle defined with defpattern with the name name in the current context namespace. Forward references to a so far not defined pattern or recursive references are allowed.
group ?quant? <definition script>
choice ?quant? <definition script>
interleave ?quant? <definition script>
mixed ?quant? <definition script>
text ?<constraint script>|"type" typename?
Without the optional constraint script this validation constraint matches every string (including the empty one). With constraint script or with a given text type argument a text matching this script or the text type is expected.
any ?quant?
The any command matches every element (with whatever attributes) or subtree, no matter if known within the schema or not. Please notice, that this mean the quantifier * and + will eat up any elements until the enclosing element ends.
attribute name ?quant? (?<constraint script>|"type" typename?)
The attribute command defines a attribute (in no namespace) to the enclosing element. The first definition of name inside an element definition wins; later definitions of the same name are silently ignored. After the name argument there may be one of the qunatifieres ? or !. If there is, this will be used. Otherwise the attribute will be required (must be present in the xml source). If there is one argument more this argument is evaluated as constraint script, defining the value constrains of the attribute. Otherwise, if there are two more arguments and the first of them is the bareword "type" then the following is used as a text type name.
nsattribute name namespace ?quant? (?<constraint script>|"type" typename?)
This command does the same as the command attribute, just for the attribute name in the namespace namespace.
namespace uri <definition script>
Evaluates the definition script with context namespace uri. Every element or ref command name will be looked up in the namespace uri and local defined element will be in that namespace.
prefixns ?prefixUriList?
This defines a prefix to namespace URI mapping exactly as a schemacmd prefixns call. This is meant as toplevel command of a schemacmd define script. This command is not allowed nested in an other definition script command and will raise error, if you call it there.
defelement name ?namespace? <definition script>
This defines an element type exactly as a schemacmd defelement call. This is meant as toplevel command of a schemacmd define script. This command is not allowed nested in an other definition script command and will raise error, if you call it there.
defpattern name ?namespace? <definition script>
This defines a named pattern exactly as a schemacmd defpattern call. This is meant as toplevel command of a schemacmd define script. This command is not allowed nested in an other definition script command and will raise error, if you call it there.
start name ?namespace?
This command works exactly as a schemacmd start call. This is meant as toplevel command of a schemacmd define script. This command is not allowed nested in an other definition script command and will raise error, if you call it there.

Quantity specifier

Serveral schema definition commands expects a quantifier as one of their arguments, which specifies how often the content particle specified by the command is expected. The valid values for a quant argument are:

!
The content particle must occur exactly once in valid documents. This is the default, if a quantifier is omitted.
?
The content particle must occur at most once in valid documents.
*
The content particle may occur zero or more times in a row in valid documents.
+
The content particle may occur one or more times in a row in valid documents.
n
The content particle must occur n times in a row in valid documents. The quantifier must be an integer greater zero.
{n m}
The content particle must occur n to m times (both inclusive) in a row in valid documents. The quantifier must be a tcl list with two elements. Both elements must be integers, with n >= 0 and n < m.

If an optional quantifier is not given then it defaults to * in case of the mixed command and to ! for all other commands.

Text constraint scripts

Text - parsed character data, as XML calles it - must sometimes have to be of a certain kind, must comply to some rules etc to be valid.

The text constraint commands are:

isint
fixed value
The text constraint only matches if the text value is string equal to the given value.
tcl tclcmd ?arg arg ...?
Evaluates the tcl script tclcmd arg arg ... and the text to validate appended to the argument list. The return value of the tcl command is interpreted as a boolean.
enumeration list
This text constraint match if the text value is equal to one element (respecting case and any whitespace) of the argument list, which has to be a valid Tcl list.
match ?-nocase? glob style match pattern
This text constraint match if the text value match the glob style pattern given as argument. It follows the rules of the Tcl [string match] command, see https://www.tcl.tk/man/tcl8.6/TclCmd/string.htm#M35.
regexp expression
This text constraint match if the text value match the regular expression given as argument. https://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm describes the regular expression syntax
nmtoken
This text constraint match if the text value match the XML nmtoken production https://www.w3.org/TR/xml/#NT-Nmtoken
nmtokens
This text constraint match if the text value match the XML nmtokens production https://www.w3.org/TR/xml/#NT-Nmtokens
number
boolean
isodate
maxLength length
This text constraint match if the length of the text value (in characters, not bytes) is at most length. The length argument must be an integer greater zero.
minLength length
This text constraint match if the length of the text value (in characters, not bytes) is at least length. The length argument must be an integer greater zero.
oneOf <constraint script>
This text constraint match if one of the text constraints defined in the argument constraint script match the text. It stops after the first match and probes the text constraints in the order of definition.
allOf <constraint script>
This text constraint match if all of the text constraints defined in the argument constraint script match the text. It stops after the first match failues and probes the text constraints in the order of definition. Since the schema definition commmand text also expects all text constraints to match the text constraint allOf is useful mostly together with the oneOf text constraint commmand.
strip <constraint script>
This text constraint command tests all text constraints in the evaluated constraint script> with the text to test striped of all white space at start and end.
split ?type ?args??<constraint script>

This text constraint command splits the text to test into a list of values and tests all elements of that list for the text constraints in the evaluated constraint script>.

The available types are:

whitespace
The text to split is striped of all white space at start and end splited into a list at any successive white space.
tcl tclcmd ?arg ...?
The text to split is handed to the tclcmd, which is evaluated on global level, appended with every given arg and the text to split as last argument. This call must return a valid tcl list, which elements are tested..

The default in case no split type argument is given is whitespace.

id
This text constraint command marks the text as a document wide ID (to be referenced by an idref). Every ID value within a document must be unique. It isn't an error if the ID isn't actually referenced within the document.
idref
This text constraint command expects the text to be a reference to an ID within the document. The referenced ID may be later in the document, that the reference. Several references within the document to one ID are possible.
base64
This text constraint match if text is valid according to RFC 4648.

Local key constraints

Document wide uniqueness and foreign key constraints are available with the text constraint commands id and idref. Keyspaces allow for sub-tree local uniqueness and foreign key constraints.

keyspace names list> <constraint script>
Any number of keyspaces are possible. A keyspace is either active or not. An inside a constraint script> called keyspace with the same name does nothing.

This text constraint commands work with keyspaces:

key name>
If the keyspace with the name name> is not active always matches. If the keyspace is active then reports error if there is already a key with the value. Otherwise, stores the value as key in this keyspace and matches.
keyref name>
If the keyspace with the name name> is not active always matches. If the keyspace is active then reports error if there is still no key as the value at the end of the keyspace name>. Otherwise it matches.

Exampels

The XML Schema Part 0: Primer Second Edition (https://www.w3.org/TR/xmlschema-0/) starts with this example schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:annotation>
    <xsd:documentation xml:lang="en">
     Purchase order schema for Example.com.
     Copyright 2000 Example.com. All rights reserved.
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

  <xsd:element name="comment" type="xsd:string"/>

  <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
      <xsd:element name="shipTo" type="USAddress"/>
      <xsd:element name="billTo" type="USAddress"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
  </xsd:complexType>

  <xsd:complexType name="USAddress">
    <xsd:sequence>
      <xsd:element name="name"   type="xsd:string"/>
      <xsd:element name="street" type="xsd:string"/>
      <xsd:element name="city"   type="xsd:string"/>
      <xsd:element name="state"  type="xsd:string"/>
      <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN"
                   fixed="US"/>
  </xsd:complexType>

  <xsd:complexType name="Items">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="productName" type="xsd:string"/>
            <xsd:element name="quantity">
              <xsd:simpleType>
                <xsd:restriction base="xsd:positiveInteger">
                  <xsd:maxExclusive value="100"/>
                </xsd:restriction>
              </xsd:simpleType>
            </xsd:element>
            <xsd:element name="USPrice"  type="xsd:decimal"/>
            <xsd:element ref="comment"   minOccurs="0"/>
            <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
          </xsd:sequence>
          <xsd:attribute name="partNum" type="SKU" use="required"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Stock Keeping Unit, a code for identifying products -->
  <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>
    

A somewhat one-to-one translation of that into a tDOM schema defintion script would be:

tdom::schema schema      
schema define {

    # Purchase order schema for Example.com.
    # Copyright 2000 Example.com. All rights reserved.

    element purchaseOrder {ref PurchaseOrderType}

    element comment {text}

    defpattern PurchaseOrderType {
        element shipTo {ref USAddress}
        element billTo {ref USAddress}
        element comment ?
        element items
        attribute orderDate
    }

    defpattern USAddress {
        element name ! {text}
        element street ! {text}
        element city ! {text}
        element state ! {text}
        element zip ! {text isNumber}
        attribute country ! {text {fixed "US"}}
    }

    defelement items {
        element item * {
            element product ! {text}
            element quntity ! {text {maxExcluse 100}}
            element USPrice ! {text isNumber}
            element comment
            element shipDate ? {text isDate}
            attribute partNum ! {text {pattern "\d{3}-[A-Z]{2}"}}
        }
    }
}
      
    

The RELAX NG Tutorial (http://relaxng.org/tutorial-20011203.html) starts with described at this example:

Consider a simple XML representation of an email address book:

<addressBook>
  <card>
    <name>John Smith</name>
    <email>js@example.com</email>
  </card>
  <card>
    <name>Fred Bloggs</name>
    <email>fb@example.net</email>
  </card>
</addressBook>

The DTD would be as follows:

<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>

A RELAX NG pattern for this could be written as follows:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>
      
    

This schema definition script will do the same:

tdom::schema schema      
schema define {
    defelement addressBook {
        element card *
    }
    defelement card {
        element name
        element email
    }
    foreach e {name email} {
        defelement $e {text}
    }
}