tdom::schema -
Create a schema validation command
package require tdom
tdom::schema ?create? cmdName
This command creates validation commands with a simple API. The
validation commands have methods to define a schema and are able
to validate XML or DOM trees (and to some degree other kind of
hierarchical data) against this schema.
Additionally, a validation command may be used as argument to
the -validateCmd option of the dom parse and the
expat commands to enable validation additional to what they
otherwise do.
The valid methods of the created commands are:
-
prefixns ?prefixUriList?
- This method gives control to a prefix (or
abbreviation) to namespace URI mapping. Everywhere a
namespace argument is expected in the schema command methods
you may use the "prefix" pointing to the namespace
URI in the current prefixUriList, set by this method. If the
list map the same prefix to different namespace URIs the
frist one win. If there isn't such a prefix the namespace
argument is used literally as namespace URI. If the method
is called without argument it returns the current
prefixUriList. If the method is called with the empty string
any namespace URI arguments are used literally. This is the
default.
-
defelement name ?namespace? <definition script>
- This method defines the element name (optional in
the namespace namespace) in the schema. The
definition script is evaluated and defines the content
model of the element. If the namespace argument is
given, any element or ref references in the
definition script not wrapped inside a namespace
command are resolved in that namespace. If there is already a
element definition for the name/namespace combination the
command raises error.
-
defpattern name ?namespace? <definition script>
- This method defines a (maybe complex) content particle
with the name (optional in the namespace
namespace) in the schema, to be referenced in other
definition scripts with the definition command ref. The
definition script is evaluated and defines the content
model of the content particle. If the namespace
argument is given, any element or ref references
in the definition script not wrapped inside a namespace
command are resolved in that namespace. If there is already a
pattern definition for the name/namespace combination the
command raises error.
-
define <definition script>
- This method allows to define several elements or pattern
or a whole schema with one call.
-
start documentElement ?namespace?
- This method defines the name and namespace of the root
element of a tree to validate. If this method is used then the
root element must match for validity. If start isn't
used, any with defelement defined element may be the
root of a valid document. The start method may be used
serveral times with varying arguments during the lifetime of a
validation command. If the command is called with just the
empty string (and no namespace argument), the validation
constrain for the root element is removed and any defined
element will be valid as root of a tree to validate.
-
event (start|end|text) ?event specific data?
- This method allows to validate hierarchical data against
the content constrains of the validation command.
-
start name ?attributes? ?namespace?
- Checks if the current validation state allows the
element name in the namespace is allowed to
start here. It raises error, if not.
- end
- Checks if the current innermost open element may end
here in the current state without violate validation
constrains. It raises error, if not.
-
text text
- Checks if the current validation state allows the
given text content. It raises error, if not.
-
validate <XML string> ?objVar?
- Returns true if the <XML string> is valid or
false otherwise. If validation failed and the optional
objVar argument is given, then the variable with that
name is set to a validation error message. If the XML string
is valid and the optional objVar argument is given,
then the variable with that name is set to the empty string.
-
domvalidate domNode ?objVar?
- Returns true if the first argument is a valid tree or
false otherwise. If validation failed and the optional
objVar argument is given, then the variable with that
name is set to a validation error message. If the dom tree is
valid and the optional objVar argument is given, then
the variable with that name is set to the empty string.
- delete
- This method deletes the validation command.
- state
- This method returns the state of the validation command
with respect to validation state. The possible return values
and their meanings are:
- READY
- The validation command is ready to start
validation
- VALIDATING
- The validation command is in the
process of validating input.
- FINISHED
- The validation has finished, no futher
events are expected.
- reset
- This method resets the validation command into state
READY (while preserving the defined grammer).
Schema definition scripts are ordinary Tcl scripts that are
evaluatend in the namespace tdom::schema. The below listed schema
definition commands in this tcl namespace allow to define a wide
variety of document structures. Every schema definition command
establish a validation constraint on the content which has to
match or must be optional to render the content as valid. It is a
validation error if the element in the XML source has additional
(not matched) content.
The schema definition commands are:
-
element name ?quant? ?<definition script>?
- If the optional argument definition script isn't
given this command refers to the element defined with
defelement with the name name in the current
context namespace. If the defelement script argument is
given, then the validation constraint expects an element with
the name name in the current namespace with content
"locally" defined by the definition script. Forward
references to so far not defined elements or pattern or other
local definitions of the same name inside the definition
script are allowed.
-
ref name ?quant?
- This command refers to the content particle defined with
defpattern with the name name in the current context
namespace. Forward references to a so far not defined pattern
or recursive references are allowed.
-
group ?quant? <definition script>
-
choice ?quant? <definition script>
-
interleave ?quant? <definition script>
-
mixed ?quant? <definition script>
-
text ?<constraint script>|"type" typename?
- Without the optional constraint script this validation
constraint matches every string (including the empty one).
With constraint script or with a given text type
argument a text matching this script or the text type is
expected.
-
any ?quant?
- The any command matches every element (with whatever
attributes) or subtree, no matter if known within the schema
or not. Please notice, that this mean the quantifier * and +
will eat up any elements until the enclosing element
ends.
-
attribute name ?quant? (?<constraint script>|"type" typename?)
- The attribute command defines a attribute (in no
namespace) to the enclosing element. The first definition of
name inside an element definition wins; later
definitions of the same name are silently ignored. After the
name argument there may be one of the qunatifieres ? or
!. If there is, this will be used. Otherwise the attribute
will be required (must be present in the xml source). If there
is one argument more this argument is evaluated as constraint
script, defining the value constrains of the attribute.
Otherwise, if there are two more arguments and the first of
them is the bareword "type" then the following is used as a
text type name.
-
nsattribute name namespace ?quant? (?<constraint script>|"type" typename?)
- This command does the same as the command attribute, just for the attribute name in the namespace namespace.
-
namespace uri <definition script>
- Evaluates the definition script with context namespace uri. Every element or ref command name will be looked up in the namespace uri and local defined element will be in that namespace.
-
prefixns ?prefixUriList?
- This defines a prefix to namespace URI mapping exactly
as a schemacmd prefixns call. This is meant as toplevel
command of a schemacmd define script. This command is
not allowed nested in an other definition script command and
will raise error, if you call it there.
-
defelement name ?namespace? <definition script>
- This defines an element type exactly as a schemacmd
defelement call. This is meant as toplevel command of a
schemacmd define script. This command is not allowed
nested in an other definition script command and will raise
error, if you call it there.
-
defpattern name ?namespace? <definition script>
- This defines a named pattern exactly as a schemacmd
defpattern call. This is meant as toplevel command of a
schemacmd define script. This command is not allowed
nested in an other definition script command and will raise
error, if you call it there.
-
start name ?namespace?
- This command works exactly as a schemacmd start
call. This is meant as toplevel command of a schemacmd
define script. This command is not allowed nested in an
other definition script command and will raise error, if you
call it there.
Serveral schema definition commands expects a quantifier as
one of their arguments, which specifies how often the content
particle specified by the command is expected. The valid values
for a quant argument are:
- !
- The content particle must occur exactly once in valid
documents. This is the default, if a quantifier is
omitted.
- ?
- The content particle must occur at most once in valid
documents.
- *
- The content particle may occur zero or more times in a
row in valid documents.
- +
- The content particle may occur one or more times in a
row in valid documents.
- n
- The content particle must occur n times in a row in
valid documents. The quantifier must be an integer greater
zero.
- {n m}
- The content particle must occur
n to m times (both inclusive) in a row in valid documents. The
quantifier must be a tcl list with two elements. Both elements
must be integers, with n >= 0 and n < m.
If an optional quantifier is not given then it defaults to * in
case of the mixed command and to ! for all other commands.
Text - parsed character data, as XML calles it - must sometimes
have to be of a certain kind, must comply to some rules etc to be
valid.
The text constraint commands are:
- isint
-
fixed value
- The text constraint only matches if the text value is
string equal to the given value.
-
tcl tclcmd ?arg arg ...?
- Evaluates the tcl script tclcmd arg arg ... and
the text to validate appended to the argument list. The return
value of the tcl command is interpreted as a boolean.
-
enumeration list
- This text constraint match if the text value is equal to
one element (respecting case and any whitespace) of the
argument list, which has to be a valid Tcl list.
-
match ?-nocase? glob style match pattern
- This text constraint match if the text value match the
glob style pattern given as argument. It follows the rules of
the Tcl [string match] command, see
https://www.tcl.tk/man/tcl8.6/TclCmd/string.htm#M35.
-
regexp expression
- This text constraint match if the text value match the
regular expression given as argument. https://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm describes the regular expression syntax
- nmtoken
- This text constraint match if the text value match the
XML nmtoken production
https://www.w3.org/TR/xml/#NT-Nmtoken
- nmtokens
- This text constraint match if the text value match the
XML nmtokens production
https://www.w3.org/TR/xml/#NT-Nmtokens
- number
- boolean
- isodate
-
maxLength length
- This text constraint match if the length of the text
value (in characters, not bytes) is at most length. The
length argument must be an integer greater zero.
-
minLength length
- This text constraint match if the length of the text
value (in characters, not bytes) is at least length.
The length argument must be an integer greater zero.
-
oneOf <constraint script>
- This text constraint match if one of the text
constraints defined in the argument constraint script
match the text. It stops after the first match and probes the
text constraints in the order of definition.
-
allOf <constraint script>
- This text constraint match if all of the text
constraints defined in the argument constraint script
match the text. It stops after the first match failues and
probes the text constraints in the order of definition. Since
the schema definition commmand text also expects all
text constraints to match the text constraint allOf is
useful mostly together with the oneOf text constraint
commmand.
-
strip <constraint script>
- This text constraint command tests all text constraints
in the evaluated constraint script> with the text to
test striped of all white space at start and end.
-
split ?type ?args??<constraint script>
-
This text constraint command splits the text to test
into a list of values and tests all elements of that list for
the text constraints in the evaluated constraint
script>.
The available types are:
- whitespace
- The text to split is striped of all
white space at start and end splited into a list at any
successive white space.
- tcl tclcmd ?arg ...?
- The text to split is
handed to the tclcmd, which is evaluated on global
level, appended with every given arg and the text to split
as last argument. This call must return a valid tcl list,
which elements are tested..
The default in case no split type argument is given is
whitespace.
- id
- This text constraint command marks the text as a
document wide ID (to be referenced by an idref). Every ID
value within a document must be unique. It isn't an error if
the ID isn't actually referenced within the document.
- idref
- This text constraint command expects the text to be a
reference to an ID within the document. The referenced ID may
be later in the document, that the reference. Several
references within the document to one ID are possible.
- base64
- This text constraint match if text is valid according to
RFC 4648.
Document wide uniqueness and foreign key constraints are
available with the text constraint commands id and idref.
Keyspaces allow for sub-tree local uniqueness and foreign key
constraints.
-
keyspace names list> <constraint script>
- Any number of keyspaces are possible. A keyspace is
either active or not. An inside a constraint
script> called keyspace with the same name does
nothing.
This text constraint commands work with keyspaces:
-
key name>
- If the keyspace with the name name> is not
active always matches. If the keyspace is active then
reports error if there is already a key with the value.
Otherwise, stores the value as key in this keyspace and
matches.
-
keyref name>
- If the keyspace with the name name> is not
active always matches. If the keyspace is active then
reports error if there is still no key as the value at the
end of the keyspace name>. Otherwise it
matches.
The XML Schema Part 0: Primer Second Edition
(https://www.w3.org/TR/xmlschema-0/) starts with this
example schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
A somewhat one-to-one translation of that into a tDOM schema
defintion script would be:
tdom::schema schema
schema define {
# Purchase order schema for Example.com.
# Copyright 2000 Example.com. All rights reserved.
element purchaseOrder {ref PurchaseOrderType}
element comment {text}
defpattern PurchaseOrderType {
element shipTo {ref USAddress}
element billTo {ref USAddress}
element comment ?
element items
attribute orderDate
}
defpattern USAddress {
element name ! {text}
element street ! {text}
element city ! {text}
element state ! {text}
element zip ! {text isNumber}
attribute country ! {text {fixed "US"}}
}
defelement items {
element item * {
element product ! {text}
element quntity ! {text {maxExcluse 100}}
element USPrice ! {text isNumber}
element comment
element shipDate ? {text isDate}
attribute partNum ! {text {pattern "\d{3}-[A-Z]{2}"}}
}
}
}
The RELAX NG Tutorial
(http://relaxng.org/tutorial-20011203.html) starts with
described at this example:
Consider a simple XML representation of an email address book:
<addressBook>
<card>
<name>John Smith</name>
<email>js@example.com</email>
</card>
<card>
<name>Fred Bloggs</name>
<email>fb@example.net</email>
</card>
</addressBook>
The DTD would be as follows:
<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
A RELAX NG pattern for this could be written as follows:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
<zeroOrMore>
<element name="card">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
</element>
</element>
</zeroOrMore>
</element>
This schema definition script will do the same:
tdom::schema schema
schema define {
defelement addressBook {
element card *
}
defelement card {
element name
element email
}
foreach e {name email} {
defelement $e {text}
}
}