MSc-IT Study Material
June 2010 Edition

Computer Science Department, University of Cape Town

XML Schema

A XML Schema is an alternative to the DTD for specifying an XML document's structure and data types. It is capable of expressing everything a DTD can, and more. Similar, alternative languages exist, such as RELAX and Schematron, but XML Schemas are a W3C standard.

Schema Structure

Elements are defined using <element name="..." type="..." minOccurs="..." maxOccurs="...">, where:

  • name refers to the tag.

  • type can be custom-defined or one of the standard types. Common predefined types include string, integer and anyURI.

  • minOccurs and maxOccurs specify how many occurrences of the element may appear in an XML document. unbounded is used to specify no upper limits.

Example: <element name="title" type="string" minOccurs="1" maxOccurs="1"/>

Sequences

Sequences of elements are defined using a complexType container:

<complexType>
   <sequence>
      <element name="title" type="string"/>
      <element name="author" type="string" maxOccurs="unbounded"/>
   </sequence>
</complexType>
      

Note: Defaults for both minOccurs and maxOccurs are 1.

Nested Elements

Instead of specifying an atomic type for an element, its type can be elaborated as a structure. This corresponds to nested XML elements.

<element name="uct">
   <complexType>
      <sequence>
         <element name="title" type="string"/>
         <element name="author" type="string"
                  maxOccurs="unbounded"/>
      </sequence>
   </complexType>
</element>
      

Extensions

Extensions are used to place additional restrictions on an element's content.

For instance, the content can be restricted to be a value from a given set:

<element name="version">
   <simpleType>
      <restriction base="string">\
         <enumeration value="1.0"/>
         <enumeration value="2.0"/>
      </restriction>
   </simpleType>
</element> 

The content can be forced to conform to a regular expression:

<element name="version">
   <simpleType>
      <restriction base="string">
         <pattern value="[1-9]\.[0-9]+"/>
      </restriction>
   </simpleType>
</element>      

Attributes

Attributes can be defined as part of complexType declarations.

<element name="author">
   <complexType>
      <simpleContent>
         <extension base="string">
            <attribute name="email" type="string" 
                       use="required"/>
            <attribute name="office" type="integer" 
                       use="required"/>
            <attribute name="type" type="string"/>
         </extension>
      </simpleContent>
   </complexType>
</element>
      

Named Types

Types can be named and referred to at the top level of the XSD.

<element name="author" type="uct:authorType"/>

<complexType name="authorType">
   <simpleContent>
      <extension base="string">
         <attribute name="email" type="string" 
                    use="required"/>
         <attribute name="office" type="integer" 
                    use="required"/>
         <attribute name="type" type="string"/>
      </extension>
   </simpleContent>
</complexType>
      

Other Content Models

Instead of sequence, other content models may be used:

  • choice means that only one of the children may appear.

  • all means that each child may appear or not, but at most once each.

Consult the specification for more detail on these and other content models.

Schema Namespaces

Every schema should define a namespace for its elements and for internal references to types. For example:

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.uct.ac.za"
        xmlns:uct="http://www.uct.ac.za"> 

<element name="author" type="uct:authorType"/>

<complexType name="authorType">
   <simpleContent>
      <extension base="string">
         <attribute name="email" type="string" 
                    use="required"/>
         <attribute name="office" type="number" 
                    use="required"/>
         <attribute name="type" type="string"/>
      </extension>
   </simpleContent>
</complexType>

</schema>

Schema Example

Here is an example of a full Schema:

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.uct.ac.za"
        xmlns:uct="http://www.uct.ac.za"
        elementFormDefault="qualified"
        attributeFormDefault="unqualified"
> 

<complexType name="authorType">
   <simpleContent>
      <extension base="string">
         <attribute name="email" type="string" use="required"/>
         <attribute name="office" type="integer" use="required"/>
         <attribute name="type" type="string"/>
      </extension>
   </simpleContent>
</complexType>

<complexType name="versionType">
   <sequence>
      <element name="number">
         <simpleType>
            <restriction base="string">
               <pattern value="[1-9]\.[0-9]+"/>
            </restriction>
         </simpleType>
      </element>
   </sequence>
</complexType>

<complexType name="uctType">
   <sequence>
      <element name="title" type="string"/>
      <element name="author" type="uct:authorType"/>
      <element name="version" type="uct:versionType"/>
   </sequence
</complexType>

<element name="uct" type="uct:uctType"/>

</schema>
      

Here is a valid XML example for the above Schema

<uct xmlns="http://www.uct.ac.za"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.uct.ac.za 
       http://www.husseinsspace/teaching/uct/2003/csc400dl/uct.xsd"
>

   <title>test XML document</title>
   <author email="pat@cs.uct.ac.za" 
           office="410" 
           type="lecturer">Pat Pukram</author>
   <version>
      <number>1.0</number>
   </version>

</uct>
     

Activity 2: Schema

Write a Schema for the following XML document.

<article xmlns="http://article.com">
  <name>Fermat's Last Theorem</name>
  <date>20010112</date>
  <length unit="pages">11</length>
  <author>
    <first>Jonathan</first>
    <last>Smith</last>
  </author>
  <author>
    <first>Mary</first>
    <last>Carter</last>
  </author>
</article>