Commit ac297930 authored by Daniel Veillard's avatar Daniel Veillard
Browse files

some cleanups extended the document to cover RelaxNG and tree operations

* relaxng.c: some cleanups
* doc/xmlreader.html: extended the document to cover RelaxNG and
  tree operations
* python/tests/Makefile.am python/tests/reader[46].py: added some
  xmlReader example/regression tests
* result/relaxng/tutor*.err: updated the output of a number of tests
Daniel
parent 62163604
Thu Apr 17 14:51:57 CEST 2003 Daniel Veillard <daniel@veillard.com>
* relaxng.c: some cleanups
* doc/xmlreader.html: extended the document to cover RelaxNG and
tree operations
* python/tests/Makefile.am python/tests/reader[46].py: added some
xmlReader example/regression tests
* result/relaxng/tutor*.err: updated the output of a number of tests
Thu Apr 17 11:35:37 CEST 2003 Daniel Veillard <daniel@veillard.com>
 
* relaxng.c: valgrind pointed out an uninitialized variable error.
......
......@@ -13,6 +13,8 @@ H3 {font-family: Verdana,Arial,Helvetica}
A:link, A:visited, A:active { text-decoration: underline }-->
</style>
<title>Libxml2 XmlTextReader Interface tutorial</title>
</head>
......@@ -42,6 +44,9 @@ examples using both C and the Python bindings:</p>
attributes</a></li>
<li><a href="#Validating">Validating a document</a></li>
<li><a href="#Entities">Entities substitution</a></li>
<li><a href="#L1142">Relax-NG Validation</a></li>
<li><a href="#Mixing">Mixing the reader and tree or XPath
operations</a></li>
</ul>
<p></p>
......@@ -147,8 +152,7 @@ def streamFile(filename):
ret = reader.Read()
if ret != 0:
print "%s : failed to parse" % (filename)
</pre>
print "%s : failed to parse" % (filename)</pre>
<p>The only things worth adding are that the <a
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader
......@@ -390,9 +394,79 @@ the validation feature is just:</p>
<h2><a name="Entities">Entities substitution</a></h2>
<p>@@TODO@@</p>
<p>By default the xmlReader will report entities as such and not replace them
with their content. This default behaviour can however be overriden using:</p>
<p><code>reader.SetParserProp(libxml2.PARSER_SUBST_ENTITIES,1)</code></p>
<h2><a name="L1142">Relax-NG Validation</a></h2>
<p style="font-size: 10pt">Introduced in version 2.5.7</p>
<p>Libxml2 can now validate the document being read using the xmlReader using
Relax-NG schemas. While the Relax NG validator can't always work in a
streamable mode, only subsets which cannot be reduced to regular expressions
need to have their subtree expanded for validation. In practice it means
that, unless the schemas for the top level element content is not expressable
as a regexp, only chunk of the document needs to be parsed while
validating.</p>
<p>The steps to do so are:</p>
<ul>
<li>create a reader working on a document as usual</li>
<li>before any call to read associate it to a Relax NG schemas, either the
preparsed schemas or the URL to the schemas to use</li>
<li>errors will be reported the usual way, and the validity status can be
obtained using the IsValid() interface of the reader like for DTDs.</li>
</ul>
<p> </p>
<p>Example, assuming the reader has already being created and that the schema
string contains the Relax-NG schemas:</p>
<p><code>rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))<br>
rngs = rngp.relaxNGParse()<br>
reader.RelaxNGSetSchema(rngs)<br>
ret = reader.Read()<br>
while ret == 1:<br>
ret = reader.Read()<br>
if ret != 0:<br>
print "Error parsing the document"<br>
if reader.IsValid() != 1:<br>
print "Document failed to validate"</code><br>
See <code>reader6.py</code> in the sources or documentation for a complete
example.</p>
<h2><a name="Mixing">Mixing the reader and tree or XPath operations</a></h2>
<p style="font-size: 10pt">Introduced in version 2.5.7</p>
<p>While the reader is a streaming interface, its underlying implementation
is based on the DOM builder of libxml2. As a result it is relatively simple
to mix operations based on both models under some constraints. To do so the
reader has an Expand() operation allowing to grow the subtree under the
current node. It returns a pointer to a standard node wich can be manipulated
in the usual ways. The node will get all its ancestors and the full subtree
available. Usual operations like XPath queries can be used on that reduced
view of the document. Here is an example extracted from reader5.py in the
sources which extract and prints the bibliography for the "Dragon" compiler
book from the XML 1.0 recommendation:</p>
<pre>f = open('../../test/valid/REC-xml-19980210.xml')
input = libxml2.inputBuffer(f)
reader = input.newTextReader("REC")
res=""
while reader.Read():
while reader.Name() == 'bibl':
node = reader.Expand() # expand the subtree
if node.xpathEval("@id = 'Aho'"): # use XPath on it
res = res + node.serialize()
if reader.Next() != 1: # skip the subtree
break;</pre>
<p>Note however that the node instance returned by the Expand() call is only
valid until the next Read() operation. The Expand() operation does not
affects the Read() ones, however usually once processed the full subtree is
not useful anymore, and the Next() operation allows to skip it completely and
process to the successor or return 0 if the document end is reached. </p>
<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
......
......@@ -23,6 +23,9 @@ PYTESTS= \
reader.py \
reader2.py \
reader3.py \
reader4.py \
reader5.py \
reader6.py \
ctxterror.py\
readererr.py\
relaxng.py
......
#!/usr/bin/python -u
#
# this tests the basic APIs of the XmlTextReader interface
#
import libxml2
import StringIO
import sys
# Memory debug specific
libxml2.debugMemory(1)
def tst_reader(s):
f = StringIO.StringIO(s)
input = libxml2.inputBuffer(f)
reader = input.newTextReader("tst")
res = ""
while reader.Read():
res=res + "%s (%s) [%s] %d\n" % (reader.NodeType(),reader.Name(),
reader.Value(), reader.IsEmptyElement())
if reader.NodeType() == 1: # Element
while reader.MoveToNextAttribute():
res = res + "-- %s (%s) [%s]\n" % (reader.NodeType(),
reader.Name(),reader.Value())
return res
expect="""1 (test) [None] 0
1 (b) [None] 1
1 (c) [None] 1
15 (test) [None] 0
"""
res = tst_reader("""<test><b/><c/></test>""")
if res != expect:
print "Did not get the expected error message:"
print res
sys.exit(1)
# Memory debug specific
libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
print "OK"
else:
print "Memory leak %d bytes" % (libxml2.debugMemory(1))
libxml2.dumpMemory()
#!/usr/bin/python -u
#
# this tests the entities substitutions with the XmlTextReader interface
#
import sys
import StringIO
import libxml2
schema="""<element name="foo" xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<oneOrMore>
<element name="label">
<text/>
</element>
<optional>
<element name="opt">
<empty/>
</element>
</optional>
<element name="item">
<data type="byte"/>
</element>
</oneOrMore>
</element>
"""
# Memory debug specific
libxml2.debugMemory(1)
#
# Parse the Relax NG Schemas
#
rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))
rngs = rngp.relaxNGParse()
del rngp
#
# Parse and validate the correct document
#
docstr="""<foo>
<label>some text</label>
<item>100</item>
</foo>"""
f = StringIO.StringIO(docstr)
input = libxml2.inputBuffer(f)
reader = input.newTextReader("correct")
reader.RelaxNGSetSchema(rngs)
ret = reader.Read()
while ret == 1:
ret = reader.Read()
if ret != 0:
print "Error parsing the document"
sys.exit(1)
if reader.IsValid() != 1:
print "Document failed to validate"
sys.exit(1)
#
# Parse and validate the incorrect document
#
docstr="""<foo>
<label>some text</label>
<item>1000</item>
</foo>"""
err=""
expect="""RNG validity error: file error line 3 element text
Type byte doesn't allow value '1000'
RNG validity error: file error line 3 element text
Error validating datatype byte
RNG validity error: file error line 3 element text
Element item failed to validate content
"""
def callback(ctx, str):
global err
err = err + "%s" % (str)
libxml2.registerErrorHandler(callback, "")
f = StringIO.StringIO(docstr)
input = libxml2.inputBuffer(f)
reader = input.newTextReader("error")
reader.RelaxNGSetSchema(rngs)
ret = reader.Read()
while ret == 1:
ret = reader.Read()
if ret != 0:
print "Error parsing the document"
sys.exit(1)
if reader.IsValid() != 0:
print "Document failed to detect the validation error"
sys.exit(1)
if err != expect:
print "Did not get the expected error message:"
print err
sys.exit(1)
#
# cleanup
#
del f
del input
del reader
del rngs
libxml2.relaxNGCleanupTypes()
# Memory debug specific
libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
print "OK"
else:
print "Memory leak %d bytes" % (libxml2.debugMemory(1))
libxml2.dumpMemory()
......@@ -8,11 +8,9 @@
/**
* TODO:
* - error reporting
* - handle namespace declarations as attributes.
* - add support for DTD compatibility spec
* http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html
* - report better mem allocations at runtime and abort immediately.
* - report better mem allocations pbms at runtime and abort immediately.
*/
#define IN_LIBXML
......@@ -836,7 +834,6 @@ xmlRelaxNGFreeDefine(xmlRelaxNGDefinePtr define)
* @size: the default size for the container
*
* Allocate a new RelaxNG validation state container
* TODO: keep a pool in the ctxt
*
* Returns the newly allocated structure or NULL in case or error
*/
......@@ -1989,7 +1986,7 @@ xmlRelaxNGGetErrorString(xmlRelaxNGValidErr err, const xmlChar *arg1,
case XML_RELAXNG_ERR_EXTRADATA:
return(xmlCharStrdup("Extra data in the document"));
default:
TODO
return(xmlCharStrdup("Unknown error !"));
}
if (msg[0] == 0) {
snprintf(msg, 1000, "Unknown error code %d", err);
......@@ -2279,12 +2276,6 @@ xmlRelaxNGSchemaTypeCheck(void *data ATTRIBUTE_UNUSED,
xmlSchemaTypePtr typ;
int ret;
/*
* TODO: the type should be cached ab provided back, interface subject
* to changes.
* TODO: handle facets, may require an additional interface and keep
* the value returned from the validation.
*/
if ((type == NULL) || (value == NULL))
return(-1);
typ = xmlSchemaGetPredefinedType(type,
......@@ -2956,9 +2947,9 @@ xmlRelaxNGCompile(xmlRelaxNGParserCtxtPtr ctxt, xmlRelaxNGDefinePtr def) {
case XML_RELAXNG_LIST:
case XML_RELAXNG_PARAM:
case XML_RELAXNG_VALUE:
TODO /* This should not happen and generate an internal error */
printf("trying to compile %s\n", xmlRelaxNGDefName(def));
/* This should not happen and generate an internal error */
fprintf(stderr, "RNG internal error trying to compile %s\n",
xmlRelaxNGDefName(def));
break;
}
return(ret);
......@@ -3302,7 +3293,6 @@ xmlRelaxNGParseValue(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
}
}
}
/* TODO check ahead of time that the value is okay per the type */
return(def);
}
......@@ -4878,10 +4868,9 @@ xmlRelaxNGParseAttribute(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
ctxt->nbErrors++;
break;
case XML_RELAXNG_NOOP:
TODO
if (ctxt->error != NULL)
ctxt->error(ctxt->userData,
"Internal error, noop found\n");
"RNG Internal error, noop found in attribute\n");
ctxt->nbErrors++;
break;
}
......@@ -5199,16 +5188,27 @@ xmlRelaxNGParseElement(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
ret->attrs = cur;
break;
case XML_RELAXNG_START:
if (ctxt->error != NULL)
ctxt->error(ctxt->userData,
"RNG Internal error, start found in element\n");
ctxt->nbErrors++;
break;
case XML_RELAXNG_PARAM:
if (ctxt->error != NULL)
ctxt->error(ctxt->userData,
"RNG Internal error, param found in element\n");
ctxt->nbErrors++;
break;
case XML_RELAXNG_EXCEPT:
TODO
if (ctxt->error != NULL)
ctxt->error(ctxt->userData,
"RNG Internal error, except found in element\n");
ctxt->nbErrors++;
break;
case XML_RELAXNG_NOOP:
TODO
if (ctxt->error != NULL)
ctxt->error(ctxt->userData,
"Internal error, noop found\n");
"RNG Internal error, noop found in element\n");
ctxt->nbErrors++;
break;
}
......@@ -5438,9 +5438,6 @@ xmlRelaxNGCheckReference(xmlRelaxNGDefinePtr ref,
name);
ctxt->nbErrors++;
}
/*
* TODO: make a closure and verify there is no loop !
*/
}
/**
......
RNG validity error: file ./test/relaxng/tutor10_7_3.xml line 2 element card
Element addressBook has extra content: card
Element card failed to validate attributes
RNG validity error: file ./test/relaxng/tutor10_8_3.xml line 2 element card
Element addressBook has extra content: card
Element card failed to validate attributes
RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email
Expecting element name, got email
RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email
Element card failed to validate content
Did not expect element email there
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element card
Element addressBook has extra content: card
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email
Expecting element name, got email
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email
Element card failed to validate content
RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card
Element addressBook has extra content: card
Invalid sequence in interleave
RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card
Element card failed to validate attributes
RNG validity error: file ./test/relaxng/tutor9_5_3.xml line 2 element card
Element addressBook has extra content: card
Invalid attribute error for element card
RNG validity error: file ./test/relaxng/tutor9_6_2.xml line 2 element card
Element addressBook has extra content: card
Element card failed to validate attributes
RNG validity error: file ./test/relaxng/tutor9_6_3.xml line 2 element card
Element addressBook has extra content: card
Invalid attribute error for element card
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment