xml.html 156 KB
Newer Older
1
2
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
3
4
<html>
<head>
5
  <title>The XML C library for Gnome</title>
6
  <meta name="GENERATOR" content="amaya 5.1">
Daniel Veillard's avatar
Daniel Veillard committed
7
  <meta http-equiv="Content-Type" content="text/html">
8
9
</head>

10
<body bgcolor="#ffffff">
11
<h1 align="center">The XML C library for Gnome</h1>
12

13
14
15
<h1>Note: this is the flat content of the <a href="index.html">web
site</a></h1>

16
<h1 style="text-align: center">libxml, a.k.a. gnome-xml</h1>
17
18

<p></p>
19
20
21
22
23

<p>Libxml is the XML C library developped for the Gnome project.  XML itself
is a metalanguage to design markup languages, i.e. text language where
semantic and structure are added to the content using extra "markup"
information enclosed between angle bracket. HTML is the most well-known
24
25
markup language. Though the library is written in C <a href="python.html">a
variety of language binding</a> makes it available in other environments.</p>
26
27
28
29
30
31
32
33
34
35

<p>Libxml2 implements a number of existing standards related to markup
languages:</p>
<ul>
  <li>the XML standard: <a
    href="http://www.w3.org/TR/REC-xml">http://www.w3.org/TR/REC-xml</a></li>
  <li>Namespaces in XML: <a
    href="http://www.w3.org/TR/REC-xml-names/">http://www.w3.org/TR/REC-xml-names/</a></li>
  <li>XML Base: <a
    href="http://www.w3.org/TR/xmlbase/">http://www.w3.org/TR/xmlbase/</a></li>
36
37
  <li><a href="http://www.cis.ohio-state.edu/rfc/rfc2396.txt">RFC 2396</a> :
    Uniform Resource Identifiers <a
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
    href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</a></li>
  <li>XML Path Language (XPath) 1.0: <a
    href="http://www.w3.org/TR/xpath">http://www.w3.org/TR/xpath</a></li>
  <li>HTML4 parser: <a
    href="http://www.w3.org/TR/html401/">http://www.w3.org/TR/html401/</a></li>
  <li>most of XML Pointer Language (XPointer) Version 1.0: <a
    href="http://www.w3.org/TR/xptr">http://www.w3.org/TR/xptr</a></li>
  <li>XML Inclusions (XInclude) Version 1.0: <a
    href="http://www.w3.org/TR/xinclude/">http://www.w3.org/TR/xinclude/</a></li>
  <li>[ISO-8859-1], <a
    href="http://www.cis.ohio-state.edu/rfc/rfc2044.txt">rfc2044</a> [UTF-8]
    and <a href="http://www.cis.ohio-state.edu/rfc/rfc2781.txt">rfc2781</a>
    [UTF-16] core encodings</li>
  <li>part of SGML Open Technical Resolution TR9401:1997</li>
  <li>XML Catalogs Working Draft 06 August 2001: <a
    href="http://www.oasis-open.org/committees/entity/spec-2001-08-06.html">http://www.oasis-open.org/committees/entity/spec-2001-08-06.html</a></li>
54
55
  <li>Canonical XML Version 1.0: <a
    href="http://www.w3.org/TR/xml-c14n">http://www.w3.org/TR/xml-c14n</a>
56
57
    and the Exclusive XML Canonicalization CR draft <a
    href="http://www.w3.org/TR/xml-exc-c14n">http://www.w3.org/TR/xml-exc-c14n</a></li>
58
59
60
</ul>

<p>In most cases libxml tries to implement the specifications in a relatively
Daniel Veillard's avatar
Daniel Veillard committed
61
62
63
strict way. As of release 2.4.16, libxml2 passes all 1800+ tests from the <a
href="http://www.oasis-open.org/committees/xml-conformance/">OASIS XML Tests
Suite</a>.</p>
64
65
66

<p>To some extent libxml2 provide some support for the following other
specification but don't claim to implement them:</p>
67
68
69
70
71
<ul>
  <li>Document Object Model (DOM) <a
    href="http://www.w3.org/TR/DOM-Level-2-Core/">http://www.w3.org/TR/DOM-Level-2-Core/</a>
    it doesn't implement the API itself, gdome2 does this in top of
  libxml2</li>
72
73
74
75
  <li><a href="http://www.cis.ohio-state.edu/rfc/rfc959.txt">RFC 959</a> :
    libxml implements a basic FTP client code</li>
  <li><a href="http://www.cis.ohio-state.edu/rfc/rfc1945.txt">RFC 1945</a> :
    HTTP/1.0, again a basic HTTP client code</li>
76
77
78
79
  <li>SAX: a minimal SAX implementation compatible with early expat
  versions</li>
  <li>DocBook SGML v4: libxml2 includes a hackish parser to transition to
  XML</li>
80
81
</ul>

82
83
84
<p>Libxml2 is known to be very portable, the library should build and work
without serious troubles on a variety of systems (Linux, Unix, Windows,
CygWin, MacOs, MacOsX, RISC Os, OS/2, VMS, QNX, MVS, ...)</p>
85

86
87
<p>Separate documents:</p>
<ul>
88
  <li><a href="http://xmlsoft.org/XSLT/">the libxslt page</a> providing an
89
90
    implementation of XSLT 1.0 and common extensions like EXSLT for
  libxml2</li>
91
  <li><a href="http://www.cs.unibo.it/~casarini/gdome2/">the gdome2 page</a>
92
93
94
95
    : a standard DOM2 implementation for libxml2</li>
  <li><a href="http://www.aleksey.com/xmlsec/">the XMLSec page</a>: an
    implementation of <a href="http://www.w3.org/TR/xmldsig-core/">W3C XML
    Digital Signature</a> for libxml2</li>
96
97
98
</ul>

<h2><a name="Introducti">Introduction</a></h2>
99

Daniel Veillard's avatar
Daniel Veillard committed
100
<p>This document describes libxml, the <a
101
102
103
104
href="http://www.w3.org/XML/">XML</a> C library developped for the <a
href="http://www.gnome.org/">Gnome</a> project. <a
href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based
structured documents/data.</p>
105

106
107
<p>Here are some key points about libxml:</p>
<ul>
108
109
  <li>Libxml exports Push (progressive) and Pull (blocking) type parser
    interfaces for both XML and HTML.</li>
110
111
  <li>Libxml can do DTD validation at parse time, using a parsed document
    instance, or with an arbitrary DTD.</li>
112
  <li>Libxml includes complete <a
Daniel Veillard's avatar
Daniel Veillard committed
113
114
115
    href="http://www.w3.org/TR/xpath">XPath</a>, <a
    href="http://www.w3.org/TR/xptr">XPointer</a> and <a
    href="http://www.w3.org/TR/xinclude">XInclude</a> implementations.</li>
116
  <li>It is written in plain C, making as few assumptions as possible, and
117
    sticking closely to ANSI C/POSIX for easy embedding. Works on
118
    Linux/Unix/Windows, ported to a number of other platforms.</li>
Daniel Veillard's avatar
Daniel Veillard committed
119
120
  <li>Basic support for HTTP and FTP client allowing aplications to fetch
    remote resources</li>
121
  <li>The design is modular, most of the extensions can be compiled out.</li>
122
123
124
  <li>The internal document repesentation is as close as possible to the <a
    href="http://www.w3.org/DOM/">DOM</a> interfaces.</li>
  <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX
125
126
    like interface</a>; the interface is designed to be compatible with <a
    href="http://www.jclark.com/xml/expat.html">Expat</a>.</li>
127
128
129
130
  <li>This library is released under the <a
    href="http://www.opensource.org/licenses/mit-license.html">MIT
    Licence</a> see the Copyright file in the distribution for the precise
    wording.</li>
131
</ul>
132

133
<p>Warning: unless you are forced to because your application links with a
134
Gnome-1.X library requiring it,  <strong><span
135
136
137
style="background-color: #FF0000">Do Not Use libxml1</span></strong>, use
libxml2</p>

138
139
140
141
142
143
144
145
146
147
148
149
150
<h2><a name="FAQ">FAQ</a></h2>

<p>Table of Content:</p>
<ul>
  <li><a href="FAQ.html#Licence">Licence(s)</a></li>
  <li><a href="FAQ.html#Installati">Installation</a></li>
  <li><a href="FAQ.html#Compilatio">Compilation</a></li>
  <li><a href="FAQ.html#Developer">Developer corner</a></li>
</ul>

<h3><a name="Licence">Licence</a>(s)</h3>
<ol>
  <li><em>Licensing Terms for libxml</em>
151
152
153
154
    <p>libxml is released under the <a
    href="http://www.opensource.org/licenses/mit-license.html">MIT
    Licence</a>, see the file Copyright in the distribution for the precise
    wording</p>
155
156
  </li>
  <li><em>Can I embed libxml in a proprietary application ?</em>
157
158
    <p>Yes. The MIT Licence allows you to also keep proprietary the changes
    you made to libxml, but it would be graceful to provide back bugfixes and
159
160
161
162
163
164
165
166
167
168
    improvements as patches for possible incorporation in the main
    development tree</p>
  </li>
</ol>

<h3><a name="Installati">Installation</a></h3>
<ol>
  <li>Unless you are forced to because your application links with a Gnome
    library requiring it,  <strong><span style="background-color: #FF0000">Do
    Not Use libxml1</span></strong>, use libxml2</li>
169
  <li><em>Where can I get libxml</em> ?
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
    <p>The original distribution comes from <a
    href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a
    href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p>
    <p>Most linux and Bsd distribution includes libxml, this is probably the
    safer way for end-users</p>
    <p>David Doolin provides precompiled Windows versions at <a
    href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/         ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p>
  </li>
  <li><em>I see libxml and libxml2 releases, which one should I install ?</em>
    <ul>
      <li>If you are not concerned by any existing backward compatibility
        with existing application, install libxml2 only</li>
      <li>If you are not doing development, you can safely install both.
        usually the packages <a
        href="http://rpmfind.net/linux/RPM/libxml.html">libxml</a> and <a
        href="http://rpmfind.net/linux/RPM/libxml2.html">libxml2</a> are
        compatible (this is not the case for development packages)</li>
      <li>If you are a developer and your system provides separate packaging
        for shared libraries and the development components, it is possible
        to install libxml and libxml2, and also <a
        href="http://rpmfind.net/linux/RPM/libxml-devel.html">libxml-devel</a>
        and <a
        href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml2-devel</a>
        too for libxml2 &gt;= 2.3.0</li>
      <li>If you are developing a new application, please develop against
        libxml2(-devel)</li>
    </ul>
  </li>
  <li><em>I can't install the libxml package it conflicts with libxml0</em>
    <p>You probably have an old libxml0 package used to provide the shared
    library for libxml.so.0, you can probably safely remove it. Anyway the
    libxml packages provided on <a
    href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> provides
    libxml.so.0</p>
  </li>
  <li><em>I can't install the libxml(2) RPM package due to failed
    dependancies</em>
    <p>The most generic solution is to refetch the latest src.rpm , and
    rebuild it locally with</p>
    <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p>
    <p>if everything goes well it will generate two binary rpm (one providing
    the shared libs and xmllint, and the other one, the -devel package
    providing includes, static libraries and scripts needed to build
    applications with libxml(2)) that you can install locally.</p>
  </li>
</ol>

<h3><a name="Compilatio">Compilation</a></h3>
<ol>
  <li><em>What is the process to compile libxml ?</em>
    <p>As most UNIX libraries libxml follows the "standard":</p>
    <p><code>gunzip -c xxx.tar.gz | tar xvf -</code></p>
    <p><code>cd libxml-xxxx</code></p>
    <p><code>./configure --help</code></p>
    <p>to see the options, then the compilation/installation proper</p>
    <p><code>./configure [possible options]</code></p>
    <p><code>make</code></p>
    <p><code>make install</code></p>
    <p>At that point you may have to rerun ldconfig or similar utility to
    update your list of installed shared libs.</p>
  </li>
  <li><em>What other libraries are needed to compile/install libxml ?</em>
    <p>Libxml does not requires any other library, the normal C ANSI API
    should be sufficient (please report any violation to this rule you may
    find).</p>
    <p>However if found at configuration time libxml will detect and use the
    following libs:</p>
    <ul>
238
239
      <li><a href="http://www.info-zip.org/pub/infozip/zlib/">libz</a> : a
        highly portable and available widely compression library</li>
240
241
242
243
244
245
246
247
248
249
250
251
252
253
      <li>iconv: a powerful character encoding conversion library. It's
        included by default on recent glibc libraries, so it doesn't need to
        be installed specifically on linux. It seems it's now <a
        href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part
        of the official UNIX</a> specification. Here is one <a
        href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation
        of the library</a> which source can be found <a
        href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li>
    </ul>
  </li>
  <li><em>make check fails on some platforms</em>
    <p>Sometime the regression tests results don't completely match the value
    produced by the parser, and the makefile uses diff to print the delta. On
    some platforms the diff return breaks the compilation process, if the
254
255
256
    diff is small this is probably not a serious problem.</p>
    <p>Sometimes (especially on Solaris) make checks fails due to limitations
    in make. Try using GNU-make instead.</p>
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
  </li>
  <li><em>I use the CVS version and there is no configure script</em>
    <p>The configure (and other Makefiles) are generated. Use the autogen.sh
    script to regenerate the configure and Makefiles, like:</p>
    <p><code>./autogen.sh --prefix=/usr --disable-shared</code></p>
  </li>
  <li><em>I have troubles when running make tests with gcc-3.0</em>
    <p>It seems the initial release of gcc-3.0 has a problem with the
    optimizer which miscompiles the URI module. Please use another
    compiler</p>
  </li>
</ol>

<h3><a name="Developer">Developer</a> corner</h3>
<ol>
  <li><em>xmlDocDump() generates output on one line</em>
    <p>libxml will not <strong>invent</strong> spaces in the content of a
    document since <strong>all spaces in the content of a document are
    significant</strong>. If you build a tree from the API and want
    indentation:</p>
    <ol>
      <li>the correct way is to generate those yourself too</li>
      <li>the dangerous way is to ask libxml to add those blanks to your
        content <strong>modifying the content of your document in the
        process</strong>. The result may not be what you expect. There is
        <strong>NO</strong> way to guarantee that such a modification won't
        impact other part of the content of your document. See <a
        href="http://xmlsoft.org/html/libxml-parser.html#XMLKEEPBLANKSDEFAULT">xmlKeepBlanksDefault
        ()</a> and <a
        href="http://xmlsoft.org/html/libxml-tree.html#XMLSAVEFORMATFILE">xmlSaveFormatFile
        ()</a></li>
    </ol>
  </li>
  <li>Extra nodes in the document:
    <p><em>For a XML file as below:</em></p>
    <pre>&lt;?xml version="1.0"?&gt;
&lt;PLAN xmlns="http://www.argus.ca/autotest/1.0/"&gt;
&lt;NODE CommFlag="0"/&gt;
&lt;NODE CommFlag="1"/&gt;
&lt;/PLAN&gt;</pre>
    <p><em>after parsing it with the function
    pxmlDoc=xmlParseFile(...);</em></p>
    <p><em>I want to the get the content of the first node (node with the
    CommFlag="0")</em></p>
    <p><em>so I did it as following;</em></p>
    <pre>xmlNodePtr pode;
pnode=pxmlDoc-&gt;children-&gt;children;</pre>
    <p><em>but it does not work. If I change it to</em></p>
    <pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre>
    <p><em>then it works.  Can someone explain it to me.</em></p>
    <p></p>
    <p>In XML all characters in the content of the document are significant
    <strong>including blanks and formatting line breaks</strong>.</p>
    <p>The extra nodes you are wondering about are just that, text nodes with
    the formatting spaces wich are part of the document but that people tend
    to forget. There is a function <a
    href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault
    ()</a>  to remove those at parse time, but that's an heuristic, and its
    use should be limited to case where you are sure there is no
    mixed-content in the document.</p>
  </li>
  <li><em>I get compilation errors of existing code like when accessing
    <strong>root</strong> or <strong>childs fields</strong> of nodes</em>
    <p>You are compiling code developed for libxml version 1 and using a
    libxml2 development environment. Either switch back to libxml v1 devel or
    even better fix the code to compile with libxml2 (or both) by <a
    href="upgrade.html">following the instructions</a>.</p>
  </li>
  <li><em>I get compilation errors about non existing
    <strong>xmlRootNode</strong> or <strong>xmlChildrenNode</strong>
    fields</em>
    <p>The source code you are using has been <a
    href="upgrade.html">upgraded</a> to be able to compile with both libxml
    and libxml2, but you need to install a more recent version:
    libxml(-devel) &gt;= 1.8.8 or libxml2(-devel) &gt;= 2.1.0</p>
  </li>
  <li><em>XPath implementation looks seriously broken</em>
    <p>XPath implementation prior to 2.3.0 was really incomplete, upgrade to
335
    a recent version, there is no known bug in the current version.</p>
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
  </li>
  <li><em>The example provided in the web page does not compile</em>
    <p>It's hard to maintain the documentation in sync with the code
    &lt;grin/&gt; ...</p>
    <p>Check the previous points 1/ and 2/ raised before, and send
    patches.</p>
  </li>
  <li><em>Where can I get more examples and informations than in the web
    page</em>
    <p>Ideally a libxml book would be nice. I have no such plan ... But you
    can:</p>
    <ul>
      <li>check more deeply the <a href="html/libxml-lib.html">existing
        generated doc</a></li>
      <li>looks for examples of use for libxml function using the Gnome code
        for example the following will query the full Gnome CVs base for the
        use of the <strong>xmlAddChild()</strong> function:
        <p><a
        href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p>
        <p>This may be slow, a large hardware donation to the gnome project
        could cure this :-)</p>
      </li>
      <li><a
        href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&amp;dir=gnome-xml">Browse
360
        the libxml source</a> , I try to write code as clean and documented
361
362
363
        as possible, so looking at it may be helpful. Especially the code of
        xmllint.c and of the various testXXX.c tests programs should provide
        good example on how to do things with the library.</li>
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
    </ul>
  </li>
  <li>What about C++ ?
    <p>libxml is written in pure C in order to allow easy reuse on a number
    of platforms, including embedded systems. I don't intend to convert to
    C++.</p>
    <p>There is however a C++ wrapper provided by Ari Johnson
    &lt;ari@btigate.com&gt; which may fullfill your needs:</p>
    <p>Website: <a
    href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p>
    <p>Download: <a
    href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p>
  </li>
  <li>How to validate a document a posteriori ?
    <p>It is possible to validate documents which had not been validated at
    initial parsing time or documents who have been built from scratch using
    the API. Use the <a
    href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a>
    function. It is also possible to simply add a Dtd to an existing
    document:</p>
    <pre>xmlDocPtr doc; /* your existing document */
        xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
        dtd-&gt;name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */

        doc-&gt;intSubset = dtd;
        if (doc-&gt;children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd);
        else xmlAddPrevSibling(doc-&gt;children, (xmlNodePtr)dtd);
          </pre>
  </li>
  <li>etc ...</li>
</ol>

<p></p>

398
<h2><a name="Documentat">Documentation</a></h2>
399

400
<p>There are some on-line resources about using libxml:</p>
401
<ol>
402
  <li>Check the <a href="FAQ.html">FAQ</a></li>
403
  <li>Check the <a href="http://xmlsoft.org/html/libxml-lib.html">extensive
404
405
406
    documentation</a> automatically extracted from code comments (using <a
    href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&amp;dir=gtk-doc">gtk
    doc</a>).</li>
407
408
  <li>Look at the documentation about <a href="encoding.html">libxml
    internationalization support</a></li>
Daniel Veillard's avatar
Daniel Veillard committed
409
  <li>This page provides a global overview and <a href="example.html">some
410
    examples</a> on how to use libxml.</li>
411
  <li><a href="mailto:james@daa.com.au">James Henstridge</a> wrote <a
412
413
    href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">some nice
    documentation</a> explaining how to use the libxml SAX interface.</li>
414
415
  <li>George Lebl wrote <a
    href="http://www-4.ibm.com/software/developer/library/gnome3/">an article
416
    for IBM developerWorks</a> about using libxml.</li>
417
418
419
420
421
  <li>Check <a href="http://cvs.gnome.org/lxr/source/gnome-xml/TODO">the TODO
    file</a></li>
  <li>Read the <a href="upgrade.html">1.x to 2.x upgrade path</a>. If you are
    starting a new project using libxml you should really use the 2.x
  version.</li>
422
423
  <li>And don't forget to look at the <a
    href="http://mail.gnome.org/archives/xml/">mailing-list archive</a>.</li>
424
425
</ol>

426
<h2><a name="Reporting">Reporting bugs and getting help</a></h2>
427

428
429
430
431
432
<p>Well, bugs or missing features are always possible, and I will make a
point of fixing them in a timely fashion. The best way to report a bug is to
use the <a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Gnome
bug tracking database</a> (make sure to use the "libxml" module name). I look
at reports there regularly and it's good to have a reminder when a bug is
Daniel Veillard's avatar
Daniel Veillard committed
433
still open. Be sure to specify that the bug is for the package libxml.</p>
434

435
<p>There is also a mailing-list <a
436
437
href="mailto:xml@gnome.org">xml@gnome.org</a> for libxml, with an  <a
href="http://mail.gnome.org/archives/xml/">on-line archive</a> (<a
438
439
440
441
442
href="http://xmlsoft.org/messages">old</a>). To subscribe to this list,
please visit the <a
href="http://mail.gnome.org/mailman/listinfo/xml">associated Web</a> page and
follow the instructions. <strong>Do not send code, I won't debug it</strong>
(but patches are really appreciated!).</p>
443

444
445
<p>Check the following <strong><span style="color: #FF0000">before
posting</span></strong>:</p>
446
<ul>
447
  <li>read the <a href="FAQ.html">FAQ</a></li>
448
449
450
451
  <li>make sure you are <a href="ftp://xmlsoft.org/">using a recent
    version</a>, and that the problem still shows up in those</li>
  <li>check the <a href="http://mail.gnome.org/archives/xml/">list
    archives</a> to see if the problem was reported already, in this case
452
453
    there is probably a fix available, similary check the <a
    href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered
454
    open bugs</a></li>
455
456
457
458
459
  <li>make sure you can reproduce the bug with xmllint or one of the test
    programs found in source in the distribution</li>
  <li>Please send the command showing the error as well as the input (as an
    attachement)</li>
</ul>
460

461
<p>Then send the bug with associated informations to reproduce it to the <a
462
href="mailto:xml@gnome.org">xml@gnome.org</a> list; if it's really libxml
463
464
465
related I will approve it.. Please do not send me mail directly, it makes
things really harder to track and in some cases I'm not the best person to
answer a given question, ask the list instead.</p>
466

467
<p>Of course, bugs reported with a suggested patch for fixing them will
468
469
470
probably be processed faster.</p>

<p>If you're looking for help, a quick look at <a
471
href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually
472
provide the answer, I usually send source samples when answering libxml usage
473
questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated
474
475
476
documentantion</a> is not as polished as I would like (i need to learn more
about Docbook), but it's a good starting point.</p>

477
478
479
480
<h2><a name="help">How to help</a></h2>

<p>You can help the project in various ways, the best thing to do first is to
subscribe to the mailing-list as explained before, check the <a
481
482
href="http://mail.gnome.org/archives/xml/">archives </a>and the <a
href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Gnome bug
483
484
485
database:</a>:</p>
<ol>
  <li>provide patches when you find problems</li>
486
  <li>provide the diffs when you port libxml to a new platform. They may not
487
488
    be integrated in all cases but help pinpointing portability problems
  and</li>
489
  <li>provide documentation fixes (either as patches to the code comments or
490
491
492
493
    as HTML diffs).</li>
  <li>provide new documentations pieces (translations, examples, etc ...)</li>
  <li>Check the TODO file and try to close one of the items</li>
  <li>take one of the points raised in the archive or the bug database and
494
495
496
    provide a fix. <a href="mailto:daniel@veillard.com">Get in touch with me
    </a>before to avoid synchronization problems and check that the suggested
    fix will fit in nicely :-)</li>
497
498
</ol>

499
<h2><a name="Downloads">Downloads</a></h2>
500

501
<p>The latest versions of libxml can be found on <a
502
503
504
href="ftp://xmlsoft.org/">xmlsoft.org</a> (<a
href="ftp://speakeasy.rpmfind.net/pub/libxml/">Seattle</a>, <a
href="ftp://fr.rpmfind.net/pub/libxml/">France</a>) or on the <a
505
href="ftp://ftp.gnome.org/pub/GNOME/MIRRORS.html">Gnome FTP server</a> either
506
as a <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">source
507
archive</a> or <a
508
509
href="ftp://ftp.gnome.org/pub/GNOME/stable/redhat/i386/libxml/">RPM
packages</a>. (NOTE that you need both the <a
510
511
href="http://rpmfind.net/linux/RPM/libxml2.html">libxml(2)</a> and <a
href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml(2)-devel</a>
512
513
514
packages installed to compile applications using libxml.) <a
href="mailto:izlatkovic@daenet.de">Igor  Zlatkovic</a> is now the maintainer
of the Windows port, <a
515
href="http://www.fh-frankfurt.de/~igor/projects/libxml/index.html">he
516
provides binaries</a>. <a href="mailto:Gary.Pennington@sun.com">Gary
517
518
Pennington</a> provides <a href="http://garypennington.net/libxml2/">Solaris
binaries</a>.</p>
519

520
521
522
<p><a name="Snapshot">Snapshot:</a></p>
<ul>
  <li>Code from the W3C cvs base libxml <a
523
    href="ftp://xmlsoft.org/cvs-snapshot.tar.gz">cvs-snapshot.tar.gz</a></li>
524
  <li>Docs, content of the web site, the list archive included <a
525
    href="ftp://xmlsoft.org/libxml-docs.tar.gz">libxml-docs.tar.gz</a></li>
526
527
</ul>

528
<p><a name="Contribs">Contributions:</a></p>
529
530

<p>I do accept external contributions, especially if compiling on another
531
platform,  get in touch with me to upload the package, wrappers for various
Daniel Veillard's avatar
Daniel Veillard committed
532
533
languages have been provided, and can be found in the <a
href="contribs.html">contrib section</a></p>
534

535
<p>Libxml is also available from CVS:</p>
536
537
538
<ul>
  <li><p>The <a
    href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&amp;dir=gnome-xml">Gnome
539
    CVS base</a>. Check the <a
540
541
    href="http://developer.gnome.org/tools/cvs.html">Gnome CVS Tools</a>
    page; the CVS module is <b>gnome-xml</b>.</p>
542
  </li>
543
  <li>The <strong>libxslt</strong> module is also present there</li>
544
545
546
547
</ul>

<h2><a name="News">News</a></h2>

548
549
<h3>CVS only : check the <a
href="http://cvs.gnome.org/lxr/source/gnome-xml/ChangeLog">Changelog</a> file
550
for a really accurate description</h3>
551

552
<p>Items floating around but not actively worked on, get in touch with me if
553
554
you want to test those</p>
<ul>
Daniel Veillard's avatar
Daniel Veillard committed
555
556
  <li>Finishing up <a href="http://www.w3.org/TR/xptr">XPointer</a> and <a
    href="http://www.w3.org/TR/xinclude">XInclude</a></li>
557
558
559
560
561
562
563
564
565
</ul>

<h3>2.4.17: Mar 8 2002</h3>
<ul>
  <li>a lot of bug fixes, including "namespace nodes have no parents in
  XPath"</li>
  <li>fixed/improved the Python wrappers, added more examples and more
    regression tests, XPath extension functions can now return node-sets</li>
  <li>added the XML Canonalization support from Aleksey Sanin</li>
566
567
</ul>

568
569
570
571
572
573
574
<h3>2.4.16: Feb 20 2002</h3>
<ul>
  <li>a lot of bug fixes, most of them were triggered by the XML Testsuite
    from OASIS and W3C. Compliance has been significantly improved.</li>
  <li>a couple of portability fixes too.</li>
</ul>

575
576
577
578
579
580
581
<h3>2.4.15: Feb 11 2002</h3>
<ul>
  <li>Fixed the Makefiles, especially the python module ones</li>
  <li>A few bug fixes and cleanup</li>
  <li>Includes cleanup</li>
</ul>

582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
<h3>2.4.14: Feb 8 2002</h3>
<ul>
  <li>Change of Licence to the <a
    href="http://www.opensource.org/licenses/mit-license.html">MIT
    Licence</a> basisally for integration in XFree86 codebase, and removing
    confusion around the previous dual-licencing</li>
  <li>added Python bindings, beta software but should already be quite
    complete</li>
  <li>a large number of fixes and cleanups, especially for all tree
    manipulations</li>
  <li>cleanup of the headers, generation of a reference API definition in
  XML</li>
</ul>

<h3>2.4.13: Jan 14 2002</h3>
597
598
599
600
601
602
603
604
605
<ul>
  <li>update of the documentation: John Fleck and Charlie Bozeman</li>
  <li>cleanup of timing code from Justin Fletcher</li>
  <li>fixes for Windows and initial thread support on Win32: Igor and Serguei
    Narojnyi</li>
  <li>Cygwin patch from Robert Collins</li>
  <li>added xmlSetEntityReferenceFunc() for Keith Isdale work on xsldbg</li>
</ul>

606
607
608
609
610
611
612
613
614
<h3>2.4.12: Dec 7 2001</h3>
<ul>
  <li>a few bug fixes: thread (Gary Pennington), xmllint (Geert Kloosterman),
    XML parser (Robin Berjon), XPointer (Danny Jamshy), I/O cleanups
  (robert)</li>
  <li>Eric Lavigne contributed project files for MacOS</li>
  <li>some makefiles cleanups</li>
</ul>

615
616
617
618
619
620
621
622
623
<h3>2.4.11: Nov 26 2001</h3>
<ul>
  <li>fixed a couple of errors in the includes, fixed a few bugs, some code
    cleanups</li>
  <li>xmllint man pages improvement by Heiko Rupp</li>
  <li>updated VMS build instructions from John A Fotheringham</li>
  <li>Windows Makefiles updates from Igor</li>
</ul>

624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
<h3>2.4.10: Nov 10 2001</h3>
<ul>
  <li>URI escaping fix (Joel Young)</li>
  <li>added xmlGetNodePath() (for paths or XPointers generation)</li>
  <li>Fixes namespace handling problems when using DTD and validation</li>
  <li>improvements on xmllint: Morus Walter patches for --format and
    --encode, Stefan Kost and Heiko Rupp improvements on the --shell</li>
  <li>fixes for xmlcatalog linking pointed by Weiqi Gao</li>
  <li>fixes to the HTML parser</li>
</ul>

<h3>2.4.9: Nov 6 2001</h3>
<ul>
  <li>fixes more catalog bugs</li>
  <li>avoid a compilation problem, improve xmlGetLineNo()</li>
</ul>

641
642
643
644
645
646
647
<h3>2.4.8: Nov 4 2001</h3>
<ul>
  <li>fixed SGML catalogs broken in previous release, updated xmlcatalog
  tool</li>
  <li>fixed a compile errors and some includes troubles.</li>
</ul>

648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
<h3>2.4.7: Oct 30 2001</h3>
<ul>
  <li>exported some debugging interfaces</li>
  <li>serious rewrite of the catalog code</li>
  <li>integrated Gary Pennington thread safety patch, added configure option
    and regression tests</li>
  <li>removed an HTML parser bug</li>
  <li>fixed a couple of potentially serious validation bugs</li>
  <li>integrated the SGML DocBook support in xmllint</li>
  <li>changed the nanoftp anonymous login passwd</li>
  <li>some I/O cleanup and a couple of interfaces for Perl wrapper</li>
  <li>general bug fixes</li>
  <li>updated xmllint man page by John Fleck</li>
  <li>some VMS and Windows updates</li>
</ul>

664
665
<h3>2.4.6: Oct 10 2001</h3>
<ul>
666
  <li>added an updated man pages by John Fleck</li>
667
668
669
670
671
672
673
  <li>portability and configure fixes</li>
  <li>an infinite loop on the HTML parser was removed (William)</li>
  <li>Windows makefile patches from Igor</li>
  <li>fixed half a dozen bugs reported fof libxml or libxslt</li>
  <li>updated xmlcatalog to be able to modify SGML super catalogs</li>
</ul>

674
675
676
677
678
679
680
681
682
683
684
685
686
<h3>2.4.5: Sep 14 2001</h3>
<ul>
  <li>Remove a few annoying bugs in 2.4.4</li>
  <li>forces the HTML serializer to output decimal charrefs since some
    version of Netscape can't handle hexadecimal ones</li>
</ul>

<h3>1.8.16: Sep 14 2001</h3>
<ul>
  <li>maintenance release of the old libxml1 branch, couple of bug and
    portability fixes</li>
</ul>

687
688
689
690
691
692
693
694
<h3>2.4.4: Sep 12 2001</h3>
<ul>
  <li>added --convert to xmlcatalog, bug fixes and cleanups of XML
  Catalog</li>
  <li>a few bug fixes and some portability changes</li>
  <li>some documentation cleanups</li>
</ul>

695
696
697
698
699
700
701
702
<h3>2.4.3:  Aug 23 2001</h3>
<ul>
  <li>XML Catalog support see the doc</li>
  <li>New NaN/Infinity floating point code</li>
  <li>A few bug fixes</li>
</ul>

<h3>2.4.2:  Aug 15 2001</h3>
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
<ul>
  <li>adds xmlLineNumbersDefault() to control line number generation</li>
  <li>lot of bug fixes</li>
  <li>the Microsoft MSC projects files shuld now be up to date</li>
  <li>inheritance of namespaces from DTD defaulted attributes</li>
  <li>fixes a serious potential security bug</li>
  <li>added a --format option to xmllint</li>
</ul>

<h3>2.4.1:  July 24 2001</h3>
<ul>
  <li>possibility to keep line numbers in the tree</li>
  <li>some computation NaN fixes</li>
  <li>extension of the XPath API</li>
  <li>cleanup for alpha and ia64 targets</li>
  <li>patch to allow saving through HTTP PUT or POST</li>
719
720
721
722
723
724
725
726
</ul>

<h3>2.4.0: July 10 2001</h3>
<ul>
  <li>Fixed a few bugs in XPath, validation, and tree handling.</li>
  <li>Fixed XML Base implementation, added a coupel of examples to the
    regression tests</li>
  <li>A bit of cleanup</li>
727
728
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
729
730
<h3>2.3.14: July 5 2001</h3>
<ul>
731
732
  <li>fixed some entities problems and reduce mem requirement when
    substituing them</li>
Daniel Veillard's avatar
Daniel Veillard committed
733
734
735
736
737
738
739
740
  <li>lots of improvements in the XPath queries interpreter can be
    substancially faster</li>
  <li>Makefiles and configure cleanups</li>
  <li>Fixes to XPath variable eval, and compare on empty node set</li>
  <li>HTML tag closing bug fixed</li>
  <li>Fixed an URI reference computating problem when validating</li>
</ul>

741
742
743
744
745
746
747
748
749
750
751
752
<h3>2.3.13: June 28 2001</h3>
<ul>
  <li>2.3.12 configure.in was broken as well as the push mode XML parser</li>
  <li>a few more fixes for compilation on Windows MSC by Yon Derek</li>
</ul>

<h3>1.8.14: June 28 2001</h3>
<ul>
  <li>Zbigniew Chyla gave a patch to use the old XML parser in push mode</li>
  <li>Small Makefile fix</li>
</ul>

753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
<h3>2.3.12: June 26 2001</h3>
<ul>
  <li>lots of cleanup</li>
  <li>a couple of validation fix</li>
  <li>fixed line number counting</li>
  <li>fixed serious problems in the XInclude processing</li>
  <li>added support for UTF8 BOM at beginning of entities</li>
  <li>fixed a strange gcc optimizer bugs in xpath handling of float, gcc-3.0
    miscompile uri.c (William), Thomas Leitner provided a fix for the
    optimizer on Tru64</li>
  <li>incorporated Yon Derek and Igor Zlatkovic  fixes and improvements for
    compilation on Windows MSC</li>
  <li>update of libxml-doc.el (Felix Natter)</li>
  <li>fixed 2 bugs in URI normalization code</li>
</ul>

769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
<h3>2.3.11: June 17 2001</h3>
<ul>
  <li>updates to trio, Makefiles and configure should fix some portability
    problems (alpha)</li>
  <li>fixed some HTML serialization problems (pre, script, and block/inline
    handling), added encoding aware APIs, cleanup of this code</li>
  <li>added xmlHasNsProp()</li>
  <li>implemented a specific PI for encoding support in the DocBook SGML
    parser</li>
  <li>some XPath fixes (-Infinity, / as a function parameter and namespaces
    node selection)</li>
  <li>fixed a performance problem and an error in the validation code</li>
  <li>fixed XInclude routine to implement the recursive behaviour</li>
  <li>fixed xmlFreeNode problem when libxml is included statically twice</li>
  <li>added --version to xmllint for bug reports</li>
</ul>

786
787
788
<h3>2.3.10: June 1 2001</h3>
<ul>
  <li>fixed the SGML catalog support</li>
789
790
  <li>a number of reported bugs got fixed, in XPath, iconv detection,
    XInclude processing</li>
791
792
793
  <li>XPath string function should now handle unicode correctly</li>
</ul>

794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
<h3>2.3.9: May 19 2001</h3>

<p>Lots of bugfixes, and added a basic SGML catalog support:</p>
<ul>
  <li>HTML push bugfix #54891 and another patch from Jonas Borgstrm</li>
  <li>some serious speed optimisation again</li>
  <li>some documentation cleanups</li>
  <li>trying to get better linking on solaris (-R)</li>
  <li>XPath API cleanup from Thomas Broyer</li>
  <li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed
    xmlValidGetValidElements()</li>
  <li>Added an INSTALL file</li>
  <li>Attribute removal added to API: #54433</li>
  <li>added a basic support for SGML catalogs</li>
  <li>fixed xmlKeepBlanksDefault(0) API</li>
  <li>bugfix in xmlNodeGetLang()</li>
  <li>fixed a small configure portability problem</li>
  <li>fixed an inversion of SYSTEM and PUBLIC identifier in HTML document</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
814
815
816
817
818
<h3>1.8.13: May 14 2001</h3>
<ul>
  <li>bugfixes release of the old libxml1 branch used by Gnome</li>
</ul>

819
820
821
822
823
824
<h3>2.3.8: May 3 2001</h3>
<ul>
  <li>Integrated an SGML DocBook parser for the Gnome project</li>
  <li>Fixed a few things in the HTML parser</li>
  <li>Fixed some XPath bugs raised by XSLT use, tried to fix the floating
    point portability issue</li>
825
826
  <li>Speed improvement (8M/s for SAX, 3M/s for DOM, 1.5M/s for
    DOM+validation using the XML REC as input and a 700MHz celeron).</li>
827
828
829
830
831
832
  <li>incorporated more Windows cleanup</li>
  <li>added xmlSaveFormatFile()</li>
  <li>fixed problems in copying nodes with entities references (gdome)</li>
  <li>removed some troubles surrounding the new validation module</li>
</ul>

833
834
835
836
837
838
839
840
841
842
843
844
845
846
<h3>2.3.7: April 22 2001</h3>
<ul>
  <li>lots of small bug fixes, corrected XPointer</li>
  <li>Non determinist content model validation support</li>
  <li>added xmlDocCopyNode for gdome2</li>
  <li>revamped the way the HTML parser handles end of tags</li>
  <li>XPath: corrctions of namespacessupport and number formatting</li>
  <li>Windows: Igor Zlatkovic patches for MSC compilation</li>
  <li>HTML ouput fixes from P C Chow and William M. Brack</li>
  <li>Improved validation speed sensible for DocBook</li>
  <li>fixed a big bug with ID declared in external parsed entities</li>
  <li>portability fixes, update of Trio from Bjorn Reese</li>
</ul>

847
848
849
850
851
852
853
854
<h3>2.3.6: April 8 2001</h3>
<ul>
  <li>Code cleanup using extreme gcc compiler warning options, found and
    cleared half a dozen potential problem</li>
  <li>the Eazel team found an XML parser bug</li>
  <li>cleaned up the user of some of the string formatting function. used the
    trio library code to provide the one needed when the platform is missing
    them</li>
855
856
857
  <li>xpath: removed a memory leak and fixed the predicate evaluation
    problem, extended the testsuite and cleaned up the result. XPointer seems
    broken ...</li>
858
859
</ul>

860
861
862
863
864
865
866
867
868
<h3>2.3.5: Mar 23 2001</h3>
<ul>
  <li>Biggest change is separate parsing and evaluation of XPath expressions,
    there is some new APIs for this too</li>
  <li>included a number of bug fixes(XML push parser, 51876, notations,
  52299)</li>
  <li>Fixed some portability issues</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
<h3>2.3.4: Mar 10 2001</h3>
<ul>
  <li>Fixed bugs #51860 and #51861</li>
  <li>Added a global variable xmlDefaultBufferSize to allow default buffer
    size to be application tunable.</li>
  <li>Some cleanup in the validation code, still a bug left and this part
    should probably be rewritten to support ambiguous content model :-\</li>
  <li>Fix a couple of serious bugs introduced or raised by changes in 2.3.3
    parser</li>
  <li>Fixed another bug in xmlNodeGetContent()</li>
  <li>Bjorn fixed XPath node collection and Number formatting</li>
  <li>Fixed a loop reported in the HTML parsing</li>
  <li>blank space are reported even if the Dtd content model proves that they
    are formatting spaces, this is for XmL conformance</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
885
886
887
888
889
890
891
892
<h3>2.3.3: Mar 1 2001</h3>
<ul>
  <li>small change in XPath for XSLT</li>
  <li>documentation cleanups</li>
  <li>fix in validation by Gary Pennington</li>
  <li>serious parsing performances improvements</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
893
<h3>2.3.2: Feb 24 2001</h3>
Daniel Veillard's avatar
Daniel Veillard committed
894
895
896
897
898
899
900
<ul>
  <li>chasing XPath bugs, found a bunch, completed some TODO</li>
  <li>fixed a Dtd parsing bug</li>
  <li>fixed a bug in xmlNodeGetContent</li>
  <li>ID/IDREF support partly rewritten by Gary Pennington</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
901
<h3>2.3.1: Feb 15 2001</h3>
902
903
904
905
906
907
908
<ul>
  <li>some XPath and HTML bug fixes for XSLT</li>
  <li>small extension of the hash table interfaces for DOM gdome2
    implementation</li>
  <li>A few bug fixes</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
909
<h3>2.3.0: Feb 8 2001 (2.2.12 was on 25 Jan but I didn't kept track)</h3>
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
<ul>
  <li>Lots of XPath bug fixes</li>
  <li>Add a mode with Dtd lookup but without validation error reporting for
    XSLT</li>
  <li>Add support for text node without escaping (XSLT)</li>
  <li>bug fixes for xmlCheckFilename</li>
  <li>validation code bug fixes from Gary Pennington</li>
  <li>Patch from Paul D. Smith correcting URI path normalization</li>
  <li>Patch to allow simultaneous install of libxml-devel and
  libxml2-devel</li>
  <li>the example Makefile is now fixed</li>
  <li>added HTML to the RPM packages</li>
  <li>tree copying bugfixes</li>
  <li>updates to Windows makefiles</li>
  <li>optimisation patch from Bjorn Reese</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
927
<h3>2.2.11: Jan 4 2001</h3>
928
929
930
931
932
<ul>
  <li>bunch of bug fixes (memory I/O, xpath, ftp/http, ...)</li>
  <li>added htmlHandleOmittedElem()</li>
  <li>Applied Bjorn Reese's IPV6 first patch</li>
  <li>Applied Paul D. Smith patches for validation of XInclude results</li>
933
  <li>added XPointer xmlns() new scheme support</li>
934
935
</ul>

936
<h3>2.2.10: Nov 25 2000</h3>
937
938
939
940
941
942
943
944
<ul>
  <li>Fix the Windows problems of 2.2.8</li>
  <li>integrate OpenVMS patches</li>
  <li>better handling of some nasty HTML input</li>
  <li>Improved the XPointer implementation</li>
  <li>integrate a number of provided patches</li>
</ul>

945
946
947
948
949
<h3>2.2.9: Nov 25 2000</h3>
<ul>
  <li>erroneous release :-(</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
<h3>2.2.8: Nov 13 2000</h3>
<ul>
  <li>First version of <a href="http://www.w3.org/TR/xinclude">XInclude</a>
    support</li>
  <li>Patch in conditional section handling</li>
  <li>updated MS compiler project</li>
  <li>fixed some XPath problems</li>
  <li>added an URI escaping function</li>
  <li>some other bug fixes</li>
</ul>

<h3>2.2.7: Oct 31 2000</h3>
<ul>
  <li>added message redirection</li>
  <li>XPath improvements (thanks TOM !)</li>
  <li>xmlIOParseDTD() added</li>
  <li>various small fixes in the HTML, URI, HTTP and XPointer support</li>
  <li>some cleanup of the Makefile, autoconf and the distribution content</li>
</ul>

970
971
972
973
974
975
976
977
978
979
980
981
<h3>2.2.6: Oct 25 2000:</h3>
<ul>
  <li>Added an hash table module, migrated a number of internal structure to
    those</li>
  <li>Fixed a posteriori validation problems</li>
  <li>HTTP module cleanups</li>
  <li>HTML parser improvements (tag errors, script/style handling, attribute
    normalization)</li>
  <li>coalescing of adjacent text nodes</li>
  <li>couple of XPath bug fixes, exported the internal API</li>
</ul>

982
<h3>2.2.5: Oct 15 2000:</h3>
983
984
985
986
<ul>
  <li>XPointer implementation and testsuite</li>
  <li>Lot of XPath fixes, added variable and functions registration, more
    tests</li>
987
988
  <li>Portability fixes, lots of enhancements toward an easy Windows build
    and release</li>
989
990
991
  <li>Late validation fixes</li>
  <li>Integrated a lot of contributed patches</li>
  <li>added memory management docs</li>
992
  <li>a performance problem when using large buffer seems fixed</li>
993
994
995
</ul>

<h3>2.2.4: Oct 1 2000:</h3>
996
<ul>
997
998
999
  <li>main XPath problem fixed</li>
  <li>Integrated portability patches for Windows</li>
  <li>Serious bug fixes on the URI and HTML code</li>
1000
1001
</ul>

1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
<h3>2.2.3: Sep 17 2000</h3>
<ul>
  <li>bug fixes</li>
  <li>cleanup of entity handling code</li>
  <li>overall review of all loops in the parsers, all sprintf usage has been
    checked too</li>
  <li>Far better handling of larges Dtd. Validating against Docbook XML Dtd
    works smoothly now.</li>
</ul>

<h3>1.8.10: Sep 6 2000</h3>
<ul>
  <li>bug fix release for some Gnome projects</li>
</ul>

<h3>2.2.2: August 12 2000</h3>
Daniel Veillard's avatar
Daniel Veillard committed
1018
1019
<ul>
  <li>mostly bug fixes</li>
1020
  <li>started adding routines to access xml parser context options</li>
Daniel Veillard's avatar
Daniel Veillard committed
1021
1022
</ul>

1023
<h3>2.2.1: July 21 2000</h3>
Daniel Veillard's avatar
Daniel Veillard committed
1024
1025
1026
1027
<ul>
  <li>a purely bug fixes release</li>
  <li>fixed an encoding support problem when parsing from a memory block</li>
  <li>fixed a DOCTYPE parsing problem</li>
1028
1029
  <li>removed a bug in the function allowing to override the memory
    allocation routines</li>
Daniel Veillard's avatar
Daniel Veillard committed
1030
1031
</ul>

1032
<h3>2.2.0: July 14 2000</h3>
Daniel Veillard's avatar
Daniel Veillard committed
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
<ul>
  <li>applied a lot of portability fixes</li>
  <li>better encoding support/cleanup and saving (content is now always
    encoded in UTF-8)</li>
  <li>the HTML parser now correctly handles encodings</li>
  <li>added xmlHasProp()</li>
  <li>fixed a serious problem with &amp;#38;</li>
  <li>propagated the fix to FTP client</li>
  <li>cleanup, bugfixes, etc ...</li>
  <li>Added a page about <a href="encoding.html">libxml Internationalization
    support</a></li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
1046
1047
1048
1049
1050
1051
1052
<h3>1.8.9:  July 9 2000</h3>
<ul>
  <li>fixed the spec the RPMs should be better</li>
  <li>fixed a serious bug in the FTP implementation, released 1.8.9 to solve
    rpmfind users problem</li>
</ul>

1053
1054
1055
1056
1057
1058
<h3>2.1.1: July 1 2000</h3>
<ul>
  <li>fixes a couple of bugs in the 2.1.0 packaging</li>
  <li>improvements on the HTML parser</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
<h3>2.1.0 and 1.8.8: June 29 2000</h3>
<ul>
  <li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to
    <a href="upgrade.html">new instructions</a>. It fixes a nasty problem
    about &amp;#38; charref parsing</li>
  <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it
    also contains numerous fixes and enhancements:
    <ul>
      <li>added xmlStopParser() to stop parsing</li>
      <li>improved a lot parsing speed when there is large CDATA blocs</li>
      <li>includes XPath patches provided by Picdar Technology</li>
      <li>tried to fix as much as possible DtD validation and namespace
        related problems</li>
      <li>output to a given encoding has been added/tested</li>
      <li>lot of various fixes</li>
    </ul>
  </li>
</ul>

1078
<h3>2.0.0: Apr 12 2000</h3>
1079
1080
<ul>
  <li>First public release of libxml2. If you are using libxml, it's a good
1081
1082
1083
    idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally
    scheduled for Apr 3 the relase occured only on Apr 12 due to massive
    workload.</li>
1084
  <li>The include are now located under $prefix/include/libxml (instead of
1085
    $prefix/include/gnome-xml), they also are referenced by
Daniel Veillard's avatar
Daniel Veillard committed
1086
    <pre>#include &lt;libxml/xxx.h&gt;</pre>
1087
    <p>instead of</p>
1088
1089
    <pre>#include "xxx.h"</pre>
  </li>
1090
1091
1092
  <li>a new URI module for parsing URIs and following strictly RFC 2396</li>
  <li>the memory allocation routines used by libxml can now be overloaded
    dynamically by using xmlMemSetup()</li>
1093
1094
1095
  <li>The previously CVS only tool tester has been renamed
    <strong>xmllint</strong> and is now installed as part of the libxml2
    package</li>
1096
1097
1098
1099
1100
1101
  <li>The I/O interface has been revamped. There is now ways to plug in
    specific I/O modules, either at the URI scheme detection level using
    xmlRegisterInputCallbacks()  or by passing I/O functions when creating a
    parser context using xmlCreateIOParserCtxt()</li>
  <li>there is a C preprocessor macro LIBXML_VERSION providing the version
    number of the libxml module in use</li>
1102
1103
  <li>a number of optional features of libxml can now be excluded at
    configure time (FTP/HTTP/HTML/XPath/Debug)</li>
1104
1105
1106
1107
1108
</ul>

<h3>2.0.0beta: Mar 14 2000</h3>
<ul>
  <li>This is a first Beta release of libxml version 2</li>
Daniel Veillard's avatar
Daniel Veillard committed
1109
1110
1111
  <li>It's available only from<a href="ftp://xmlsoft.org/">xmlsoft.org
    FTP</a>, it's packaged as libxml2-2.0.0beta and available as tar and
  RPMs</li>
1112
1113
1114
1115
1116
1117
1118
  <li>This version is now the head in the Gnome CVS base, the old one is
    available under the tag LIB_XML_1_X</li>
  <li>This includes a very large set of changes. Froma  programmatic point of
    view applications should not have to be modified too much, check the <a
    href="upgrade.html">upgrade page</a></li>
  <li>Some interfaces may changes (especially a bit about encoding).</li>
  <li>the updates includes:
1119
    <ul>
1120
1121
      <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly
        handled now</li>
1122
1123
      <li>Better handling of entities, especially well formedness checking
        and proper PEref extensions in external subsets</li>
1124
      <li>DTD conditional sections</li>
Daniel Veillard's avatar
Daniel Veillard committed
1125
      <li>Validation now correcly handle entities content</li>
1126
1127
1128
1129
      <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change
        structures to accomodate DOM</a></li>
    </ul>
  </li>
1130
1131
1132
1133
1134
  <li>Serious progress were made toward compliance, <a
    href="conf/result.html">here are the result of the test</a> against the
    OASIS testsuite (except the japanese tests since I don't support that
    encoding yet). This URL is rebuilt every couple of hours using the CVS
    head version.</li>
Daniel Veillard's avatar
Daniel Veillard committed
1135
1136
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
1137
1138
1139
1140
1141
<h3>1.8.7: Mar 6 2000</h3>
<ul>
  <li>This is a bug fix release:</li>
  <li>It is possible to disable the ignorable blanks heuristic used by
    libxml-1.x, a new function  xmlKeepBlanksDefault(0) will allow this. Note
1142
1143
1144
    that for adherence to XML spec, this behaviour will be disabled by
    default in 2.x . The same function will allow to keep compatibility for
    old code.</li>
Daniel Veillard's avatar
Daniel Veillard committed
1145
1146
  <li>Blanks in &lt;a&gt;  &lt;/a&gt; constructs are not ignored anymore,
    avoiding heuristic is really the Right Way :-\</li>
Daniel Veillard's avatar
Daniel Veillard committed
1147
1148
1149
1150
1151
1152
  <li>The unchecked use of snprintf which was breaking libxml-1.8.6
    compilation on some platforms has been fixed</li>
  <li>nanoftp.c nanohttp.c: Fixed '#' and '?' stripping when processing
  URIs</li>
</ul>

Daniel Veillard's avatar
Daniel Veillard committed
1153
1154
1155
1156
1157
<h3>1.8.6: Jan 31 2000</h3>
<ul>
  <li>added a nanoFTP transport module, debugged until the new version of <a
    href="http://rpmfind.net/linux/rpm2html/rpmfind.html">rpmfind</a> can use
    it without troubles</li>
Daniel Veillard's avatar
Daniel Veillard committed
1158
1159
1160
1161
</ul>

<h3>1.8.5: Jan 21 2000</h3>
<ul>
1162
  <li>adding APIs to parse a well balanced chunk of XML (production <a
1163
1164
    href="http://www.w3.org/TR/REC-xml#NT-content">[43] content</a> of the
    XML spec)</li>
1165
  <li>fixed a hideous bug in xmlGetProp pointed by Rune.Djurhuus@fast.no</li>
Daniel Veillard's avatar
Daniel Veillard committed
1166
1167
  <li>Jody Goldberg &lt;jgoldberg@home.com&gt; provided another patch trying
    to solve the zlib checks problems</li>
1168
1169
  <li>The current state in gnome CVS base is expected to ship as 1.8.5 with
    gnumeric soon</li>
1170
1171
1172
1173
1174
1175
1176
1177
</ul>

<h3>1.8.4: Jan 13 2000</h3>
<ul>
  <li>bug fixes, reintroduced xmlNewGlobalNs(), fixed xmlNewNs()</li>
  <li>all exit() call should have been removed from libxml</li>
  <li>fixed a problem with INCLUDE_WINSOCK on WIN32 platform</li>
  <li>added newDocFragment()</li>
1178
1179
1180
1181
1182
</ul>

<h3>1.8.3: Jan 5 2000</h3>
<ul>
  <li>a Push interface for the XML and HTML parsers</li>
Daniel Veillard's avatar
Daniel Veillard committed
1183
  <li>a shell-like interface to the document tree (try tester --shell :-)</li>
1184
  <li>lots of bug fixes and improvement added over XMas hollidays</li>
1185
  <li>fixed the DTD parsing code to work with the xhtml DTD</li>
1186
1187
  <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li>
  <li>Fixed bugs in xmlNewNs()</li>
1188
  <li>External entity loading code has been revamped, now it uses
1189
    xmlLoadExternalEntity(), some fix on entities processing were added</li>
1190
  <li>cleaned up WIN32 includes of socket stuff</li>
1191
1192
1193
1194
</ul>

<h3>1.8.2: Dec 21 1999</h3>
<ul>
Daniel Veillard's avatar
Daniel Veillard committed
1195
1196
  <li>I got another problem with includes and C++, I hope this issue is fixed
    for good this time</li>
1197
1198
1199
1200
1201
  <li>Added a few tree modification functions: xmlReplaceNode,
    xmlAddPrevSibling, xmlAddNextSibling, xmlNodeSetName and
    xmlDocSetRootElement</li>
  <li>Tried to improve the HTML output with help from <a
    href="mailto:clahey@umich.edu">Chris Lahey</a></li>
1202
</ul>
Daniel Veillard's avatar
Daniel Veillard committed
1203

1204
1205
1206
1207
1208
1209
1210
<h3>1.8.1: Dec 18 1999</h3>
<ul>
  <li>various patches to avoid troubles when using libxml with C++ compilers
    the "namespace" keyword and C escaping in include files</li>
  <li>a problem in one of the core macros IS_CHAR was corrected</li>
  <li>fixed a bug introduced in 1.8.0 breaking default namespace processing,
    and more specifically the Dia application</li>
1211
1212
  <li>fixed a posteriori validation (validation after parsing, or by using a
    Dtd not specified in the original document)</li>
Daniel Veillard's avatar
Daniel Veillard committed
1213
  <li>fixed a bug in</li>
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
</ul>

<h3>1.8.0: Dec 12 1999</h3>
<ul>
  <li>cleanup, especially memory wise</li>
  <li>the parser should be more reliable, especially the HTML one, it should
    not crash, whatever the input !</li>
  <li>Integrated various patches, especially a speedup improvement for large
    dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>,
    configure with --with-buffers to enable them.</li>
  <li>attribute normalization, oops should have been added long ago !</li>
  <li>attributes defaulted from Dtds should be available, xmlSetProp() now
    does entities escapting by default.</li>
1227
</ul>
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237

<h3>1.7.4: Oct 25 1999</h3>
<ul>
  <li>Lots of HTML improvement</li>
  <li>Fixed some errors when saving both XML and HTML</li>
  <li>More examples, the regression tests should now look clean</li>
  <li>Fixed a bug with contiguous charref</li>
</ul>

<h3>1.7.3: Sep 29 1999</h3>
1238
<ul>
1239
  <li>portability problems fixed</li>
1240
  <li>snprintf was used unconditionnally, leading to link problems on system
1241
    were it's not available, fixed</li>
1242
1243
1244
1245
1246
1247
</ul>

<h3>1.7.1: Sep 24 1999</h3>
<ul>
  <li>The basic type for strings manipulated by libxml has been renamed in
    1.7.1 from <strong>CHAR</strong> to <strong>xmlChar</strong>. The reason
1248
1249
    is that CHAR was conflicting with a predefined type on Windows. However
    on non WIN32 environment, compatibility is provided by the way of  a
1250
1251
1252
1253
1254
1255
1256
1257
    <strong>#define </strong>.</li>
  <li>Changed another error : the use of a structure field called errno, and
    leading to troubles on platforms where it's a macro</li>
</ul>

<h3>1.7.0: sep 23 1999</h3>
<ul>
  <li>Added the ability to fetch remote DTD or parsed entities, see the <a
Daniel Veillard's avatar
Daniel Veillard committed
1258
    href="html/libxml-nanohttp.html">nanohttp</a> module.</li>
1259
1260
1261
1262
  <li>Added an errno to report errors by another mean than a simple printf
    like callback</li>
  <li>Finished ID/IDREF support and checking when validation</li>
  <li>Serious memory leaks fixed (there is now a <a
Daniel Veillard's avatar
Daniel Veillard committed
1263
    href="html/libxml-xmlmemory.html">memory wrapper</a> module)</li>
1264
1265
1266
1267
1268
1269
  <li>Improvement of <a href="http://www.w3.org/TR/xpath">XPath</a>
    implementation</li>
  <li>Added an HTML parser front-end</li>
</ul>

<h2><a name="XML">XML</a></h2>
1270

1271
<p><a href="http://www.w3.org/TR/REC-xml">XML is a standard</a> for
Daniel Veillard's avatar
Daniel Veillard committed
1272
1273
markup-based structured documents. Here is <a name="example">an example XML
document</a>:</p>
Daniel Veillard's avatar
Daniel Veillard committed
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
<pre>&lt;?xml version="1.0"?&gt;
&lt;EXAMPLE prop1="gnome is great" prop2="&amp;amp; linux too"&gt;
  &lt;head&gt;
   &lt;title&gt;Welcome to Gnome&lt;/title&gt;
  &lt;/head&gt;
  &lt;chapter&gt;
   &lt;title&gt;The Linux adventure&lt;/title&gt;
   &lt;p&gt;bla bla bla ...&lt;/p&gt;
   &lt;image href="linus.gif"/&gt;
   &lt;p&gt;...&lt;/p&gt;
  &lt;/chapter&gt;
&lt;/EXAMPLE&gt;</pre>
1286

1287
1288
1289
<p>The first line specifies that it's an XML document and gives useful
information about its encoding. Then the document is a text format whose
structure is specified by tags between brackets. <strong>Each tag opened has
Daniel Veillard's avatar
Daniel Veillard committed
1290
to be closed</strong>. XML is pedantic about this. However, if a tag is empty
1291
1292
1293
1294
(no content), a single tag can serve as both the opening and closing tag if
it ends with <code>/&gt;</code> rather than with <code>&gt;</code>. Note
that, for example, the image tag has no content (just an attribute) and is
closed by ending the tag with <code>/&gt;</code>.</p>
1295

1296
<p>XML can be applied sucessfully to a wide range of uses, from long term
1297
1298
structured document maintenance (where it follows the steps of SGML) to
simple data encoding mechanisms like configuration file formatting (glade),
Daniel Veillard's avatar
Daniel Veillard committed
1299
1300
spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where
it is used to encode remote calls between a client and a server.</p>
1301

1302
1303
<h2><a name="XSLT">XSLT</a></h2>

1304
1305
<p>Check <a href="http://xmlsoft.org/XSLT">the separate libxslt page</a></p>

1306
1307
1308
<p><a href="http://www.w3.org/TR/xslt">XSL Transformations</a>,  is a
language for transforming XML documents into other XML documents (or
HTML/textual output).</p>
1309
1310
1311
1312

<p>A separate library called libxslt is being built on top of libxml2. This
module "libxslt" can be found in the Gnome CVS base too.</p>

1313
<p>You can check the <a
1314
1315
href="http://cvs.gnome.org/lxr/source/libxslt/FEATURES">features</a>
supported and the progresses on the <a
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
href="http://cvs.gnome.org/lxr/source/libxslt/ChangeLog"
name="Changelog">Changelog</a></p>

<h2><a name="Python">Python and bindings</a></h2>

<p>There is a number of language bindings and wrappers available for libxml2,
the list below is not exhaustive. Please contact the <a
href="http://mail.gnome.org/mailman/listinfo/xml-bindings">xml-bindings@gnome.org</a>
(<a href="http://mail.gnome.org/archives/xml-bindings/">archives</a>) in
order to get updates to this list or to discuss the specific topic of libxml2
or libxslt wrappers or bindings:</p>
<ul>
1328
1329
  <li><a href="mailto:ari@lusis.org">Ari Johnson</a> provides a  C++ wrapper
    for libxml:<br>
1330
1331
1332
1333
1334
1335
1336
1337
    Website: <a
    href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a><br>
    Download: <a
    href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></li>
  <li>There is another <a href="http://libgdome-cpp.berlios.de/">C++ wrapper
    based on the gdome2 </a>bindings maintained by Tobias Peters.</li>
  <li><a
    href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt
1338
1339
1340
1341
1342
1343
    Sergeant</a> developped <a
    href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for
    libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML
    application server</a></li>
  <li><a href="mailto:dkuhlman@cutter.rexx.com">Dave Kuhlman</a> provides and
    earlier version of the libxml/libxslt <a
1344
1345
1346
1347
1348
1349
1350
1351
1352
    href="http://www.rexx.com/~dkuhlman">wrappers for Python</a></li>
  <li>Petr Kozelka provides <a
    href="http://sourceforge.net/projects/libxml2-pas">Pascal units to glue
    libxml2</a> with Kylix, Delphi and other Pascal compilers</li>
  <li>Wai-Sun "Squidster" Chia provides <a
    href="http://www.rubycolor.org/arc/redist/">bindings for Ruby</a>  and
    libxml2 bindings are also available in Ruby through the <a
    href="http://libgdome-ruby.berlios.de/">libgdome-ruby</a> module
    maintained by Tobias Peters.</li>
1353
1354
1355
1356
  <li>Steve Ball and contributors maintains <a
    href="http://tclxml.sourceforge.net/">libxml2 and libxslt bindings for
    Tcl</a></li>
  <li>There is support for libxml2 in the DOM module of PHP.</li>
1357
1358
1359
1360
</ul>

<p>The distribution includes a set of Python bindings, which are garanteed to
be maintained as part of the library in the future, though the Python
1361
interface have not yet reached the maturity of the C API.</p>
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379

<p>To install the Python bindings there are 2 options:</p>
<ul>
  <li>If you use an RPM based distribution, simply install the <a
    href="http://rpmfind.net/linux/rpm2html/search.php?query=libxml2-python">libxml2-python
    RPM</a> (and if needed the <a
    href="http://rpmfind.net/linux/rpm2html/search.php?query=libxslt-python">libxslt-python
    RPM</a>).</li>
  <li>Otherwise use the <a href="ftp://xmlsoft.org/python/">libxml2-python
    module distribution</a> corresponding to your installed version of
    libxml2 and libxslt. Note that to install it you will need both libxml2
    and libxslt installed and run "python setup.py build install" in the
    module tree.</li>
</ul>

<p>The distribution includes a set of examples and regression tests for the
python bindings in the <code>python/tests</code> directory. Here are some
excepts from those tests:</p>
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404

<h3>tst.py:</h3>

<p>This is a basic test of the file interface and DOM navigation:</p>
<pre>import libxml2

doc = libxml2.parseFile("tst.xml")
if doc.name != "tst.xml":
    print "doc.name failed"
    sys.exit(1)
root = doc.children
if root.name != "doc":
    print "root.name failed"
    sys.exit(1)
child = root.children
if child.name != "foo":
    print "child.name failed"
    sys.exit(1)
doc.freeDoc()</pre>

<p>The Python module is called libxml2, parseFile is the equivalent of
xmlParseFile (most of the bindings are automatically generated, and the xml
prefix is removed and the casing convention are kept). All node seen at the
binding level share the same subset of accesors:</p>
<ul>
1405
1406
1407
1408
1409
1410
1411
1412
1413
  <li><code>name</code> : returns the node name</li>
  <li><code>type</code> : returns a string indicating the node
    typ<code>e</code></li>
  <li><code>content</code> : returns the content of the node, it is based on
    xmlNodeGetContent() and hence is recursive.</li>
  <li><code>parent</code> , <code>children</code>, <code>last</code>,
    <code>next</code>, <code>prev</code>, <code>doc</code>,
    <code>properties</code>: pointing to the associated element in the tree,
    those may return None in case no such link exists.</li>
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
</ul>

<p>Also note the need to explicitely deallocate documents with freeDoc() .
Reference counting for libxml2 trees would need quite a lot of work to
function properly, and rather than risk memory leaks if not implemented
correctly it sounds safer to have an explicit function to free a tree. The
wrapper python objects like doc, root or child are them automatically garbage
collected.</p>

<h3>validate.py:</h3>

<p>This test check the validation interfaces and redirection of error
messages:</p>
<pre>import libxml2

#desactivate error messages from the validation
def noerr(ctx, str):
    pass

libxml2.registerErrorHandler(noerr, None)

ctxt = libxml2.createFileParserCtxt("invalid.xml")
ctxt.validate(1)
ctxt.parseDocument()
doc = ctxt.doc()
valid = ctxt.isValid()
doc.freeDoc()
if valid != 0:
    print "validity chec failed"</pre>

<p>The first thing to notice is the call to registerErrorHandler(), it
defines a new error handler global to the library. It is used to avoid seeing
the error messages when trying to validate the invalid document.</p>

<p>The main interest of that test is the creation of a parser context with
createFileParserCtxt() and how the behaviour can be changed before calling
parseDocument() . Similary the informations resulting from the parsing phase
are also available using context methods.</p>

<p>Contexts like nodes are defined as class and the libxml2 wrappers maps the
C function interfaces in terms of objects method as much as possible. The
best to get a complete view of what methods are supported is to look at the
libxml2.py module containing all the wrappers.</p>

<h3>push.py:</h3>

<p>This test show how to activate the push parser interface:</p>
<pre>import libxml2

ctxt = libxml2.createPushParser(None, "&lt;foo", 4, "test.xml")
ctxt.parseChunk("/&gt;", 2, 1)
doc = ctxt.doc()

doc.freeDoc()</pre>

<p>The context is created with a speciall call based on the
xmlCreatePushParser() from the C library. The first argument is an optional
SAX callback object, then the initial set of data, the lenght and the name of
the resource in case URI-References need to be computed by the parser.</p>

<p>Then the data are pushed using the parseChunk() method, the last call
setting the thrird argument terminate to 1.</p>

<h3>pushSAX.py:</h3>

<p>this test show the use of the event based parsing interfaces. In this case
the parser does not build a document, but provides callback information as
the parser makes progresses analyzing the data being provided:</p>
<pre>import libxml2
log = ""

class callback:
    def startDocument(self):
        global log
        log = log + "startDocument:"

    def endDocument(self):
        global log
        log = log + "endDocument:"

    def startElement(self, tag, attrs):
        global log
        log = log + "startElement %s %s:" % (tag, attrs)

    def endElement(self, tag):
        global log
        log = log + "endElement %s:" % (tag)

    def characters(self, data):
        global log
        log = log + "characters: %s:" % (data)

    def warning(self, msg):
        global log
        log = log + "warning: %s:" % (msg)

    def error(self, msg):
        global log
        log = log + "error: %s:" % (msg)

    def fatalError(self, msg):
        global log
        log = log + "fatalError: %s:" % (msg)

handler = callback()

ctxt = libxml2.createPushParser(handler, "&lt;foo", 4, "test.xml")
chunk = " url='tst'&gt;b"
ctxt.parseChunk(chunk, len(chunk), 0)
chunk = "ar&lt;/foo&gt;"
ctxt.parseChunk(chunk, len(chunk), 1)

Daniel Veillard's avatar
Daniel Veillard committed
1526
1527
reference = "startDocument:startElement foo {'url': 'tst'}:" + \ 
            "characters: bar:endElement foo:endDocument:"
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
if log != reference:
    print "Error got: %s" % log
    print "Exprected: %s" % reference</pre>

<p>The key object in that test is the handler, it provides a number of entry
points which can be called by the parser as it makes progresses to indicate
the information set obtained. The full set of callback is larger than what
the callback class in that specific example implements (see the SAX
definition for a complete list). The wrapper will only call those supplied by
the object when activated. The startElement receives the names of the element
and a dictionnary containing the attributes carried by this element.</p>

<p>Also note that the reference string generated from the callback shows a
single character call even though the string "bar" is passed to the parser
from 2 different call to parseChunk()</p>

<h3>xpath.py:</h3>

<p>This is a basic test of XPath warppers support</p>
<pre>import libxml2

doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
    print "xpath query: wrong node set size"
    sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
    print "xpath query: wrong node set value"
    sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()</pre>

<p>This test parses a file, then create an XPath context to evaluate XPath
expression on it. The xpathEval() method execute an XPath query and returns
the result mapped in a Python way. String and numbers are natively converted,
and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like
the document, the XPath context need to be freed explicitely, also not that
the result of the XPath query may point back to the document tree and hence
the document must be freed after the result of the query is used.</p>

<h3>xpathext.py:</h3>

<p>This test shows how to extend the XPath engine with functions written in
python:</p>
<pre>import libxml2

def foo(ctx, x):
    return x + 1

doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
libxml2.registerXPathFunction(ctxt._o, "foo", None, foo)
res = ctxt.xpathEval("foo(1)")
if res != 2:
    print "xpath extension failure"
doc.freeDoc()
ctxt.xpathFreeContext()</pre>

<p>Note how the extension function is registered with the context (but that
part is not yet finalized, ths may change slightly in the future).</p>

<h3>tstxpath.py:</h3>

<p>This test is similar to the previousone but shows how the extension
function can access the XPath evaluation context:</p>
<pre>def foo(ctx, x):
    global called

    #
    # test that access to the XPath evaluation contexts
    #
    pctxt = libxml2.xpathParserContext(_obj=ctx)
    ctxt = pctxt.context()
    called = ctxt.function()
    return x + 1</pre>

<p>All the interfaces around the XPath parser(or rather evaluation) context
are not finalized, but it should be sufficient to do contextual work at the
evaluation point.</p>

<h3>Memory debugging:</h3>

<p>last but not least, all tests starts with the following prologue:</p>
<pre>#memory debug specific
1613
libxml2.debugMemory(1)</pre>
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627

<p>and ends with the following epilogue:</p>
<pre>#memory debug specific
libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
    print "OK"
else:
    print "Memory leak %d bytes" % (libxml2.debugMemory(1))
    libxml2.dumpMemory()</pre>

<p>Those activate the memory debugging interface of libxml2 where all
alloacted block in the library are tracked. The prologue then cleans up the
library state and checks that all allocated memory has been freed. If not it
calls dumpMemory() which saves that list in a <code>.memdump</code> file.</p>
1628

1629
<h2><a name="architecture">libxml architecture</a></h2>
1630

Daniel Veillard's avatar
Daniel Veillard committed
1631
1632
<p>Libxml is made of multiple components; some of them are optional, and most
of the block interfaces are public. The main components are:</p>
1633
1634
<ul>
  <li>an Input/Output layer</li>
1635
  <li>FTP and HTTP client layers (optional)</li>
1636
  <li>an Internationalization layer managing the encodings support</li>