Commit abfca615 authored by Daniel Veillard's avatar Daniel Veillard
Browse files

applying patch from Mark Vakoc for Windows applied doc fixes from Sven

* win32/Makefile.bcb win32/Makefile.mingw win32/Makefile.msvc:
  applying patch from Mark Vakoc for Windows
* doc/catalog.html doc/encoding.html doc/xml.html: applied doc
  fixes from Sven Zimmerman
Daniel
parent 46da4649
Thu Jan 8 00:36:00 CET 2004 Daniel Veillard <daniel@veillard.com>
* win32/Makefile.bcb win32/Makefile.mingw win32/Makefile.msvc:
applying patch from Mark Vakoc for Windows
* doc/catalog.html doc/encoding.html doc/xml.html: applied doc
fixes from Sven Zimmerman
Tue Jan 6 23:51:46 CET 2004 Daniel Veillard <daniel@veillard.com>
 
* python/libxml2-python-api.xml python/libxml_wrap.h python/types.c
......
......@@ -238,7 +238,7 @@ literature to point at:</p><ul><li>You can find a good rant from Norm Walsh abou
Resolution</a> who maintains XML Catalog, you will find pointers to the
specification update, some background and pointers to others tools
providing XML Catalog support</li>
<li>Here is a <a href="buildDocBookCatalog">shell script</a> to generate
<li>There is a <a href="buildDocBookCatalog">shell script</a> to generate
XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/
directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on
the resources found on the system. Otherwise it will just create
......
......@@ -22,13 +22,13 @@ by using Unicode. Any conformant XML parser has to support the UTF-8 and
UTF-16 default encodings which can both express the full unicode ranges. UTF8
is a variable length encoding whose greatest points are to reuse the same
encoding for ASCII and to save space for Western encodings, but it is a bit
more complex to handle in practice. UTF-16 use 2 bytes per characters (and
more complex to handle in practice. UTF-16 use 2 bytes per character (and
sometimes combines two pairs), it makes implementation easier, but looks a
bit overkill for Western languages encoding. Moreover the XML specification
allows document to be encoded in other encodings at the condition that they
allows the document to be encoded in other encodings at the condition that they
are clearly labeled as such. For example the following is a wellformed XML
document encoded in ISO-8859 1 and using accentuated letter that we French
likes for both markup and content:</p><pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
document encoded in ISO-8859-1 and using accentuated letters that we French
like for both markup and content:</p><pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;très&gt;&lt;/très&gt;</pre><p>Having internationalization support in libxml2 means the following:</p><ul><li>the document is properly parsed</li>
<li>informations about it's encoding are saved</li>
<li>it can be modified</li>
......@@ -48,9 +48,9 @@ an internationalized fashion by libxml2 too:</p><pre>&lt;!DOCTYPE HTML PUBLIC "-
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;W3C crée des standards pour le Web.&lt;/body&gt;
&lt;/html&gt;</pre><h3><a name="internal" id="internal">The internal encoding, how and why</a></h3><p>One of the core decision was to force all documents to be converted to a
&lt;/html&gt;</pre><h3><a name="internal" id="internal">The internal encoding, how and why</a></h3><p>One of the core decisions was to force all documents to be converted to a
default internal encoding, and that encoding to be UTF-8, here are the
rationale for those choices:</p><ul><li>keeping the native encoding in the internal form would force the libxml
rationales for those choices:</p><ul><li>keeping the native encoding in the internal form would force the libxml
users (or the code associated) to be fully aware of the encoding of the
original document, for examples when adding a text node to a document,
the content would have to be provided in the document encoding, i.e. the
......@@ -79,7 +79,7 @@ rationale for those choices:</p><ul><li>keeping the native encoding in the inter
for using UTF-16 or UCS-4.</li>
<li>UTF-8 is being used as the de-facto internal encoding standard for
related code like the <a href="http://www.pango.org/">pango</a>
upcoming Gnome text widget, and a lot of Unix code (yep another place
upcoming Gnome text widget, and a lot of Unix code (yet another place
where Unix programmer base takes a different approach from Microsoft
- they are using UTF-16)</li>
</ul></li>
......@@ -92,8 +92,8 @@ rationale for those choices:</p><ul><li>keeping the native encoding in the inter
(internationalization) support get triggered only during I/O operation, i.e.
when reading a document or saving one. Let's look first at the reading
sequence:</p><ol><li>when a document is processed, we usually don't know the encoding, a
simple heuristic allows to detect UTF-16 and UCS-4 from whose where the
ASCII range (0-0x7F) maps with ASCII</li>
simple heuristic allows to detect UTF-16 and UCS-4 from encodings
where the ASCII range (0-0x7F) maps with ASCII</li>
<li>the xml declaration if available is parsed, including the encoding
declaration. At that point, if the autodetected encoding is different
from the one declared a call to xmlSwitchEncoding() is issued.</li>
......@@ -121,7 +121,7 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
</li>
<li>From that point the encoder processes progressively the input (it is
plugged as a front-end to the I/O module) for that entity. It captures
and convert on-the-fly the document to be parsed to UTF-8. The parser
and converts on-the-fly the document to be parsed to UTF-8. The parser
itself just does UTF-8 checking of this input and process it
transparently. The only difference is that the encoding information has
been added to the parsing context (more precisely to the input
......@@ -154,10 +154,10 @@ encoding:</p><ol><li>if no encoding is given, libxml2 will look for an encoding
resume the conversion. This guarantees that any document will be saved
without losses (except for markup names where this is not legal, this is
a problem in the current version, in practice avoid using non-ascii
characters for tags or attributes names @@). A special "ascii" encoding
characters for tag or attribute names). A special "ascii" encoding
name is used to save documents to a pure ascii form can be used when
portability is really crucial</li>
</ol><p>Here is a few examples based on the same test document:</p><pre>~/XML -&gt; ./xmllint isolat1
</ol><p>Here are a few examples based on the same test document:</p><pre>~/XML -&gt; ./xmllint isolat1
&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;très&gt;&lt;/très&gt;
~/XML -&gt; ./xmllint --encode UTF-8 isolat1
......@@ -190,7 +190,7 @@ aliases when handling a document:</p><ul><li>int xmlAddEncodingAlias(const char
<li>const char * xmlGetEncodingAlias(const char *alias);</li>
<li>void xmlCleanupEncodingAliases(void);</li>
</ul><h3><a name="extend" id="extend">How to extend the existing support</a></h3><p>Well adding support for new encoding, or overriding one of the encoders
(assuming it is buggy) should not be hard, just write an input and output
(assuming it is buggy) should not be hard, just write input and output
conversion routines to/from UTF-8, and register them using
xmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx), and they will be
called automatically if the parser(s) encounter such an encoding name
......
......@@ -2773,13 +2773,13 @@ by using Unicode. Any conformant XML parser has to support the UTF-8 and
UTF-16 default encodings which can both express the full unicode ranges. UTF8
is a variable length encoding whose greatest points are to reuse the same
encoding for ASCII and to save space for Western encodings, but it is a bit
more complex to handle in practice. UTF-16 use 2 bytes per characters (and
more complex to handle in practice. UTF-16 use 2 bytes per character (and
sometimes combines two pairs), it makes implementation easier, but looks a
bit overkill for Western languages encoding. Moreover the XML specification
allows document to be encoded in other encodings at the condition that they
allows the document to be encoded in other encodings at the condition that they
are clearly labeled as such. For example the following is a wellformed XML
document encoded in ISO-8859 1 and using accentuated letter that we French
likes for both markup and content:</p>
document encoded in ISO-8859-1 and using accentuated letters that we French
like for both markup and content:</p>
<pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;trs&gt;l&lt;/trs&gt;</pre>
......@@ -2813,9 +2813,9 @@ an internationalized fashion by libxml2 too:</p>
<h3><a name="internal">The internal encoding, how and why</a></h3>
<p>One of the core decision was to force all documents to be converted to a
<p>One of the core decisions was to force all documents to be converted to a
default internal encoding, and that encoding to be UTF-8, here are the
rationale for those choices:</p>
rationales for those choices:</p>
<ul>
<li>keeping the native encoding in the internal form would force the libxml
users (or the code associated) to be fully aware of the encoding of the
......@@ -2847,7 +2847,7 @@ rationale for those choices:</p>
for using UTF-16 or UCS-4.</li>
<li>UTF-8 is being used as the de-facto internal encoding standard for
related code like the <a href="http://www.pango.org/">pango</a>
upcoming Gnome text widget, and a lot of Unix code (yep another place
upcoming Gnome text widget, and a lot of Unix code (yet another place
where Unix programmer base takes a different approach from Microsoft
- they are using UTF-16)</li>
</ul>
......@@ -2871,8 +2871,8 @@ when reading a document or saving one. Let's look first at the reading
sequence:</p>
<ol>
<li>when a document is processed, we usually don't know the encoding, a
simple heuristic allows to detect UTF-16 and UCS-4 from whose where the
ASCII range (0-0x7F) maps with ASCII</li>
simple heuristic allows to detect UTF-16 and UCS-4 from encodings
where the ASCII range (0-0x7F) maps with ASCII</li>
<li>the xml declaration if available is parsed, including the encoding
declaration. At that point, if the autodetected encoding is different
from the one declared a call to xmlSwitchEncoding() is issued.</li>
......@@ -2900,7 +2900,7 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
</li>
<li>From that point the encoder processes progressively the input (it is
plugged as a front-end to the I/O module) for that entity. It captures
and convert on-the-fly the document to be parsed to UTF-8. The parser
and converts on-the-fly the document to be parsed to UTF-8. The parser
itself just does UTF-8 checking of this input and process it
transparently. The only difference is that the encoding information has
been added to the parsing context (more precisely to the input
......@@ -2937,12 +2937,12 @@ encoding:</p>
resume the conversion. This guarantees that any document will be saved
without losses (except for markup names where this is not legal, this is
a problem in the current version, in practice avoid using non-ascii
characters for tags or attributes names @@). A special "ascii" encoding
characters for tag or attribute names). A special "ascii" encoding
name is used to save documents to a pure ascii form can be used when
portability is really crucial</li>
</ol>
<p>Here is a few examples based on the same test document:</p>
<p>Here are a few examples based on the same test document:</p>
<pre>~/XML -&gt; ./xmllint isolat1
&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;trs&gt;l&lt;/trs&gt;
......@@ -2996,7 +2996,7 @@ aliases when handling a document:</p>
<h3><a name="extend">How to extend the existing support</a></h3>
<p>Well adding support for new encoding, or overriding one of the encoders
(assuming it is buggy) should not be hard, just write an input and output
(assuming it is buggy) should not be hard, just write input and output
conversion routines to/from UTF-8, and register them using
xmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx), and they will be
called automatically if the parser(s) encounter such an encoding name
......@@ -3563,7 +3563,7 @@ literature to point at:</p>
Resolution</a> who maintains XML Catalog, you will find pointers to the
specification update, some background and pointers to others tools
providing XML Catalog support</li>
<li>Here is a <a href="buildDocBookCatalog">shell script</a> to generate
<li>There is a <a href="buildDocBookCatalog">shell script</a> to generate
XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/
directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on
the resources found on the system. Otherwise it will just create
......
......@@ -148,7 +148,8 @@ XML_OBJS = $(XML_INTDIR)\c14n.obj\
$(XML_INTDIR)\xmlunicode.obj\
$(XML_INTDIR)\xmlwriter.obj\
$(XML_INTDIR)\xpath.obj\
$(XML_INTDIR)\xpointer.obj
$(XML_INTDIR)\xpointer.obj\
$(XML_INTDIR)\xmlstring.obj
# Static libxml object files.
XML_OBJS_A = $(XML_INTDIR_A)\c14n.obj\
......@@ -189,7 +190,8 @@ XML_OBJS_A = $(XML_INTDIR_A)\c14n.obj\
$(XML_INTDIR_A)\xmlunicode.obj\
$(XML_INTDIR_A)\xmlwriter.obj\
$(XML_INTDIR_A)\xpath.obj\
$(XML_INTDIR_A)\xpointer.obj
$(XML_INTDIR_A)\xpointer.obj\
$(XML_INTDIR_A)\xmlstring.obj
# Xmllint and friends executables.
UTILS = $(BINDIR)\xmllint.exe\
......
......@@ -138,7 +138,8 @@ XML_OBJS = $(XML_INTDIR)/c14n.o\
$(XML_INTDIR)/xmlunicode.o\
$(XML_INTDIR)/xmlwriter.o\
$(XML_INTDIR)/xpath.o\
$(XML_INTDIR)/xpointer.o
$(XML_INTDIR)/xpointer.o\
$(XML_INTDIR)/xmlstring.o
XML_SRCS = $(subst .o,.c,$(subst $(XML_INTDIR)/,$(XML_SRCDIR)/,$(XML_OBJS)))
......@@ -181,7 +182,8 @@ XML_OBJS_A = $(XML_INTDIR_A)/c14n.o\
$(XML_INTDIR_A)/xmlunicode.o\
$(XML_INTDIR_A)/xmlwriter.o\
$(XML_INTDIR_A)/xpath.o\
$(XML_INTDIR_A)/xpointer.o
$(XML_INTDIR_A)/xpointer.o\
$(XML_INTDIR_A)/xmlstring.o
XML_SRCS_A = $(subst .o,.c,$(subst $(XML_INTDIR_A)/,$(XML_SRCDIR)/,$(XML_OBJS_A)))
......
......@@ -127,7 +127,8 @@ XML_OBJS = $(XML_INTDIR)\c14n.obj\
$(XML_INTDIR)\xmlunicode.obj\
$(XML_INTDIR)\xmlwriter.obj\
$(XML_INTDIR)\xpath.obj\
$(XML_INTDIR)\xpointer.obj
$(XML_INTDIR)\xpointer.obj\
$(XML_INTDIR)\xmlstring.obj
# Static libxml object files.
XML_OBJS_A = $(XML_INTDIR_A)\c14n.obj\
......@@ -168,7 +169,8 @@ XML_OBJS_A = $(XML_INTDIR_A)\c14n.obj\
$(XML_INTDIR_A)\xmlunicode.obj\
$(XML_INTDIR_A)\xmlwriter.obj\
$(XML_INTDIR_A)\xpath.obj\
$(XML_INTDIR_A)\xpointer.obj
$(XML_INTDIR_A)\xpointer.obj\
$(XML_INTDIR_A)\xmlstring.obj
# Xmllint and friends executables.
UTILS = $(BINDIR)\xmllint.exe\
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment