Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra "markup" information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C a variety of language bindings make it available in other environments.
Libxml2 is known to be very portable, the library should build and work without serious troubles on a variety of systems (Linux, Unix, Windows, CygWin, MacOS, RISC Os, OS/2, VMS, QNX, MVS, ...)
Libxml2 implements a number of existing standards related to markup languages:
- the XML 1.0 standard: https://www.w3.org/TR/REC-xml
- Namespaces in XML 1.0: https://www.w3.org/TR/REC-xml-names/
- XML Base: https://www.w3.org/TR/xmlbase/
- RFC 2396: Uniform Resource Identifiers https://www.ietf.org/rfc/rfc2396.txt
- XML Path Language (XPath) 1.0: https://www.w3.org/TR/xpath
- HTML4 parser: https://www.w3.org/TR/html401/
- XML Pointer Language (XPointer) Version 1.0: https://www.w3.org/TR/xptr
- XML Inclusions (XInclude) Version 1.0: https://www.w3.org/TR/xinclude/
- ISO-8859-x encodings, as well as rfc2044 [UTF-8] and rfc2781 [UTF-16] Unicode encodings, and more if using iconv support
- part of SGML Open Technical Resolution TR9401:1997
- XML Catalogs Working Draft 06 August 2001: https://www.oasis-open.org/committees/entity/spec-2001-08-06.html
- Canonical XML Version 1.0: https://www.w3.org/TR/xml-c14n and the Exclusive XML Canonicalization CR draft https://www.w3.org/TR/xml-exc-c14n
- Relax NG, ISO/IEC 19757-2:2003, https://www.oasis-open.org/committees/relax-ng/spec-20011203.html
- W3C XML Schemas 1.0: https://www.w3.org/TR/xmlschema-0/
- W3C xml:id Working Draft 7 April 2004
In most cases libxml2 tries to implement the specifications in a relatively strictly compliant way. As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite.
To some extent libxml2 provides support for the following additional specifications but doesn't claim to implement them completely:
- Document Object Model (DOM) https://www.w3.org/TR/DOM-Level-2-Core/ the document model, but it doesn't implement the API itself, gdome2 does this on top of libxml2
- RFC 959 : libxml2 implements a basic FTP client code
- RFC 1945 : HTTP/1.0, again a basic HTTP client code
- SAX: a SAX2 like interface and a minimal SAX1 implementation compatible with early expat versions
Here are some key points about libxml:
- Libxml2 exports Push (progressive) and Pull (blocking) type parser interfaces for both XML and HTML.
- Libxml2 can do DTD validation at parse time, using a parsed document instance, or with an arbitrary DTD.
- Libxml2 includes complete XPath 1.0, XPointer and XInclude 1.0 implementations.
- It is written in plain C, making as few assumptions as possible, and sticking closely to ANSI C/POSIX for easy embedding. Works on Linux/Unix/Windows, ported to a number of other platforms.
- Basic support for HTTP and FTP client allowing applications to fetch remote resources.
- The design is modular, most of the extensions can be compiled out.
- The internal document representation is as close as possible to the DOM interfaces.
- Libxml2 also has a SAX like interface; the interface is designed to be compatible with Expat.
- This library is released under the MIT License. See the Copyright file in the distribution for the precise wording.
HTML Documentation
Generated HTML documentation is available via GitLab Pages:
Mailing list
There is a mailing-list xml@gnome.org for libxml, with an on-line archive. To subscribe to this list, please visit the associated web page and follow the instructions.
Language bindings
There are a number of language bindings and wrappers available for libxml2, the list below is not exhaustive.
- Libxml++ seems the most up-to-date C++ bindings for libxml2, check the documentation and the examples.
- xmlwrapp, a C++ library built atop libxml2.
- XML::LibXML Perl bindings are available on CPAN, as well as XML::LibXSLT Perl libxslt bindings.
- If you're interested into scripting XML processing, have a look at XSH an XML editing shell based on Libxml2 Perl bindings.
- Petr Kozelka provides Pascal units to glue libxml2 with Kylix, Delphi and other Pascal compilers.
- Uwe Fechner also provides idom2, a DOM2 implementation for Kylix2/D5/D6 from Borland.
- There are bindings for Ruby and libxml2 bindings are also available in Ruby through Nokogiri.
- Steve Ball and contributors maintains libxml2 and libxslt bindings for Tcl.
- libxml2 and libxslt are the default XML libraries for PHP5.
- LibxmlJ is an effort to create a 100% JAXP-compatible Java wrapper for libxml2 and libxslt as part of GNU ClasspathX project.
- Patrick McPhee provides Rexx bindings for libxml2 and libxslt, look for RexxXML.
- Satimage provides XMLLib osax. This is an osax for Mac OS X with a set of commands to implement in AppleScript the XML DOM, XPATH and XSLT. Also includes commands for Property-lists (Apple's fast lookup table XML format.)
- Francesco Montorsi developed wxXml2 wrappers that interface libxml2, allowing wxWidgets applications to load/save/edit XML instances.
Contributions
- Bjorn Reese, William Brack and Thomas Broyer have provided a number of patches, Gary Pennington worked on the validation API, threading support and Solaris port.
- John Fleck helps maintaining the documentation and man pages.
- Igor Zlatkovic is now the maintainer of the Windows port, he provides binaries
- Felix Natter and Geert Kloosterman provide an emacs module to lookup libxml(2) functions documentation
- Ziying Sherwin provided man pages
- Dave Kuhlman provided the first version of libxml/libxslt wrappers for Python
- Aleksey Sanin implemented the XML Canonicalization and XML Digital Signature implementations for libxml2