1. 20 Sep, 2012 1 commit
  2. 18 Sep, 2012 1 commit
    • Daniel Richard's avatar
      Windows build fixes · bbe19451
      Daniel Richard authored and Daniel Veillard's avatar Daniel Veillard committed
      Building 2.9.0 on MSVC7.1 was failing
      
      This is because HAVE_CONFIG_H is not #defined
      The patch addresses the above, adds testrecurse.exe and the
      standard "make check" suite of tests to the MSVC makefile, and also
      fixes the following (MSVC7.1) warnings:
      buf.c(674) : warning C4028: formal parameter 1 different from
      declaration
      libxml2\timsort.h(71) : warning C4028: formal parameter 1 different from
      declaration
      bbe19451
  3. 15 Sep, 2012 1 commit
  4. 14 Sep, 2012 1 commit
  5. 13 Sep, 2012 1 commit
  6. 11 Sep, 2012 6 commits
  7. 08 Sep, 2012 1 commit
  8. 07 Sep, 2012 4 commits
    • Daniel Veillard's avatar
      Keep non-significant blanks node in HTML parser · f933c898
      Daniel Veillard authored
      For https://bugzilla.gnome.org/show_bug.cgi?id=681822
      
      Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
      are removed from a HTML document, for example:
      
      <html>
        <head>
          <title>This is a test.</title>
        </head>
        <body>
          <p>This is a test.</p>
        </body>
      </html>
      
      is read as:
      
      <html><head><title>This is a test.</title></head><body>
          <p>This is a test.</p>
        </body></html>
      
      This changes the default behaviour but the old behaviour is available
      as expected when using the parser flag HTML_PARSE_NOBLANKS
      
      Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>
      
      * HTMLparser.c: change various places in the parser where ignorable_space
        SAX callback was called without checking for the parser flag preference
      * xmllint.c: make sure we use the new flag even for HTML parsing
      * result/HTML/*: this modifies the output of a number of tests
      f933c898
    • Daniel Richard's avatar
      Second round of cleanups for LibXML2 docs/examples · 878ec9db
      Daniel Richard authored and Daniel Veillard's avatar Daniel Veillard committed
      configure.am:
      
      * Explicitly disallow --enable-rebuild-docs when builddir != srcdir, per
         what you said about needing to build docs with an in-source build
      
      doc/Makefile.am:
      
      * Ensure that xmlversion.h is in the source tree before running
         apibuild.py, to avoid generating an incomplete libxml2-api.xml
      
      * Update the .PHONY target (forgot to do this earlier)
      
      doc/devhelp/Makefile.am:
      
      * Wrap the doc-generating rule in an "if REBUILD_DOCS" conditional so it
         doesn't cause trouble for regular users
      
      * Added a handy-dandy "rebuild" target
      
      doc/examples/index.py:
      
      * NOTE: You need to run this script to regenerate the files it creates,
         and then commit the newly-updated files! The generated files currently
         in git master (e.g. doc/examples/Makefile.am) are out of date even
         before this patch!
      
      * index.html really needs to be in EXTRA_DIST
      
      * Wrap the doc-generating rules in an "if REBUILD_DOCS" conditional,
         because they shouldn't be active otherwise
      878ec9db
    • Daniel Veillard's avatar
      Add a forbidden variable error number and message to XPath · 47881284
      Daniel Veillard authored
      Related to https://bugzilla.gnome.org/show_bug.cgi?id=680938
      
      When the XML_XPATH_NOVAR flags is being used it means that
      variables are forbidden, not that they are missing
      47881284
    • Michael Stahl's avatar
      Support long path names on WNT · 55b899a2
      Michael Stahl authored and Daniel Veillard's avatar Daniel Veillard committed
      so we've got this patch to libxml2 2.7.6 in the LibreOffice code base,
      inherited from OOo.  it fixes a definite problem, which is that Windows
      has a rather low maximum path length restriction, and there is a special
      trick on NT whereby path names can be prefixed with "\\?\", in which
      case the maximum length is 32k, which ought to be sufficient even for
      bloated office suites :)
      
      I'll attach the patch to the xmlCanonicPath function.  note that i
      didn't write this and am by no means an expert on either Microsoftean
      platforms or libxml so maybe it's not the best way to do it.
      55b899a2
  9. 05 Sep, 2012 2 commits
  10. 04 Sep, 2012 4 commits
    • Daniel Veillard's avatar
      Remove all .cvsignore as they are not used anymore · 857104cd
      Daniel Veillard authored
      For https://bugzilla.gnome.org/show_bug.cgi?id=682985
      suggested by Adrian Bunk <bunk@stusta.de>
      857104cd
    • Daniel Veillard's avatar
      Fix reuse of xmlInitParser · 7a2215db
      Daniel Veillard authored
      While xmlCleanupParser() should not be used unless complete control
      is insured over the programe making sure libxml2 is not in use anywhere
      It should still be usable, and allow a sequence of
          xmlInitParser();
          xmlCleanupParser();
      calls if needed, the problem is that the thread key wasn't reallocated
      on subsequent xmlinitParser() calls leading to corruption of pthread
      keys used by the program.
      
      * threads.c: make sure xmlCleanupParser() reset the pthread_once()
                   global variable driving thread key allocation.
      7a2215db
    • Daniel Veillard's avatar
      Fix a Timsort function helper comment · 510e7583
      Daniel Veillard authored
      510e7583
    • Daniel Veillard's avatar
      Fix potential crash on entities errors · 28f5e1a2
      Daniel Veillard authored
      Related to https://bugs.launchpad.net/lxml/+bug/502959
      
      Basically the core of the issue is that if an entity references another
      entity, then in case we are replacing entities content, we should always
      do so by copying the referenced content as long as the reference is
      done within the entity. Otherwise, if for some reason there is a later
      parsing error that entity content may be freed.
      
      Complex scenario exposed by command:
      thinkpad:~/XML/diveintopython-5.4/xml -> valgrind --db-attach=yes
      ../../xmllint --loaddtd --noout --noent diveintopython.xml
      
        Document references &a;
        a references &b;
        we references b content directly in by linking in the a content
        a has an error further down
        we free a, freeing the chunk from b
        Document references &b; after &a;
        we try to copy b content, but it was freed already => segfault
      
      * parser.c: never reference directly entity content without copying if
        we aren't in the document main entity
      28f5e1a2
  11. 28 Aug, 2012 3 commits
  12. 27 Aug, 2012 5 commits
  13. 24 Aug, 2012 3 commits
    • Vojtech Fried's avatar
      Switching XPath node sorting to Timsort · 3e031b7d
      Vojtech Fried authored and Daniel Veillard's avatar Daniel Veillard committed
      I use libxml xpath engine on quite large (and mostly "flat") xml files.
      It seems that Shellsort, that is used in xmlXPathNodeSetSort is a
      performance bottleneck for my case. I have read some posts about sorting
      in libxml in the libxml archive, but I agree that qsort was not the way
      to go. I experimented with Timsort instead and my results were good for
      me. For about 10000 nodes, my test was about 5x faster with Timsort,
      for 1000 nodes about 10% faster, for small data files, the difference
      was not measurable.
      * timsort.h: the algorithm, kept in a separate header
      * xpath.c: plug in the new algorithm in xmlXPathNodeSetSort
      * Makefile.am: add the header to the EXTRA_DIST
      * doc/apibuild.py: avoid indexing the new header
      3e031b7d
    • Daniel Veillard's avatar
      Small cleanup for valgrind target · 73f94c60
      Daniel Veillard authored
      73f94c60
    • Nick Wellnhofer's avatar
      Optimizing '//' in XPath expressions · 62270539
      Nick Wellnhofer authored and Daniel Veillard's avatar Daniel Veillard committed
      When investigating the libxslt performance problem reported in bug
      #657665, I found that '//' in XPath expressions can be very slow when
      working on large subtrees.
      
      One of the reasons is the seemingly quadratic time complexity of the
      duplicate checks when merging result nodes. The other is a missed
      optimization for expressions of the form
      'descendant-or-self::node()/axis::test'. Since '//' is expanded to
      '/descendant-or-self::node()/', this type of expression is quite common.
      Depending on the axis of the expression following the
      'descendant-or-self' step, the following replacements can be made:
      
      from descendant-or-self::node()/child::test
      to   descendant::test
      
      from descendant-or-self::node()/descendant::test
      to   descendant::test
      
      from descendant-or-self::node()/self::test
      to   descendant-or-self::test
      
      from descendant-or-self::node()/descendant-or-self::test
      to   descendant-or-self::test
      
      'test' can be any kind of node test.
      
      With these replacements the possibly huge result of
      'descendant-or-self::node()' doesn't have to be stored temporarily, but
      can be processsed in one pass. If the resulting nodeset is small, the
      duplicate checks aren't a problem.
      
      I found that there already is a function called
      xmlXPathRewriteDOSExpression which performs this optimization for a very
      limited set of cases. It employs a complicated iteration scheme for
      rewritten expressions. AFAICS, this can be avoided by simply changing
      the axis of the expression like described above.
      
      With the attached patch against libxml2 and the files from bug #657665 I
      got the following results.
      
      Before:
      
      $ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
      service-names-port-numbers.xml
      real    2m56.213s
      user    2m56.123s
      sys     0m0.080s
      
      After:
      
      $ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
      service-names-port-numbers.xml
      real    0m3.836s
      user    0m3.764s
      sys     0m0.060s
      
      I also ran the libxml2 and libxslt test suites with the patch and
      couldn't detect any breakage.
      
      Nick
      
      >From e0f5a8261760e4f257b90410be27657e984237c8 Mon Sep 17 00:00:00 2001
      From: Nick Wellnhofer <wellnhofer@aevum.de>
      Date: Sun, 19 Aug 2012 18:20:22 +0200
      Subject: [PATCH] Optimizations for descendant-or-self::node()
      
      Currently, the function xmlXPathRewriteDOSExpression optimizes expressions
      of type '//child'. Instead of adding a 'rewriteType' and doing a compound
      traversal, the same can be achieved simply by setting the axis of the node
      test from 'child' to 'descendant'.
      
      There are also many other cases that can be optimized similarly. This
      commit augments xmlXPathRewriteDOSExpression to essentially rewrite the
      following subexpressions:
      
      - descendant-or-self::node()/child:: to descendant::
      - descendant-or-self::node()/descendant:: to descendant::
      - descendant-or-self::node()/self:: to descendant-or-self::
      - descendant-or-self::node()/descendant-or-self:: to descendant-or-self::
      
      Since the '//' shortcut in XPath is translated to
      '/descendant-or-self::node()/', this greatly speeds up expressions using
      '//' on large subtrees.
      62270539
  14. 23 Aug, 2012 1 commit
  15. 22 Aug, 2012 1 commit
    • Daniel Veillard's avatar
      Expose xmlBufShrink in the public tree API · 82cdfc4e
      Daniel Veillard authored
      As suggested by Andrew W. Nosenko:
      Proposal: expose the new xmlBufShrink() to the "public" API for
      compatibility with xmlBufUse().
      
      Reason: the following scenario:
      
      1. Read something into  xmlParserInputBuffer (e.g. using
      xmlParserInputBufferRead())
      2. Extract content through xmlBufContent()
      3. Extract content length through xmlBufUse().  Result have type
      'size_t'.
      4. Use this content
      5. Now, you need to shrink the buffer.  How to do it?  Doing that
      through legacy xmlBufferShrink() is unsafe because it uses 'unsigned
      int' and the whole point of introducing the new API was handling the
      cases, when 'unsigned int' is not enough.  Therefore, need to use the
      new xmlBufShrink().  But it is "private".
      
      Therefore, I propose to expose the new xmlBufShrink() in the same way,
      as xmlBufContent() and xmlBufUse() are exposed.
      82cdfc4e
  16. 20 Aug, 2012 1 commit
  17. 17 Aug, 2012 4 commits