1. 04 Mar, 2021 3 commits
  2. 02 Mar, 2021 2 commits
    • Nick Wellnhofer's avatar
      Clarify xmlNewDocProp documentation · ad101bb5
      Nick Wellnhofer authored
      ad101bb5
    • Nick Wellnhofer's avatar
      Stop checking attributes for UTF-8 validity · a6e6498f
      Nick Wellnhofer authored
      I can't see a reason to check attribute content for UTF-8 validity.
      Other parts of the API like xmlNewText have always assumed valid UTF-8
      as extra checks only slow down processing.
      
      Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not
      freeing the old encoding would cause a memory leak.
      
      Note that this was last changed in 2008 with commit 6f8611fd which
      removed unnecessary encoding/decoding steps. Setting attributes should
      be even faster now.
      
      Found by OSS-Fuzz.
      a6e6498f
  3. 01 Mar, 2021 2 commits
    • Nick Wellnhofer's avatar
      Reduce some fuzzer timeouts · 8446d459
      Nick Wellnhofer authored
      OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
      several hundred hours without hitting the 20s timeout. It seems that
      most timeouts resulting from accidentally quadratic behavior in the
      HTML parser have been fixed. Start to gradually reduce the timeout to
      find new performance issues.
      8446d459
    • Nick Wellnhofer's avatar
      Fix quadratic behavior when looking up xml:* attributes · 688b41a0
      Nick Wellnhofer authored
      Add a special case for the predefined XML namespace when looking up DTD
      attribute defaults in xmlGetPropNodeInternal to avoid calling
      xmlGetNsList.
      
      This fixes quadratic behavior in
      
      - xmlNodeGetBase
      - xmlNodeGetLang
      - xmlNodeGetSpacePreserve
      
      Found by OSS-Fuzz.
      688b41a0
  4. 22 Feb, 2021 8 commits
  5. 21 Feb, 2021 1 commit
  6. 20 Feb, 2021 3 commits
    • Nick Wellnhofer's avatar
      Fix slow parsing of HTML with encoding errors · dcb80b92
      Nick Wellnhofer authored
      Under certain circumstances, the HTML parser would try to guess and
      switch input encodings multiple times, leading to slow processing of
      documents with encoding errors. The repeated scanning of the input
      buffer when guessing encodings could even lead to quadratic behavior.
      
      The code htmlCurrentChar probably assumed that if there's an encoding
      handler, it is guaranteed to produce valid UTF-8. This holds true in
      general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
      encoding handler simply invoked memcpy without checking for invalid
      UTF-8. This still must be fixed, preferably by not using this handler
      at all.
      
      Also leave a note that switching encodings twice seems impossible to
      implement correctly. Add a check when handling UTF-8 encoding errors
      in htmlCurrentChar to avoid this situation, even if encoders produce
      invalid UTF-8.
      
      Found by OSS-Fuzz.
      dcb80b92
    • hhb's avatar
      02bee4c4
    • Simon Josefsson's avatar
      Fix warnings in libxml.m4 with autoconf 2.70+. · 4defa2c2
      Simon Josefsson authored
      Closes #219.
      4defa2c2
  7. 09 Feb, 2021 1 commit
  8. 08 Feb, 2021 4 commits
  9. 07 Feb, 2021 3 commits
  10. 03 Feb, 2021 1 commit
  11. 15 Jan, 2021 1 commit
  12. 05 Jan, 2021 7 commits
  13. 18 Dec, 2020 4 commits