1. 07 Aug, 2020 3 commits
  2. 06 Aug, 2020 3 commits
  3. 04 Aug, 2020 1 commit
  4. 03 Aug, 2020 1 commit
  5. 31 Jul, 2020 2 commits
    • Nick Wellnhofer's avatar
      Update fuzzing code · 905820a4
      Nick Wellnhofer authored
      - Shorten timeouts
      - Align options from Makefile and options files
      - Add section headers to Makefile
      - Skip invalid UTF-8 in regexp fuzzer
      - Update regexp.dict
      - Generate HTML seed corpus in correct format
      905820a4
    • Nick Wellnhofer's avatar
      Fix exponential runtime in xmlFARecurseDeterminism · 68eadabd
      Nick Wellnhofer authored
      In order to prevent visiting a state twice, states must be marked as
      visited for the whole duration of graph traversal because states might
      be reached by different paths. Otherwise state graphs like the
      following can lead to exponential runtime:
      
        ->O-->O-->O-->O-->O->
           \ / \ / \ / \ /
            O   O   O   O
      
      Reset the "visited" flag only after the graph was traversed.
      
      xmlFAComputesDeterminism still has massive performance problems when
      handling fuzzed input. By design, it has quadratic time complexity in
      the number of reachable states. Some issues might also stem from
      redundant epsilon transitions. With this fix, fuzzing regexes with a
      maximum length of 100 becomes feasible at least.
      
      Found with libFuzzer.
      68eadabd
  6. 28 Jul, 2020 5 commits
  7. 25 Jul, 2020 1 commit
  8. 23 Jul, 2020 2 commits
    • Nick Wellnhofer's avatar
      Fix several quadratic runtime issues in HTML push parser · 93ce33c2
      Nick Wellnhofer authored
      Fix a few remaining cases where the HTML push parser would scan more
      content during lookahead than being parsed later.
      
      Make sure that htmlParseDocTypeDecl consumes all content up to the
      final '>' in case of errors. The old comment said "We shouldn't try to
      resynchronize", but ignoring invalid content is also what the HTML5
      spec mandates.
      
      Likewise, make htmlParseEndTag skip to the final '>' in invalid end
      tags even if not in recovery mode. This is probably the most visible
      change in practice and leads to different output for some tests but is
      also more in line with HTML5.
      
      Make sure that htmlParsePI and htmlParseComment don't abort if invalid
      characters are encountered but log an error and ignore the character.
      
      Change some other end-of-buffer checks to test for a zero byte instead
      of relying on IS_CHAR.
      
      Fix usage of IS_CHAR macro in htmlParseScript.
      93ce33c2
    • Nick Wellnhofer's avatar
      Fix .gitattributes · 10d09472
      Nick Wellnhofer authored
      The files in 'test' and 'result' have mixed line endings, so disable
      end-of-line conversion.
      10d09472
  9. 22 Jul, 2020 1 commit
    • Nick Wellnhofer's avatar
      Fix quadratic runtime when push parsing HTML start tags · 173a0830
      Nick Wellnhofer authored
      Make sure that htmlParseStartTag doesn't terminate on characters for
      which IS_CHAR_CH is false like control chars.
      
      In htmlParseTryOrFinish, only switch to START_TAG if the next character
      starts a valid name. Otherwise, htmlParseStartTag might return without
      consuming all characters up to the final '>'.
      
      Found by OSS-Fuzz.
      173a0830
  10. 19 Jul, 2020 2 commits
  11. 15 Jul, 2020 5 commits
    • Nick Wellnhofer's avatar
      Fix HTML push parser lookahead · 8e219b15
      Nick Wellnhofer authored
      The parsing rules when looking for terminating chars or sequences in
      the push parser differed from the actual parsing code. This could
      result in the lookahead to overshoot and data being rescanned,
      potentially leading to quadratic runtime.
      
      Comments must never be handled during lookahead. Attribute values must
      only be skipped for start tags and doctype declarations, not for end
      tags, comments, PIs and script content.
      8e219b15
    • Nick Wellnhofer's avatar
      Make htmlCurrentChar always translate U+0000 · e050062c
      Nick Wellnhofer authored
      The general assumption is that htmlCurrentChar only returns 0 if the
      end of the input buffer is reached. The UTF-8 path already logged an
      error if a zero byte U+0000 was found and returned a space character
      instead. Make the ASCII code path do the same.
      
      htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so
      even if 0 was returned from htmlCurrentChar, the push parser would make
      progress. But rescanning the input could cause performance problems.
      
      The pull parser would abort parsing and now handles zero bytes in ASCII
      mode the same way as the push parser or as in UTF-8 mode.
      
      It would be better to return the replacement character U+FFFD instead,
      but some of the client code assumes that the UTF-8 length of input and
      output matches.
      e050062c
    • Nick Wellnhofer's avatar
      Rework control flow in htmlCurrentChar · dfd4e330
      Nick Wellnhofer authored
      Don't call xmlCurrentChar after switching encodings. Rearrange code
      blocks and fall through to normal UTF-8 handling.
      dfd4e330
    • Nick Wellnhofer's avatar
      922bebcc
    • Nick Wellnhofer's avatar
      Fix UTF-8 decoder in HTML parser · 1493130e
      Nick Wellnhofer authored
      Reject sequences starting with a continuation byte as well as overlong
      sequences like the XML parser.
      
      Also fixes an infinite loop in connection with previous commit 50078922
      since htmlCurrentChar would return 0 even if not at the end of the
      buffer.
      
      Found by OSS-Fuzz.
      1493130e
  12. 13 Jul, 2020 3 commits
  13. 12 Jul, 2020 7 commits
  14. 11 Jul, 2020 1 commit
  15. 09 Jul, 2020 1 commit
  16. 07 Jul, 2020 1 commit
  17. 06 Jul, 2020 1 commit
    • Nick Wellnhofer's avatar
      Limit regexp nesting depth · fc842f6e
      Nick Wellnhofer authored
      Enforce a maximum nesting depth of 50 for regular expressions. Avoids
      stack overflows with deeply nested regexes.
      
      Found by OSS-Fuzz.
      fc842f6e