1. 17 Jun, 2017 1 commit
    • Nick Wellnhofer's avatar
      Merge duplicate code paths handling PE references · 03904159
      Nick Wellnhofer authored
      xmlParsePEReference is essentially a subset of
      xmlParserHandlePEReference, so make xmlParserHandlePEReference call
      xmlParsePEReference. The code paths in these functions differed
      slighty, but the code from xmlParserHandlePEReference seems more solid
      and tested.
      03904159
  2. 16 Jun, 2017 1 commit
    • David Kilzer's avatar
      Fix duplicate SAX callbacks for entity content · 3f0627a1
      David Kilzer authored
      Reset 'was_checked' to prevent entity from being parsed twice and SAX
      callbacks being invoked twice if XML_PARSE_NOENT was set.
      
      This regressed in version 2.9.3 and caused problems with WebKit.
      
      Fixes bug 760367.
      3f0627a1
  3. 10 Jun, 2017 3 commits
    • Nick Wellnhofer's avatar
      Fix potential infinite loop in xmlStringLenDecodeEntities · fb2f518c
      Nick Wellnhofer authored
      Make sure that xmlParseStringPEReference advances the "str" pointer
      even if the parser was stopped. Otherwise xmlStringLenDecodeEntities
      can loop infinitely.
      fb2f518c
    • Nick Wellnhofer's avatar
      Remove useless check in xmlParseAttributeListDecl · 4ba8cc85
      Nick Wellnhofer authored
      Since we already successfully parsed the attribute name and other
      items, it is guaranteed that we made progress in the input stream.
      
      Comparing the input pointer to a previous value also looks fragile to
      me. What if the input buffer was reallocated and the new "cur" pointer
      happens to be the same as the old one? There are a couple of similar
      checks which also take "consumed" into account. This seems to be safer
      but I'm not convinced that it couldn't lead to false alarms in rare
      situations.
      4ba8cc85
    • Nick Wellnhofer's avatar
      Fix memory leak in xmlParseEntityDecl error path · bedbef80
      Nick Wellnhofer authored
      When parsing the entity value, it can happen that an external entity
      with an unsupported encoding is loaded and the parser is stopped. This
      would lead to a memory leak.
      
      A custom SAX callback could also stop the parser.
      
      Found with libFuzzer and ASan.
      bedbef80
  4. 06 Jun, 2017 1 commit
  5. 05 Jun, 2017 1 commit
    • Nick Wellnhofer's avatar
      Fix handling of parameter-entity references · e2663054
      Nick Wellnhofer authored
      There were two bugs where parameter-entity references could lead to an
      unexpected change of the input buffer in xmlParseNameComplex and
      xmlDictLookup being called with an invalid pointer.
      
      Percent sign in DTD Names
      =========================
      
      The NEXTL macro used to call xmlParserHandlePEReference. When parsing
      "complex" names inside the DTD, this could result in entity expansion
      which created a new input buffer. The fix is to simply remove the call
      to xmlParserHandlePEReference from the NEXTL macro. This is safe because
      no users of the macro require expansion of parameter entities.
      
      - xmlParseNameComplex
      - xmlParseNCNameComplex
      - xmlParseNmtoken
      
      The percent sign is not allowed in names, which are grammatical tokens.
      
      - xmlParseEntityValue
      
      Parameter-entity references in entity values are expanded but this
      happens in a separate step in this function.
      
      - xmlParseSystemLiteral
      
      Parameter-entity references are ignored in the system literal.
      
      - xmlParseAttValueComplex
      - xmlParseCharDataComplex
      - xmlParseCommentComplex
      - xmlParsePI
      - xmlParseCDSect
      
      Parameter-entity references are ignored outside the DTD.
      
      - xmlLoadEntityContent
      
      This function is only called from xmlStringLenDecodeEntities and
      entities are replaced in a separate step immediately after the function
      call.
      
      This bug could also be triggered with an internal subset and double
      entity expansion.
      
      This fixes bug 766956 initially reported by Wei Lei and independently by
      Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone
      involved.
      
      xmlParseNameComplex with XML_PARSE_OLD10
      ========================================
      
      When parsing Names inside an expanded parameter entity with the
      XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the
      GROW macro if the input buffer was exhausted. At the end of the
      parameter entity's replacement text, this function would then call
      xmlPopInput which invalidated the input buffer.
      
      There should be no need to invoke GROW in this situation because the
      buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and,
      at least for UTF-8, in xmlCurrentChar. This also matches the code path
      executed when XML_PARSE_OLD10 is not set.
      
      This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050).
      Thanks to Marcel Böhme and Thuan Pham for the report.
      
      Additional hardening
      ====================
      
      A separate check was added in xmlParseNameComplex to validate the
      buffer size.
      e2663054
  6. 01 Jun, 2017 3 commits
    • Nick Wellnhofer's avatar
      Avoid reparsing in xmlParseStartTag2 · 855c19ef
      Nick Wellnhofer authored
      The code in xmlParseStartTag2 must handle the case that the input
      buffer was grown and reallocated which can invalidate pointers to
      attribute values. Before, this was handled by detecting changes of
      the input buffer "base" pointer and, in case of a change, jumping
      back to the beginning of the function and reparsing the start tag.
      
      The major problem of this approach is that whether an input buffer is
      reallocated is nondeterministic, resulting in seemingly random test
      failures. See the mailing list thread "runtest mystery bug: name2.xml
      error case regression test" from 2012, for example.
      
      If a reallocation was detected, the code also made no attempts to
      continue parsing in case of errors which makes a difference in
      the lax "recover" mode.
      
      Now we store the current input buffer "base" pointer for each (not
      separately allocated) attribute in the namespace URI field, which isn't
      used until later. After the whole start tag was parsed, the pointers
      to the attribute values are reconstructed using the offset between the
      new and the old input buffer. This relies on arithmetic on dangling
      pointers which is technically undefined behavior. But it seems like
      the easiest and most efficient fix and a similar approach is used in
      xmlParserInputGrow.
      
      This changes the error output of several tests, typically making it
      more verbose because we try harder to continue parsing in case of
      errors.
      
      (Another possible solution is to check not only the "base" pointer
      but the size of the input buffer as well. But this would result in
      even more reparsing.)
      855c19ef
    • Nick Wellnhofer's avatar
      Simplify control flow in xmlParseStartTag2 · 07b7428b
      Nick Wellnhofer authored
      Remove some goto labels and deduplicate a bit of code after handling
      namespaces.
      
      Before:
      
          loop {
              parseAttribute
              if (ok) {
                  if (defaultNamespace) {
                      handleDefaultNamespace
                      if (error)
                          goto skip_default_ns;
                      handleDefaultNamespace
          skip_default_ns:
                      freeAttr
                      nextAttr
                      continue;
                  }
                  if (namespace) {
                      handleNamespace
                      if (error)
                          goto skip_ns;
                      handleNamespace
          skip_ns:
                      freeAttr
                      nextAttr;
                      continue;
                  }
                  handleAttr
              } else {
                  freeAttr
              }
              nextAttr
          }
      
      After:
      
          loop {
              parseAttribute
              if (!ok)
                  goto next_attr;
              if (defaultNamespace) {
                  handleDefaultNamespace
                  if (error)
                      goto next_attr;
                  handleDefaultNamespace
              } else if (namespace) {
                  handleNamespace
                  if (error)
                      goto next_attr;
                  handleNamespace
              } else {
                  handleAttr
              }
          next_attr:
              freeAttr
              nextAttr
          }
      07b7428b
    • Nick Wellnhofer's avatar
      Avoid spurious UBSan errors in parser.c · 47496724
      Nick Wellnhofer authored
      If available, use a C99 flexible array member to avoid spurious UBSan
      errors.
      47496724
  7. 27 May, 2017 1 commit
    • Nick Wellnhofer's avatar
      Fix memory leak in parser error path · 8627e4ed
      Nick Wellnhofer authored
      Triggered in mixed content ELEMENT declarations if there's an invalid
      name after the first valid name:
      
          <!ELEMENT para (#PCDATA|a|<invalid>)*>
      
      Found with libFuzzer and ASan.
      8627e4ed
  8. 07 Apr, 2017 2 commits
  9. 23 May, 2016 7 commits
  10. 22 May, 2016 1 commit
  11. 15 Apr, 2016 1 commit
  12. 08 Apr, 2016 1 commit
    • David Kilzer's avatar
      Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8... · 4f8606c1
      David Kilzer authored
      Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183>
      
      * parser.c:
      (xmlCheckCdataPush): Add 'complete' argument to describe whether
      the buffer passed in is the whole CDATA buffer, or if there is
      more data to parse.  If there is more data to parse, don't
      return a negative value for an invalid multi-byte UTF-8
      character that is split between buffers.
      (xmlParseTryOrFinish): Pass 'complete' argument to
      xmlCheckCdataPush() as appropriate.
      
      * result/cdata-2-byte-UTF-8.xml: Added.
      * result/cdata-2-byte-UTF-8.xml.rde: Added.
      * result/cdata-2-byte-UTF-8.xml.rdr: Added.
      * result/cdata-2-byte-UTF-8.xml.sax: Added.
      * result/cdata-2-byte-UTF-8.xml.sax2: Added.
      * result/cdata-3-byte-UTF-8.xml: Added.
      * result/cdata-3-byte-UTF-8.xml.rde: Added.
      * result/cdata-3-byte-UTF-8.xml.rdr: Added.
      * result/cdata-3-byte-UTF-8.xml.sax: Added.
      * result/cdata-3-byte-UTF-8.xml.sax2...
      4f8606c1
  13. 09 Feb, 2016 1 commit
  14. 20 Nov, 2015 6 commits
  15. 09 Nov, 2015 2 commits
  16. 03 Nov, 2015 1 commit
  17. 27 Oct, 2015 1 commit
  18. 23 Oct, 2015 1 commit
  19. 30 Sep, 2015 1 commit
  20. 18 Sep, 2015 1 commit
  21. 15 Sep, 2015 1 commit
  22. 29 Jun, 2015 2 commits