- 04 Mar, 2021 3 commits
-
-
Nick Wellnhofer authored
-
Nick Wellnhofer authored
Switch to binary search.
-
Nick Wellnhofer authored
Switch to binary search. This is the first time bsearch is used in the libxml2 code base. But it's a standard library function since C89 and should be portable.
-
- 02 Mar, 2021 2 commits
-
-
Nick Wellnhofer authored
-
Nick Wellnhofer authored
I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.
-
- 01 Mar, 2021 2 commits
-
-
Nick Wellnhofer authored
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for several hundred hours without hitting the 20s timeout. It seems that most timeouts resulting from accidentally quadratic behavior in the HTML parser have been fixed. Start to gradually reduce the timeout to find new performance issues.
-
Nick Wellnhofer authored
Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.
-
- 22 Feb, 2021 8 commits
-
-
Nick Wellnhofer authored
Only run the following tests by default - gcc - clang:asan - cmake:mingw:w64-x86_64:shared - cmake:msvc:v141:x64:shared
-
Nick Wellnhofer authored
- Add more calls to xmlInitializeCatalog. - Call xmlResetLastError after fuzzing each input.
-
Nick Wellnhofer authored
-
Markus Rickert authored
-
Nick Wellnhofer authored
xmlInitializeCatalog is not called from xmlInitParser.
-
Nick Wellnhofer authored
This reverts commit de1b51ed.
-
Nick Wellnhofer authored
-
Nick Wellnhofer authored
Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.
-
- 21 Feb, 2021 1 commit
-
-
Markus Rickert authored
-
- 20 Feb, 2021 3 commits
-
-
Nick Wellnhofer authored
Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.
-
hhb authored
-
Simon Josefsson authored
Closes #219.
-
- 09 Feb, 2021 1 commit
-
-
Nick Wellnhofer authored
Found by OSS-Fuzz.
-
- 08 Feb, 2021 4 commits
-
-
Nick Wellnhofer authored
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
-
SVGAnimate authored
A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
-
Markus Rickert authored
-
Mike Dalessio authored
Fixes #200 Also see discussions at: - #192 - nwellnhof/libxml2@99bda1e1 - https://github.com/sparklemotion/nokogiri/issues/2132
-
- 07 Feb, 2021 3 commits
-
-
Nick Wellnhofer authored
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.
-
Nick Wellnhofer authored
The encoding string is unused. Encodings are set by way of the output buffer.
-
Nick Wellnhofer authored
Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
-
- 03 Feb, 2021 1 commit
-
-
Nick Wellnhofer authored
Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.
-
- 15 Jan, 2021 1 commit
-
-
Markus Rickert authored
-
- 05 Jan, 2021 7 commits
-
-
Markus Rickert authored
-
Markus Rickert authored
-
Markus Rickert authored
-
Markus Rickert authored
-
Markus Rickert authored
-
Markus Rickert authored
-
Markus Rickert authored
-
- 18 Dec, 2020 4 commits
-
-
Nick Wellnhofer authored
Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
-
Nick Wellnhofer authored
Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
-
Nick Wellnhofer authored
Check for malloc failure to avoid null deref. Found with libFuzzer.
-
Nick Wellnhofer authored
Avoid misdiagnosis in OOM situations.
-