Better protection against amplification attacks related to XML entities
libxml2 has some protection against amplification attacks related to XML entities (aka "billion laughs"), but they were never implemented in an understandable, systematic way. This leaves entity substitution vulnerable to DoS attacks. In my opinion, the only way to properly fix the issue is to count the number of input bytes consumed and compare with the number of bytes that would result when serializing a document after entity substitution. If a certain amplification threshold is exceeded, parsing should stop with an error message. A couple of megabytes should always be allowed to account for use cases like DITA which make heavy use of entities (see #294 (closed) for example). This requires quite a few changes, especially with regard to external and parameter entities. Besides, there are many code paths where entities are substituted:
- internal vs. external entities
- general vs. parameter entities
- entities in content, attribute values, entity values
A closely related issue is detection of recursive entities. Currently, we simply limit the recursion depth which is fragile. There are better approaches to detect recursion immediately similar to this Chromium patch.
I'd also love to rewrite xmlParseReference
, one of the core functions handling entity expansion. In its current form, it's almost impenetrable.