Unicode characters within a css <style> tag are html entity encoded incorrectly
The html parser is incorrectly encoding the unicode within the css of the html document. See example xmllint below:
# echo '<?xml encoding="utf-8" ?><style type="text/css">p:after {content: "Ł"; }</style><p>Hello</p>' | xmllint --html -
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml encoding="utf-8" ?><html>
<head><style type="text/css">p:after {content: "Ł"; }</style></head>
<body>
<p>Hello</p>
</body>
</html>
Notice that the Ł
character is encoded as Ł
whereas it should be left alone, or encoded as \0141
. When the markup is rendered by the browser, it will display HelloŁ
instead of the expected HelloŁ
.