Pick a unicode library implementation
We have 2 backends: libicu and libunistring (complemented by enca).
Both have pros and cons:
- libicu
- pros
- good encoding detection, provides a confidence percentage
- cons
- Works with UTF16 internally
- it's huge
- pros
- libunistring
- pros
- Works with UTF8 natively
- Smaller size
- cons
- no encoding detection. Enca is used for this, but has slightly worse detection, and doesn't hint the confidence on the result.
- pros
It seems we are down to trading dependency sizes for encoding detection. Given the latter is mostly useful for ID3v1 tags and other old places where encodings are underdefined, it'd seem the odds are in favor of libunistring.