reinvents the man wheel, and does it badly
Submitted by Colin Watson
Link to original bug (#477788)
Description
Hi, I'm the upstream maintainer of man-db (one of the two major implementations of /usr/bin/man et al on Linux) and the Debian maintainer of man-db and groff. Occasionally I get questions about why Yelp renders such-and-such a manual page badly. Rather than using groff or even just man to do the job, Yelp implements a complete manual page parser itself.
This is a fundamental design error. *roff is a full typesetting language and manual pages are fully entitled to use just about every bit of it if they so choose. I'm sure Yelp's parser works to some reasonable extent, but you are doomed to forever having to tack extra bits and pieces onto it every time somebody uses something new (bug 349677 was the case I came across recently). Not using groff (or troff if groff isn't available) is a mistake. I realise you want to have formatting appropriate to your frontend, but there are better ways to do that; pinfo and w3mman both do this by parsing the output (w3mman even manages to implement cross-references to other manual pages!), and as a result they do a much better job than Yelp. Admittedly they're text-based, but the same approach should work just as well in a graphical frontend.
Aside from the details of rendering the pages, Yelp (and librarian, but I have to file the bug somewhere) compounds its errors by reinventing man too. man is not as simple as it looks; I've been maintaining man-db for six years so I know what I'm talking about here. Different systems have different weird and wonderful compression schemes (not all of which you successfully handle). The encoding of manual pages is a nasty swamp that is handled differently on different systems; I guarantee that the current code will break as soon as Debian starts supporting UTF-8 manual pages properly, which is going to happen soon (http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2007-09-17-man-db-encodings.html). I'm told that Red Hat has already moved over to UTF-8 manual pages, I think in a somewhat different way, so Yelp's big list of encodings is probably already broken there (bug 473040 confirms my suspicion).
All this would be avoided if you just asked man to render pages for you and postprocessed the output. Yes, I suspect you'd have to do a bit of work to cope with the idiosyncrasies of different man implementations, but this pales in comparison to the horribleness of trying to reinvent the whole stack.
Thanks for your consideration.