Consider replacing GStreamer for media extraction with something simpler
This was discussed in passing in FOSDEM 2023, and now I find myself triaging a hang in tracker-extract during testing which apparently happens during GStreamer plugin load.
GStreamer is an excellent library but is not optimized for the use case of metadata extraction which leads to major issues.
Pros of using GStreamer for generic media extraction:
- new formats can be handled 'automatically', as long as a suitable plugin is available
- its already a core dependency of GNOME
- it's based around GObject, just like Tracker itself
Cons of using GStreamer for generic media extraction:
- the tracker-extract-gstreamer module behaves differently depending on which GStreamer plugins are installed and available on the deployed system. This means we can get user bug reports which are very difficult to reproduce, and we cannot ever prove that the tracker-extract-gstreamer module will not hang or busyloop in some situations.
- we cannot prove that the tracker-extract-gstreamer module is safe, as some GStreamer plugins may be unsafe. The tracker-extract SECCOMP sandbox mitigates this to some extend, but is itself a source of bug reports as now SIGSYS can appear depending on specifics of a given deployment
Specific examples of major issues caused by use of GStreamer for media parsing:
Alternatives (some of which are not practical):
- code media parsers for everything ourself (time consuming)
- Taglib (audio formats only)
- libav (already implemented, but is a difficult dependency partly due to US software patent issues)
- ...