tracker-miner-fs brings system to its knees when indexing - scope for deduplicating hardlinked files?
(I was directed to log this upstream after logging as https://bugs.launchpad.net/ubuntu/+source/tracker-miners/+bug/1826078 )
I've noticed since upgrading from Ubuntu 18.10 to 19.04 that tracker-miner-fs has been going to town on trying to index my filesystem.. so much so, I've had to switch off all the search options, and then switch off search overall, as well as call tracker reset -r
as it seems that just turning off search in the settings -> search titlebar doesn't always take effect.
Within my home directory, and in other parts of my filesystem, I have some folders that have date stamped rsync backups of remote systems. Between directories, identical files are hard-linked, so they're stored once, if the same, but listed in multiple directories if they existed at that time (so each folder is a ready to go "point in time" snapshot, but identical data is only stored once). One such directory tree has over 4 million files in it.
I believe that tracker-miner-fs is treating each instance of a file listing as an independent copy because they exist in separate directories, even though they're hard linked, and subsequently processing the same files again and again and again, consuming the majority of IO bandwidth on my drives, as well as growing so large in memory that the system starts to swap.
There is the UI for Gnome tracker, but it's not that granular, with options for whether or not things like "Documents" are included, but not folders below that.
Where this enters bug territory is that this behaviour makes the system unusable due to the IO load, with no visible indication of what's going on, and no immediate response to toggling off search functionality.
The fixes might be feature suggestions.. Perhaps scope for:
- A ".nomedia" or similar .file similar to how Android allows index exclusion of a given directory and its children
- Tracking of indexed inodes so it can skip files it already knows about
- Visible indication that indexing is taking place in notification area with ability to stop it, if it peaks at a certain IO utilisation?