Improve handling of very large folders
Right now very large folders show 2 different issues in the localsearch-3
indexer process on first-time index:
- We try to handle folders more or less as an unitary thing, which means we queue up the whole bunch of newly discovered files in memory before it is flushed and converted to TrackerResource and database ops. This is one of two identified sources of unbound memory growth (dependent on the filesystem layout) in the indexer process.
- Even though folders are processed that way, the operations are not flushed to the database in a single batch. If something awry happened in the middle of this processing (interruption, process crash), and due to the ordering of our inserted data, we may end up with situations that a folder is detected as up-to-date and not crawled on next restart despite missing children. This situation would eventually self-correct but would be nice not to have in the first place.
This MR changes the processing of folders, so that:
- Handling of folders in TrackerFileNotifier is split in chunks, integrated with the "high water" mechanisms we have in place to limit memory usage.
- The TrackerResource/RDF data is inserted in an order that folders will not be seen on next restart as up-to-date until they have been fully processed (i.e. the last related batch was committed to the database). This was done through a new "finish directory" step, that required the folder processing order to change from a breadth-first approach to a depth-first one so folders are finished as soon as possible, keeping the minimum possible in memory at a time.
In short, these changes make memory usage flatter, and make the indexer more resilient to crashes and outages.