src/libtracker-miner/tracker-sparql-buffer.c · 760cc6b231f92d574ec9a221f8abbbb900355064 · GNOME / tracker-miners

libtracker-miner: Gather as many SPARQL updates as possible for every batch · 760cc6b2

Carlos Garnacho authored Jul 05, 2020

We currently block the processing queue if the parent is seen in any stage
of processing, the situation is unblocked by flushing early, so processing
can resume after the SPARQL updates were performed.

This may lead to suboptimal buffer occupation, ultimately dependent on
the filesystem layout.

To improve this situation, rely on blank node labels being stable across
the whole SPARQL update string, and add a blank node labeling scheme that
allows files within a same SPARQL batch reference each other through these
blank node labels instead of IRIs.

This allows maximum buffer occupation regardless of the filesystem layout,
we still have to wait after a SPARQL update if a file being processed
references (i.e. child/parent relationship) another file added in the
SPARQL update being currently done. But that happens once per batch,
instead of once per folder.

760cc6b2