Perform DB resource refcounting in code (!366) · Merge requests · GNOME / tracker

Carlos Garnacho requested to merge wip/carlosg/refcount-in-code into master Dec 31, 2020

From the last commit:

Triggers take a performance toll, managing the refcount manually does
fare a bit better. There's several reasons here:

- Triggers added by hundreds as we do takes a performance hit, e.g.
  adding dumb "SELECT 1" triggers vs. not adding them still has a
  visible effect.
- The updates in the triggers are rather dumb, eg. executing for
  a property on insertions, even though that property might be null.
  These queries could be avoided entirely.
- Managing refcounts manually means we coalesce many references on a
  same resource (eg. rdf:type relations) in a single update.

Do this refcount maintenance in code, in order to stay ABI compatible
and (cross fingers) avoid DB refcount bugs in the future, the rules
are the same:

- Each row in a class table gets a refcount
- Each value in a rdfs:Resource property adds a reference to the
  resource being pointed to.
- In addition, multivalued rdfs:Resource properties also add one
  reference per value to the resource holding the property.
- Not observed: domainIndex properties transferred from superclasses

This makes insertions and updates sensibly faster, e.g. up to 25%
faster for "INSERT DATA { _:u a rdfs:Resource }" inserted via
TrackerBatch/TrackerResource.

Bonus points: We don't need to set up those runtime triggers anymore,
so TrackerSparqlConnection initialization is also faster.

This specifically makes the benchmark test app from !345 (merged) go from:

$ ~/tracker-benchmark-update batch-resource
175 updates (875000 resources) in 30.087494 seconds

to:

$ ~/tracker-benchmark-update batch-resource
226 updates (1130000 resources) in 30.063409 seconds

Perform DB resource refcounting in code

Merge request reports