Skip to content

Improve performance of database updates

Carlos Garnacho requested to merge wip/carlosg/update-perf into master

This branch contains many optimizations to the update machinery, from micro-optimizations like avoiding frequent memory allocations, to the moderate refactors like improved buffering of changes and caching/lookups of prepared statements for inserts/updates. No stone has been left unturned, with the purpose of making most of the CPU time massively spent in SQLite itself.

To test this, the benchmark utility has been added a few additional cases observing other usual scenarios (resources being updated, and deleted). Doing 3 runs at the master-ish branch the output looks like:

[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		1744492,876	58149,763	15,564 usec	22,935 usec	17,197 usec
     SPARQL batch update (sync)		829438,000	27647,933	34,482 usec	49,075 usec	36,169 usec
   Resource modification (sync)		1168280,369	38942,679	23,892 usec	30,113 usec	25,679 usec
  Resource insert+delete (sync)		483065,452	16102,182	59,739 usec	68,100 usec	62,103 usec
Prepared statement query (sync)		3188442,681	106281,423	8,000 usec	920,000 usec	9,409 usec
            SPARQL query (sync)		467243,361	15574,779	60,000 usec	1,201 msec	64,206 usec
[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		1650610,257	55020,342	16,438 usec	45,125 usec	18,175 usec
     SPARQL batch update (sync)		824167,207	27472,240	34,915 usec	51,269 usec	36,400 usec
   Resource modification (sync)		1158085,993	38602,866	24,254 usec	33,005 usec	25,905 usec
  Resource insert+delete (sync)		476794,732	15893,158	60,505 usec	70,450 usec	62,920 usec
Prepared statement query (sync)		3299845,780	109994,859	8,000 usec	1,067 msec	9,091 usec
            SPARQL query (sync)		466000,270	15533,342	61,000 usec	827,000 usec	64,378 usec
[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		1752330,733	58411,024	15,531 usec	26,183 usec	17,120 usec
     SPARQL batch update (sync)		807359,799	26911,993	35,254 usec	50,398 usec	37,158 usec
   Resource modification (sync)		1175350,002	39178,333	24,298 usec	30,534 usec	25,524 usec
  Resource insert+delete (sync)		474601,050	15820,035	59,429 usec	90,752 usec	63,211 usec
Prepared statement query (sync)		3046379,695	101545,990	8,000 usec	1,502 msec	9,848 usec
            SPARQL query (sync)		468526,672	15617,556	60,000 usec	805,000 usec	64,031 usec

And on the top of this branch:

[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		2786198,140	92873,271	9,785 usec	14,579 usec	10,767 usec
     SPARQL batch update (sync)		1016639,532	33887,984	28,263 usec	45,691 usec	29,509 usec
   Resource modification (sync)		1946191,368	64873,046	14,564 usec	23,329 usec	15,415 usec
  Resource insert+delete (sync)		1239900,825	41330,028	23,224 usec	28,889 usec	24,195 usec
Prepared statement query (sync)		3239441,244	107981,375	8,000 usec	970,000 usec	9,261 usec
            SPARQL query (sync)		473613,353	15787,112	59,000 usec	2,615 msec	63,343 usec
[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		2808449,642	93614,988	9,463 usec	17,203 usec	10,682 usec
     SPARQL batch update (sync)		1004343,862	33478,129	28,553 usec	51,018 usec	29,870 usec
   Resource modification (sync)		1953141,846	65104,728	14,483 usec	23,614 usec	15,360 usec
  Resource insert+delete (sync)		1217449,281	40581,643	23,412 usec	29,943 usec	24,642 usec
Prepared statement query (sync)		3231219,354	107707,312	8,000 usec	1,017 msec	9,284 usec
            SPARQL query (sync)		469455,609	15648,520	60,000 usec	1,831 msec	63,904 usec
[carlos@gotera build]$ ./utils/benchmark/tracker-benchmark
Batch size: 5000, Individual test duration: 30 sec
Opening in-memory database…
                           Test		Elements	Elems/sec	Min         	Max         	Avg
   Resource batch update (sync)		2768928,978	92297,633	9,402 usec	14,492 usec	10,835 usec
     SPARQL batch update (sync)		1006464,592	33548,820	28,591 usec	40,461 usec	29,807 usec
   Resource modification (sync)		1934488,844	64482,961	14,750 usec	22,681 usec	15,508 usec
  Resource insert+delete (sync)		1231734,098	41057,803	23,346 usec	28,333 usec	24,356 usec
Prepared statement query (sync)		3197596,147	106586,538	8,000 usec	990,000 usec	9,382 usec
            SPARQL query (sync)		470088,702	15669,623	60,000 usec	1,199 msec	63,818 usec

Besides SELECT queries staying the same (minus noise), it can be seen that all update operations are helped by this branch, some with more than 2x improvement. This brings us much closer to the same ballpark than raw SQLite, esp with APIs that update the database from TrackerResource.

Merge request reports