tracker-store infinite loop in sqlite3 btreeCursor() function
I have found an instance of tracker-store stuck in an infinite loop, causing high CPU usage.
tracker version: master (195cee89) SQLite version: 3.20.1 OS version: Fedora 27
Attaching GDB shows two threads are active, both in the same place. Here's a backtrace from each one:
Thread 7 (Thread 0x7fd435ffb700 (LWP 19015)):
#0 btreeCursor (pCur=0x5578c7ea8360, pKeyInfo=0x0, wrFlag=0, iTable=<optimized out>, p=0x5578c7babfa8) at sqlite3.c:63805
#1 sqlite3BtreeCursor (p=0x5578c7babfa8, iTable=<optimized out>, wrFlag=0, pKeyInfo=0x0, pCur=0x5578c7ea8360) at sqlite3.c:63826
#2 0x00007fd44760be21 in sqlite3VdbeExec (p=p@entry=0x5578c7eacdd8) at sqlite3.c:82393
#3 0x00007fd4476155bf in sqlite3Step (p=0x5578c7eacdd8) at sqlite3.c:77352
#4 sqlite3_step (pStmt=<optimized out>) at sqlite3.c:11880
#5 0x00007fd44b544b89 in db_cursor_iter_next () from /usr/lib64/tracker-2.0/libtracker-data.so.0
#6 0x00005578c70d51e2 in ____lambda11__tracker_store_sparql_query_in_thread ()
#7 0x00005578c70d88cd in _tracker_store_pool_dispatch_cb_gfunc ()
#8 0x00007fd4478f0e50 in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307
#9 0x00007fd4478f0486 in g_thread_proxy (data=0x7fd4380066d0) at gthread.c:784
#10 0x00007fd446e0850b in start_thread () from /lib64/libpthread.so.0
#11 0x00007fd446b4016f in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fd4367fc700 (LWP 19014)):
#0 0x00007fd4475c7a01 in btreeCursor (pCur=0x5578c7ea9170, pKeyInfo=0x0, wrFlag=0, iTable=<optimized out>, p=0x5578c7babfa8) at sqlite3.c:63803
#1 sqlite3BtreeCursor (p=0x5578c7babfa8, iTable=<optimized out>, wrFlag=0, pKeyInfo=0x0, pCur=0x5578c7ea9170) at sqlite3.c:63826
#2 0x00007fd44760be21 in sqlite3VdbeExec (p=p@entry=0x5578c7eaeea8) at sqlite3.c:82393
#3 0x00007fd4476155bf in sqlite3Step (p=0x5578c7eaeea8) at sqlite3.c:77352
#4 sqlite3_step (pStmt=<optimized out>) at sqlite3.c:11880
#5 0x00007fd44b544b89 in db_cursor_iter_next () from /usr/lib64/tracker-2.0/libtracker-data.so.0
#6 0x00005578c70d51e2 in ____lambda11__tracker_store_sparql_query_in_thread ()
#7 0x00005578c70d88cd in _tracker_store_pool_dispatch_cb_gfunc ()
#8 0x00007fd4478f0e50 in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307
#9 0x00007fd4478f0486 in g_thread_proxy (data=0x7fd438006720) at gthread.c:784
#10 0x00007fd446e0850b in start_thread () from /lib64/libpthread.so.0
#11 0x00007fd446b4016f in clone () from /lib64/libc.so.6
The SQLite code that is stuck is here: https://github.com/mackyle/sqlite/blob/master/src/btree.c#L4320
pBt
is a pointer to a BtShared
instance, which is shared state for the B*Tree structure in our SQLite database. The pCursor
member is a linked list of BtCursor
instances, and somehow this linked list has become circular.
In one case I saw this (one element):
(gdb) p pBt->pCursor
$1 = (BtCursor *) 0x5578c7ea9170
(gdb) p pBt->pCursor->next
$2 = (BtCursor *) 0x5578c7ea9170
In a second case I saw this (two elements):
(gdb) p pBt->pCursor
$1 = (BtCursor *) 0x173d670
(gdb) p pBt->pCursor->pNext
$2 = (BtCursor *) 0x17365f0
(gdb) p pBt->pCursor->pNext->pNext
$3 = (BtCursor *) 0x173d670
In both cases it's obvious why we are stuck in an infinite loop here.
It's less obvious how we got into this state though. The fact that there are two threads makes it very likely that there is some locking issue.
We open the SQlite database with the SQLITE_OPEN_NOMUTEX
flag, which enables 'multithreaded' mode. The SQlite documentation says:
In this mode, SQLite can be safely used by multiple threads provided that no single database connection is used simultaneously in two or more threads.
I haven't yet looked at reproducing this issue, but it should be possible. In one case the issue was triggered by the test 'functional-09-concurrent-query', and both the queries that had triggered this condition in that instance were:
SELECT ?u WHERE { ?u a nco:PersonContact. FILTER regex (?u, 'test-09:ins')}
In the other instance I have seen, the query was triggered by the test 'functional-ipc-bus-query-cancellation' (recently renamed from 'sparql-gb-737023'), and was this:
SELECT ?urn WHERE {?urn a rdfs:Resource}