Skip to content
  • David Woodhouse's avatar
    Add (failing) test for Evolution's address autocompletion query · 5f9f5b52
    David Woodhouse authored
    [00:21] <tristan> dwmw2, if you really think that, then it will have to be
                rethought again, but if we're gonna rethink it again - please
                add a failing test case to test-client-custom-summary.c before
                we even go down that road.
    
    Here you go :)
    
    Note: This fails because we're using an inner join of folder_id with
    folder_id_email_list on uid. Which means that records in folder_id
    which don't *have* an email address aren't considered for the search.
    
    This is the query:
    SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
      FROM  'folder_id' AS summary
      JOIN  'folder_id_email_list' AS email_list
      ON    +email_list.uid = summary.uid
      WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')
          OR (email_list.value IS NOT NULL AND email_list.value LIKE 'p%')
          OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%')
           OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%')
           OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%')
           OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%'))
          OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%'));
    
    You actually need a LEFT OUTER JOIN there, if you're going to do it that
    way.
    
    The above query takes about 1350ms on my data set with 238121 records.
    
    If I change it to a LEFT OUTER JOIN, it looks like it will literally
    take hours, so I didn't let it finish. I'm not sure what the '+' sign
    in 'ON +email_list.uid = summary.uid' is supposed to do, but if I
    *remove* that to make it 'LEFT OUTER JOIN … ON email_list.uid = summary.uid'
    the query then completes in about 1650ms.
    
    I can improve that by restructuring the query to look like this:
    
    SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
      FROM  'folder_id' AS summary
      JOIN  'folder_id_email_list' AS email_list
      ON    +email_list.uid = summary.uid
      WHERE (email_list.value IS NOT NULL AND email_list.value LIKE 'p%')
    UNION
    SELECT summary.uid, summary.vcard, summary.bdata
      FROM  'folder_id' AS summary
      WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')
          OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%')
           OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%')
           OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%')
           OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%'))
          OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%'));
    
    That runs in about 460ms on my data set.
    
    I can get it down to about 400ms by eliding some of the IS NOT NULL checks
    that seem rather gratuitous (although shouldn't sqlite do that for itself
    since it's a fairly bloody obvious optimisation?):
    
    SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
      FROM  'folder_id' AS summary
      JOIN  'folder_id_email_list' AS email_list
      ON    +email_list.uid = summary.uid
      WHERE (email_list.value LIKE 'p%')
    UNION
    SELECT summary.uid, summary.vcard, summary.bdata
      FROM  'folder_id' AS summary
      WHERE ((summary.nickname LIKE 'p%')
          OR (summary.full_name LIKE 'p%')
          OR (summary.family_name LIKE 'p%')
          OR (summary.given_name LIKE 'p%')
          OR (summary.nickname LIKE 'p%')
          OR (summary.file_as LIKE 'p%'));
    
    Finally, if I actually add indices on the fields that need them, such as
    file_as, nickname, etc., I can get the search time down to 5-10ms. This
    *only* works for my refactored queries using 'UNION', and not for the JOIN
    versions. Obviously the indices aren't useful on the results of the JOIN.
    
    (Note: we did already have an index on file_as_localized, but not file_as.
     And likewise for some other fields).
    5f9f5b52