1. 26 May, 2022 1 commit
    • Carlos Garnacho's avatar
      tracker-miner-fs: Always delete graph nie:InformationElement on create/update · 96143ae5
      Carlos Garnacho authored
      There are some situations where the file monitors cannot distinguish between
      a file being created where none existed before, or a file newly created
      replacing a previously existing file.
      
      Treat all create/update events the same WRT trimming the previously
      existing nie:InformationElements, in order to ensure these updates that
      pass as creates also result in the file being reindexed by the
      metadata extractor. This only applies to files that would have metadata
      extracted.
      
      While at it, simplify the SPARQL and move the code so that it is not
      scattered across the function.
      96143ae5
  2. 22 May, 2022 1 commit
    • Carlos Garnacho's avatar
      tracker-extract: Check field type on IPTC data embedded in TIFFs · 73637bb3
      Carlos Garnacho authored
      The field containing IPTC metadata can come either as TIFF_LONG (32-bit
      ints, endianness dependent), or TIFF_UNDEFINED (a byte string). We currently
      handle everything as TIFF_LONG, which may cause memory corruptions if we
      deal with a file where we must perform endianness swapping and receive
      non-long-aligned data.
      
      Ensure to handle only these 2 types as it is defined for the TIFFTAG_RICHTIFFIPTC
      tag, and only perform byte swapping for non-byte data (i.e. TIFF_LONG).
      
      This is more in line with what other (e.g. ImageMagick) do when dealing with
      byte-swapping and IPTC data.
      
      Fixes: https://gitlab.gnome.org/GNOME/tracker/-/issues/364
      73637bb3
  3. 02 May, 2022 1 commit
    • Ray Strode's avatar
      libtracker-miner: Properly detect fanotify failures · 5db77456
      Ray Strode authored
      commit 28142bdb makes
      trackers file monitoring code fall back to GLib file monitors,
      if fanotify doesn't work.
      
      Unfortunately, fanotify_mark failures aren't propagated up,
      so the fall back code doesn't get triggered in some cases.
      
      This commit adds the plumbing to make the fall back code get
      propagated.
      
      Closes #217
      5db77456
  4. 24 Apr, 2022 2 commits
  5. 08 Apr, 2022 2 commits
    • Carlos Garnacho's avatar
      tracker: Show full output in "tracker3 status" if redirected to file · 3e14dac8
      Carlos Garnacho authored
      Handle "tracker3 status" output being redirected to a file showing the
      full error reports for all existing errors, instead of the clamped/paged
      list meant for interactive navigation.
      3e14dac8
    • Carlos Garnacho's avatar
      tracker: Handle stale reports for disappeared files in "tracker3 status" · 878e7eeb
      Carlos Garnacho authored
      As every race condition is possible in filesystems, it is possible that a
      file disappears under tracker-extract-3 feet after existence checks, but
      before the extractor module opens the file for metadata extraction.
      
      In that case we could generate error reports for files that do no longer
      exist, and whose report won't be automatically deleted ever again.
      
      Ensure to trim these stale files during "tracker3 status" output
      generation, so there's something able to remove these from the filesystem
      and they don't confuse users into thinking these are legit errors.
      878e7eeb
  6. 16 Mar, 2022 1 commit
    • Carlos Garnacho's avatar
      tracker-extract: Initialize cache for stable content URNs at startup · 4cda983b
      Carlos Garnacho authored
      This cache initialization may incur in unexpected ioctls to probe
      floppy/cdrom devices, and is nowadays left up to the first extractor
      thread that queries a content identifier. This may result in seccomp
      issues.
      
      Perform this initialization on the main thread early during initialization
      so the extractor threads find a populated cache when querying content
      identifiers.
      
      Fixes: tracker#355
      4cda983b
  7. 06 Mar, 2022 1 commit
    • Carlos Garnacho's avatar
      libtracker-miner: Handle FAN_DELETE[_SELF] being emitted separately on dirs · 2d230e6f
      Carlos Garnacho authored
      If there is a monitor on a dir and its parent, fanotify used to be able to
      coalesce the event so FAN_DELETE and FAN_DELETE_SELF would be both set on
      an uniquely sent event.
      
      This seems no longer the case on more recent linux versions, where these
      generate 2 separate events, first FAN_DELETE_SELF for the folder being
      deleted, and then FAN_DELETE from the parent dir to notify of the parent
      folder structure change. This double DELETE event emission is inconsistent
      with TrackerMonitor behavior and causes test failures.
      
      Handle these 2 events, by caching both and being able to merge them, but
      flushing on the one that is expected last. Fixes tracker-file-notifier-test
      on recent linux versions (locally, 5.17.0).
      2d230e6f
  8. 05 Mar, 2022 1 commit
  9. 04 Mar, 2022 1 commit
    • Ignacy Kuchciński's avatar
      tracker-extract: ignore subtrack titles for videos · 47b13910
      Ignacy Kuchciński authored and Carlos Garnacho's avatar Carlos Garnacho committed
      Currently, the resulting nie:title includes titles from the individual
      tracks due to both gstreamer not differentiating between
      global/container tags and track tags until introducing
      gst_discoverer_container_info_get_tags() as new API and tracker-extract
      gstreamer backend looping through available tracks and collecting all
      tags it could find.
      
      As a result, in cases where there is no title tag in the container, but
      there are some title tags in subtitle or audio tracks, they are included
      in the nie:title and prevent the file name based fallback title in grilo
      and cause the video to show up with nonsense titles in totem.
      
      To fix this, replace gst_discoverer_info_get_tags() with the new API,
      and ignore title tags from subtracks if the file is a video.
      
      Fixes #202
      47b13910
  10. 14 Feb, 2022 1 commit
  11. 13 Feb, 2022 1 commit
  12. 25 Jan, 2022 1 commit
  13. 22 Jan, 2022 1 commit
  14. 18 Jan, 2022 1 commit
  15. 16 Jan, 2022 1 commit
    • Carlos Garnacho's avatar
      libtracker-miners-common: Use better stable filesystem identifiers · f0ab9c97
      Carlos Garnacho authored
      Our new stable identifiers come in the `urn:fileid:$FS_ID:$INODE/$SUFFIX`
      format, where $FS_ID is a stable identifier for the filesystem. We
      currently use G_FILE_ATTRIBUTE_ID_FILESYSTEM for that, but that may differ
      in some circumstances, e.g. removable mounts inserted in different order.
      
      Try harder at obtaining a stable identifier for the filesystem, that will
      not change on these runtime conditions. We prefer identifiers in this
      order:
      
      - If the mount entry node is an actual partition (e.g. /dev/sda3), look
        up the filesystem UUID with blkid
      - If the mount entry points to a non-physical partition (e.g.
        /dev/mapper/luks-$UUID, or $HOST:$FOLDER with NFS), the mount entry
        device name is used.
      - If none of these are found (e.g. tmpfs), we still resort to
        G_FILE_ATTRIBUTE_ID_FILESYSTEM.
      
      These identifiers are cached for all available mount entries in mtab
      for fast lookups, and are updated on mount entry changes.
      f0ab9c97
  16. 15 Jan, 2022 1 commit
  17. 12 Jan, 2022 1 commit
    • Carlos Garnacho's avatar
      libtracker-miner: Do not recurse into mount points unless configured · 629f4646
      Carlos Garnacho authored
      Make TrackerFileNotifier stop recursing at folders that are mountpoints,
      this makes all files indexed in a TrackerIndexingTree root be implicitly
      from the same mount point.
      
      If there are mount points in recursively indexed folders that any user
      wants indexed, these will have to be configured as indexed folders
      themselves.
      
      Closes: tracker#85
      629f4646
  18. 05 Jan, 2022 2 commits
  19. 30 Dec, 2021 10 commits
    • Carlos Garnacho's avatar
      libtracker-miner: Fix reference count of array · 1657955a
      Carlos Garnacho authored
      We are missing to set an extra ref on the SPARQL buffer task
      array. This causes warnings when trying to lose this extra ref
      after a flush error happens.
      1657955a
    • Carlos Garnacho's avatar
      libtracker-miner: Plug string leaks · 3dc37cb5
      Carlos Garnacho authored
      The mimetype/extractorHash info from pre-existing files queried
      during startup would be leaked.
      3dc37cb5
    • Carlos Garnacho's avatar
      libtracker-miner: Change function arguments · b74127dc
      Carlos Garnacho authored
      We no longer need those many details for
      tracker_miner_fs_get_identifier(), so remove these, and make it
      return a const string.
      b74127dc
    • Carlos Garnacho's avatar
      tracker-miner-fs: Raise batch size · 39b01fa3
      Carlos Garnacho authored
      Now that crawling is throttled by the amount of items left to process
      and the main culprits of high memory usage on large filesystems are
      gone, we can raise the batch size a bit. We can definitely afford a
      couple extra megabytes in memory now, so raise the batch size to also
      optimize the throughput.
      39b01fa3
    • Carlos Garnacho's avatar
      libtracker-miner: Add control for the "high water" hint in TrackerMinerFS · cf43cf9c
      Carlos Garnacho authored
      If there are too many files already queued (worth 2 batches), set the hint
      on. This greatly reduces the peak memory used by tracker-miner-fs-3, esp. on
      large filesystems, since that peak used to come from GFileInfos being queued
      up waiting for extraction.
      cf43cf9c
    • Carlos Garnacho's avatar
      libtracker-miner: Notify created/updated files soon · 98e5e022
      Carlos Garnacho authored
      Let the files found during crawling (either created or updated, and
      we already know that by merging the information with database queries)
      be notified before waiting for the whole index root to be finished.
      
      This allows early handling from the TrackerMinerFS side, and some way
      to let it control the "high water" hint if there are too many files
      being queued. Otherwise all files found are added at once, and the
      "high water" hint becomes moot.
      98e5e022
    • Carlos Garnacho's avatar
      libtracker-miner: Add a "high water" property to TrackerFileNotifier · 99936a2f
      Carlos Garnacho authored
      This is not a panic "stop everything you're doing" call, but rather a
      soft hint that TrackerFileNotifier may stop for a while after finishing
      what it is doing at the moment.
      
      When the value is unset, the notifier can resume its operation (if it
      did ever stop).
      99936a2f
    • Carlos Garnacho's avatar
      libtracker-miner: Avoid stopping during flush if possible · a15b62dc
      Carlos Garnacho authored
      Due to the way we asked the store to create the file content URNs
      for us, we fairly often had to stop all the machinery while there was
      a SPARQL batch being inserted, since we need information on files
      that are in the batch being executed before proceeding with files
      being processed now (e.g. a nfo:belongsToContainer relation).
      
      Since the file content URNs are stable and quick to fetch from GIO,
      we no longer need to wait for anything here. File processing is
      able to continue now despite these "dependency" files being in a
      batch that is being inserted.
      
      Still, avoid to have more than one batch in flight, if the SPARQL
      buffer gets full again before the prior batch finished insertion,
      processing will stop until that happens. This still favors memory
      usage over parallellization, if batches are built up quicker than
      they are inserted in the database, there will always be worth 2 of
      them in flight.
      a15b62dc
    • Carlos Garnacho's avatar
      libtracker-miner: Plug possible leak · 8c0a62be
      Carlos Garnacho authored
      We sometimes create a GFileInfo (since we don't get any), but we
      fail to unref it.
      8c0a62be
    • Carlos Garnacho's avatar
      libtracker-miner: Remove unused task pool in TrackerMinerFs · 96042233
      Carlos Garnacho authored
      Since the file information extraction in tracker-miner-fs was streamlined
      so the GFileInfo would be obtained from the TrackerFileNotifier and pushed
      to the upper layers, file extraction became no longer asynchronous and this
      task pool became unused (besides, it had a limit of 1 for a long time).
      
      So we were left with the code that did the task pool maintenance, but no
      elements were ever added to it. It looks like this code can be simply
      peeled off.
      96042233
  20. 29 Dec, 2021 9 commits
    • Carlos Garnacho's avatar
      libtracker-miner: Add Fanotify TrackerMonitor implementation · 661f2bcd
      Carlos Garnacho authored
      Since recent kernel versions, the fanotify file change notification
      API is finally available without CAP_SYS_ADMIN. Add a TrackerMonitor
      implementation that uses this new API.
      
      This object is also used in favor of the GFileMonitor-based
      implementation wherever possible, since it performs infinitely better
      than inotify with the presence of many file monitors. Since there is
      only one file descriptor (and one GSource) as opposed to one per
      monitored directory, this results in faster behavior of tracker-miner-fs
      overall (first time index, second time startups, and tracking of changed
      files) and reduced memory overhead.
      661f2bcd
    • Carlos Garnacho's avatar
      libtracker-miner: Split TrackerMonitor abstract class and implementation · 25ef84d8
      Carlos Garnacho authored
      Since we want multiple TrackerMonitor implementations around, split
      it into a base abstract TrackerMonitor class, and a TrackerMonitorGlib
      subclass that has the GFileMonitor implementation.
      25ef84d8
    • Carlos Garnacho's avatar
      tracker-extract: Use the new stable identifiers in tracker-extract-3 · c9b80524
      Carlos Garnacho authored
      Port all modules to use this new stable identifier on all TrackerResources
      that express the nie:InformationElement of the file content(s).
      c9b80524
    • Carlos Garnacho's avatar
      libtracker-miner: Use stable URNs for file folders · e1440d85
      Carlos Garnacho authored
      These no longer require querying the database for the given URN
      after insertion, so this simplifies the code a bit.
      e1440d85
    • Carlos Garnacho's avatar
      libtracker-miners-common: Add API to get stable URNs for file content · 0668540a
      Carlos Garnacho authored
      This helper function works on top of G_FILE_ATTRIBUTE_ID_FILE which
      based on the filesystem ID and inode. This shares the same lifetime
      expectancies we have about our own URNs:
      
      - It is unique system-wide.
      - It is persistent as long as there are no file content changes
        (e.g. persists on attribute updates).
      - It is persistent across file moves/renames too.
      
      And extends the benefits a bit:
      
      - It is persistent across reindexes.
      - It is a stat() away, no need to insert a blank node and query for its
        name.
      
      Add this piece of API that allows to generate one of these new
      stable URNs for file content. Optionally, a suffix can be added,
      for the cases where there will be multiple content entities for a
      file data object (e.g. flac).
      0668540a
    • Carlos Garnacho's avatar
      tracker-extract: Add dummy folder rule · 075fdd3e
      Carlos Garnacho authored
      We want these to have a tracker:extractorHash since there might be also
      changes that apply to them. However we don't want these to be caught by
      the extractor, so make it sure that tracker-miner-fs-3 sets the extractor
      hash right away for these.
      075fdd3e
    • Carlos Garnacho's avatar
      libtracker-extract: Fix typo · 92c36e1a
      Carlos Garnacho authored
      We don't depend on the graph to return the hash here.
      92c36e1a
    • Carlos Garnacho's avatar
      build: Integrate TrackerMinerFiles extraction into tracker:extractorHash · 4b2a5d3c
      Carlos Garnacho authored
      This will allow file re-extraction if there are changes to the extraction code
      in TrackerMinerFiles.
      4b2a5d3c
    • Carlos Garnacho's avatar
      tracker-miner-fs: Separate file data extraction code into its own file · 109a377b
      Carlos Garnacho authored
      These methods will be handy to keep separate from the main TrackerMinerFiles
      code, so we can hook this into the tracker:extractorHash machinery and
      trigger re-extraction after code changes.
      109a377b