Figure out performance requirements for filechooser
I realized that when we replace the filechooser with a listview implementation (and potentially add an icons view), we will have to reevaluate how we display directories in a performant way. The last time significant work was done on this was in 2009 when I added the GtkFileSystemModel
object and implemented delayed icon loading. But this was at a time when SSDs weren't a thing.
What follows is a discussion exclusively about local files. I do not think any remote filesystems are relevant enough currently, but that's my personal opinion, and we can surely evaluate those, too.
These are the 2 steps a filechooser needs to do when displaying contents of a directory:
- Get a sorted list of files
- Display information about the first few files in the list
So this means that it is not necessary to collect all information about all files at the beginning, it is only necessary to collect enough data so that we can sort the files and then we can query the first files for the remaining data. When listing large directories with 1000s of files, this gets increasingly relevant.
I did some preliminary testing and was able to identify 4 performance-relevant steps in collecting data for files:
-
The
readdir()
stage
This stage only returns the filename and the type (regular, directory, symlink, ...). This is barely enough to sort the files by name.
gio list -a standard::name $DIR
performs this operation (requires glib!1136 (closed)).
Runningls
without arguments can be used as a rough equivalent for people without that fix. -
The
stat()
stage
This stage allows querying a lot of extra information about files, like access/modification time and size and in particular allows sorting by those.gio list -a standard::name,standard::size $DIR
performs this operation.
Runningls -l
is the equivalent. -
The content-type stage
Querying the content type allows not just getting the type, but also the icon-name to use for the standard (non-thumbnail) icon. And it obviously allows sorting by type. It does however potentially open the files to read a bit of data for content type sniffing.
gio list -a standard::name,standard::size,standard::content-type $DIR
performs this operation.
There is no equivalent ls command.
This is what the filechooser does today. -
The icon stage This requires querying the thumbnails and
stat()
ing the potential locations to determine the validity of the file. But even more important, it requires loading the file. For standard icons the same icon can often be reused, but for thumbnails, this does not work.gio list -a standard::name,standard::size,thumbnail $DIR
performs the thumbnail lookup for this operation.
There is nogio list
command to test performance of decoding thumbnails.
I will compare the performance of these methods on a few large directories on my disk:
- /usr/bin - 2696 binaries and scripts (358 symlinks)
- /usr/lib64 - 4296 libraries and subdirectories (2101 symlinks)
- $HOME - 4931 random files and directories I accumulated over 20 years, fully thumbnailed
There is also 2 different cases for each of those directories: Hot cache and cold cache.
For the cold cache case I will be running sudo bash -c "echo 3 > /proc/sys/vm/drop_caches" && gio list /
to drop all caches and then load gio list back into the cache.
For the hot cache case I will be running the command again right after the cold cache case.
So the actual code run is this:
for ATTR in "standard::name" "standard::name,standard::size" "standard::name,standard::size,standard::content-type" "standard::name,standard::size,standard::content-type,thumbnail";
do
for DIR in /usr/bin /usr/lib64 ~;
do
sudo bash -c "echo 3 > /proc/sys/vm/drop_caches" &&
gio list / >/dev/null && echo $DIR - $ATTR &&
time gio list -a $ATTR $DIR > /dev/null &&
time gio list -a $ATTR $DIR > /dev/null;
done;
done
And here's the formatted output of a run of that with the times in milliseconds:
directory | cache | readdir() | stat() | content | thumbnail |
---|---|---|---|---|---|
/usr/bin | cold | 106 | 125 | 857 | 930 |
/usr/bin | hot | 22 | 28 | 364 | 380 |
/usr/lib64 | cold | 112 | 178 | 838 | 871 |
/usr/lib64 | hot | 42 | 51 | 471 | 526 |
$HOME | cold | 65 | 106 | 584 | 571 |
$HOME | hot | 28 | 47 | 238 | 285 |
Some thoughts from that run:
-
There is a huge difference between checking for content-type and not checking on the order of half a second to a second in somewhat larger directories.
-
gio list empty-directory
takes around 20ms in itself, so thereaddir()
runs look like they are pretty much instant in the hot case. -
Considering a 60Hz framerate, it sounds plausible to target instant loads - ie having the files fully listed in the frame that the filechooser is first shown
-
For cold caches, if we can only
stat()
directories, we can probably load fast enough to not require any incremental loading. Though really, we'd probably want to benchmark really large directories (~50,000 files) and slow HDDs to see how long this can take. Though there's still remote directories to think about. -
To make ultimate judgements, we might need a better benchmark that can benchmark loading icons and is not affected by application startup overhead.
So the question now becomes: What do we learn from this that can influence the design of the filechooser (both UI wise and code wise) when we switch it to listview?