Skip to content

Remove "terms" from index.json, rewrite commandline search to work without it

FeRD (Frank Dana) requested to merge ferdnyc/gi-docgen:no-terms into main

Since at least the merge of !53 (merged), if not before, the row-based references in the "terms" list are invalidated. As such, there's no reason to generate them at all.

This MR does away with all index-based symbol lookups, and all parsing and stemming previously used to generate the "terms" list. (The key is preserved in the file as an empty dictionary, just in case anything relies on it being there.)

The Porter stemmer is removed entirely, as are all of the utils. functions that made use of it. The algorithm for the commandline search is replaced with a simple case-insensitive, substring-based one that works without index["terms"]. Support for the constant type is also added to search, and it's now able to handle types that intentionally don't return useful results (like callback).

For the command:

gi-docgen.py gen-index -C examples/gtk4.toml \
  --add-include-path=/usr/share/gir-1.0 /usr/share/gir-1.0/Gtk-4.0.gir

...removing all of the code devoted to generating the index.json "terms" dictionary reduced the runtime from 7.219s to 4.677s on my system (and 4.05s of that is the .gir file parsing)!

Commandline search time is nearly identical (4.644s vs. 4.647s), but now returns correct results.

Merge request reports