Skip to content

Ensure DevHelp symbol sections have stable ordering

Alexandre Macabies requested to merge Zopieux/gi-docgen:reproducible-output into main

It was discovered through NixOS reproducible build initiative[0] that gi-docgen introduces non-determinism in the ordering of some DevHelp files[1] and index.json. It turns out this is caused by concurrent generation of the various symbol sections, a performance optimization that inserts into a 'sections' dict as threaded workers complete, which is by nature not a reproducible task. In the case of the index, I was not able to find the source of the randomness, but it's likely caused by file enumerations.

This commit adds a final sort on relevant dicts and lists to restore determinism in how these structures are iterated on between runs of the program. The exact iteration order does not matter, only the fact that it is stable given the same input. Since Python 3.7, dict iteration order is guaranteed to be insertion order[2], so this is working as intended.

This introduces no performance penalty since Python does not copy the dict items, which are (str, list) tuples, and sorts lists in-place.

[0] https://r13y.com/
[1] https://r13y.com/diff/af78aa6744b6df28036f25d6e6cbc4c5dac475a1e91d9c3c2b6532815d66b590-e22063a648e9a7c712ca6aa8b8bb9599ccabeb9f32b45cb24b1dc477af99b882.html
[2] https://docs.python.org/3.7/library/stdtypes.html#typesmapping

Edited by Alexandre Macabies

Merge request reports