Compression friendly struct layout
Crazy idea for a possible tiny speedup, inspired by #2657 (closed), in particular the comment from @chergert
Zero length runs [I assume you meant runs of zero bytes] are often optimized in compression algorithms given they are so common, so some amount of performance difference doesn't surprise me too much.
How about arranging the structures so that likely zero bytes are grouped together?
Could replacing (approximate code)
struct RowRecord {
guint64 text_offset;
guint64 attr_offset;
guint8 flags;
}
by
struct RowRecord {
guint32 text_offset_high;
guint32 attr_offset_high;
guint32 text_offset_low;
guint32 attr_offset_low;
guint8 flags;
}
along with guint64 getters/setter methods result in performance improvement?
The struct is 24 bytes on x86-64, and with this rearrangement the last 7 bytes (5 with the rewrap speedup patch) and the first 8 bytes of the next record would almost always form a run of 15 (13) zero bytes.
We could further experiment with splitting to even smaller (guint16 or guint8) chunks, and/or rearranging so that flags
is closer to the low bytes (i.e. placing it right before guint32 text_offset_low
would further improve grouping of likely zeros vs likely nonzeros on little endian systems).
I don't know how much we'd win in compression time and how much we'd lose on the getters/setters. We could give it a try.