Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • G GLib
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 854
    • Issues 854
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 55
    • Merge requests 55
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GNOMEGNOME
  • GLib
  • Issues
  • #2787
Closed
Open
Issue created Oct 21, 2022 by Simon McVittie@smcvMaintainer

new int64, double hash functions always hash to 0 on big-endian

!2924 (merged) reimplemented g_int64_hash() and g_double_hash(). The reimplementation is probably fine on little-endian platforms like x86, but is a pathologically bad hash function on big-endian platforms: everything hashes to 0.

A standalone test-case that can be compiled and run on a big-endian machine without needing all of GLib:

#include <math.h>
#include <stdio.h>
#include <stdint.h>

typedef unsigned guint;
typedef int64_t gint64;
typedef uint64_t guint64;
typedef double gdouble;

uint64_t i64 = 0x1234567890abcdefULL;
double d = M_PI;

static unsigned
old_int64_hash (const void *v)
{
  return (guint) *(const gint64*) v;
}

static unsigned
new_int64_hash (const void *v)
{
  return (guint) ((const guint) (*(guint64 *) v >> 32)) ^ (*(const guint *) v);
}

static unsigned
old_double_hash (const void *v)
{
  return (guint) *(const gdouble*) v;
}

static unsigned
new_double_hash (const void *v)
{
  return (guint) ((const guint) (*(guint64 *) v >> 32)) ^ (*(const guint *) v);
}

int main (void)
{
  printf ("old int64 hash: %x\n", old_int64_hash (&i64));
  printf ("new int64 hash: %x\n", new_int64_hash (&i64));
  printf ("old double hash: %x\n", old_double_hash (&d));
  printf ("new double hash: %x\n", new_double_hash (&d));
  return 0;
}

On Debian's s390x porterbox, this prints:

old int64 hash: 90abcdef
new int64 hash: 0
old double hash: 3
new double hash: 0

which I think is not what we want! This is because the new hash functions take the 32 most significant bits of the 64-bit quantity pointed to by v, and xor them with the first 32 bits. On LE platforms where the least significant bits appear first, this is xor'ing the 32 most significant bits with the 32 least significant, but on BE platforms where the most significant bits appear first, this is xor'ing the 32 most significant bits with themselves.

If instead we use

static unsigned
new_int64_hash (const void *v)
{
  return (guint) ((const guint) (*(guint64 *) v >> 32)) ^ (*(const guint64 *) v & 0xffffffffU);
}

static unsigned
new_double_hash (const void *v)
{
  return (guint) ((const guint) (*(guint64 *) v >> 32)) ^ (*(const guint64 *) v & 0xffffffffU);
}

then that seems more likely to be what @wszqkzqk meant. I'll open a MR for the 2.75.x branch that does that.

For the 2.74.x branch, I'd be tempted to revert !2924 (merged) instead of fixing it: changing the hash function to reduce collisions seems like feature work that is questionable to be doing on a stable-branch, since user code might be (incorrectly) relying on a particular hash table traversal order.

/cc @3v1n0 @pwithnall

Assignee
Assign to
Time tracking