Why does `sxhash` return a constant for all structs?
Asked Answered
S

2

10

When using the Common Lisp sxhash function on structs I'm getting the same value for all structs (in SBCL only structs of the same type). For instance, the following code prints two lists of integers all of which have the same value.

(progn 
  (defstruct foo 
    data)
  (print (mapcar #'sxhash (loop for i below 10 collect (make-foo :data i))))
  (defstruct bar 
    data)
  (print (mapcar #'sxhash (loop for i below 10 collect (make-bar :data i)))))

 ;;; Allegro
 (319 319 319 319 319 319 319 319 319 319) 
 (319 319 319 319 319 319 319 319 319 319) 
 ;;; SBCL
 (22591133455133788 22591133455133788 22591133455133788 22591133455133788
 22591133455133788 22591133455133788 22591133455133788 22591133455133788
 22591133455133788 22591133455133788) 
(21321591953876048 21321591953876048 21321591953876048 21321591953876048
 21321591953876048 21321591953876048 21321591953876048 21321591953876048
 21321591953876048 21321591953876048) 

I've tried this in both Allegro Lisp and SBCL and they both return (different) constants for all structs (of same type in SBCL). On the linked sxhash Hyperspec page there are the following statements:

  1. For any two objects, x and y, both of which are bit vectors, characters, conses, numbers, pathnames, strings, or symbols, and which are similar, (sxhash x) and (sxhash y) yield the same mathematical value even if x and y exist in different Lisp images of the same implementation. See Section 3.2.4 (Literal Objects in Compiled Files).

  2. The hash-code for an object is always the same within a single session provided that the object is not visibly modified with regard to the equivalence test equal. See Section 18.1.2 (Modifying Hash Table Keys).

The latter statement does not specify, but seems to imply, that it would be sensible that two structs which are not equal will have differing hash codes (modulo collision). However, structs are suspiciously absent from the list in the first paragraph. At first I chalked this up to a bug in Allegro Lisp but now that I see it in two different implementations I think there must be something about the spec I don't understand.

Slipshod answered 14/1, 2014 at 23:52 Comment(2)
The one guarantee is that if (equal x y) then (= (sxhash x) (sxhash y)). I seem to recall (but I may be conflating comp.lang.lisp and the hyperspec) that what you are seeing is not uncommon, in that there are different things taht will have the same sxhash hash value.Pani
@Vantine Yes, it appears that at least SBCL and Allegro do this for memory footprint reasons. They can't use the in memory address because it will change, and the hyperspec requires sxhash to be the same for similar objects in the same implementation. Also, they can't do recursive traversal (safely) because the object may be mutated.Slipshod
S
9

I've queried Franz support and this was their response. Presumably SBCL is doing something similar for similar reasons.

The function cl:sxhash always returns the same value for structure objects. The reason for this is because it has no extra space to store a unique hash code within it. As a result, using structures as keys is very inefficient. The excl::hash-table-stats function demonstrates this when given a hash-table with structs used as keys; the histogram becomes the worst case, because every key wants the same index.

The decision was made to keep the same behavior for structure objects, because the automatic inclusion of a hashing slot in all structure objects would have made all structs an average of one word longer. For small structs this is unacceptable for many of our users.

Instead, a user may define a struct with an extra slot, and the constructor for that struct type could store a unique value into that slot (either a random value or a value gotten by incrementing a counter each time the constructor is run). Also, create a hash generating function which accesses this hash-slot to generate its value. If the structs to be hashed are buried inside a list, then this hash function would need to know how to traverse these keys to obtain a unique value. Finally, then, build your hash-table using the documented :hash-function argument to make-hash-table (still using the equal test argument), to create a hash-table which will be well-distributed.

Alternatively, and if you can guarantee that none of the slots in your structures will be changed after they are used as keys in the hash-table, you can use the equalp test function in your make-hash-table call, rather than equal. If you do, however, make sure that these struct objects don't change, because then they may not be found in the hash-table.

Slipshod answered 15/1, 2014 at 1:56 Comment(2)
There's some discussion at marc.info/?l=sbcl-devel&m=121182898514193&w=2 too.Dianoetic
Thanks, I was hoping someone with more knowledge about SBCL would confirm.Slipshod
F
2

There are two important properties of sxhash: If (equal x y) then (= (sxhash x) (sxhash y)) and the value returned by sxhash is always the same for any object (even between lisp images).

Now structures are equal if only if they are eq (i.e. They have the same address) but sxhash cannot simply return the address (or some hash thereof) of the structure because the address can change due to garbage collection. When designing a lisp implementation one must therefore choose whether to have sxhash be the same for every structure or to store some identity in every structure which does not change when the garbage collector moves the structure and so can be used to sxhash the object. Most implementations (including Franz and sbcl) consider adding such a value either a waste of space or useless if only a few spare bits are given to it.

This tradeoff will ultimately only affect a user attempt at implementing hash tables as the implementation's own hash tables can use the address of objects and notify the garbage collector of this so they can rehash when the object moves (I don't know whether or not any implementations do do this). Some implementations (including sbcl) allow you to customise the built-in hash tables with your own comparison/hashing operations. Perhaps if you implemented hashing yourself you could add an extra field to structures for this.

I believe that the result returned by sxhash in sbcl is determined by hashing the name of the type of the structure.

Farandole answered 18/7, 2017 at 15:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.