Overriding GetHashCode in VB without checked/unchecked keyword support?
Asked Answered
F

7

29

So I'm trying to figure out how to correctly override GetHashCode() in VB for a large number of custom objects. A bit of searching leads me to this wonderful answer.

Except there's one problem: VB lacks both the checked and unchecked keyword in .NET 4.0. As far as I can tell, anyways. So using Jon Skeet's implementation, I tried creating such an override on a rather simple class that has three main members: Name As String, Value As Int32, and [Type] As System.Type. Thus I come up with:

Public Overrides Function GetHashCode() As Int32
    Dim hash As Int32 = 17

    hash = hash * 23 + _Name.GetHashCode()
    hash = hash * 23 + _Value
    hash = hash * 23 + _Type.GetHashCode()
    Return hash
End Function

Problem: Int32 is too small for even a simple object such as this. The particular instance I tested has "Name" as a simple 5-character string, and that hash alone was close enough to Int32's upper limit, that when it tried to calc the second field of the hash (Value), it overflowed. Because I can't find a VB equivalent for granular checked/unchecked support, I can't work around this.

I also do not want to remove Integer overflow checks across the entire project. This thing is maybe....40% complete (I made that up, TBH), and I have a lot more code to write, so I need these overflow checks in place for quite some time.

What would be the "safe" version of Jon's GetHashCode version for VB and Int32? Or, does .NET 4.0 have checked/unchecked in it somewhere that I'm not finding very easily on MSDN?


EDIT:
Per the linked SO question, one of the unloved answers at the very bottom provided a quasi-solution. I say quasi because it feels like it's....cheating. Beggars can't be choosers, though, right?

Translated from from C# into a more readable VB and aligned to the object described above (Name, Value, Type), we get:

Public Overrides Function GetHashCode() As Int32
    Return New With { _
        Key .A = _Name, _
        Key .B = _Value, _
        Key .C = _Type
     }.GetHashCode()
End Function

This triggers the compiler apparently to "cheat" by generating an anonymous type, which it then compiles outside of the project namespace, presumably with integer overflow checks disabled, and allows the math to take place and simply wrap around when it overflows. It also seems to involve box opcodes, which I know to be performance hits. No unboxing, though.

But this raises an interesting question. Countless times, I've seen it stated here and elsewhere that both VB and C# generate the same IL code. This is clearly not the case 100% of the time...Like the use of C#'s unchecked keyword simply causes a different opcode to get emitted. So why do I continue to see the assumption that both produce the exact same IL keep getting repeated?  </rhetorical-question>

Anyways, I'd rather find a solution that can be implemented within each object module. Having to create Anonymous Types for every single one of my objects is going to look messy from an ILDASM perspective. I'm not kidding when I say I have a lot of classes implemented in my project.


EDIT2: I did open up a bug on MSFT Connect, and the gist of the outcome from the VB PM was that they'll consider it, but don't hold your breath: https://connect.microsoft.com/VisualStudio/feedback/details/636564/checked-unchecked-keywords-in-visual-basic

A quick look at the changes in .NET 4.5 suggests they've not considered it yet, so maybe .NET 5?

My final implementation, which fits the constraints of GetHashCode, while still being fast and unique enough for VB is below, derived from the "Rotating Hash" example on this page:

'// The only sane way to do hashing in VB.NET because it lacks the
'// checked/unchecked keywords that C# has.
Public Const HASH_PRIME1 As Int32 = 4
Public Const HASH_PRIME2 As Int32 = 28
Public Const INT32_MASK As Int32 = &HFFFFFFFF

Public Function RotateHash(ByVal hash As Int64, ByVal hashcode As Int32) As Int64
    Return ((hash << HASH_PRIME1) Xor (hash >> HASH_PRIME2) Xor hashcode)
End Function

I also think the "Shift-Add-XOR" hash may also apply, but I haven't tested it.

Fulllength answered 11/1, 2011 at 4:37 Comment(7)
So, why not use int64 for the intermediate calculations?Joab
I'd have to downcast back to Int32 to return the value. If the calculated value stored in Int64 is too large, that will overflow in the downcast process. I have to override the original implementation for the .NET framework to be able to properly use it, so I have to have a return type of Int32.Fulllength
Replying to the rhetorical question "why do people say VB and C# generate 100% the same IL?". Either lack of knowledge, or a desire to stifle flame wars. As you say, it's not 100% true. Your question is about something missing from VB, but there's also things missing from C#. For example exception filters which the .Net CLR team think are useful, sort of implying they think C# should have them.Gateway
Yeah, I think it's just a case of where not enough people working in the VB world have requested the feature. I'll need to go look into Connect and suggest the feature (as well as report that line-ending bug I found).Fulllength
Where does INT32_MASK come into play?Weatherbeaten
@Richard: I guess I forgot to reference it in he example it, but my old code used it as the last step once it was done rotating the Int64 value around, it returned just the lower 32bits of that final Int64 value to make VB.NET's GetHashCode happy.Fulllength
Perhaps this could be done using expressions, which has separate factory methods for checked and unchecked operations?Detachment
B
25

Use Long to avoid the overflow:

Dim hash As Long = 17
'' etc..
Return CInt(hash And &H7fffffffL)

The And operator ensures no overflow exception is thrown. This however does lose one bit of "precision" in the computed hash code, the result is always positive. VB.NET has no built-in function to avoid it, but you can use a trick:

Imports System.Runtime.InteropServices

Module NoOverflows
    Public Function LongToInteger(ByVal value As Long) As Integer
        Dim cast As Caster
        cast.LongValue = value
        Return cast.IntValue
    End Function

    <StructLayout(LayoutKind.Explicit)> _
    Private Structure Caster
        <FieldOffset(0)> Public LongValue As Long
        <FieldOffset(0)> Public IntValue As Integer
    End Structure
End Module

Now you can write:

Dim hash As Long = 17
'' etc..
Return NoOverflows.LongToInteger(hash)
Breviary answered 11/1, 2011 at 11:3 Comment(14)
Doesn't this technically nerf the value, only returning the 32bit portion of the long/int64, thus potentially increasing your risks of a possible (yet highly unlikely) case of where two objects could generate the same hashcode?Fulllength
@Hans: "Four billion is a very large number." -- They said the same thing about IPv4 :) The final /8 allocations are due to be issued early this year. Now everyone's on the "340 trillion^3 IPv6 addresses is a large number" bandwagon. Give us time, and we'll all see just how large 340 trillion^3 really is :)Fulllength
@Hans: Anyways, I just wanted to double check. As the old saying goes, all things are possible except skiing through a revolving door. Despite the number of objects in my assembly, I think your solution works well enough. I'll likely go with different primes, though, just to mix things up a bit.Fulllength
Just an update, even the Long/Int64 doesn't help all the time. If a custom class has multiple fields, especially strings, re-doing hash *= (HASH_2 + field.GetHashCode) two many times will still lead to an overflow.Fulllength
Doesn't seem to work 100% of the time. I tested an input string in one of my classes that was a comma-separated list of values, so looping over each value and doing the hash math (hash *= (23 + Values(i).GetHashCode)), after about the 41st iteration (of 78), the hash just wrapped to zero and didn't change anymore. I wonder if I tripped up some kind of bug in the framework?Fulllength
It'd just be easier if there was a solid VB.NET implementation of a Hash formula that didn't rely on C# features. Everyone keeps pointing back to the C# answer as a reference, but that is just incompatible with VB.Fulllength
Clarifying "easier", I have ~80+ classes that I have to implement GetHashCode for, and each of them is slightly different from the other. Trying to account for these differences in each one AND hoping that some kind of input string won't trigger an overflow exception is kind of frustrating. I'd like to not have to resort to using BigInteger and the likes just to provide some guarantee of not causing an exception.Fulllength
This is incorrect. The biggest signed integer is &H7FFFFFFF and that should be the mask value, otherwise you will get an overflow when converting back to integer.Weatherbeaten
@Hans Passant - Really.. I get an overflow for the following code Module Module1 Sub Main() Dim value As Long = 4219274409 Dim intValue As Integer = MaskToInt(value) End Sub Function MaskToInt(value As Long) As Int32 Const mask As Integer = &Hffffffff Return CInt(value And mask) End Function End ModuleWeatherbeaten
@HansPassant: The hex constant &hFFFFFFFF is an int32 with a value of negative one. Trying to use -1 as a mask value will have no effect. Trying to mask with &hFFFFFFFF&` would yield a value that could be converted to a UInt32, but not necessarily to an Int32.Treadway
Reversed my original downvote to an upvote. The castor solution is interesting. Thank you for continuing to maintain this answer.Weatherbeaten
Just want to point out to anyone who hates magic numbers (due to arguments such as these comments), you can do this: Return hash And Int32.MaxValueIleana
Another option is to use BitConverter to avoid losing the 32nd bit. I'm sure it's slower than the NoOverflows approach, but you wouldn't need to introduce a support module.Buhl
IMHO, a slightly better distribution of hash values is achieved if you don't simply throw away the upper 32-bits of the long. Fortunately, .NET has a built-in way to do exactly what we want, once you've computed the long intermediate value: Return hash.GetHashCode(). Yep, its that simple :) (I considered adding that as an alternative in your answer, since it preserves the essential point you make, which is "use Long intermediate". But I wasn't sure if you would consider such an edit rude/inappropriate.)Saunderson
C
12

Here is an implementation combining Hans Passant's answer and Jon Skeet's answer.

It works even for millions of properties (i.e. no integer overflow exceptions) and is very fast (less than 20 ms to generate hash code for a class with 1,000,000 fields and barely measurable for a class with only 100 fields).

Here is the structure to handle the overflows:

<StructLayout(LayoutKind.Explicit)>
Private Structure HashCodeNoOverflow
    <FieldOffset(0)> Public Int64 As Int64
    <FieldOffset(0)> Public Int32 As Int32
End Structure

And a simple GetHashCode function:

Public Overrides Function GetHashCode() As Integer

    Dim hashCode As HashCodeNoOverflow

    hashCode.Int64 = 17

    hashCode.Int64 = CLng(hashCode.Int32) * 23 + Field1.GetHashCode
    hashCode.Int64 = CLng(hashCode.Int32) * 23 + Field2.GetHashCode
    hashCode.Int64 = CLng(hashCode.Int32) * 23 + Field3.GetHashCode

    Return hashCode.Int32

End Function

Or if your prefer:

Public Overrides Function GetHashCode() As Integer

    Dim hashCode = New HashCodeNoOverflow With {.Int32 = 17}

    For Each field In Fields
        hashCode.Int64 = CLng(hashCode.Int32) * 23 + field.GetHashCode
    Next

    Return hashCode.Int32

End Function
Collide answered 7/10, 2015 at 16:43 Comment(1)
Couldn't you just use 23L instead of CLng(hashCode.Int32)?Greenwald
G
5

I had the same problem implementing Mr. Skeet's solution in vb.net. I ended up using the Mod operator to get there. Each Mod by Integer.MaxValue should return just the least significant component up to that point and will always be within Integer.MaxValue and Integer.MinValue -- which should have the same effect as unchecked. You probably don't have to mod as often as I do (it's only when there's a chance of getting bigger than a long (which would mean combining a LOT of hash codes) and then once at the end) but a variant of this works for me (and lets you play with using much bigger primes like some of the other hash functions without worrying).

Public Overrides Function GetHashCode() As Int32
    Dim hash as Int64 = 17
    hash = (hash * 23 + _Name.GetHashCode()) Mod Integer.MaxValue
    hash = (hash * 23 + _Value) Mod Integer.MaxValue
    hash = (hash * 23 + _Type.GetHashCode()) Mod Integer.MaxValue
    Return Convert.ToInt32(hash)
End Function
Ghislainegholston answered 14/9, 2015 at 7:44 Comment(3)
Two things: (1) at what point does overflow get 'checked'? Couldn't it overflow calculating hash * 23 + field.GetHashCode() before applying Mod Integer.MaxValue, or would it automatically cast up to Long then back down to Integer? (2) Does Mod not introduce some bias towards positive or negative numbers, rather than "rolling" around all possible values of Integer? i.e. Applying Mod to a positive n is going to end up with 0 <= n < Integer.MaxValue, while applying to a negative n is going to end up with Integer.MinValue < n <= 0 (with a small bias for -1)Ligni
Hi Jimbo -- (1) You're completely right, I've fixed the code, thanks for that. (2) I don't understand. I've thrown together an excel file, and if I mod by any positive k, I get a nice rolling pattern where the result is from zero to k-1.Ghislainegholston
Hi Jimbo -- (2) I understand now. You're right, there will be some small bias in that I get more zeros than I should (all integer.minvalue and integer.maxvalues will go to zero). Should be two parts in 4.25 billion of bias (roughly) -- I think testing for it might be more expensive than it's worth. My earlier mistake was thinking that Mod meant modulo -- instead the Mod keyword means remainder.Ghislainegholston
R
2

You can implement a suitable hash code helper in a separate assembly either using C# and the unchecked keyword or turning overflow checking of for the entire project (possible in both VB.NET and C# projects). If you want to you can then use ilmerge to merge this assembly to your main assembly.

Reversion answered 11/1, 2011 at 10:25 Comment(1)
There's another SO question on here addressed by this very solution. But that is a tad overkill and a bit excessive, though. It also seems somewhat hacky. I wonder if anyone has suggested to Microsoft that they add checked/unchecked blocks into the next iteration of .NET?Fulllength
K
2

Improved answer Overriding GetHashCode in VB without checked/unchecked keyword support?

Public Overrides Function GetHashCode() as Integer
  Dim hashCode as Long = 0
  If myReplacePattern IsNot Nothing Then _
    hashCode = ((hashCode*397) Xor myField.GetHashCode()) And &HffffffffL
  If myPattern IsNot Nothing Then _
    hashCode = ((hashCode*397) Xor myOtherField.GetHashCode()) And &HffffffffL
  Return CInt(hashCode)
End Function

There is a trimming after each multiplication. And literal is defined explicitly as Long because the And operator with an Integer argument does not zeroize the upper bytes.

Keldon answered 3/8, 2011 at 12:44 Comment(2)
What happens if the last trimming yields a value in the range &h80000000L to &hFFFFFFFFL?Treadway
You could replace the last line with BitConverter.ToInt32(BitConverter.GetBytes(hashCode), 0) to avoid a potential overflow (which is what supercat is referring to in his comment). Or use H7FFFFFFF in the final trimming operation.Buhl
R
2

After researching that VB had not given us anything like unchecked and raging for a bit (c# dev now doing vb), I implemented a solution close to the one Hans Passant posted. I failed at it. Terrible performance. This was certainly due to my implementation and not the solution Hans posted. I could have gone back and more closely copied his solution.

However, I solved the problem with a different solution. A post complaining about lack of unchecked on the VB language feature requests page gave me the idea to use a hash algorithm already in the framework. In my problem, I had a String and Guid that I wanted to use for a dictionary key. I decided a Tupple(Of Guid, String) would be a fine internal data store.

Original Bad Version

Public Structure HypnoKey
  Public Sub New(name As String, areaId As Guid)
    _resourceKey = New Tuple(Of Guid, String)(resourceAreaId, key)
  End Sub

  Private ReadOnly _name As String
  Private ReadOnly _areaId As Guid

  Public ReadOnly Property Name As String
    Get
      Return _name 
    End Get
  End Property

  Public ReadOnly Property AreaId As Guid
    Get
      Return _areaId 
    End Get
  End Property

  Public Overrides Function GetHashCode() As Integer
    'OMFG SO BAD
    'TODO Fail less hard
  End Function

End Structure

Much Improved Version

Public Structure HypnoKey
  Public Sub New(name As String, areaId As Guid)
    _innerKey = New Tuple(Of Guid, String)(areaId , key)
  End Sub

  Private ReadOnly _innerKey As Tuple(Of Guid, String)

  Public ReadOnly Property Name As String
    Get
      Return _innerKey.Item2
    End Get
  End Property

  Public ReadOnly Property AreaId As Guid
    Get
      Return _innerKey.Item1
    End Get
  End Property

  Public Overrides Function GetHashCode() As Integer
    Return _innerKey.GetHashCode() 'wow! such fast (enuf)
  End Function

End Structure

So, while I expect there are far better solutions than this, I am pretty happy. My performance is good. Also, the nasty utility code is gone. Hopefully this is useful to some other poor dev forced to write VB who comes across this post.

Cheers

Ramakrishna answered 25/8, 2014 at 21:43 Comment(0)
M
1

I've also found that RemoveIntegerChecks MsBuild property affects /removeintchecks VB compiler property that prevents compiler from emitting runtime checks:

  <PropertyGroup>
    <RemoveIntegerChecks>true</RemoveIntegerChecks>   
  </PropertyGroup>
Miguelinamiguelita answered 13/5, 2014 at 16:59 Comment(1)
The desire is to be able to write code which works even if the project has integer checks kept. People want the integer checks because they care more a lot about weird strange things happening in their business logic code due to unexpected integer wraparound.Nosh

© 2022 - 2024 — McMap. All rights reserved.