Who knew overriding GetHashCode() was so much fun?

So, I have a project with Code Analysis turned on and it treats warnings as errors. In this project, I have a class that overrides Equals() so that I can use some of the nice collection based linq expression (like Except()). Well, code analysis doesn’t like it if you override Equals, but forget to override GetHashCode().

Needless to say, I have to override GetHashCode(). But how does one do this? Some searching turned up a bunch of interesting resources showing how to do this. However, there isn’t some cut and paste code that solves the problem. You have to really understand the implications and usage of this method. For that purpose, I recommend: MSDN’s Object.GetHashCode Method documentation. I also found some interesting discussions on StackOverflow regarding the issue.

In my case, I have an object that I want to be “Equal” to another if three internal values are the same. This is where things get interesting. GetHashCode is just an integer. As such, how to I get it to return a value that is guaranteed not to have the same value as another object with a different set of 3 properties?

For Example, if I just add the numbers, then I could have a problem like this:

  Object A Object B
Field 1 Hash 1 1
Field 2 Hash 5 100
Field 3 Hash 100 5
Sum of Hashes 106 106

The above sums are equal, but the objects are different!

So, a number of implementations use seeds and factors to adjust the numbers and reduce the collisions. One such example (from the above stack overflow article) does so like this:

int hash = 13; 
hash = (hash * 7) + field1.GetHashCode(); 
hash = (hash * 7) + field2.GethashCode(); 
hash = (hash * 7) + field3.GethashCode(); 
return hash;

I’m not sure I fully understand this yet, but I guess I can see how this would improve the chance that things are not going to collide. In the above scenario, I would end up with something like the following:

  Object A Object B
Field 1 Hash 1 1
Field 2 Hash 5 100
Field 3 Hash 100 5
Sum of Hashes wtih Seed and Factor 91 + 1 + 644 + 5 + 4543 + 100

= 4643

91 + 1 + 644 + 100 + 5208 + 5

= 5213

So this seems better, but we’re still not guaranteed that these are unique (at least as far as my feeble mind can comprehend).

Microsoft’s MSDN documentation uses a bitwise XOR to combine hash codes from multiple contained fields. This too seems a bit risky, but I haven’t explored how it works in detail.

So all you intelligent people out there, please tell me what I’m missing. Is there some “algorithm” or factor + seed magic that does guarantee unique hash codes?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: