Re: [dotNetRDF-develop] Use of string.GetHashCode() for database identety

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Michael

All the hash codes for Nodes and Triples in dotNetRDF (and most other internal objects which need their own hash codes) are based upon using GetHashCode() on a string representation of the object in question.  Yes hash codes are not stable across .Net versions and platform architectures but we do not require them to be since the vast majority of the time these hash codes are only being used in memory for the purpose of storage and lookup in various hash code based data structures and algorithms.  Hash codes just need to be fast and efficient to compute which the .Net string classes GetHashCode() implementation is regardless of .Net version/architecture since we want to compute the hash codes once and then store the value and just return it when needed.

While we could potentially design and implement our own hash code algorithm for these things there is no point reinventing the wheel and it would almost certainly take far more effort to implement than the potential benefit of finding/designing an algorithm which generated codes with sufficient uniqueness and efficiency.

With regards to their use for database identity this is a pragmatic design decision which makes a trade off between read/write speed and data instability.  Hash codes are used as part of database identity only for our own SQL based stored simply because it makes a significant difference in speed and most of the time you'll create and access your data on the same architecture so hash code instability won't be an issue.  Since it is a potential issue the database code is all designed to take account of this hash code instability and work around it automatically and seamlessly.  Actual database identity is based on numeric identifiers and the hash codes are only used as a means to speed up and cache conversions between Nodes and database IDs.

Regards,

Rob Vesse

----------------------------------------
From: Michael Friis <fr...@gm...>
Sent: 03 June 2010 16:05
To: dot...@li...
Subject: [dotNetRDF-develop] Use of string.GetHashCode() for database identety 

As far as I can determine from the code in eg. UriNode.cs, dotNetRDF
uses string.GetHashCode() for database identety. This is bad design
because the string hashcodes are not stable accross .Net version nor
accross architecture:
http://stackoverflow.com/questions/2099998/hash-quality-and-stability-of-string-gethashcode-in-net

Regards
Michael

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Dotnetrdf-develop mailing list
Dot...@li...
https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop