Re: [dotNetRDF-develop] Use of string.GetHashCode() for database identety

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Great, didn't realise the hashes in the database were used for caching only.

Michael

On Fri, Jun 4, 2010 at 10:36 AM, Rob Vesse <rv...@vd...> wrote:
> Hi Michael
>
> All the hash codes for Nodes and Triples in dotNetRDF (and most other
> internal objects which need their own hash codes) are based upon using
> GetHashCode() on a string representation of the object in question.  Yes
> hash codes are not stable across .Net versions and platform architectures
> but we do not require them to be since the vast majority of the time these
> hash codes are only being used in memory for the purpose of storage and
> lookup in various hash code based data structures and algorithms.  Hash
> codes just need to be fast and efficient to compute which the .Net string
> classes GetHashCode() implementation is regardless of .Net
> version/architecture since we want to compute the hash codes once and then
> store the value and just return it when needed.
>
> While we could potentially design and implement our own hash code algorithm
> for these things there is no point reinventing the wheel and it would almost
> certainly take far more effort to implement than the potential benefit of
> finding/designing an algorithm which generated codes with sufficient
> uniqueness and efficiency.
>
> With regards to their use for database identity this is a pragmatic design
> decision which makes a trade off between read/write speed and data
> instability.  Hash codes are used as part of database identity only for our
> own SQL based stored simply because it makes a significant difference in
> speed and most of the time you'll create and access your data on the same
> architecture so hash code instability won't be an issue.  Since it is a
> potential issue the database code is all designed to take account of this
> hash code instability and work around it automatically and seamlessly.
> Actual database identity is based on numeric identifiers and the hash codes
> are only used as a means to speed up and cache conversions between Nodes and
> database IDs.
>
> Regards,
>
> Rob Vesse
>
> ________________________________
> From: Michael Friis <fr...@gm...>
> Sent: 03 June 2010 16:05
> To: dot...@li...
> Subject: [dotNetRDF-develop] Use of string.GetHashCode() for database
> identety
>
> As far as I can determine from the code in eg. UriNode.cs, dotNetRDF
> uses string.GetHashCode() for database identety. This is bad design
> because the string hashcodes are not stable accross .Net version nor
> accross architecture:
> http://stackoverflow.com/questions/2099998/hash-quality-and-stability-of-string-gethashcode-in-net
>
> Regards
> Michael
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit. See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Dotnetrdf-develop mailing list
> Dot...@li...
> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop
>
>

-- 
http://friism.com
(+45) 27122799
Sapere aude