From: <ja...@ca...> - 2002-08-31 20:59:28
|
Hi! I think this is common problem when there is no single server solution. For example when using something like Napster how can you tell if file "song_1.mp3" is the same as "song-1.mp3". You can't. You could check file size and you could check the beggining bytes and the ending bytes of that file. This example is trivial, but you could have "Brinkhoff's Brauerei" and "Grinkhoff's Brauerei" and I wouldn't be so sure to decide if it is the same or not. In programms as Napster and DC there is no solution for duplicates(yet). *)One solution could be to introduce a standart for naming system. Of course not everyone will follow that. *)We could also make a single server where everyone must(!!!) register any new breweries. After that they would receive an unique ID to use. Of course not everyone has internet accessible and one could try to use his own IDs. *)Hashing comes in handy when there is necessity to simplify algorithms and where data size for key has to be reduced. There could be some improvement if names were replaced by their hashed values, but I am not sure how could it help to reduce duplicates. Actually I think that there is no perfect solution for this. Maybe the best solution would be to make it very easy to solve such conflicts. Still thinking how to do that ... Well, ... this looks like a hard one. Janis p.s. I like the small logo. It really looks like a bear matt. I am not spcialist in design but I think that's exactly what was needed. :) > Hi, > > when creating the demo data package, I recognized some weak points in our > database structure. > > Summary of the structure as it is now: > > We have bddb_beer linking ID of bddb_brewery. > We have bddb_series linking ID of bddb_brewery. > We have bddb_mats linking ID of bddb_brewery and ID of bddb_series and ID > of bddb_beer. > > As obvious, you must be secure, that ID of bddb_brewery is always > representing the same brewery, even if you import external data. For now, > ID is set to autoincrement and could not garantuee always connect the same > brewery (just take a second database, and you will get new ID's). > > > My idea to solve the problem is to create a ID hash from brewery name > string. This one will be unique, whenever you hash the same brewery. > I am not expert in hashing matters, but I think this could be done. > One more point: I would be good to make sure, that small differences in the > name string would create the same hash (i.e. "Brinkhoff's > Brauerei", "Brauerei Brinkhoff", "Brinkhoff´s Brauerei") as they all > represent the same brewery. > > Any ideas? Any comments? I am not sure if this one is the best solution. > > Brian. > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp? r_______________________________________________ > Phpbddb-dev1 mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpbddb-dev1 > > > --> http://www.one.lv - your number one mobile email service! > |