From: Egon W. <e.w...@sc...> - 2005-02-16 14:27:56
|
On Wednesday 16 February 2005 03:15 pm, Peter Murray-Rust wrote: > [copied to other lists to alert people that we are starting to use > cml-discuss again and suggest that CML-specific traffic is routed here. We > shall not crosspost again.] > > One of the areas we are still struggling with are default values and > implicit assumptions. These need to get firmed up in CML as defaults in the > schema. In many cases they are unknown and can be stamped as "known > unknowns" (in the Rumsfeld taxonomy of knowledge). In other cases it may be > less clear CDK uses the idea that unset is unknown, and does not use defaults as much as possible. If Object's are used, null is unset. When primatives are used, this is more troublesome. Some primatives have NaN which could be used, but others don't. (E.g. unset for boolean is ???) > Here are some typical examples Since they idea of CML is to preserver information (correct?), I would not be happy with default settings... what would happen if some format X does not have a way to define or give formal charge; would the default then still be given in CML? > atom: > * formalCharge. Can we assume the default is 0? Not if you want to ensure input info = output info. Otherwise, zero seems logical for organic structures, but what about compounds with metals/etc? > This would assert that if > the attribute was absent it is known to be zero? This can be set in the > schema > > * occupancy. Similarly can we assume the default is 1.0? That sounds reasonable, but no crystallography expert. > * x2 (and all other coordinates). Here it is extremely dangerous to assume > any value, so it is a known unknown. Can we use Double.NaN as marking > unset? And then can a user can definitely write: > if (atom.getX2() == Double.NaN) { > System.out.println("atom x2 not known: "); > } I think Jmol uses things like x.NaN too, but I believe such is not available for all Java primatives. What would be the schema equivalents of these x.NaN's ? > * hydrogenCount. Here it is also unreasonable to assume that a missing > value tells us anything, so it is a known unknown. How do we do that? There > is no "null" value for an int or NaN so we either have to box it as Integer > or use some special value (e.g. negative infinity) or use an additional > variable to indicate state of knowledge. In Java terms, I think it is commen to define constants to say unset, e.g. if (atom.getX2() == atom.X2_UNSET) { System.out.println("atom x2 has no 2D coordinate"); } It's an interesting area, and I'm looking forward to reading other comments. Egon |