From: Chris M. <cj...@fr...> - 2002-10-31 18:54:01
|
On Wed, 30 Oct 2002, Hilmar Lapp wrote: > My $0.02 on this is that I have seen unintuitive and cryptic column > names causing as much grief as unintuitive and cryptic API method > names. Intuitive and consistent naming is IMHO a much neglected art, > but its lack is one of the most annoying (because avoidable) > barriers to any piece of API or schema. > > At first glance, neither pos5 nor fnbeg mean much to me. If you mean > 5' position, why not say pos_5prime and pos_3prime? > > As for start/end being right or wrong in bioperl, my take on this is > that it depends on your viewpoint and there's no silver bullet that > kills every bird. If your viewpoint is biological, then a feature > starts at its 5' end. If your viewpoint is a 1-dimensional axis, > then it is useful to define that end cannot be smaller than start, > and strand is the tool to map to the biological viewpoint. Bioperl > takes the latter viewpoint, which may be good for some and bad for > others. There's Bio::Coordinate that lets you potentially map > between any two systems. I have to say that some people here have > discovered the bioperl way of defining feature boundaries > independently of bioperl as the most useful one for storing and > searching genome mappings. To me it seems they've all got their > downsides and upsides, and one just needs to settle on one and be > consistent throughout. i meant the naming is wrong, not necessarily the semantics there is two choices of semantics for the two columns either [1] X <= Y or [2] (Y - X) * strand >= 0 (both assuming interbase coordintes) there is no absolute correct choice of what semantics to use - like you say, both have their up and downsides. (there is actually another choice, using offset+length, but i personally don't like this) however, start/end are obviously terrible, awful, confusing choices of attribute *name* for semantics [1], whether you speak biology or vector math or english. there is no debate on this one, sorry. we had already made the choice to go with semantics [2] for chado (so fmin/fmax as column names is not an option). my opinion is this is generally a more useful semantics. eg getting upstream regions. a lot more is expressible as simple arithmetic statements without restorting to ugly if/then/case constructs. given semantics 2 we were deciding on the names for X and Y. I think as Dave says having 5 and 3 in the column name is out. I do think it's ok to indicate a mathematical notation of directionality - even though protein features have strands, protein locations are still equivalent to 1-d vectors with directionality. you make a good point about cryptic names. i guess i tend towards shorter names. however, if you come across the name "fnbeg" and say "what's that?" and are forced to the read the documentation then this is a very good thing, as you then learn the semantics - both that these are interbase and directional. whereas a cosy familiar name will most likely lead people to assume they know the semantics and then mess up. this is what happens to people learning bioperl all the time. i'm being disingenuous, i know. i guess at the end of the day longer names are better. but then we have to be consistent within chado.... i'm glad i'm not the only one this pedantic about the naming of these things. (I don;t think the sementics issue is at all pedantinc - where possible these things should have a precise computational definition) > > -hilmar > > On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > > > I am sending this to the gmod-devel list to get the opinion of the > > larger audience. > > > > I am inclined to agree with Colin about nomenclature, though I do agree > > with you about bioperl's normal/incorrect use of boundaries. Before > > bioperl came along I did it the way you propose; it caused much > > confusion when I changed my schema to correspond to the bioperl way. > > Assuming we use Chris' proposed boundary coordinates, I think using > > check constraints is a good idea. > > > > Other opinions? > > > > Scott > > > > On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > >> I preferred your suggestion of pos5 and pos3, as well as my suggestion > >> of end5 and end3. I don't think a new chado user will figure out that > >> fnbeg stands for "feature natural begin" as easily as they would > >> figure > >> out that pos5 (or end5) is the "position of the 5' end". > >> > >> Colin > >> > >>> -----Original Message----- > >>> From: gmo...@li... [mailto:gmod-schema- > >>> ad...@li...] On Behalf Of Chris Mungall > >>> Sent: Wednesday, October 30, 2002 4:46 PM > >>> To: gmo...@li... > >>> Subject: [Gmod-schema] cvs changes: companalysis module, sequence > >> module > >>> > >>> > >>> I have reworked the tables in the computational analysis module; they > >> are > >>> now a little less generic than before. there is some docs included in > >> the > >>> .sql - more needed though... > >>> > >>> the multiple alignment part (eg for clustal results) is still fluid > >>> > >>> I have also settled on > >>> > >>> fnbeg > >>> fnend > >>> > >>> for specifying coordinates - feature natural begin, feature natural > >> end > >>> > >>> ie this is the "real" begin and end > >>> > >>> we should also possibly include a check constraint to make this > >> explicit > >>> eg > >>> > >>> fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > >>> > >>> > >>> this is opposed to the normal (erroneous in my opinion) use of > >> start/begin > >>> end/stop, as used in bioperl, where > >>> > >>> start <= end > >>> > >>> ie they actually mean (low, high) > >>> > >>> how do we feel about check constraints? > >>> > >>> > >>> > >>> ------------------------------------------------------- > >>> This sf.net email is sponsored by: Influence the future > >>> of Java(TM) technology. Join the Java Community > >>> Process(SM) (JCP(SM)) program now. > >>> http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>> _______________________________________________ > >>> Gmod-schema mailing list > >>> Gmo...@li... > >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema > >> > >> > >> > >> ------------------------------------------------------- > >> This sf.net email is sponsored by: Influence the future > >> of Java(TM) technology. Join the Java Community > >> Process(SM) (JCP(SM)) program now. > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >> _______________________________________________ > >> Gmod-schema mailing list > >> Gmo...@li... > >> https://lists.sourceforge.net/lists/listinfo/gmod-schema > >> > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > ca...@cs... > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: Influence the future > > of Java(TM) technology. Join the Java Community > > Process(SM) (JCP(SM)) program now. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > |