You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(28) |
Nov
(87) |
Dec
(16) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(109) |
Feb
(107) |
Mar
(117) |
Apr
(5) |
May
(156) |
Jun
(83) |
Jul
(86) |
Aug
(25) |
Sep
(17) |
Oct
(14) |
Nov
(82) |
Dec
(50) |
2004 |
Jan
(14) |
Feb
(75) |
Mar
(110) |
Apr
(83) |
May
(20) |
Jun
(36) |
Jul
(12) |
Aug
(37) |
Sep
(9) |
Oct
(11) |
Nov
(52) |
Dec
(68) |
2005 |
Jan
(46) |
Feb
(94) |
Mar
(68) |
Apr
(55) |
May
(67) |
Jun
(65) |
Jul
(67) |
Aug
(96) |
Sep
(79) |
Oct
(46) |
Nov
(24) |
Dec
(64) |
2006 |
Jan
(39) |
Feb
(31) |
Mar
(48) |
Apr
(58) |
May
(31) |
Jun
(57) |
Jul
(29) |
Aug
(40) |
Sep
(22) |
Oct
(31) |
Nov
(44) |
Dec
(51) |
2007 |
Jan
(103) |
Feb
(172) |
Mar
(59) |
Apr
(41) |
May
(33) |
Jun
(50) |
Jul
(60) |
Aug
(51) |
Sep
(21) |
Oct
(40) |
Nov
(89) |
Dec
(39) |
2008 |
Jan
(28) |
Feb
(20) |
Mar
(19) |
Apr
(29) |
May
(29) |
Jun
(24) |
Jul
(32) |
Aug
(16) |
Sep
(35) |
Oct
(23) |
Nov
(17) |
Dec
(19) |
2009 |
Jan
(4) |
Feb
(23) |
Mar
(16) |
Apr
(16) |
May
(38) |
Jun
(54) |
Jul
(18) |
Aug
(40) |
Sep
(58) |
Oct
(6) |
Nov
(8) |
Dec
(29) |
2010 |
Jan
(40) |
Feb
(40) |
Mar
(63) |
Apr
(95) |
May
(136) |
Jun
(58) |
Jul
(91) |
Aug
(55) |
Sep
(77) |
Oct
(52) |
Nov
(85) |
Dec
(37) |
2011 |
Jan
(22) |
Feb
(46) |
Mar
(73) |
Apr
(138) |
May
(75) |
Jun
(35) |
Jul
(41) |
Aug
(13) |
Sep
(13) |
Oct
(11) |
Nov
(21) |
Dec
(5) |
2012 |
Jan
(13) |
Feb
(34) |
Mar
(59) |
Apr
(4) |
May
(13) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(2) |
Nov
(4) |
Dec
(1) |
2013 |
Jan
(18) |
Feb
(28) |
Mar
(19) |
Apr
(42) |
May
(43) |
Jun
(41) |
Jul
(41) |
Aug
(31) |
Sep
(6) |
Oct
(2) |
Nov
(2) |
Dec
(70) |
2014 |
Jan
(55) |
Feb
(98) |
Mar
(44) |
Apr
(40) |
May
(15) |
Jun
(18) |
Jul
(20) |
Aug
(1) |
Sep
(13) |
Oct
(3) |
Nov
(37) |
Dec
(85) |
2015 |
Jan
(16) |
Feb
(12) |
Mar
(16) |
Apr
(13) |
May
(16) |
Jun
(3) |
Jul
(23) |
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2016 |
Jan
(12) |
Feb
(1) |
Mar
(9) |
Apr
(13) |
May
(4) |
Jun
(5) |
Jul
|
Aug
|
Sep
(10) |
Oct
(11) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
(11) |
Apr
(8) |
May
|
Jun
(6) |
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(2) |
Dec
(1) |
2018 |
Jan
(6) |
Feb
(6) |
Mar
(3) |
Apr
(9) |
May
(3) |
Jun
|
Jul
|
Aug
(3) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
(4) |
2019 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2020 |
Jan
(22) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
(2) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(5) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Stan L. <st...@mo...> - 2002-11-15 15:43:36
|
Scott, I developed a 3D data blade in Illustra (a commercial descendent of postgres) some years ago for MRI images. There is a built-in 2D index using R-tree indexing. However for the 1D case that bioinformatics is most commonly interested in, all that is overkill. A compound index of the form (contig_id, fmin, fmax) (recently renamed to nbeg and nend) should suffice to make range queries efficient. It may be a good idea to use the cluster command on this index periodically as well, to get good coupling between disk hardware and software query optimization. You may also need to play with the ways of expressing the query -- I am not sure how smart the postgres optimizer is about knowing which indexes to use when, or whether conjunct order affects this. I have done this in Sybase in the past and gotten very good performance on both "contains" and "overlaps" queries. Contains: where fmin >= {query_interval_fmin} and fmax <= {query_interval_fmax} Overlaps: where fmin <= {query_interval_fmax} and fmax >= {query_interval_fmin} But it has to use the index. Someone should experiment with postgres and make sure it does the right thing here. Note that the proposed policy of having nbeg > bend when on the opposite strand will complicate these queries, possibly making it harder to get the optimizer to use the index. The versions above only work when fmin <= fmax. If people are wedded to the strand/reversal idea, you may want to materialize fmin = min(nbeg, nend) fmax = max(nbeg, nend) as extra columns in feature for indexing purposes. Unless someone can verify that postgres can efficiently index the nbeg/nend queries with strand reversal. Cheers, -Stan |
From: Chris M. <cj...@fr...> - 2002-11-14 20:26:47
|
Another avenue to explore is the postgres range type also i don't see any problem with materializing helper tables to speed up the queries, something similar to bio-db-gff's bins. of course, there is the danger of redundancy and so forth, but this is fine if you only you gbrowse on 'warehouse' style copies of the db. i think this is the paradigm we will end up using a lot for chado. this is what ensembl do with their core/mart split (although i would prefer chado to do this via layering, so you still have the core chado tables underneath if you want to get to the normalized data) sorry i don't have anything more solid to go on On 14 Nov 2002, Scott Cain wrote: > Hello all, > > As I was putting together queries that I will use to port GBrowse to run > on top of chado, I realized that I was not getting the kind of response > that would allow GBrowse to run at a reasonable speed, an observation > that will probably not surprise anyone. > > I noticed that Lincoln precalculates bins for features that greatly > increase response time and asked him if I should consider doing a > similar sort of thing for chado. He said I could, but that I might also > consider an R-tree index. After some preliminary reading, I understand > that an R-tree index is for use with geometric functions, and since DNA > is, especially from a feature perspective, just a line, I can see how > using geometric functions could be very useful. Has anyone considered > their use in chado, or even better, could anyone provide examples? > Lacking that, does anyone have a good general reference for writing > queries using geometric functions? > > Thanks much, > Scott > > |
From: Chris M. <cj...@fr...> - 2002-11-14 20:10:24
|
On Thu, 14 Nov 2002, Ewan Birney wrote: > > For what it is worth, I think SO belongs in the "output" layer of a > database such as Ensembl, ie, i don't think we will be assigning SO terms > to things inside the Ensembl database as the SO terms sort of are "inside" > the schema. This is similar to what i was talking about i think > Instead in Mart (data mining) and other areas we assign SO terms to our > "hard data model" of Ensembl. > > > This means that we can generate more SO terms if so desired etc or adapt > the mapping of Ensembl terms to SO flexibly, rather than SO mappings being > inherently part of our model. so whereas SO encodes [exon part-of transcript] using a graph type language, ensembl encodes this with both foreign key relationships in the schema, and associations in the object model both are sensible and have various trade-offs in terms of flexibility and software engineering consequences; it also depends how deep in the SO graph you go - I don't think ensembl goes that deep, and I think what you are saying is that when you do want to get that detailed you switch to a flexible type-system that encodes SO > Chris and others - do you think this is a crazy thing to do? Perfectly sane One day it would be nice to automatically convert between the two representations but for now it's good to try and keep them in sync. |
From: Scott C. <ca...@cs...> - 2002-11-14 16:42:42
|
Hello all, As I was putting together queries that I will use to port GBrowse to run on top of chado, I realized that I was not getting the kind of response that would allow GBrowse to run at a reasonable speed, an observation that will probably not surprise anyone. I noticed that Lincoln precalculates bins for features that greatly increase response time and asked him if I should consider doing a similar sort of thing for chado. He said I could, but that I might also consider an R-tree index. After some preliminary reading, I understand that an R-tree index is for use with geometric functions, and since DNA is, especially from a feature perspective, just a line, I can see how using geometric functions could be very useful. Has anyone considered their use in chado, or even better, could anyone provide examples? Lacking that, does anyone have a good general reference for writing queries using geometric functions? Thanks much, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. ca...@cs... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory |
From: Ewan B. <bi...@eb...> - 2002-11-14 09:56:39
|
For what it is worth, I think SO belongs in the "output" layer of a database such as Ensembl, ie, i don't think we will be assigning SO terms to things inside the Ensembl database as the SO terms sort of are "inside" the schema. Instead in Mart (data mining) and other areas we assign SO terms to our "hard data model" of Ensembl. This means that we can generate more SO terms if so desired etc or adapt the mapping of Ensembl terms to SO flexibly, rather than SO mappings being inherently part of our model. Chris and others - do you think this is a crazy thing to do? |
From: Mark Y. <mya...@fr...> - 2002-11-13 23:00:10
|
Hi Chris , I like you thinking here; I'm planning on using SO a LOT in the coming months so here's my two-bits worth: > I have a few questions about how SO is likely to be applied to a feature > database - this isn't about the content of SO, it's more of a database > issue. > > * How specific should we be when attaching SO terms to features? > > e.g. Should we mark exons as "interior exon" or just as "exon"? > > * How much redundancy should we use in instantiating features? > > Intron features are implicit from exon features; > splice site features are implicit from introns; > Coding exons are implicit from exons and ORF boundaries; It seems that one logical way to go is to ONLY have features that can't be inferred from other features. As you point out, introns can be innferred from exons; UTR can be inffered from exon order and start and stop sites; ditto for the non-coding bases of first and last exons. > > People are obviously free to use SO in their databases as they please. > However, we also want to write interoperable tools around SO, and the > tools need to know what data to expect. Yes, and I'm hoping to get some sort of document containing the essential parts of a gene together with the relationships between them. Given such a document I'm looking forward to taking part in writing the functions that will do things like return the coordinants of the non coding portion of the first exon, etc. I think that these documents togther with such code will really enable a whole new level of data-mining, as it will finially make it easy to ask questions like 'I've done a BLAST search against the Fly genome ...does my EST overlap a gene ? If so does it contain a portion of the coding sequence of this gene?' The same documents and functions seem like they would also enable some cool graphical interfaces to common search tools as well... > > I think for a warehouse/data mining type DB you may well want to > instantiate everything, to maximum detail. Similarly, if I was providing > data via DAS or a related protocol, I would use the most specific SO term, > and possibly provide redundant features as well. > > However, for the primary datastore, I think it makes sense to go against > the GO annotation rule which is "annotate to as fine a level of > granularity as you can". For SO I would add a qualifier "...so long as the > finer level of granularity is providing a higher information content. I think this is sound thinking; so long as datebase can spit out documents that contain all of the 'essential' parts information, everything else can be calculated. One thing I've learned about databases: A database capable of answering every question is the answer to none-- so I agree! > > Here's the reasoning - if you instantiate introns, then you have to make > sure you recreate new introns if the exons change. if you add a new exon, > you have to check if the old internal exons are still the new > internal exon. This is a large burden to place on any tool that > manipulates feature data. There is a large danger of leaving a database in > an inconsistent state. Well it still needs to be up to date. I agree though-- it just makes good sense to avoid filling it with derivative information from an update point of view and from every point of view. > > It is always possible to logically infer introns or whether an exon is > internal (see caveat below), so it causes more problems to instantiate > these extra features in the database than it does to get these > dynamically. Yes, dynamically is the way to go; just give me a document with the essentials and I'll just smile all day long ;-) > > Of course, we should always use the more specific term if this > increases information content. If we know a gene is tRNA coding, we should > obviously use the term "transfer RNA coding gene", as this is providing > more information than "gene". The tRNA-codingness cannot be logically > inferred from existing data. So this seems dangerously close to GO to me. I think we should try to also keep SO pure in the sense that it contains no FUNCTIONAL info. I mean, if I want to know what this collection of exons, a start codon and a stop codon 'does' shouldn't I just turn to GO? > > caveat: of course, we can't make the Closed World Assumption: if a curated > gene model contains 6 exons, there may be another exon awaiting discovery > via new EST evidence. Yes, and I plan to use SO as an important adjunct to discovering such things... >However, I would be wary of using the SO term "5 > prime exon" to indicate that the curator believes that this is the > definite boundary - this should be a specific attribute of the transcript. > We can then infer that an exon is a "5 prime exon" from a combination of > spatial data and this attribute. > > There is a strong danger programmers will say "oh, I know what term xxxxx > means", and go ahead and write some code that infers this; then another > programmer will write code that uses a subtly different definition, which > will lead to problems. Therefore the rules for infering specific SO > features or types should be captured in some declarative language (ie not > buried in the depths of a perl or java program) and form part of SO > itself. for example: > > ;; **** rule for infering introns **** > IF instance-of(?ex1, 'exon') > AND instance-of(?ex2, 'exon') > AND neighbour-on-transcript(?ex1, ?ex2, ?tr) > > THEN THERE EXISTS instance-of(?intron, 'intron') > ?intron.start == ?ex1.end AND > ?intron.end == ?ex2.start > > > ;; **** rule for infering 5 prime ness **** > IF instance-of(?ex, 'exon') > AND five-prime-most-exon-on-transcript(?ex, ?tr) > AND is-a-complete-transcript(?tr) > > THEN instance-of(?ex, '5 prime exon') > > [aside: The second rule can easily be captured by writing SO down as a > Description Logic (DLs are a common way of representing ontologies - > DAML+OIL and its successor OWL are DLs. There are some editors out there > that let you manipulate DLs, such as OilED and GOET)] > Sounds good to me, but we should also go the whole way and provide the actual code to recover the implicit objects as well (as a function in a perl module?), as without such things, the whole thing is sort of masturbatory; Of course one of the great things about these formal description languages is that if you can actually find someone who can read these things... then they can turn them into code even if they don't understand the biology. Of course its the nature of the beast that programmers who actually understand things like description logic syntax consider themselves above instantiatiting those descriptions, and usually too good for biology as well ;-) Neverthess, wouldn't a Perl module wherein the pod-documetaion contained the formal description of just what a function returned would be very cool don't you think? > To summarise - I think it's best to manage feature data by annotating to a > 'SO-slim' and leave the inference/transformation to 'SO-full' as a > external step. I absolutely agree! >SO-oriented tools could declare themselves compliant with > either level. Yes, and I also think the best way to work the kinks out of the ontology is to actually start writing the 'SO-orientated' tools; I would love to have some prototype documents to play with! --mark y. > > |
From: <SLe...@ao...> - 2002-11-13 13:22:06
|
As long as we are soliciting input, let me add a specific question along these lines: the relationships between gene (genomic DNA spanning transcript), transcript, ORF and protein might be described at both high (abstract) and low (explicit biological process description levels). Eg. Relation High Low ----------- ------------------------ ----------------------- gene->transcript part-of or produced-by transcriptional product of transcript->ORF part of coding region ORF->protein produced by translational product of The high level reuses a small set of general terms (part of, produced by) whereas the low level refers to explicit biological processes. Chris argues that the high level can always be inferred from the low level plus the types of the actors, using a small set of rules as described below. Anyone have opinions? Preferences? Even (gasp) arguments? Cheers, -Stan In a message dated 11/12/2002 10:32:41 PM Eastern Standard Time, cj...@fr... writes: > Subj: [Gmod-schema] Using SO in a feature database > Date: 11/12/2002 10:32:41 PM Eastern Standard Time > From: <A HREF="mailto:cj...@fr...">cj...@fr...</A> > To: <A HREF="mailto:so...@ge...">so...@ge...</A> > CC: <A HREF="mailto:gmo...@li...">gmo...@li...</A> > Sent from the Internet > > > > > I have a few questions about how SO is likely to be applied to a feature > database - this isn't about the content of SO, it's more of a database > issue. > > * How specific should we be when attaching SO terms to features? > > e.g. Should we mark exons as "interior exon" or just as "exon"? > > * How much redundancy should we use in instantiating features? > > Intron features are implicit from exon features; > splice site features are implicit from introns; > Coding exons are implicit from exons and ORF boundaries; > > > People are obviously free to use SO in their databases as they please. > However, we also want to write interoperable tools around SO, and the > tools need to know what data to expect. > > I think for a warehouse/data mining type DB you may well want to > instantiate everything, to maximum detail. Similarly, if I was providing > data via DAS or a related protocol, I would use the most specific SO term, > and possibly provide redundant features as well. > > However, for the primary datastore, I think it makes sense to go against > the GO annotation rule which is "annotate to as fine a level of > granularity as you can". For SO I would add a qualifier "...so long as the > finer level of granularity is providing a higher information content. > > Here's the reasoning - if you instantiate introns, then you have to make > sure you recreate new introns if the exons change. if you add a new exon, > you have to check if the old internal exons are still the new > internal exon. This is a large burden to place on any tool that > manipulates feature data. There is a large danger of leaving a database in > an inconsistent state. > > It is always possible to logically infer introns or whether an exon is > internal (see caveat below), so it causes more problems to instantiate > these extra features in the database than it does to get these > dynamically. > > Of course, we should always use the more specific term if this > increases information content. If we know a gene is tRNA coding, we should > obviously use the term "transfer RNA coding gene", as this is providing > more information than "gene". The tRNA-codingness cannot be logically > inferred from existing data. > > caveat: of course, we can't make the Closed World Assumption: if a curated > gene model contains 6 exons, there may be another exon awaiting discovery > via new EST evidence. However, I would be wary of using the SO term "5 > prime exon" to indicate that the curator believes that this is the > definite boundary - this should be a specific attribute of the transcript. > We can then infer that an exon is a "5 prime exon" from a combination of > spatial data and this attribute. > > There is a strong danger programmers will say "oh, I know what term xxxxx > means", and go ahead and write some code that infers this; then another > programmer will write code that uses a subtly different definition, which > will lead to problems. Therefore the rules for infering specific SO > features or types should be captured in some declarative language (ie not > buried in the depths of a perl or java program) and form part of SO > itself. for example: > > ;; **** rule for infering introns **** > IF instance-of(?ex1, 'exon') > AND instance-of(?ex2, 'exon') > AND neighbour-on-transcript(?ex1, ?ex2, ?tr) > > THEN THERE EXISTS instance-of(?intron, 'intron') > ?intron.start == ?ex1.end AND > ?intron.end == ?ex2.start > > > ;; **** rule for infering 5 prime ness **** > IF instance-of(?ex, 'exon') > AND five-prime-most-exon-on-transcript(?ex, ?tr) > AND is-a-complete-transcript(?tr) > > THEN instance-of(?ex, '5 prime exon') > > [aside: The second rule can easily be captured by writing SO down as a > Description Logic (DLs are a common way of representing ontologies - > DAML+OIL and its successor OWL are DLs. There are some editors out there > that let you manipulate DLs, such as OilED and GOET)] > > To summarise - I think it's best to manage feature data by annotating to a > 'SO-slim' and leave the inference/transformation to 'SO-full' as a > external step. SO-oriented tools could declare themselves compliant with > either level. > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > To learn the basics of securing your web site with SSL, > click here to get a FREE TRIAL of a Thawte Server Certificate: > http://www.gothawte.com/rd522.html > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > |
From: Chris M. <cj...@fr...> - 2002-11-13 03:31:51
|
I have a few questions about how SO is likely to be applied to a feature database - this isn't about the content of SO, it's more of a database issue. * How specific should we be when attaching SO terms to features? e.g. Should we mark exons as "interior exon" or just as "exon"? * How much redundancy should we use in instantiating features? Intron features are implicit from exon features; splice site features are implicit from introns; Coding exons are implicit from exons and ORF boundaries; People are obviously free to use SO in their databases as they please. However, we also want to write interoperable tools around SO, and the tools need to know what data to expect. I think for a warehouse/data mining type DB you may well want to instantiate everything, to maximum detail. Similarly, if I was providing data via DAS or a related protocol, I would use the most specific SO term, and possibly provide redundant features as well. However, for the primary datastore, I think it makes sense to go against the GO annotation rule which is "annotate to as fine a level of granularity as you can". For SO I would add a qualifier "...so long as the finer level of granularity is providing a higher information content. Here's the reasoning - if you instantiate introns, then you have to make sure you recreate new introns if the exons change. if you add a new exon, you have to check if the old internal exons are still the new internal exon. This is a large burden to place on any tool that manipulates feature data. There is a large danger of leaving a database in an inconsistent state. It is always possible to logically infer introns or whether an exon is internal (see caveat below), so it causes more problems to instantiate these extra features in the database than it does to get these dynamically. Of course, we should always use the more specific term if this increases information content. If we know a gene is tRNA coding, we should obviously use the term "transfer RNA coding gene", as this is providing more information than "gene". The tRNA-codingness cannot be logically inferred from existing data. caveat: of course, we can't make the Closed World Assumption: if a curated gene model contains 6 exons, there may be another exon awaiting discovery via new EST evidence. However, I would be wary of using the SO term "5 prime exon" to indicate that the curator believes that this is the definite boundary - this should be a specific attribute of the transcript. We can then infer that an exon is a "5 prime exon" from a combination of spatial data and this attribute. There is a strong danger programmers will say "oh, I know what term xxxxx means", and go ahead and write some code that infers this; then another programmer will write code that uses a subtly different definition, which will lead to problems. Therefore the rules for infering specific SO features or types should be captured in some declarative language (ie not buried in the depths of a perl or java program) and form part of SO itself. for example: ;; **** rule for infering introns **** IF instance-of(?ex1, 'exon') AND instance-of(?ex2, 'exon') AND neighbour-on-transcript(?ex1, ?ex2, ?tr) THEN THERE EXISTS instance-of(?intron, 'intron') ?intron.start == ?ex1.end AND ?intron.end == ?ex2.start ;; **** rule for infering 5 prime ness **** IF instance-of(?ex, 'exon') AND five-prime-most-exon-on-transcript(?ex, ?tr) AND is-a-complete-transcript(?tr) THEN instance-of(?ex, '5 prime exon') [aside: The second rule can easily be captured by writing SO down as a Description Logic (DLs are a common way of representing ontologies - DAML+OIL and its successor OWL are DLs. There are some editors out there that let you manipulate DLs, such as OilED and GOET)] To summarise - I think it's best to manage feature data by annotating to a 'SO-slim' and leave the inference/transformation to 'SO-full' as a external step. SO-oriented tools could declare themselves compliant with either level. |
From: Suzanna L. <su...@fr...> - 2002-11-13 02:00:32
|
for cancer I assume you have already seen this: http://cgap.nci.nih.gov/ (the Cancer Genome Anatomy Project). They are good folk and would have further leads. The mouse people also have a nice anatomy and are working on associating mouse phenotypes with human diseases: http://www.geneontology.org/gobo/anatomy.ontology/Mouse_anatomy_by_time_xproduct also you might take a look at GALEN http://www.opengalen.org/ This is just off the top of my head. There are more related resources. Its a tough nut so a lot of people are working on this. ciao, s Allen Day wrote: >It's okay, I'm still making changes to the schema over here. > >WRT the cvterm module, I'm looking for ontologies of human anatomy and >brain cancers. Does anyone know where I can find them, or where I might >start looking? > >-Allen > > >On Tue, 12 Nov 2002, Chris Mungall wrote: > > > >>Hey Allen, >> >>I haven't had time to look over these myself yet, but I haven't forgotten >>about this - more later.... >> >>On Thu, 7 Nov 2002, Allen Day wrote: >> >> >> >>>Chris, >>> >>>I had a look at the ChaDo expression module, and used it to set up a >>>database over here. For the next few days, I'll be writing the DBI layer >>>that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo >>>records. I can post/commit this code and sample data somewhere if people >>>are interested to see it. >>> >>>I had to make some changes to the schema, and was wondering if these >>>changes are generically useful enough to augment on to the base expression >>>module, or whether I should just keep my changes to myself :). They are: >>> >>>(1) added an egroup table for tracking groups of related expression >>>records. egroup and expression are related using an expression_egroup >>>link table. it's conceivable that users will want to group related >>>expression records together, and that a single expression record may >>>belong to multiple groups. >>> >>>(2) added an eplatform table, and an eplatform_id FK to expression. i >>>plan on throwing all my different expression technology data into the >>>expression table, and need a way to keep track of which record comes from >>>which platform (just Affymetrix Human U95Av2, U133A, and U133B data right >>>now). >>> >>>(3) made the feature_expression table into a "stack" of tables that >>>conform to feature_expression interface from your diagram. The table >>>"stack" allows me to store my platform-specific data in seperate tables in >>>a common database. for instance, instead of having a feature_expression, >>>i have a feature_expression_cel table for my raw Affymetrix measurements, >>>and another table called feature_expression_dchip for my normalized >>>Affymetrix measurements. it's possible to figure out which table in the >>>"stack" a particular expression record has data in by its eplatform_id. >>> >>>------ >>> >>>Would it be more clear to show you my SQL CREATE statements? >>> >>>-Allen >>> >>> >>> >>>------------------------------------------------------- >>>This sf.net email is sponsored by: See the NEW Palm >>>Tungsten T handheld. Power & Color in a compact size! >>>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en >>>_______________________________________________ >>>Gmod-devel mailing list >>>Gmo...@li... >>>https://lists.sourceforge.net/lists/listinfo/gmod-devel >>> >>> >>> > > > >------------------------------------------------------- >This sf.net email is sponsored by: >To learn the basics of securing your web site with SSL, >click here to get a FREE TRIAL of a Thawte Server Certificate: >http://www.gothawte.com/rd522.html >_______________________________________________ >Gmod-devel mailing list >Gmo...@li... >https://lists.sourceforge.net/lists/listinfo/gmod-devel > > |
From: Chris M. <cj...@fr...> - 2002-11-12 22:24:31
|
On Tue, 12 Nov 2002, Allen Day wrote: > here is a document by cjm about the naming conventions. > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/schema/chado/modules/naming_conventions.txt?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup > > i hope we don't decide to change the naming rules too much, i'm already > starting to develop based on this schema! personally i'm not as keen on my rules as i was originally; maybe more underscores would be better. i can always use views in my copy to satisfy my urge for brevity. whichever rules we use, there are still a few cases where column names are inconsistent with the rules that need changed. > -Allen |
From: ShengQiang S. <ss...@fr...> - 2002-11-12 21:38:03
|
Hi, there. just want to chime in on the schema. The schema is very flexible in that you can make any statement about an expression (having expression table) for a gene. I like it a lot. In the comments for expresion_cvterm table, adding a type qualifier was contemplated, but not materialized. I strongly suggest to have that in the table. Take a look in Flybase anatomy ontology, blastoderm is both a body part and stage term depending on the context. It is very ambiguous. Having a type qualifier should give us some freedom in determing type of cvterm expressed, instead of relying on ontology. In BDGP in situ expression database (built on top of GO schema), I had to add a table to solve that problem. Shu Allen Day wrote: > It's okay, I'm still making changes to the schema over here. > > WRT the cvterm module, I'm looking for ontologies of human anatomy and > brain cancers. Does anyone know where I can find them, or where I might > start looking? > > -Allen > > > On Tue, 12 Nov 2002, Chris Mungall wrote: > > >>Hey Allen, >> >>I haven't had time to look over these myself yet, but I haven't forgotten >>about this - more later.... >> >>On Thu, 7 Nov 2002, Allen Day wrote: >> >> >>>Chris, >>> >>>I had a look at the ChaDo expression module, and used it to set up a >>>database over here. For the next few days, I'll be writing the DBI layer >>>that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo >>>records. I can post/commit this code and sample data somewhere if people >>>are interested to see it. >>> >>>I had to make some changes to the schema, and was wondering if these >>>changes are generically useful enough to augment on to the base expression >>>module, or whether I should just keep my changes to myself :). They are: >>> >>>(1) added an egroup table for tracking groups of related expression >>>records. egroup and expression are related using an expression_egroup >>>link table. it's conceivable that users will want to group related >>>expression records together, and that a single expression record may >>>belong to multiple groups. >>> >>>(2) added an eplatform table, and an eplatform_id FK to expression. i >>>plan on throwing all my different expression technology data into the >>>expression table, and need a way to keep track of which record comes from >>>which platform (just Affymetrix Human U95Av2, U133A, and U133B data right >>>now). >>> >>>(3) made the feature_expression table into a "stack" of tables that >>>conform to feature_expression interface from your diagram. The table >>>"stack" allows me to store my platform-specific data in seperate tables in >>>a common database. for instance, instead of having a feature_expression, >>>i have a feature_expression_cel table for my raw Affymetrix measurements, >>>and another table called feature_expression_dchip for my normalized >>>Affymetrix measurements. it's possible to figure out which table in the >>>"stack" a particular expression record has data in by its eplatform_id. >>> >>>------ >>> >>>Would it be more clear to show you my SQL CREATE statements? >>> >>>-Allen >>> >>> >>> >>>------------------------------------------------------- >>>This sf.net email is sponsored by: See the NEW Palm >>>Tungsten T handheld. Power & Color in a compact size! >>>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en >>>_______________________________________________ >>>Gmod-devel mailing list >>>Gmo...@li... >>>https://lists.sourceforge.net/lists/listinfo/gmod-devel >>> >> > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > To learn the basics of securing your web site with SSL, > click here to get a FREE TRIAL of a Thawte Server Certificate: > http://www.gothawte.com/rd522.html > _______________________________________________ > Gmod-devel mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-devel |
From: Allen D. <all...@uc...> - 2002-11-12 20:33:29
|
here is a document by cjm about the naming conventions. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/schema/chado/modules/naming_conventions.txt?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup i hope we don't decide to change the naming rules too much, i'm already starting to develop based on this schema! -Allen On Tue, 12 Nov 2002, Hilmar Lapp wrote: > Just to add my meanwhile worthless DEM 0.02, I once tried that convention too but discontinued it quickly because it falsely indicates some directionality in the association and people started trying to interpret that and arguing why the direction goes in that way and not the other. What I've been happily using since then is > <table1>_<table2>_assoc. > > Regarding schema conventions, is there a page that lists the one to be used? Has there been a discussion? Is this still being debated? If the latter, is there a page that lists different conventions people have been using to vote on? If deemed useful, I could also also throw in the convention I've been using the last couple of years (which happens to be the one we're using here too :) > > -hilmar > > > -----Original Message----- > > From: Aaron J Mackey [mailto:aj...@vi...] > > Sent: Tuesday, November 12, 2002 11:08 AM > > To: Ken Y. Clark > > Cc: Chris Mungall; Allen Day; gmo...@li...; > > gmo...@li... > > Subject: Re: [GMOD-devel] expression module > > > > > > > > On Tue, 12 Nov 2002, Ken Y. Clark wrote: > > > > > I'd like to chime in on the SQL naming convention for linking tables > > > for just a moment. If the "expression_egroup" table links > > the tables > > > "expression" and "egroup," I'd advocate calling the table > > > "expression_to_egroup" (or "egroup_to_expression" if that makes more > > > sense). > > > > We also use this convention, but we use the "Prince" typology > > "expression2egroup", being not-quite-as-ugly-as > > "expression_to_egroup"; we > > also then have the rule that "2" cannot occur within any other normal > > table name or field. > > > > But U should do what works 4 U. > > > > -Aaron, who still remembers the 80's. > > > > -- > > Aaron J Mackey > > Pearson Laboratory > > University of Virginia > > (434) 924-2821 > > am...@vi... > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: > > To learn the basics of securing your web site with SSL, > > click here to get a FREE TRIAL of a Thawte Server Certificate: > > http://www.gothawte.com/rd522.html > > _______________________________________________ > > Gmod-devel mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > |
From: Hilmar L. <hl...@gn...> - 2002-11-12 20:20:08
|
Just to add my meanwhile worthless DEM 0.02, I once tried that = convention too but discontinued it quickly because it falsely indicates = some directionality in the association and people started trying to = interpret that and arguing why the direction goes in that way and not = the other. What I've been happily using since then is <table1>_<table2>_assoc. Regarding schema conventions, is there a page that lists the one to be = used? Has there been a discussion? Is this still being debated? If the = latter, is there a page that lists different conventions people have = been using to vote on? If deemed useful, I could also also throw in the = convention I've been using the last couple of years (which happens to be = the one we're using here too :) -hilmar > -----Original Message----- > From: Aaron J Mackey [mailto:aj...@vi...] > Sent: Tuesday, November 12, 2002 11:08 AM > To: Ken Y. Clark > Cc: Chris Mungall; Allen Day; gmo...@li...; > gmo...@li... > Subject: Re: [GMOD-devel] expression module >=20 >=20 >=20 > On Tue, 12 Nov 2002, Ken Y. Clark wrote: >=20 > > I'd like to chime in on the SQL naming convention for linking tables > > for just a moment. If the "expression_egroup" table links=20 > the tables > > "expression" and "egroup," I'd advocate calling the table > > "expression_to_egroup" (or "egroup_to_expression" if that makes more > > sense). >=20 > We also use this convention, but we use the "Prince" typology > "expression2egroup", being not-quite-as-ugly-as=20 > "expression_to_egroup"; we > also then have the rule that "2" cannot occur within any other normal > table name or field. >=20 > But U should do what works 4 U. >=20 > -Aaron, who still remembers the 80's. >=20 > --=20 > Aaron J Mackey > Pearson Laboratory > University of Virginia > (434) 924-2821 > am...@vi... >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by:=20 > To learn the basics of securing your web site with SSL,=20 > click here to get a FREE TRIAL of a Thawte Server Certificate:=20 > http://www.gothawte.com/rd522.html > _______________________________________________ > Gmod-devel mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-devel >=20 |
From: Aaron J M. <aj...@vi...> - 2002-11-12 19:07:56
|
On Tue, 12 Nov 2002, Ken Y. Clark wrote: > I'd like to chime in on the SQL naming convention for linking tables > for just a moment. If the "expression_egroup" table links the tables > "expression" and "egroup," I'd advocate calling the table > "expression_to_egroup" (or "egroup_to_expression" if that makes more > sense). We also use this convention, but we use the "Prince" typology "expression2egroup", being not-quite-as-ugly-as "expression_to_egroup"; we also then have the rule that "2" cannot occur within any other normal table name or field. But U should do what works 4 U. -Aaron, who still remembers the 80's. -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 am...@vi... |
From: Ken Y. C. <kc...@lo...> - 2002-11-12 19:02:23
|
On Tue, 12 Nov 2002, Chris Mungall wrote: > Date: Tue, 12 Nov 2002 10:54:01 -0800 (PST) > From: Chris Mungall <cj...@fr...> > To: Allen Day <all...@uc...> > Cc: gmo...@li..., gmo...@li... > Subject: Re: [GMOD-devel] expression module > > Hey Allen, > > I haven't had time to look over these myself yet, but I haven't forgotten > about this - more later.... > > On Thu, 7 Nov 2002, Allen Day wrote: > > > Chris, > > > > I had a look at the ChaDo expression module, and used it to set up a > > database over here. For the next few days, I'll be writing the DBI layer > > that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo > > records. I can post/commit this code and sample data somewhere if people > > are interested to see it. > > > > I had to make some changes to the schema, and was wondering if these > > changes are generically useful enough to augment on to the base expression > > module, or whether I should just keep my changes to myself :). They are: > > > > (1) added an egroup table for tracking groups of related expression > > records. egroup and expression are related using an expression_egroup > > link table. it's conceivable that users will want to group related > > expression records together, and that a single expression record may > > belong to multiple groups. I'd like to chime in on the SQL naming convention for linking tables for just a moment. If the "expression_egroup" table links the tables "expression" and "egroup," I'd advocate calling the table "expression_to_egroup" (or "egroup_to_expression" if that makes more sense). I think that joining the table names on the "_to_" token makes the relationships clearer than just using an underscore as the tables with underscores in their names will start to run together when joined like this with other tables (which also might have underscores). Just my 2c. ky |
From: Allen D. <all...@uc...> - 2002-11-12 18:59:47
|
It's okay, I'm still making changes to the schema over here. WRT the cvterm module, I'm looking for ontologies of human anatomy and brain cancers. Does anyone know where I can find them, or where I might start looking? -Allen On Tue, 12 Nov 2002, Chris Mungall wrote: > Hey Allen, > > I haven't had time to look over these myself yet, but I haven't forgotten > about this - more later.... > > On Thu, 7 Nov 2002, Allen Day wrote: > > > Chris, > > > > I had a look at the ChaDo expression module, and used it to set up a > > database over here. For the next few days, I'll be writing the DBI layer > > that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo > > records. I can post/commit this code and sample data somewhere if people > > are interested to see it. > > > > I had to make some changes to the schema, and was wondering if these > > changes are generically useful enough to augment on to the base expression > > module, or whether I should just keep my changes to myself :). They are: > > > > (1) added an egroup table for tracking groups of related expression > > records. egroup and expression are related using an expression_egroup > > link table. it's conceivable that users will want to group related > > expression records together, and that a single expression record may > > belong to multiple groups. > > > > (2) added an eplatform table, and an eplatform_id FK to expression. i > > plan on throwing all my different expression technology data into the > > expression table, and need a way to keep track of which record comes from > > which platform (just Affymetrix Human U95Av2, U133A, and U133B data right > > now). > > > > (3) made the feature_expression table into a "stack" of tables that > > conform to feature_expression interface from your diagram. The table > > "stack" allows me to store my platform-specific data in seperate tables in > > a common database. for instance, instead of having a feature_expression, > > i have a feature_expression_cel table for my raw Affymetrix measurements, > > and another table called feature_expression_dchip for my normalized > > Affymetrix measurements. it's possible to figure out which table in the > > "stack" a particular expression record has data in by its eplatform_id. > > > > ------ > > > > Would it be more clear to show you my SQL CREATE statements? > > > > -Allen > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: See the NEW Palm > > Tungsten T handheld. Power & Color in a compact size! > > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > > _______________________________________________ > > Gmod-devel mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > |
From: Chris M. <cj...@fr...> - 2002-11-12 18:54:02
|
Hey Allen, I haven't had time to look over these myself yet, but I haven't forgotten about this - more later.... On Thu, 7 Nov 2002, Allen Day wrote: > Chris, > > I had a look at the ChaDo expression module, and used it to set up a > database over here. For the next few days, I'll be writing the DBI layer > that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo > records. I can post/commit this code and sample data somewhere if people > are interested to see it. > > I had to make some changes to the schema, and was wondering if these > changes are generically useful enough to augment on to the base expression > module, or whether I should just keep my changes to myself :). They are: > > (1) added an egroup table for tracking groups of related expression > records. egroup and expression are related using an expression_egroup > link table. it's conceivable that users will want to group related > expression records together, and that a single expression record may > belong to multiple groups. > > (2) added an eplatform table, and an eplatform_id FK to expression. i > plan on throwing all my different expression technology data into the > expression table, and need a way to keep track of which record comes from > which platform (just Affymetrix Human U95Av2, U133A, and U133B data right > now). > > (3) made the feature_expression table into a "stack" of tables that > conform to feature_expression interface from your diagram. The table > "stack" allows me to store my platform-specific data in seperate tables in > a common database. for instance, instead of having a feature_expression, > i have a feature_expression_cel table for my raw Affymetrix measurements, > and another table called feature_expression_dchip for my normalized > Affymetrix measurements. it's possible to figure out which table in the > "stack" a particular expression record has data in by its eplatform_id. > > ------ > > Would it be more clear to show you my SQL CREATE statements? > > -Allen > > > > ------------------------------------------------------- > This sf.net email is sponsored by: See the NEW Palm > Tungsten T handheld. Power & Color in a compact size! > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > _______________________________________________ > Gmod-devel mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-devel > |
From: Chris M. <cj...@fr...> - 2002-11-08 22:46:41
|
On Fri, 8 Nov 2002, Lincoln Stein wrote: > I'm mostly with Colin and Chris on this. However, the bit of information that > is not represented by [start,end] is distinguishing between a feature that is > on both strands versus one that is on a single one. So there needs to be a > field that indicates "stranded" vs "non-stranded." > > On a related thread, the topic of interbase coordinates came up on the > Wormbase mailing list, and Richard Durbin and Ed Griffith weighed in strongly > that interbase coordinates do *not* allow you to specify the strandedness of > an insertion site, such as [28,28]. I am puzzled as to why an insertion site > has strandedness, but they were quite adamant that this is a real issue. I think we need to differentiate strandedness (biological) vs directionality (mathematical) for want of a better distinction. I think you want to record the directionality of an insertion event, since the event corresponds to an extra piece of sequence being added to the genome, and you need to know the directionality to know what the new sequence is. And this is why we need strand! I knew I had it in there for a reason! > Lincoln > > On Friday 08 November 2002 10:50 am, Colin Wiel wrote: > > I don't think storing start and end (without strand) is like putting > > unparsed text into a field, if by unparsed text you mean text that can > > be divided into to more than one field, such as "sim4:na_ARGs.dros". In > > that case, proper database design (third normal form) dictates > > separating the field into its individual entities, namely analysis and > > database. There is still no redundant information being stored, and no > > chance for a data integrity violation. However, in the case of > > start/end/strand, the (start, end) pair already represent the strand > > information. Adding strand is storing redundant information. > > > > Colin > > > > -----Original Message----- > > From: SLe...@ao... [mailto:SLe...@ao...] > > Sent: Thursday, November 07, 2002 5:32 PM > > To: cj...@fr... > > Cc: ls...@cs...; hl...@gn...; ca...@cs...; cw...@lb...; > > gmo...@li... > > Subject: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > > module, sequence... > > > > Its kind of like putting unparsed text into a field. You are loading 3 > > pieces of info into 2 fields in a way that requires extra-schema > > knowledge of conventions, as opposed to making the info explicit. > > > > Cheers, -Stan > > In a message dated 11/7/2002 8:29:48 PM Eastern Standard Time, > > cj...@fr... writes: > > > > > > > > Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > > module, sequence... > > Date: 11/7/2002 8:29:48 PM Eastern Standard Time > > From: cj...@fr... > > To: SLe...@ao... > > CC: ls...@cs..., hl...@gn..., ca...@cs..., cw...@lb..., > > gmo...@li... > > Sent from the Internet > > > > > > > > > > Sorry, I'm not sure why it's a hack > > > > (i removed gmod-devel from the cc list) > > > > On Thu, 7 Nov 2002 SLe...@ao... wrote: > > >We discussed this some in a meeting with TIGR, etc. folk today. I worry > > > > about > > > > >relying on the min <max convention > > >to indicate strand -- it seems rather a hack, as opposed to making > > > > everything > > > > >crystal clear. Why not just have > > >a strand field, with values +, -, both, unspecified? > > > > > >Cheers, -Stan > > > > > >In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, > > > > > >cj...@fr... writes: > > >>Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > > > > module, > > > > >>sequence module > > >> Date: 11/6/2002 9:56:43 PM Eastern Standard Time > > >> From: cj...@fr... > > >> To: ls...@cs... > > >> CC: hl...@gn..., ca...@cs..., cw...@lb..., > > > > <mailto:gmo...@li...> > > > > >>gmo...@li..., gmo...@li... > > >> Sent from the Internet > > >> > > >> > > >> > > >>yep, this is correct, we should remove strand, it was there as a > > > > holdover > > > > >>from when we were using min/max > > >> > > >>as for ds features - hmm, i wonder if we can just punt this into SO. > > >> > > >>i think it's easier to make everything directional, that way > > > > everything > > > > >>can be asked for a sequence. what is the sequence of a ds feature? you > > >>have to additionally specify direction. > > >> > > >>i realise for some viewers it doesn't make sense to show certain > > > > features > > > > >>as being on a particular strand - but this true for features with > > >>directionality too. for instance, P insertions and SNPs shouldn't be > > > > shown > > > > >>in a display as affecting one particular strand, but they do have a > > >>strand. > > >> > > >>On Fri, 1 Nov 2002, Lincoln Stein wrote: > > >>>If you are using interbase coordinates, then there is no reason to > > > > have a > > > > >>>Bioperl-style strand field, is there? This is because interbase > > >> > > >>coordinates > > >> > > >>>unambiguously specify the strand even when there's only one base pair > > > > in > > > > >>>question: > > >>> > > >>> 0 1 2 3 > > >>> g a t > > >>> > > >>>If we're speaking of the "a" on the forward strand, the interbase > > >>>representation is [1,2], whereas if we're speaking of its complement > > > > on > > > > >>the > > >> > > >>>reverse strand, the representation is [2,1] > > >>> > > >>>Strand can be calculated as: pos5 >pos3 ? 1 : -1; > > >>> > > >>>You will, however, need a boolean field that indicates whether the > > > > feature > > > > >>is > > >> > > >>>single-stranded (strand -1 or +1) or double-stranded (strand 0). > > >>> > > >>>Lincoln > > >>> > > >>>On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > > >>>>On Wed, 30 Oct 2002, Hilmar Lapp wrote: > > >>>>>My $0.02 on this is that I have seen unintuitive and cryptic column > > >>>>>names causing as much grief as unintuitive and cryptic API method > > >>>>>names. Intuitive and consistent naming is IMHO a much neglected > > > > art, > > > > >>>>>but its lack is one of the most annoying (because avoidable) > > >>>>>barriers to any piece of API or schema. > > >>>>> > > >>>>>At first glance, neither pos5 nor fnbeg mean much to me. If you > > > > mean > > > > >>>>>5' position, why not say pos_5prime and pos_3prime? > > >>>>> > > >>>>>As for start/end being right or wrong in bioperl, my take on this > > > > is > > > > >>>>>that it depends on your viewpoint and there's no silver bullet that > > >>>>>kills every bird. If your viewpoint is biological, then a feature > > >>>>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, > > >>>>>then it is useful to define that end cannot be smaller than start, > > >>>>>and strand is the tool to map to the biological viewpoint. Bioperl > > >>>>>takes the latter viewpoint, which may be good for some and bad for > > >>>>>others. There's Bio::Coordinate that lets you potentially map > > >>>>>between any two systems. I have to say that some people here have > > >>>>>discovered the bioperl way of defining feature boundaries > > >>>>>independently of bioperl as the most useful one for storing and > > >>>>>searching genome mappings. To me it seems they've all got their > > >>>>>downsides and upsides, and one just needs to settle on one and be > > >>>>>consistent throughout. > > >>>> > > >>>>i meant the naming is wrong, not necessarily the semantics > > >>>> > > >>>>there is two choices of semantics for the two columns > > >>>> > > >>>>either > > >>>> > > >>>>[1] X <= Y > > >>>> > > >>>>or > > >>>> > > >>>>[2] (Y - X) * strand >= 0 > > >>>> > > >>>>(both assuming interbase coordintes) > > >>>> > > >>>>there is no absolute correct choice of what semantics to use - like > > > > you > > > > >>>>say, both have their up and downsides. (there is actually another > > > > choice, > > > > >>>>using offset+length, but i personally don't like this) > > >>>> > > >>>>however, start/end are obviously terrible, awful, confusing choices > > > > of > > > > >>>>attribute *name* for semantics [1], whether you speak biology or > > > > vector > > > > >>>>math or english. there is no debate on this one, sorry. > > >>>> > > >>>>we had already made the choice to go with semantics [2] for chado > > > > (so > > > > >>>>fmin/fmax as column names is not an option). my opinion is this is > > >>>>generally a more useful semantics. eg getting upstream regions. a > > > > lot > > > > >>more > > >> > > >>>>is expressible as simple arithmetic statements without restorting to > > > > ugly > > > > >>>>if/then/case constructs. > > >>>> > > >>>>given semantics 2 we were deciding on the names for X and Y. I think > > > > as > > > > >>>>Dave says having 5 and 3 in the column name is out. I do think it's > > > > ok to > > > > >>>>indicate a mathematical notation of directionality - even though > > > > protein > > > > >>>>features have strands, protein locations are still equivalent to 1-d > > >>>>vectors with directionality. > > >>>> > > >>>>you make a good point about cryptic names. i guess i tend towards > > > > shorter > > > > >>>>names. however, if you come across the name "fnbeg" and say "what's > > >> > > >>that?" > > >> > > >>>>and are forced to the read the documentation then this is a very > > > > good > > > > >>>>thing, as you then learn the semantics - both that these are > > > > interbase > > > > >>and > > >> > > >>>>directional. whereas a cosy familiar name will most likely lead > > > > people to > > > > >>>>assume they know the semantics and then mess up. this is what > > > > happens to > > > > >>>>people learning bioperl all the time. i'm being disingenuous, i > > > > know. i > > > > >>>>guess at the end of the day longer names are better. but then we > > > > have to > > > > >>>>be consistent within chado.... > > >>>> > > >>>>i'm glad i'm not the only one this pedantic about the naming of > > > > these > > > > >>>>things. (I don;t think the sementics issue is at all pedantinc - > > > > where > > > > >>>>possible these things should have a precise computational > > > > definition) > > > > >>>>> -hilmar > > >>>>> > > >>>>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > > >>>>>>I am sending this to the gmod-devel list to get the opinion of the > > >>>>>>larger audience. > > >>>>>> > > >>>>>>I am inclined to agree with Colin about nomenclature, though I do > > > > agree > > > > >>>>>>with you about bioperl's normal/incorrect use of boundaries. > > > > Before > > > > >>>>>>bioperl came along I did it the way you propose; it caused much > > >>>>>>confusion when I changed my schema to correspond to the bioperl > > > > way. > > > > >>>>>>Assuming we use Chris' proposed boundary coordinates, I think > > > > using > > > > >>>>>>check constraints is a good idea. > > >>>>>> > > >>>>>>Other opinions? > > >>>>>> > > >>>>>>Scott > > >>>>>> > > >>>>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > > >>>>>>>I preferred your suggestion of pos5 and pos3, as well as my > > > > suggestion > > > > >>>>>>>of end5 and end3. I don't think a new chado user will figure out > > > > that > > > > >>>>>>>fnbeg stands for "feature natural begin" as easily as they would > > >>>>>>>figure > > >>>>>>>out that pos5 (or end5) is the "position of the 5' end". > > >>>>>>> > > >>>>>>>Colin > > >>>>>>> > > >>>>>>>>-----Original Message----- > > >>>>>>>>From: gmo...@li... > > > > [mailto:gmod-schema- > > > > >>>>>>>>ad...@li...] On Behalf Of Chris Mungall > > >>>>>>>>Sent: Wednesday, October 30, 2002 4:46 PM > > >>>>>>>>To: gmo...@li... > > >>>>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, > > > > sequence > > > > >>>>>>>module > > >>>>>>> > > >>>>>>>>I have reworked the tables in the computational analysis module; > > > > they > > > > >>>>>>>are > > >>>>>>> > > >>>>>>>>now a little less generic than before. there is some docs > > > > included in > > > > >>>>>>>the > > >>>>>>> > > >>>>>>>>.sql - more needed though... > > >>>>>>>> > > >>>>>>>>the multiple alignment part (eg for clustal results) is still > > > > fluid > > > > >>>>>>>>I have also settled on > > >>>>>>>> > > >>>>>>>>fnbeg > > >>>>>>>>fnend > > >>>>>>>> > > >>>>>>>>for specifying coordinates - feature natural begin, feature > > > > natural > > > > >>>>>>>end > > >>>>>>> > > >>>>>>>>ie this is the "real" begin and end > > >>>>>>>> > > >>>>>>>>we should also possibly include a check constraint to make this > > >>>>>>> > > >>>>>>>explicit > > >>>>>>> > > >>>>>>>>eg > > >>>>>>>> > > >>>>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>this is opposed to the normal (erroneous in my opinion) use of > > >>>>>>> > > >>>>>>>start/begin > > >>>>>>> > > >>>>>>>>end/stop, as used in bioperl, where > > >>>>>>>> > > >>>>>>>>start <= end > > >>>>>>>> > > >>>>>>>>ie they actually mean (low, high) > > >>>>>>>> > > >>>>>>>>how do we feel about check constraints? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>------------------------------------------------------- > > >>>>>>>>This sf.net email is sponsored by: Influence the future > > >>>>>>>>of Java(TM) technology. Join the Java Community > > >>>>>>>>Process(SM) (JCP(SM)) program now. > > >>>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>>>>>_______________________________________________ > > >>>>>>>>Gmod-schema mailing list > > >>>>>>>>Gmo...@li... > > >>>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>>>>>> > > >>>>>>>------------------------------------------------------- > > >>>>>>>This sf.net email is sponsored by: Influence the future > > >>>>>>>of Java(TM) technology. Join the Java Community > > >>>>>>>Process(SM) (JCP(SM)) program now. > > >>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>>>>_______________________________________________ > > >>>>>>>Gmod-schema mailing list > > >>>>>>>Gmo...@li... > > >>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>>>>> > > >>>>>>-- > > >>>>>>------------------------------------------------------------------ > > > > ----- > > > > >>>>>>- Scott Cain, Ph. D. > > >>>>>>ca...@cs... > > >>>>>>GMOD Coordinator (http://www.gmod.org/) > > >>>>>>216-392-3087 > > >>>>>>Cold Spring Harbor Laboratory > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>------------------------------------------------------- > > >>>>>>This sf.net email is sponsored by: Influence the future > > >>>>>>of Java(TM) technology. Join the Java Community > > >>>>>>Process(SM) (JCP(SM)) program now. > > >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>>>_______________________________________________ > > >>>>>>Gmod-schema mailing list > > >>>>>>Gmo...@li... > > >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>>>> > > >>>>>-- > > >>>>>------------------------------------------------------------- > > >>>>>Hilmar Lapp email: lapp at gnf.org > > >>>>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >>>>>------------------------------------------------------------- > > >>>> > > >>>>------------------------------------------------------- > > >>>>This sf.net email is sponsored by: Influence the future > > >>>>of Java(TM) technology. Join the Java Community > > >>>>Process(SM) (JCP(SM)) program now. > > >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>_______________________________________________ > > >>>>Gmod-devel mailing list > > >>>>Gmo...@li... > > >>>>https://lists.sourceforge.net/lists/listinfo/gmod-devel > > >> > > >>------------------------------------------------------- > > >>This sf.net email is sponsored by: See the NEW Palm > > >>Tungsten T handheld. Power &Color in a compact size! > > >>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > > >>_______________________________________________ > > >>Gmod-schema mailing list > > >>Gmo...@li... > > >>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > |
From: Lincoln S. <ls...@cs...> - 2002-11-08 22:13:10
|
I'm mostly with Colin and Chris on this. However, the bit of information that is not represented by [start,end] is distinguishing between a feature that is on both strands versus one that is on a single one. So there needs to be a field that indicates "stranded" vs "non-stranded." On a related thread, the topic of interbase coordinates came up on the Wormbase mailing list, and Richard Durbin and Ed Griffith weighed in strongly that interbase coordinates do *not* allow you to specify the strandedness of an insertion site, such as [28,28]. I am puzzled as to why an insertion site has strandedness, but they were quite adamant that this is a real issue. Lincoln On Friday 08 November 2002 10:50 am, Colin Wiel wrote: > I don't think storing start and end (without strand) is like putting > unparsed text into a field, if by unparsed text you mean text that can > be divided into to more than one field, such as "sim4:na_ARGs.dros". In > that case, proper database design (third normal form) dictates > separating the field into its individual entities, namely analysis and > database. There is still no redundant information being stored, and no > chance for a data integrity violation. However, in the case of > start/end/strand, the (start, end) pair already represent the strand > information. Adding strand is storing redundant information. > > Colin > > -----Original Message----- > From: SLe...@ao... [mailto:SLe...@ao...] > Sent: Thursday, November 07, 2002 5:32 PM > To: cj...@fr... > Cc: ls...@cs...; hl...@gn...; ca...@cs...; cw...@lb...; > gmo...@li... > Subject: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > module, sequence... > > Its kind of like putting unparsed text into a field. You are loading 3 > pieces of info into 2 fields in a way that requires extra-schema > knowledge of conventions, as opposed to making the info explicit. > > Cheers, -Stan > In a message dated 11/7/2002 8:29:48 PM Eastern Standard Time, > cj...@fr... writes: > > > > Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > module, sequence... > Date: 11/7/2002 8:29:48 PM Eastern Standard Time > From: cj...@fr... > To: SLe...@ao... > CC: ls...@cs..., hl...@gn..., ca...@cs..., cw...@lb..., > gmo...@li... > Sent from the Internet > > > > > Sorry, I'm not sure why it's a hack > > (i removed gmod-devel from the cc list) > > On Thu, 7 Nov 2002 SLe...@ao... wrote: > >We discussed this some in a meeting with TIGR, etc. folk today. I worry > > about > > >relying on the min <max convention > >to indicate strand -- it seems rather a hack, as opposed to making > > everything > > >crystal clear. Why not just have > >a strand field, with values +, -, both, unspecified? > > > >Cheers, -Stan > > > >In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, > > > >cj...@fr... writes: > >>Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > > module, > > >>sequence module > >> Date: 11/6/2002 9:56:43 PM Eastern Standard Time > >> From: cj...@fr... > >> To: ls...@cs... > >> CC: hl...@gn..., ca...@cs..., cw...@lb..., > > <mailto:gmo...@li...> > > >>gmo...@li..., gmo...@li... > >> Sent from the Internet > >> > >> > >> > >>yep, this is correct, we should remove strand, it was there as a > > holdover > > >>from when we were using min/max > >> > >>as for ds features - hmm, i wonder if we can just punt this into SO. > >> > >>i think it's easier to make everything directional, that way > > everything > > >>can be asked for a sequence. what is the sequence of a ds feature? you > >>have to additionally specify direction. > >> > >>i realise for some viewers it doesn't make sense to show certain > > features > > >>as being on a particular strand - but this true for features with > >>directionality too. for instance, P insertions and SNPs shouldn't be > > shown > > >>in a display as affecting one particular strand, but they do have a > >>strand. > >> > >>On Fri, 1 Nov 2002, Lincoln Stein wrote: > >>>If you are using interbase coordinates, then there is no reason to > > have a > > >>>Bioperl-style strand field, is there? This is because interbase > >> > >>coordinates > >> > >>>unambiguously specify the strand even when there's only one base pair > > in > > >>>question: > >>> > >>> 0 1 2 3 > >>> g a t > >>> > >>>If we're speaking of the "a" on the forward strand, the interbase > >>>representation is [1,2], whereas if we're speaking of its complement > > on > > >>the > >> > >>>reverse strand, the representation is [2,1] > >>> > >>>Strand can be calculated as: pos5 >pos3 ? 1 : -1; > >>> > >>>You will, however, need a boolean field that indicates whether the > > feature > > >>is > >> > >>>single-stranded (strand -1 or +1) or double-stranded (strand 0). > >>> > >>>Lincoln > >>> > >>>On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > >>>>On Wed, 30 Oct 2002, Hilmar Lapp wrote: > >>>>>My $0.02 on this is that I have seen unintuitive and cryptic column > >>>>>names causing as much grief as unintuitive and cryptic API method > >>>>>names. Intuitive and consistent naming is IMHO a much neglected > > art, > > >>>>>but its lack is one of the most annoying (because avoidable) > >>>>>barriers to any piece of API or schema. > >>>>> > >>>>>At first glance, neither pos5 nor fnbeg mean much to me. If you > > mean > > >>>>>5' position, why not say pos_5prime and pos_3prime? > >>>>> > >>>>>As for start/end being right or wrong in bioperl, my take on this > > is > > >>>>>that it depends on your viewpoint and there's no silver bullet that > >>>>>kills every bird. If your viewpoint is biological, then a feature > >>>>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, > >>>>>then it is useful to define that end cannot be smaller than start, > >>>>>and strand is the tool to map to the biological viewpoint. Bioperl > >>>>>takes the latter viewpoint, which may be good for some and bad for > >>>>>others. There's Bio::Coordinate that lets you potentially map > >>>>>between any two systems. I have to say that some people here have > >>>>>discovered the bioperl way of defining feature boundaries > >>>>>independently of bioperl as the most useful one for storing and > >>>>>searching genome mappings. To me it seems they've all got their > >>>>>downsides and upsides, and one just needs to settle on one and be > >>>>>consistent throughout. > >>>> > >>>>i meant the naming is wrong, not necessarily the semantics > >>>> > >>>>there is two choices of semantics for the two columns > >>>> > >>>>either > >>>> > >>>>[1] X <= Y > >>>> > >>>>or > >>>> > >>>>[2] (Y - X) * strand >= 0 > >>>> > >>>>(both assuming interbase coordintes) > >>>> > >>>>there is no absolute correct choice of what semantics to use - like > > you > > >>>>say, both have their up and downsides. (there is actually another > > choice, > > >>>>using offset+length, but i personally don't like this) > >>>> > >>>>however, start/end are obviously terrible, awful, confusing choices > > of > > >>>>attribute *name* for semantics [1], whether you speak biology or > > vector > > >>>>math or english. there is no debate on this one, sorry. > >>>> > >>>>we had already made the choice to go with semantics [2] for chado > > (so > > >>>>fmin/fmax as column names is not an option). my opinion is this is > >>>>generally a more useful semantics. eg getting upstream regions. a > > lot > > >>more > >> > >>>>is expressible as simple arithmetic statements without restorting to > > ugly > > >>>>if/then/case constructs. > >>>> > >>>>given semantics 2 we were deciding on the names for X and Y. I think > > as > > >>>>Dave says having 5 and 3 in the column name is out. I do think it's > > ok to > > >>>>indicate a mathematical notation of directionality - even though > > protein > > >>>>features have strands, protein locations are still equivalent to 1-d > >>>>vectors with directionality. > >>>> > >>>>you make a good point about cryptic names. i guess i tend towards > > shorter > > >>>>names. however, if you come across the name "fnbeg" and say "what's > >> > >>that?" > >> > >>>>and are forced to the read the documentation then this is a very > > good > > >>>>thing, as you then learn the semantics - both that these are > > interbase > > >>and > >> > >>>>directional. whereas a cosy familiar name will most likely lead > > people to > > >>>>assume they know the semantics and then mess up. this is what > > happens to > > >>>>people learning bioperl all the time. i'm being disingenuous, i > > know. i > > >>>>guess at the end of the day longer names are better. but then we > > have to > > >>>>be consistent within chado.... > >>>> > >>>>i'm glad i'm not the only one this pedantic about the naming of > > these > > >>>>things. (I don;t think the sementics issue is at all pedantinc - > > where > > >>>>possible these things should have a precise computational > > definition) > > >>>>> -hilmar > >>>>> > >>>>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > >>>>>>I am sending this to the gmod-devel list to get the opinion of the > >>>>>>larger audience. > >>>>>> > >>>>>>I am inclined to agree with Colin about nomenclature, though I do > > agree > > >>>>>>with you about bioperl's normal/incorrect use of boundaries. > > Before > > >>>>>>bioperl came along I did it the way you propose; it caused much > >>>>>>confusion when I changed my schema to correspond to the bioperl > > way. > > >>>>>>Assuming we use Chris' proposed boundary coordinates, I think > > using > > >>>>>>check constraints is a good idea. > >>>>>> > >>>>>>Other opinions? > >>>>>> > >>>>>>Scott > >>>>>> > >>>>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > >>>>>>>I preferred your suggestion of pos5 and pos3, as well as my > > suggestion > > >>>>>>>of end5 and end3. I don't think a new chado user will figure out > > that > > >>>>>>>fnbeg stands for "feature natural begin" as easily as they would > >>>>>>>figure > >>>>>>>out that pos5 (or end5) is the "position of the 5' end". > >>>>>>> > >>>>>>>Colin > >>>>>>> > >>>>>>>>-----Original Message----- > >>>>>>>>From: gmo...@li... > > [mailto:gmod-schema- > > >>>>>>>>ad...@li...] On Behalf Of Chris Mungall > >>>>>>>>Sent: Wednesday, October 30, 2002 4:46 PM > >>>>>>>>To: gmo...@li... > >>>>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, > > sequence > > >>>>>>>module > >>>>>>> > >>>>>>>>I have reworked the tables in the computational analysis module; > > they > > >>>>>>>are > >>>>>>> > >>>>>>>>now a little less generic than before. there is some docs > > included in > > >>>>>>>the > >>>>>>> > >>>>>>>>.sql - more needed though... > >>>>>>>> > >>>>>>>>the multiple alignment part (eg for clustal results) is still > > fluid > > >>>>>>>>I have also settled on > >>>>>>>> > >>>>>>>>fnbeg > >>>>>>>>fnend > >>>>>>>> > >>>>>>>>for specifying coordinates - feature natural begin, feature > > natural > > >>>>>>>end > >>>>>>> > >>>>>>>>ie this is the "real" begin and end > >>>>>>>> > >>>>>>>>we should also possibly include a check constraint to make this > >>>>>>> > >>>>>>>explicit > >>>>>>> > >>>>>>>>eg > >>>>>>>> > >>>>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > >>>>>>>> > >>>>>>>> > >>>>>>>>this is opposed to the normal (erroneous in my opinion) use of > >>>>>>> > >>>>>>>start/begin > >>>>>>> > >>>>>>>>end/stop, as used in bioperl, where > >>>>>>>> > >>>>>>>>start <= end > >>>>>>>> > >>>>>>>>ie they actually mean (low, high) > >>>>>>>> > >>>>>>>>how do we feel about check constraints? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>------------------------------------------------------- > >>>>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>>>of Java(TM) technology. Join the Java Community > >>>>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>>>_______________________________________________ > >>>>>>>>Gmod-schema mailing list > >>>>>>>>Gmo...@li... > >>>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>>>> > >>>>>>>------------------------------------------------------- > >>>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>>of Java(TM) technology. Join the Java Community > >>>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>>_______________________________________________ > >>>>>>>Gmod-schema mailing list > >>>>>>>Gmo...@li... > >>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>>> > >>>>>>-- > >>>>>>------------------------------------------------------------------ > > ----- > > >>>>>>- Scott Cain, Ph. D. > >>>>>>ca...@cs... > >>>>>>GMOD Coordinator (http://www.gmod.org/) > >>>>>>216-392-3087 > >>>>>>Cold Spring Harbor Laboratory > >>>>>> > >>>>>> > >>>>>> > >>>>>>------------------------------------------------------- > >>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>of Java(TM) technology. Join the Java Community > >>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>_______________________________________________ > >>>>>>Gmod-schema mailing list > >>>>>>Gmo...@li... > >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>> > >>>>>-- > >>>>>------------------------------------------------------------- > >>>>>Hilmar Lapp email: lapp at gnf.org > >>>>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >>>>>------------------------------------------------------------- > >>>> > >>>>------------------------------------------------------- > >>>>This sf.net email is sponsored by: Influence the future > >>>>of Java(TM) technology. Join the Java Community > >>>>Process(SM) (JCP(SM)) program now. > >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>_______________________________________________ > >>>>Gmod-devel mailing list > >>>>Gmo...@li... > >>>>https://lists.sourceforge.net/lists/listinfo/gmod-devel > >> > >>------------------------------------------------------- > >>This sf.net email is sponsored by: See the NEW Palm > >>Tungsten T handheld. Power &Color in a compact size! > >>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > >>_______________________________________________ > >>Gmod-schema mailing list > >>Gmo...@li... > >>https://lists.sourceforge.net/lists/listinfo/gmod-schema -- Lincoln Stein ls...@cs... |
From: Colin W. <cw...@fr...> - 2002-11-08 15:54:12
|
I don't think storing start and end (without strand) is like putting unparsed text into a field, if by unparsed text you mean text that can be divided into to more than one field, such as "sim4:na_ARGs.dros". In that case, proper database design (third normal form) dictates separating the field into its individual entities, namely analysis and database. There is still no redundant information being stored, and no chance for a data integrity violation. However, in the case of start/end/strand, the (start, end) pair already represent the strand information. Adding strand is storing redundant information. Colin -----Original Message----- From: SLe...@ao... [mailto:SLe...@ao...] Sent: Thursday, November 07, 2002 5:32 PM To: cj...@fr... Cc: ls...@cs...; hl...@gn...; ca...@cs...; cw...@lb...; gmo...@li... Subject: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, sequence... Its kind of like putting unparsed text into a field. You are loading 3 pieces of info into 2 fields in a way that requires extra-schema knowledge of conventions, as opposed to making the info explicit. Cheers, -Stan In a message dated 11/7/2002 8:29:48 PM Eastern Standard Time, cj...@fr... writes: Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, sequence... Date: 11/7/2002 8:29:48 PM Eastern Standard Time From: cj...@fr... To: SLe...@ao... CC: ls...@cs..., hl...@gn..., ca...@cs..., cw...@lb..., gmo...@li... Sent from the Internet Sorry, I'm not sure why it's a hack (i removed gmod-devel from the cc list) On Thu, 7 Nov 2002 SLe...@ao... wrote: >We discussed this some in a meeting with TIGR, etc. folk today. I worry about >relying on the min <max convention >to indicate strand -- it seems rather a hack, as opposed to making everything >crystal clear. Why not just have >a strand field, with values +, -, both, unspecified? > >Cheers, -Stan > >In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, >cj...@fr... writes: > >>Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, >>sequence module >> Date: 11/6/2002 9:56:43 PM Eastern Standard Time >> From: cj...@fr... >> To: ls...@cs... >> CC: hl...@gn..., ca...@cs..., cw...@lb..., <mailto:gmo...@li...> >>gmo...@li..., gmo...@li... >> Sent from the Internet >> >> >> >>yep, this is correct, we should remove strand, it was there as a holdover >>from when we were using min/max >> >>as for ds features - hmm, i wonder if we can just punt this into SO. >> >>i think it's easier to make everything directional, that way everything >>can be asked for a sequence. what is the sequence of a ds feature? you >>have to additionally specify direction. >> >>i realise for some viewers it doesn't make sense to show certain features >>as being on a particular strand - but this true for features with >>directionality too. for instance, P insertions and SNPs shouldn't be shown >>in a display as affecting one particular strand, but they do have a >>strand. >> >>On Fri, 1 Nov 2002, Lincoln Stein wrote: >> >>>If you are using interbase coordinates, then there is no reason to have a >>>Bioperl-style strand field, is there? This is because interbase >>coordinates >>>unambiguously specify the strand even when there's only one base pair in >>>question: >>> >>> 0 1 2 3 >>> g a t >>> >>>If we're speaking of the "a" on the forward strand, the interbase >>>representation is [1,2], whereas if we're speaking of its complement on >>the >>>reverse strand, the representation is [2,1] >>> >>>Strand can be calculated as: pos5 >pos3 ? 1 : -1; >>> >>>You will, however, need a boolean field that indicates whether the feature >>is >>>single-stranded (strand -1 or +1) or double-stranded (strand 0). >>> >>>Lincoln >>> >>>On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: >>>>On Wed, 30 Oct 2002, Hilmar Lapp wrote: >>>>>My $0.02 on this is that I have seen unintuitive and cryptic column >>>>>names causing as much grief as unintuitive and cryptic API method >>>>>names. Intuitive and consistent naming is IMHO a much neglected art, >>>>>but its lack is one of the most annoying (because avoidable) >>>>>barriers to any piece of API or schema. >>>>> >>>>>At first glance, neither pos5 nor fnbeg mean much to me. If you mean >>>>>5' position, why not say pos_5prime and pos_3prime? >>>>> >>>>>As for start/end being right or wrong in bioperl, my take on this is >>>>>that it depends on your viewpoint and there's no silver bullet that >>>>>kills every bird. If your viewpoint is biological, then a feature >>>>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, >>>>>then it is useful to define that end cannot be smaller than start, >>>>>and strand is the tool to map to the biological viewpoint. Bioperl >>>>>takes the latter viewpoint, which may be good for some and bad for >>>>>others. There's Bio::Coordinate that lets you potentially map >>>>>between any two systems. I have to say that some people here have >>>>>discovered the bioperl way of defining feature boundaries >>>>>independently of bioperl as the most useful one for storing and >>>>>searching genome mappings. To me it seems they've all got their >>>>>downsides and upsides, and one just needs to settle on one and be >>>>>consistent throughout. >>>> >>>>i meant the naming is wrong, not necessarily the semantics >>>> >>>>there is two choices of semantics for the two columns >>>> >>>>either >>>> >>>>[1] X <= Y >>>> >>>>or >>>> >>>>[2] (Y - X) * strand >= 0 >>>> >>>>(both assuming interbase coordintes) >>>> >>>>there is no absolute correct choice of what semantics to use - like you >>>>say, both have their up and downsides. (there is actually another choice, >>>>using offset+length, but i personally don't like this) >>>> >>>>however, start/end are obviously terrible, awful, confusing choices of >>>>attribute *name* for semantics [1], whether you speak biology or vector >>>>math or english. there is no debate on this one, sorry. >>>> >>>>we had already made the choice to go with semantics [2] for chado (so >>>>fmin/fmax as column names is not an option). my opinion is this is >>>>generally a more useful semantics. eg getting upstream regions. a lot >>more >>>>is expressible as simple arithmetic statements without restorting to ugly >>>>if/then/case constructs. >>>> >>>>given semantics 2 we were deciding on the names for X and Y. I think as >>>>Dave says having 5 and 3 in the column name is out. I do think it's ok to >>>>indicate a mathematical notation of directionality - even though protein >>>>features have strands, protein locations are still equivalent to 1-d >>>>vectors with directionality. >>>> >>>>you make a good point about cryptic names. i guess i tend towards shorter >>>>names. however, if you come across the name "fnbeg" and say "what's >>that?" >>>>and are forced to the read the documentation then this is a very good >>>>thing, as you then learn the semantics - both that these are interbase >>and >>>>directional. whereas a cosy familiar name will most likely lead people to >>>>assume they know the semantics and then mess up. this is what happens to >>>>people learning bioperl all the time. i'm being disingenuous, i know. i >>>>guess at the end of the day longer names are better. but then we have to >>>>be consistent within chado.... >>>> >>>>i'm glad i'm not the only one this pedantic about the naming of these >>>>things. (I don;t think the sementics issue is at all pedantinc - where >>>>possible these things should have a precise computational definition) >>>> >>>>> -hilmar >>>>> >>>>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: >>>>>>I am sending this to the gmod-devel list to get the opinion of the >>>>>>larger audience. >>>>>> >>>>>>I am inclined to agree with Colin about nomenclature, though I do agree >>>>>>with you about bioperl's normal/incorrect use of boundaries. Before >>>>>>bioperl came along I did it the way you propose; it caused much >>>>>>confusion when I changed my schema to correspond to the bioperl way. >>>>>>Assuming we use Chris' proposed boundary coordinates, I think using >>>>>>check constraints is a good idea. >>>>>> >>>>>>Other opinions? >>>>>> >>>>>>Scott >>>>>> >>>>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: >>>>>>>I preferred your suggestion of pos5 and pos3, as well as my suggestion >>>>>>>of end5 and end3. I don't think a new chado user will figure out that >>>>>>>fnbeg stands for "feature natural begin" as easily as they would >>>>>>>figure >>>>>>>out that pos5 (or end5) is the "position of the 5' end". >>>>>>> >>>>>>>Colin >>>>>>> >>>>>>>>-----Original Message----- >>>>>>>>From: gmo...@li... [mailto:gmod-schema- >>>>>>>>ad...@li...] On Behalf Of Chris Mungall >>>>>>>>Sent: Wednesday, October 30, 2002 4:46 PM >>>>>>>>To: gmo...@li... >>>>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, sequence >>>>>>> >>>>>>>module >>>>>>> >>>>>>>>I have reworked the tables in the computational analysis module; they >>>>>>> >>>>>>>are >>>>>>> >>>>>>>>now a little less generic than before. there is some docs included in >>>>>>> >>>>>>>the >>>>>>> >>>>>>>>.sql - more needed though... >>>>>>>> >>>>>>>>the multiple alignment part (eg for clustal results) is still fluid >>>>>>>> >>>>>>>>I have also settled on >>>>>>>> >>>>>>>>fnbeg >>>>>>>>fnend >>>>>>>> >>>>>>>>for specifying coordinates - feature natural begin, feature natural >>>>>>> >>>>>>>end >>>>>>> >>>>>>>>ie this is the "real" begin and end >>>>>>>> >>>>>>>>we should also possibly include a check constraint to make this >>>>>>> >>>>>>>explicit >>>>>>> >>>>>>>>eg >>>>>>>> >>>>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 >>>>>>>> >>>>>>>> >>>>>>>>this is opposed to the normal (erroneous in my opinion) use of >>>>>>> >>>>>>>start/begin >>>>>>> >>>>>>>>end/stop, as used in bioperl, where >>>>>>>> >>>>>>>>start <= end >>>>>>>> >>>>>>>>ie they actually mean (low, high) >>>>>>>> >>>>>>>>how do we feel about check constraints? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>------------------------------------------------------- >>>>>>>>This sf.net email is sponsored by: Influence the future >>>>>>>>of Java(TM) technology. Join the Java Community >>>>>>>>Process(SM) (JCP(SM)) program now. >>>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en >>>>>>>>_______________________________________________ >>>>>>>>Gmod-schema mailing list >>>>>>>>Gmo...@li... >>>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>> >>>>>>>------------------------------------------------------- >>>>>>>This sf.net email is sponsored by: Influence the future >>>>>>>of Java(TM) technology. Join the Java Community >>>>>>>Process(SM) (JCP(SM)) program now. >>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en >>>>>>>_______________________________________________ >>>>>>>Gmod-schema mailing list >>>>>>>Gmo...@li... >>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>> >>>>>>-- >>>>>>------------------------------------------------------------------ ----- >>>>>>- Scott Cain, Ph. D. >>>>>>ca...@cs... >>>>>>GMOD Coordinator (http://www.gmod.org/) >>>>>>216-392-3087 >>>>>>Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>>------------------------------------------------------- >>>>>>This sf.net email is sponsored by: Influence the future >>>>>>of Java(TM) technology. Join the Java Community >>>>>>Process(SM) (JCP(SM)) program now. >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en >>>>>>_______________________________________________ >>>>>>Gmod-schema mailing list >>>>>>Gmo...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>> >>>>>-- >>>>>------------------------------------------------------------- >>>>>Hilmar Lapp email: lapp at gnf.org >>>>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >>>>>------------------------------------------------------------- >>>> >>>>------------------------------------------------------- >>>>This sf.net email is sponsored by: Influence the future >>>>of Java(TM) technology. Join the Java Community >>>>Process(SM) (JCP(SM)) program now. >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en >>>>_______________________________________________ >>>>Gmod-devel mailing list >>>>Gmo...@li... >>>>https://lists.sourceforge.net/lists/listinfo/gmod-devel >>> >>> >> >> >> >>------------------------------------------------------- >>This sf.net email is sponsored by: See the NEW Palm >>Tungsten T handheld. Power &Color in a compact size! >>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en >>_______________________________________________ >>Gmod-schema mailing list >>Gmo...@li... >>https://lists.sourceforge.net/lists/listinfo/gmod-schema >> > > |
From: <SLe...@ao...> - 2002-11-08 01:33:06
|
Its kind of like putting unparsed text into a field. You are loading 3 pieces of info into 2 fields in a way that requires extra-schema knowledge of conventions, as opposed to making the info explicit. Cheers, -Stan In a message dated 11/7/2002 8:29:48 PM Eastern Standard Time, cj...@fr... writes: > Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, > sequence... > Date: 11/7/2002 8:29:48 PM Eastern Standard Time > From: <A HREF="mailto:cj...@fr...">cj...@fr...</A> > To: <A HREF="mailto:SLe...@ao...">SLe...@ao...</A> > CC: <A HREF="mailto:ls...@cs...">ls...@cs...</A>, <A HREF="mailto:hl...@gn...">hl...@gn...</A>, <A HREF="mailto:ca...@cs...">ca...@cs...</A>, <A HREF="mailto:cw...@lb...">cw...@lb...</A>, <A HREF="mailto:gmo...@li..."> > gmo...@li...</A> > Sent from the Internet > > > > > Sorry, I'm not sure why it's a hack > > (i removed gmod-devel from the cc list) > > On Thu, 7 Nov 2002 SLe...@ao... wrote: > > >We discussed this some in a meeting with TIGR, etc. folk today. I worry > about > >relying on the min <max convention > >to indicate strand -- it seems rather a hack, as opposed to making > everything > >crystal clear. Why not just have > >a strand field, with values +, -, both, unspecified? > > > >Cheers, -Stan > > > >In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, > >cj...@fr... writes: > > > >>Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis > module, > >>sequence module > >> Date: 11/6/2002 9:56:43 PM Eastern Standard Time > >> From: <A HREF="mailto:cj...@fr...">cj...@fr...</A> > >> To: <A HREF="mailto:ls...@cs...">ls...@cs...</A> > >> CC: <A HREF="mailto:hl...@gn...">hl...@gn...</A>, <A HREF="mailto:ca...@cs...">ca...@cs...</A>, <A HREF="mailto:cw...@lb...">cw...@lb...</A>, <A HREF="mailto:gmo...@li..."> > >>gmo...@li...</A>, <A HREF="mailto:gmo...@li...">gmo...@li...</A> > >> Sent from the Internet > >> > >> > >> > >>yep, this is correct, we should remove strand, it was there as a holdover > >>from when we were using min/max > >> > >>as for ds features - hmm, i wonder if we can just punt this into SO. > >> > >>i think it's easier to make everything directional, that way everything > >>can be asked for a sequence. what is the sequence of a ds feature? you > >>have to additionally specify direction. > >> > >>i realise for some viewers it doesn't make sense to show certain features > >>as being on a particular strand - but this true for features with > >>directionality too. for instance, P insertions and SNPs shouldn't be > shown > >>in a display as affecting one particular strand, but they do have a > >>strand. > >> > >>On Fri, 1 Nov 2002, Lincoln Stein wrote: > >> > >>>If you are using interbase coordinates, then there is no reason to have > a > >>>Bioperl-style strand field, is there? This is because interbase > >>coordinates > >>>unambiguously specify the strand even when there's only one base pair in > >>>question: > >>> > >>> 0 1 2 3 > >>> g a t > >>> > >>>If we're speaking of the "a" on the forward strand, the interbase > >>>representation is [1,2], whereas if we're speaking of its complement on > >>the > >>>reverse strand, the representation is [2,1] > >>> > >>>Strand can be calculated as: pos5 >pos3 ? 1 : -1; > >>> > >>>You will, however, need a boolean field that indicates whether the > feature > >>is > >>>single-stranded (strand -1 or +1) or double-stranded (strand 0). > >>> > >>>Lincoln > >>> > >>>On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > >>>>On Wed, 30 Oct 2002, Hilmar Lapp wrote: > >>>>>My $0.02 on this is that I have seen unintuitive and cryptic column > >>>>>names causing as much grief as unintuitive and cryptic API method > >>>>>names. Intuitive and consistent naming is IMHO a much neglected art, > >>>>>but its lack is one of the most annoying (because avoidable) > >>>>>barriers to any piece of API or schema. > >>>>> > >>>>>At first glance, neither pos5 nor fnbeg mean much to me. If you mean > >>>>>5' position, why not say pos_5prime and pos_3prime? > >>>>> > >>>>>As for start/end being right or wrong in bioperl, my take on this is > >>>>>that it depends on your viewpoint and there's no silver bullet that > >>>>>kills every bird. If your viewpoint is biological, then a feature > >>>>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, > >>>>>then it is useful to define that end cannot be smaller than start, > >>>>>and strand is the tool to map to the biological viewpoint. Bioperl > >>>>>takes the latter viewpoint, which may be good for some and bad for > >>>>>others. There's Bio::Coordinate that lets you potentially map > >>>>>between any two systems. I have to say that some people here have > >>>>>discovered the bioperl way of defining feature boundaries > >>>>>independently of bioperl as the most useful one for storing and > >>>>>searching genome mappings. To me it seems they've all got their > >>>>>downsides and upsides, and one just needs to settle on one and be > >>>>>consistent throughout. > >>>> > >>>>i meant the naming is wrong, not necessarily the semantics > >>>> > >>>>there is two choices of semantics for the two columns > >>>> > >>>>either > >>>> > >>>>[1] X <= Y > >>>> > >>>>or > >>>> > >>>>[2] (Y - X) * strand >= 0 > >>>> > >>>>(both assuming interbase coordintes) > >>>> > >>>>there is no absolute correct choice of what semantics to use - like you > >>>>say, both have their up and downsides. (there is actually another > choice, > >>>>using offset+length, but i personally don't like this) > >>>> > >>>>however, start/end are obviously terrible, awful, confusing choices of > >>>>attribute *name* for semantics [1], whether you speak biology or vector > >>>>math or english. there is no debate on this one, sorry. > >>>> > >>>>we had already made the choice to go with semantics [2] for chado (so > >>>>fmin/fmax as column names is not an option). my opinion is this is > >>>>generally a more useful semantics. eg getting upstream regions. a lot > >>more > >>>>is expressible as simple arithmetic statements without restorting to > ugly > >>>>if/then/case constructs. > >>>> > >>>>given semantics 2 we were deciding on the names for X and Y. I think as > >>>>Dave says having 5 and 3 in the column name is out. I do think it's ok > to > >>>>indicate a mathematical notation of directionality - even though > protein > >>>>features have strands, protein locations are still equivalent to 1-d > >>>>vectors with directionality. > >>>> > >>>>you make a good point about cryptic names. i guess i tend towards > shorter > >>>>names. however, if you come across the name "fnbeg" and say "what's > >>that?" > >>>>and are forced to the read the documentation then this is a very good > >>>>thing, as you then learn the semantics - both that these are interbase > >>and > >>>>directional. whereas a cosy familiar name will most likely lead people > to > >>>>assume they know the semantics and then mess up. this is what happens > to > >>>>people learning bioperl all the time. i'm being disingenuous, i know. i > >>>>guess at the end of the day longer names are better. but then we have > to > >>>>be consistent within chado.... > >>>> > >>>>i'm glad i'm not the only one this pedantic about the naming of these > >>>>things. (I don;t think the sementics issue is at all pedantinc - where > >>>>possible these things should have a precise computational definition) > >>>> > >>>>> -hilmar > >>>>> > >>>>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > >>>>>>I am sending this to the gmod-devel list to get the opinion of the > >>>>>>larger audience. > >>>>>> > >>>>>>I am inclined to agree with Colin about nomenclature, though I do > agree > >>>>>>with you about bioperl's normal/incorrect use of boundaries. Before > >>>>>>bioperl came along I did it the way you propose; it caused much > >>>>>>confusion when I changed my schema to correspond to the bioperl way. > >>>>>>Assuming we use Chris' proposed boundary coordinates, I think using > >>>>>>check constraints is a good idea. > >>>>>> > >>>>>>Other opinions? > >>>>>> > >>>>>>Scott > >>>>>> > >>>>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > >>>>>>>I preferred your suggestion of pos5 and pos3, as well as my > suggestion > >>>>>>>of end5 and end3. I don't think a new chado user will figure out > that > >>>>>>>fnbeg stands for "feature natural begin" as easily as they would > >>>>>>>figure > >>>>>>>out that pos5 (or end5) is the "position of the 5' end". > >>>>>>> > >>>>>>>Colin > >>>>>>> > >>>>>>>>-----Original Message----- > >>>>>>>>From: gmo...@li... [mailto:gmod-schema- > >>>>>>>>ad...@li...] On Behalf Of Chris Mungall > >>>>>>>>Sent: Wednesday, October 30, 2002 4:46 PM > >>>>>>>>To: gmo...@li... > >>>>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, sequence > >>>>>>> > >>>>>>>module > >>>>>>> > >>>>>>>>I have reworked the tables in the computational analysis module; > they > >>>>>>> > >>>>>>>are > >>>>>>> > >>>>>>>>now a little less generic than before. there is some docs included > in > >>>>>>> > >>>>>>>the > >>>>>>> > >>>>>>>>.sql - more needed though... > >>>>>>>> > >>>>>>>>the multiple alignment part (eg for clustal results) is still fluid > >>>>>>>> > >>>>>>>>I have also settled on > >>>>>>>> > >>>>>>>>fnbeg > >>>>>>>>fnend > >>>>>>>> > >>>>>>>>for specifying coordinates - feature natural begin, feature natural > >>>>>>> > >>>>>>>end > >>>>>>> > >>>>>>>>ie this is the "real" begin and end > >>>>>>>> > >>>>>>>>we should also possibly include a check constraint to make this > >>>>>>> > >>>>>>>explicit > >>>>>>> > >>>>>>>>eg > >>>>>>>> > >>>>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > >>>>>>>> > >>>>>>>> > >>>>>>>>this is opposed to the normal (erroneous in my opinion) use of > >>>>>>> > >>>>>>>start/begin > >>>>>>> > >>>>>>>>end/stop, as used in bioperl, where > >>>>>>>> > >>>>>>>>start <= end > >>>>>>>> > >>>>>>>>ie they actually mean (low, high) > >>>>>>>> > >>>>>>>>how do we feel about check constraints? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>------------------------------------------------------- > >>>>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>>>of Java(TM) technology. Join the Java Community > >>>>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>>>_______________________________________________ > >>>>>>>>Gmod-schema mailing list > >>>>>>>>Gmo...@li... > >>>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>>>> > >>>>>>>------------------------------------------------------- > >>>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>>of Java(TM) technology. Join the Java Community > >>>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>>_______________________________________________ > >>>>>>>Gmod-schema mailing list > >>>>>>>Gmo...@li... > >>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>>> > >>>>>>-- > >>>>>> > ----------------------------------------------------------------------- > >>>>>>- Scott Cain, Ph. D. > >>>>>>ca...@cs... > >>>>>>GMOD Coordinator (http://www.gmod.org/) > >>>>>>216-392-3087 > >>>>>>Cold Spring Harbor Laboratory > >>>>>> > >>>>>> > >>>>>> > >>>>>>------------------------------------------------------- > >>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>of Java(TM) technology. Join the Java Community > >>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>_______________________________________________ > >>>>>>Gmod-schema mailing list > >>>>>>Gmo...@li... > >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>> > >>>>>-- > >>>>>------------------------------------------------------------- > >>>>>Hilmar Lapp email: lapp at gnf.org > >>>>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >>>>>------------------------------------------------------------- > >>>> > >>>>------------------------------------------------------- > >>>>This sf.net email is sponsored by: Influence the future > >>>>of Java(TM) technology. Join the Java Community > >>>>Process(SM) (JCP(SM)) program now. > >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>_______________________________________________ > >>>>Gmod-devel mailing list > >>>>Gmo...@li... > >>>>https://lists.sourceforge.net/lists/listinfo/gmod-devel > >>> > >>> > >> > >> > >> > >>------------------------------------------------------- > >>This sf.net email is sponsored by: See the NEW Palm > >>Tungsten T handheld. Power &Color in a compact size! > >>http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > >>_______________________________________________ > >>Gmod-schema mailing list > >>Gmo...@li... > >>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >> > > > > > > |
From: Chris M. <cj...@fr...> - 2002-11-08 01:28:57
|
Sorry, I'm not sure why it's a hack (i removed gmod-devel from the cc list) On Thu, 7 Nov 2002 SLe...@ao... wrote: > We discussed this some in a meeting with TIGR, etc. folk today. I worry about > relying on the min < max convention > to indicate strand -- it seems rather a hack, as opposed to making everything > crystal clear. Why not just have > a strand field, with values +, -, both, unspecified? > > Cheers, -Stan > > In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, > cj...@fr... writes: > > > Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, > > sequence module > > Date: 11/6/2002 9:56:43 PM Eastern Standard Time > > From: <A HREF="mailto:cj...@fr...">cj...@fr...</A> > > To: <A HREF="mailto:ls...@cs...">ls...@cs...</A> > > CC: <A HREF="mailto:hl...@gn...">hl...@gn...</A>, <A HREF="mailto:ca...@cs...">ca...@cs...</A>, <A HREF="mailto:cw...@lb...">cw...@lb...</A>, <A HREF="mailto:gmo...@li..."> > > gmo...@li...</A>, <A HREF="mailto:gmo...@li...">gmo...@li...</A> > > Sent from the Internet > > > > > > > > yep, this is correct, we should remove strand, it was there as a holdover > > from when we were using min/max > > > > as for ds features - hmm, i wonder if we can just punt this into SO. > > > > i think it's easier to make everything directional, that way everything > > can be asked for a sequence. what is the sequence of a ds feature? you > > have to additionally specify direction. > > > > i realise for some viewers it doesn't make sense to show certain features > > as being on a particular strand - but this true for features with > > directionality too. for instance, P insertions and SNPs shouldn't be shown > > in a display as affecting one particular strand, but they do have a > > strand. > > > > On Fri, 1 Nov 2002, Lincoln Stein wrote: > > > > >If you are using interbase coordinates, then there is no reason to have a > > >Bioperl-style strand field, is there? This is because interbase > > coordinates > > >unambiguously specify the strand even when there's only one base pair in > > >question: > > > > > > 0 1 2 3 > > > g a t > > > > > >If we're speaking of the "a" on the forward strand, the interbase > > >representation is [1,2], whereas if we're speaking of its complement on > > the > > >reverse strand, the representation is [2,1] > > > > > >Strand can be calculated as: pos5 >pos3 ? 1 : -1; > > > > > >You will, however, need a boolean field that indicates whether the feature > > is > > >single-stranded (strand -1 or +1) or double-stranded (strand 0). > > > > > >Lincoln > > > > > >On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > > >>On Wed, 30 Oct 2002, Hilmar Lapp wrote: > > >>>My $0.02 on this is that I have seen unintuitive and cryptic column > > >>>names causing as much grief as unintuitive and cryptic API method > > >>>names. Intuitive and consistent naming is IMHO a much neglected art, > > >>>but its lack is one of the most annoying (because avoidable) > > >>>barriers to any piece of API or schema. > > >>> > > >>>At first glance, neither pos5 nor fnbeg mean much to me. If you mean > > >>>5' position, why not say pos_5prime and pos_3prime? > > >>> > > >>>As for start/end being right or wrong in bioperl, my take on this is > > >>>that it depends on your viewpoint and there's no silver bullet that > > >>>kills every bird. If your viewpoint is biological, then a feature > > >>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, > > >>>then it is useful to define that end cannot be smaller than start, > > >>>and strand is the tool to map to the biological viewpoint. Bioperl > > >>>takes the latter viewpoint, which may be good for some and bad for > > >>>others. There's Bio::Coordinate that lets you potentially map > > >>>between any two systems. I have to say that some people here have > > >>>discovered the bioperl way of defining feature boundaries > > >>>independently of bioperl as the most useful one for storing and > > >>>searching genome mappings. To me it seems they've all got their > > >>>downsides and upsides, and one just needs to settle on one and be > > >>>consistent throughout. > > >> > > >>i meant the naming is wrong, not necessarily the semantics > > >> > > >>there is two choices of semantics for the two columns > > >> > > >>either > > >> > > >>[1] X <= Y > > >> > > >>or > > >> > > >>[2] (Y - X) * strand >= 0 > > >> > > >>(both assuming interbase coordintes) > > >> > > >>there is no absolute correct choice of what semantics to use - like you > > >>say, both have their up and downsides. (there is actually another choice, > > >>using offset+length, but i personally don't like this) > > >> > > >>however, start/end are obviously terrible, awful, confusing choices of > > >>attribute *name* for semantics [1], whether you speak biology or vector > > >>math or english. there is no debate on this one, sorry. > > >> > > >>we had already made the choice to go with semantics [2] for chado (so > > >>fmin/fmax as column names is not an option). my opinion is this is > > >>generally a more useful semantics. eg getting upstream regions. a lot > > more > > >>is expressible as simple arithmetic statements without restorting to ugly > > >>if/then/case constructs. > > >> > > >>given semantics 2 we were deciding on the names for X and Y. I think as > > >>Dave says having 5 and 3 in the column name is out. I do think it's ok to > > >>indicate a mathematical notation of directionality - even though protein > > >>features have strands, protein locations are still equivalent to 1-d > > >>vectors with directionality. > > >> > > >>you make a good point about cryptic names. i guess i tend towards shorter > > >>names. however, if you come across the name "fnbeg" and say "what's > > that?" > > >>and are forced to the read the documentation then this is a very good > > >>thing, as you then learn the semantics - both that these are interbase > > and > > >>directional. whereas a cosy familiar name will most likely lead people to > > >>assume they know the semantics and then mess up. this is what happens to > > >>people learning bioperl all the time. i'm being disingenuous, i know. i > > >>guess at the end of the day longer names are better. but then we have to > > >>be consistent within chado.... > > >> > > >>i'm glad i'm not the only one this pedantic about the naming of these > > >>things. (I don;t think the sementics issue is at all pedantinc - where > > >>possible these things should have a precise computational definition) > > >> > > >>> -hilmar > > >>> > > >>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > > >>>>I am sending this to the gmod-devel list to get the opinion of the > > >>>>larger audience. > > >>>> > > >>>>I am inclined to agree with Colin about nomenclature, though I do agree > > >>>>with you about bioperl's normal/incorrect use of boundaries. Before > > >>>>bioperl came along I did it the way you propose; it caused much > > >>>>confusion when I changed my schema to correspond to the bioperl way. > > >>>>Assuming we use Chris' proposed boundary coordinates, I think using > > >>>>check constraints is a good idea. > > >>>> > > >>>>Other opinions? > > >>>> > > >>>>Scott > > >>>> > > >>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > > >>>>>I preferred your suggestion of pos5 and pos3, as well as my suggestion > > >>>>>of end5 and end3. I don't think a new chado user will figure out that > > >>>>>fnbeg stands for "feature natural begin" as easily as they would > > >>>>>figure > > >>>>>out that pos5 (or end5) is the "position of the 5' end". > > >>>>> > > >>>>>Colin > > >>>>> > > >>>>>>-----Original Message----- > > >>>>>>From: gmo...@li... [mailto:gmod-schema- > > >>>>>>ad...@li...] On Behalf Of Chris Mungall > > >>>>>>Sent: Wednesday, October 30, 2002 4:46 PM > > >>>>>>To: gmo...@li... > > >>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, sequence > > >>>>> > > >>>>>module > > >>>>> > > >>>>>>I have reworked the tables in the computational analysis module; they > > >>>>> > > >>>>>are > > >>>>> > > >>>>>>now a little less generic than before. there is some docs included in > > >>>>> > > >>>>>the > > >>>>> > > >>>>>>.sql - more needed though... > > >>>>>> > > >>>>>>the multiple alignment part (eg for clustal results) is still fluid > > >>>>>> > > >>>>>>I have also settled on > > >>>>>> > > >>>>>>fnbeg > > >>>>>>fnend > > >>>>>> > > >>>>>>for specifying coordinates - feature natural begin, feature natural > > >>>>> > > >>>>>end > > >>>>> > > >>>>>>ie this is the "real" begin and end > > >>>>>> > > >>>>>>we should also possibly include a check constraint to make this > > >>>>> > > >>>>>explicit > > >>>>> > > >>>>>>eg > > >>>>>> > > >>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > > >>>>>> > > >>>>>> > > >>>>>>this is opposed to the normal (erroneous in my opinion) use of > > >>>>> > > >>>>>start/begin > > >>>>> > > >>>>>>end/stop, as used in bioperl, where > > >>>>>> > > >>>>>>start <= end > > >>>>>> > > >>>>>>ie they actually mean (low, high) > > >>>>>> > > >>>>>>how do we feel about check constraints? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>------------------------------------------------------- > > >>>>>>This sf.net email is sponsored by: Influence the future > > >>>>>>of Java(TM) technology. Join the Java Community > > >>>>>>Process(SM) (JCP(SM)) program now. > > >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>>>_______________________________________________ > > >>>>>>Gmod-schema mailing list > > >>>>>>Gmo...@li... > > >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>>>> > > >>>>>------------------------------------------------------- > > >>>>>This sf.net email is sponsored by: Influence the future > > >>>>>of Java(TM) technology. Join the Java Community > > >>>>>Process(SM) (JCP(SM)) program now. > > >>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>>_______________________________________________ > > >>>>>Gmod-schema mailing list > > >>>>>Gmo...@li... > > >>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>>> > > >>>>-- > > >>>>----------------------------------------------------------------------- > > >>>>- Scott Cain, Ph. D. > > >>>>ca...@cs... > > >>>>GMOD Coordinator (http://www.gmod.org/) > > >>>>216-392-3087 > > >>>>Cold Spring Harbor Laboratory > > >>>> > > >>>> > > >>>> > > >>>>------------------------------------------------------- > > >>>>This sf.net email is sponsored by: Influence the future > > >>>>of Java(TM) technology. Join the Java Community > > >>>>Process(SM) (JCP(SM)) program now. > > >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>>>_______________________________________________ > > >>>>Gmod-schema mailing list > > >>>>Gmo...@li... > > >>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > > >>> > > >>>-- > > >>>------------------------------------------------------------- > > >>>Hilmar Lapp email: lapp at gnf.org > > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >>>------------------------------------------------------------- > > >> > > >>------------------------------------------------------- > > >>This sf.net email is sponsored by: Influence the future > > >>of Java(TM) technology. Join the Java Community > > >>Process(SM) (JCP(SM)) program now. > > >>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > >>_______________________________________________ > > >>Gmod-devel mailing list > > >>Gmo...@li... > > >>https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > > > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: See the NEW Palm > > Tungsten T handheld. Power &Color in a compact size! > > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > |
From: <SLe...@ao...> - 2002-11-08 00:21:24
|
We discussed this some in a meeting with TIGR, etc. folk today. I worry about relying on the min < max convention to indicate strand -- it seems rather a hack, as opposed to making everything crystal clear. Why not just have a strand field, with values +, -, both, unspecified? Cheers, -Stan In a message dated 11/6/2002 9:56:43 PM Eastern Standard Time, cj...@fr... writes: > Subj: Re: [GMOD-devel] Re: [Gmod-schema] cvs changes: companalysis module, > sequence module > Date: 11/6/2002 9:56:43 PM Eastern Standard Time > From: <A HREF="mailto:cj...@fr...">cj...@fr...</A> > To: <A HREF="mailto:ls...@cs...">ls...@cs...</A> > CC: <A HREF="mailto:hl...@gn...">hl...@gn...</A>, <A HREF="mailto:ca...@cs...">ca...@cs...</A>, <A HREF="mailto:cw...@lb...">cw...@lb...</A>, <A HREF="mailto:gmo...@li..."> > gmo...@li...</A>, <A HREF="mailto:gmo...@li...">gmo...@li...</A> > Sent from the Internet > > > > yep, this is correct, we should remove strand, it was there as a holdover > from when we were using min/max > > as for ds features - hmm, i wonder if we can just punt this into SO. > > i think it's easier to make everything directional, that way everything > can be asked for a sequence. what is the sequence of a ds feature? you > have to additionally specify direction. > > i realise for some viewers it doesn't make sense to show certain features > as being on a particular strand - but this true for features with > directionality too. for instance, P insertions and SNPs shouldn't be shown > in a display as affecting one particular strand, but they do have a > strand. > > On Fri, 1 Nov 2002, Lincoln Stein wrote: > > >If you are using interbase coordinates, then there is no reason to have a > >Bioperl-style strand field, is there? This is because interbase > coordinates > >unambiguously specify the strand even when there's only one base pair in > >question: > > > > 0 1 2 3 > > g a t > > > >If we're speaking of the "a" on the forward strand, the interbase > >representation is [1,2], whereas if we're speaking of its complement on > the > >reverse strand, the representation is [2,1] > > > >Strand can be calculated as: pos5 >pos3 ? 1 : -1; > > > >You will, however, need a boolean field that indicates whether the feature > is > >single-stranded (strand -1 or +1) or double-stranded (strand 0). > > > >Lincoln > > > >On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > >>On Wed, 30 Oct 2002, Hilmar Lapp wrote: > >>>My $0.02 on this is that I have seen unintuitive and cryptic column > >>>names causing as much grief as unintuitive and cryptic API method > >>>names. Intuitive and consistent naming is IMHO a much neglected art, > >>>but its lack is one of the most annoying (because avoidable) > >>>barriers to any piece of API or schema. > >>> > >>>At first glance, neither pos5 nor fnbeg mean much to me. If you mean > >>>5' position, why not say pos_5prime and pos_3prime? > >>> > >>>As for start/end being right or wrong in bioperl, my take on this is > >>>that it depends on your viewpoint and there's no silver bullet that > >>>kills every bird. If your viewpoint is biological, then a feature > >>>starts at its 5' end. If your viewpoint is a 1-dimensional axis, > >>>then it is useful to define that end cannot be smaller than start, > >>>and strand is the tool to map to the biological viewpoint. Bioperl > >>>takes the latter viewpoint, which may be good for some and bad for > >>>others. There's Bio::Coordinate that lets you potentially map > >>>between any two systems. I have to say that some people here have > >>>discovered the bioperl way of defining feature boundaries > >>>independently of bioperl as the most useful one for storing and > >>>searching genome mappings. To me it seems they've all got their > >>>downsides and upsides, and one just needs to settle on one and be > >>>consistent throughout. > >> > >>i meant the naming is wrong, not necessarily the semantics > >> > >>there is two choices of semantics for the two columns > >> > >>either > >> > >>[1] X <= Y > >> > >>or > >> > >>[2] (Y - X) * strand >= 0 > >> > >>(both assuming interbase coordintes) > >> > >>there is no absolute correct choice of what semantics to use - like you > >>say, both have their up and downsides. (there is actually another choice, > >>using offset+length, but i personally don't like this) > >> > >>however, start/end are obviously terrible, awful, confusing choices of > >>attribute *name* for semantics [1], whether you speak biology or vector > >>math or english. there is no debate on this one, sorry. > >> > >>we had already made the choice to go with semantics [2] for chado (so > >>fmin/fmax as column names is not an option). my opinion is this is > >>generally a more useful semantics. eg getting upstream regions. a lot > more > >>is expressible as simple arithmetic statements without restorting to ugly > >>if/then/case constructs. > >> > >>given semantics 2 we were deciding on the names for X and Y. I think as > >>Dave says having 5 and 3 in the column name is out. I do think it's ok to > >>indicate a mathematical notation of directionality - even though protein > >>features have strands, protein locations are still equivalent to 1-d > >>vectors with directionality. > >> > >>you make a good point about cryptic names. i guess i tend towards shorter > >>names. however, if you come across the name "fnbeg" and say "what's > that?" > >>and are forced to the read the documentation then this is a very good > >>thing, as you then learn the semantics - both that these are interbase > and > >>directional. whereas a cosy familiar name will most likely lead people to > >>assume they know the semantics and then mess up. this is what happens to > >>people learning bioperl all the time. i'm being disingenuous, i know. i > >>guess at the end of the day longer names are better. but then we have to > >>be consistent within chado.... > >> > >>i'm glad i'm not the only one this pedantic about the naming of these > >>things. (I don;t think the sementics issue is at all pedantinc - where > >>possible these things should have a precise computational definition) > >> > >>> -hilmar > >>> > >>>On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > >>>>I am sending this to the gmod-devel list to get the opinion of the > >>>>larger audience. > >>>> > >>>>I am inclined to agree with Colin about nomenclature, though I do agree > >>>>with you about bioperl's normal/incorrect use of boundaries. Before > >>>>bioperl came along I did it the way you propose; it caused much > >>>>confusion when I changed my schema to correspond to the bioperl way. > >>>>Assuming we use Chris' proposed boundary coordinates, I think using > >>>>check constraints is a good idea. > >>>> > >>>>Other opinions? > >>>> > >>>>Scott > >>>> > >>>>On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > >>>>>I preferred your suggestion of pos5 and pos3, as well as my suggestion > >>>>>of end5 and end3. I don't think a new chado user will figure out that > >>>>>fnbeg stands for "feature natural begin" as easily as they would > >>>>>figure > >>>>>out that pos5 (or end5) is the "position of the 5' end". > >>>>> > >>>>>Colin > >>>>> > >>>>>>-----Original Message----- > >>>>>>From: gmo...@li... [mailto:gmod-schema- > >>>>>>ad...@li...] On Behalf Of Chris Mungall > >>>>>>Sent: Wednesday, October 30, 2002 4:46 PM > >>>>>>To: gmo...@li... > >>>>>>Subject: [Gmod-schema] cvs changes: companalysis module, sequence > >>>>> > >>>>>module > >>>>> > >>>>>>I have reworked the tables in the computational analysis module; they > >>>>> > >>>>>are > >>>>> > >>>>>>now a little less generic than before. there is some docs included in > >>>>> > >>>>>the > >>>>> > >>>>>>.sql - more needed though... > >>>>>> > >>>>>>the multiple alignment part (eg for clustal results) is still fluid > >>>>>> > >>>>>>I have also settled on > >>>>>> > >>>>>>fnbeg > >>>>>>fnend > >>>>>> > >>>>>>for specifying coordinates - feature natural begin, feature natural > >>>>> > >>>>>end > >>>>> > >>>>>>ie this is the "real" begin and end > >>>>>> > >>>>>>we should also possibly include a check constraint to make this > >>>>> > >>>>>explicit > >>>>> > >>>>>>eg > >>>>>> > >>>>>>fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > >>>>>> > >>>>>> > >>>>>>this is opposed to the normal (erroneous in my opinion) use of > >>>>> > >>>>>start/begin > >>>>> > >>>>>>end/stop, as used in bioperl, where > >>>>>> > >>>>>>start <= end > >>>>>> > >>>>>>ie they actually mean (low, high) > >>>>>> > >>>>>>how do we feel about check constraints? > >>>>>> > >>>>>> > >>>>>> > >>>>>>------------------------------------------------------- > >>>>>>This sf.net email is sponsored by: Influence the future > >>>>>>of Java(TM) technology. Join the Java Community > >>>>>>Process(SM) (JCP(SM)) program now. > >>>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>>_______________________________________________ > >>>>>>Gmod-schema mailing list > >>>>>>Gmo...@li... > >>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>>> > >>>>>------------------------------------------------------- > >>>>>This sf.net email is sponsored by: Influence the future > >>>>>of Java(TM) technology. Join the Java Community > >>>>>Process(SM) (JCP(SM)) program now. > >>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>>_______________________________________________ > >>>>>Gmod-schema mailing list > >>>>>Gmo...@li... > >>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>>> > >>>>-- > >>>>----------------------------------------------------------------------- > >>>>- Scott Cain, Ph. D. > >>>>ca...@cs... > >>>>GMOD Coordinator (http://www.gmod.org/) > >>>>216-392-3087 > >>>>Cold Spring Harbor Laboratory > >>>> > >>>> > >>>> > >>>>------------------------------------------------------- > >>>>This sf.net email is sponsored by: Influence the future > >>>>of Java(TM) technology. Join the Java Community > >>>>Process(SM) (JCP(SM)) program now. > >>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>>>_______________________________________________ > >>>>Gmod-schema mailing list > >>>>Gmo...@li... > >>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema > >>> > >>>-- > >>>------------------------------------------------------------- > >>>Hilmar Lapp email: lapp at gnf.org > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >>>------------------------------------------------------------- > >> > >>------------------------------------------------------- > >>This sf.net email is sponsored by: Influence the future > >>of Java(TM) technology. Join the Java Community > >>Process(SM) (JCP(SM)) program now. > >>http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > >>_______________________________________________ > >>Gmod-devel mailing list > >>Gmo...@li... > >>https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: See the NEW Palm > Tungsten T handheld. Power &Color in a compact size! > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > |
From: Allen D. <all...@uc...> - 2002-11-07 03:08:00
|
Chris, I had a look at the ChaDo expression module, and used it to set up a database over here. For the next few days, I'll be writing the DBI layer that lets me convert Bio::Expression::MicroarrayI objects to/from ChaDo records. I can post/commit this code and sample data somewhere if people are interested to see it. I had to make some changes to the schema, and was wondering if these changes are generically useful enough to augment on to the base expression module, or whether I should just keep my changes to myself :). They are: (1) added an egroup table for tracking groups of related expression records. egroup and expression are related using an expression_egroup link table. it's conceivable that users will want to group related expression records together, and that a single expression record may belong to multiple groups. (2) added an eplatform table, and an eplatform_id FK to expression. i plan on throwing all my different expression technology data into the expression table, and need a way to keep track of which record comes from which platform (just Affymetrix Human U95Av2, U133A, and U133B data right now). (3) made the feature_expression table into a "stack" of tables that conform to feature_expression interface from your diagram. The table "stack" allows me to store my platform-specific data in seperate tables in a common database. for instance, instead of having a feature_expression, i have a feature_expression_cel table for my raw Affymetrix measurements, and another table called feature_expression_dchip for my normalized Affymetrix measurements. it's possible to figure out which table in the "stack" a particular expression record has data in by its eplatform_id. ------ Would it be more clear to show you my SQL CREATE statements? -Allen |
From: Chris M. <cj...@fr...> - 2002-11-07 02:55:25
|
yep, this is correct, we should remove strand, it was there as a holdover from when we were using min/max as for ds features - hmm, i wonder if we can just punt this into SO. i think it's easier to make everything directional, that way everything can be asked for a sequence. what is the sequence of a ds feature? you have to additionally specify direction. i realise for some viewers it doesn't make sense to show certain features as being on a particular strand - but this true for features with directionality too. for instance, P insertions and SNPs shouldn't be shown in a display as affecting one particular strand, but they do have a strand. On Fri, 1 Nov 2002, Lincoln Stein wrote: > If you are using interbase coordinates, then there is no reason to have a > Bioperl-style strand field, is there? This is because interbase coordinates > unambiguously specify the strand even when there's only one base pair in > question: > > 0 1 2 3 > g a t > > If we're speaking of the "a" on the forward strand, the interbase > representation is [1,2], whereas if we're speaking of its complement on the > reverse strand, the representation is [2,1] > > Strand can be calculated as: pos5 > pos3 ? 1 : -1; > > You will, however, need a boolean field that indicates whether the feature is > single-stranded (strand -1 or +1) or double-stranded (strand 0). > > Lincoln > > On Thursday 31 October 2002 01:53 pm, Chris Mungall wrote: > > On Wed, 30 Oct 2002, Hilmar Lapp wrote: > > > My $0.02 on this is that I have seen unintuitive and cryptic column > > > names causing as much grief as unintuitive and cryptic API method > > > names. Intuitive and consistent naming is IMHO a much neglected art, > > > but its lack is one of the most annoying (because avoidable) > > > barriers to any piece of API or schema. > > > > > > At first glance, neither pos5 nor fnbeg mean much to me. If you mean > > > 5' position, why not say pos_5prime and pos_3prime? > > > > > > As for start/end being right or wrong in bioperl, my take on this is > > > that it depends on your viewpoint and there's no silver bullet that > > > kills every bird. If your viewpoint is biological, then a feature > > > starts at its 5' end. If your viewpoint is a 1-dimensional axis, > > > then it is useful to define that end cannot be smaller than start, > > > and strand is the tool to map to the biological viewpoint. Bioperl > > > takes the latter viewpoint, which may be good for some and bad for > > > others. There's Bio::Coordinate that lets you potentially map > > > between any two systems. I have to say that some people here have > > > discovered the bioperl way of defining feature boundaries > > > independently of bioperl as the most useful one for storing and > > > searching genome mappings. To me it seems they've all got their > > > downsides and upsides, and one just needs to settle on one and be > > > consistent throughout. > > > > i meant the naming is wrong, not necessarily the semantics > > > > there is two choices of semantics for the two columns > > > > either > > > > [1] X <= Y > > > > or > > > > [2] (Y - X) * strand >= 0 > > > > (both assuming interbase coordintes) > > > > there is no absolute correct choice of what semantics to use - like you > > say, both have their up and downsides. (there is actually another choice, > > using offset+length, but i personally don't like this) > > > > however, start/end are obviously terrible, awful, confusing choices of > > attribute *name* for semantics [1], whether you speak biology or vector > > math or english. there is no debate on this one, sorry. > > > > we had already made the choice to go with semantics [2] for chado (so > > fmin/fmax as column names is not an option). my opinion is this is > > generally a more useful semantics. eg getting upstream regions. a lot more > > is expressible as simple arithmetic statements without restorting to ugly > > if/then/case constructs. > > > > given semantics 2 we were deciding on the names for X and Y. I think as > > Dave says having 5 and 3 in the column name is out. I do think it's ok to > > indicate a mathematical notation of directionality - even though protein > > features have strands, protein locations are still equivalent to 1-d > > vectors with directionality. > > > > you make a good point about cryptic names. i guess i tend towards shorter > > names. however, if you come across the name "fnbeg" and say "what's that?" > > and are forced to the read the documentation then this is a very good > > thing, as you then learn the semantics - both that these are interbase and > > directional. whereas a cosy familiar name will most likely lead people to > > assume they know the semantics and then mess up. this is what happens to > > people learning bioperl all the time. i'm being disingenuous, i know. i > > guess at the end of the day longer names are better. but then we have to > > be consistent within chado.... > > > > i'm glad i'm not the only one this pedantic about the naming of these > > things. (I don;t think the sementics issue is at all pedantinc - where > > possible these things should have a precise computational definition) > > > > > -hilmar > > > > > > On Wednesday, October 30, 2002, at 06:45 PM, Scott Cain wrote: > > > > I am sending this to the gmod-devel list to get the opinion of the > > > > larger audience. > > > > > > > > I am inclined to agree with Colin about nomenclature, though I do agree > > > > with you about bioperl's normal/incorrect use of boundaries. Before > > > > bioperl came along I did it the way you propose; it caused much > > > > confusion when I changed my schema to correspond to the bioperl way. > > > > Assuming we use Chris' proposed boundary coordinates, I think using > > > > check constraints is a good idea. > > > > > > > > Other opinions? > > > > > > > > Scott > > > > > > > > On Wed, 2002-10-30 at 20:54, Colin Wiel wrote: > > > >> I preferred your suggestion of pos5 and pos3, as well as my suggestion > > > >> of end5 and end3. I don't think a new chado user will figure out that > > > >> fnbeg stands for "feature natural begin" as easily as they would > > > >> figure > > > >> out that pos5 (or end5) is the "position of the 5' end". > > > >> > > > >> Colin > > > >> > > > >>> -----Original Message----- > > > >>> From: gmo...@li... [mailto:gmod-schema- > > > >>> ad...@li...] On Behalf Of Chris Mungall > > > >>> Sent: Wednesday, October 30, 2002 4:46 PM > > > >>> To: gmo...@li... > > > >>> Subject: [Gmod-schema] cvs changes: companalysis module, sequence > > > >> > > > >> module > > > >> > > > >>> I have reworked the tables in the computational analysis module; they > > > >> > > > >> are > > > >> > > > >>> now a little less generic than before. there is some docs included in > > > >> > > > >> the > > > >> > > > >>> .sql - more needed though... > > > >>> > > > >>> the multiple alignment part (eg for clustal results) is still fluid > > > >>> > > > >>> I have also settled on > > > >>> > > > >>> fnbeg > > > >>> fnend > > > >>> > > > >>> for specifying coordinates - feature natural begin, feature natural > > > >> > > > >> end > > > >> > > > >>> ie this is the "real" begin and end > > > >>> > > > >>> we should also possibly include a check constraint to make this > > > >> > > > >> explicit > > > >> > > > >>> eg > > > >>> > > > >>> fstrand is null OR (fnend - fnbeg) * fstrand >= 0 > > > >>> > > > >>> > > > >>> this is opposed to the normal (erroneous in my opinion) use of > > > >> > > > >> start/begin > > > >> > > > >>> end/stop, as used in bioperl, where > > > >>> > > > >>> start <= end > > > >>> > > > >>> ie they actually mean (low, high) > > > >>> > > > >>> how do we feel about check constraints? > > > >>> > > > >>> > > > >>> > > > >>> ------------------------------------------------------- > > > >>> This sf.net email is sponsored by: Influence the future > > > >>> of Java(TM) technology. Join the Java Community > > > >>> Process(SM) (JCP(SM)) program now. > > > >>> http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > > >>> _______________________________________________ > > > >>> Gmod-schema mailing list > > > >>> Gmo...@li... > > > >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > >> > > > >> ------------------------------------------------------- > > > >> This sf.net email is sponsored by: Influence the future > > > >> of Java(TM) technology. Join the Java Community > > > >> Process(SM) (JCP(SM)) program now. > > > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > > >> _______________________________________________ > > > >> Gmod-schema mailing list > > > >> Gmo...@li... > > > >> https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ----------------------------------------------------------------------- > > > >- Scott Cain, Ph. D. > > > > ca...@cs... > > > > GMOD Coordinator (http://www.gmod.org/) > > > > 216-392-3087 > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This sf.net email is sponsored by: Influence the future > > > > of Java(TM) technology. Join the Java Community > > > > Process(SM) (JCP(SM)) program now. > > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > > > _______________________________________________ > > > > Gmod-schema mailing list > > > > Gmo...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: Influence the future > > of Java(TM) technology. Join the Java Community > > Process(SM) (JCP(SM)) program now. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > _______________________________________________ > > Gmod-devel mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > |