|
From: Harry M. <man...@ho...> - 2001-02-08 21:14:10
|
This was sent to me directly and it contains enough ?s for everyone now :) answer what you want. I'll try to fill in the missing bits. -------- Original Message -------- Subject: Re: Fwd: Re: Fwd: Re: infos Date: Thu, 8 Feb 2001 13:49:56 +0100 From: Andrea Splendiani <an...@bt...> To: man...@ho... References: <010...@lo...> > Subject: Re: Fwd: Re: infos > Date: Mon, 05 Feb 2001 10:00:17 -0800 > From: Harry Mangalam <man...@ho...> > To: ser...@ti..., > genexdev at SF <gen...@li...> > > The problem is that if I have some information represented in the db in a > > non-structured way (eg. experiments decribed in human language), and then I > > have the structured in XML (eg: experimets is ID + #Arrays etc. tec.) I may > > get into trouble. > > >From what I've seen it may be a problem, since many fileds are left to > > > "human description". > If I understand you, it won;t nec. be aproblem in representing the > unstructured text in XML - it can just be represented as TEXT, but as such, > if you want to be able to extract information from it in the DB, you'll have > to come up with a way of indexing it in SW - something like having a script > that does a daily text-index of those fields so that they can also be > searchable (but outside the DB system. Most DB systems allow things like > regex searches, but (I think) most such systems don't have contextual text > searches, such as a glimpse / isite search capability. the problem I was wondering about is: when I store data in the DB I want to store a description of the exsperiment too (+ other meta-data). I may store it in a more or less structured way, and there may be a trade-off between how much data is structured, and how flexible is the DB. Then the problem is to find the best trade-off so that a query has not to rely on text search on descriptions made perhaps without even a very standard glossary. As an example see the EBI schema in which an experiments consisting of ten samples separated by 10 minutes may in part be described in a structured way, ten samples "entity" linked by a relation "treatment" with value "dalay". Then this is easier to analyze then a text description in a filed "experiment" like "the experiment consists of ten samples taken at 10 minutes intervals....". Then if I have this semi-strucured view of an axperiment, I want to exchange it in an XML document, or if, as an example, I receive a document in which the experiment is a text description like "the experiments consists of...", how can put it in my DB where such informations are more structured ? I hope I've been clear, I fear I haven't! Anyway, I'll see better the schemas proposed, the XML languages, and I'll see how things are done and if such problems really exists. ... > > Why do you state your DB runs on PostgresSQL and not MySQL ? Do you use SQL > > functions not provided by MySQL ? which are they ? > > MySQL should be the fastest one, and it should have all the functionality > > needed by a project like a DB for gene expression data (mainly storage, no > > transactions, scarce writing etc.). > We originally considered MySQL but decided for Postgres because of lack of > some consistency checks, constraints, and transactions in MySQL. While I > think MySQL is a good system for a lab DB, if it has to scale quite large, > Postgres is a better choice - (At least it was - I've heard that MySQL now > supports some of the features that we needed - if we re-visit this, we may > add support for MySQL as well - if you want to try to implement GeneX on > MySQL, FANTASTIC!! - we certainly would like to support both, but there are > constraints as to number of people we can throw at it. I'll start to work actively on the DB after the 20 of this month. I'll probabl have a look at GeneX, but probably I will not fallow it entirely, also becouse here they are quite specialized (like only Affimetrix, only Murine, only Onchology...), and they may want an additional set of features for their own functional annotation... Anyway, I'll see, maybe I'll use GeneX, and in that case I'll surely try MySQL with it. I'll let you know... > > Thanks, I had a look at all that (brief by now, but I'll get back on it!) > > Well, I have to set up a DB for microarray experiment data. Size will start > > from 100 experimets, but it's expected to grow fast. > > All the experimets will be on Affimetrix equipments, so I have no problems > > in representing the technological platform, since I may provide some > > template to XML and make the DB compliant even with DB designed to take > > care of data from different technologies. > > But I may have troubles in designing the DB schema, becouse I'm not a > > biologist and here bioogists are not informatics... so I see there may be > > some comunication problem, possible leading in some incomplete schema. > > So I decided to start from a good base, such as EBI schema (but I'll think > > about your one too, which one do you suggest ? ) and customize it. > Well, we obviously like OUR schema, but if you're a DB person looking at > schema's to evaluate, you might also look at Stanford's (as a guide, but it > is apparently based and quite heavily dependent on Oracle technology): We are quite oriented to open source technolgy here... so I don't think we'll use Oracle! > > The goal would be to have a standard interface to share data with other DB > > and with analytical tools too. > > Which interface do you use to your tools, they are based on your DB-schema, > > or on GEML ? > > Is there any infrastructure to share data among DB ? > Ours are based on the GeneXML spec (but when MAML gets firmed up enough to > use, we'll certainly convert to using that). We also provide tools written > in Perl to support GeneXML exchange among GEneX databases and our Curation > Tool (also Open Source) Any schema the DB will be based on, it will probably feature your XML interface (whatever! :). > You might also want to join the genex-dev mailing list to see how things are > moving - and certainly if you want to use GeneX as a basis for your own > system. We'd especially like feedback from people who have Database > experience. Sign up via: thankyou very much for your help, after the 20, I'm sure I'll subscribe and ask, and ask, and ask.... Andrea Splendiani |