You can subscribe to this list here.
2001 |
Jan
(135) |
Feb
(57) |
Mar
(84) |
Apr
(43) |
May
(77) |
Jun
(51) |
Jul
(21) |
Aug
(55) |
Sep
(37) |
Oct
(56) |
Nov
(75) |
Dec
(23) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(32) |
Feb
(174) |
Mar
(121) |
Apr
(70) |
May
(55) |
Jun
(20) |
Jul
(23) |
Aug
(15) |
Sep
(12) |
Oct
(58) |
Nov
(203) |
Dec
(90) |
2003 |
Jan
(37) |
Feb
(15) |
Mar
(14) |
Apr
(57) |
May
(7) |
Jun
(40) |
Jul
(36) |
Aug
(1) |
Sep
(56) |
Oct
(38) |
Nov
(105) |
Dec
(2) |
2004 |
Jan
|
Feb
(117) |
Mar
(69) |
Apr
(160) |
May
(165) |
Jun
(35) |
Jul
(7) |
Aug
(80) |
Sep
(47) |
Oct
(23) |
Nov
(8) |
Dec
(42) |
2005 |
Jan
(19) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <ja...@op...> - 2004-05-11 15:23:13
|
ja...@op... (Jason E. Stewart) writes: > According to the following, LyX supports Docbook SGML: > > http://bgu.chez.tiscali.fr/doc/db4lyx/ > > And the output is well-formed XML with an SGML header - so my little > XML => HTML processing script could be modified to remove the SGML > header and add an XML header before it transforms it to HTML. > > I'm installing it Unfortunately, it is export only. This means that the files would have to be versioned using the LaTeX or LyX format, and then exported to Docbook when we wanted to publish them in HTML or PDF. Only half the battle. LyX is an OK editor for the WYSIWYG crowd, and I certainly know LaTeX inside and out, but I'm still not really thrilled. Cheers, jas. |
From: SourceForge.net <no...@so...> - 2004-05-11 14:10:18
|
Bugs item #951918, was opened at 2004-05-11 08:10 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=951918&group_id=16453 Category: Bio::Genex Perl API Group: Genex-2 Status: Open Resolution: None Priority: 8 Submitted By: Jason E. Stewart (jason_e_stewart) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: linking table entries getting inserted multiple times Initial Comment: There is no checking mechanism to see if a linking table entry is in the DB, so they will get inserted multiple times (especially if there is no unique index on the table). A quick fix is to simply check if dbh() is set. A more interesting fix is to add support for oids - that way even linking table entries would have pkeys. This would fix support for applications like mason-edit-app.pl ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=951918&group_id=16453 |
From: <ja...@op...> - 2004-05-11 13:46:39
|
Diane Trout <di...@ca...> writes: > On Tue, May 11, 2004 at 01:29:19PM +0530, Jason E. Stewart wrote: >> We don't need 1/100 the power of Docbook, something as simple as the >> Perl POD format would even work. > > Have you looked at restructured text it's a markup language similar in > style to what's used by wikis. There's sompe python tools that can > dump it to html, xml, and tex. > > http://docutils.sf.net > > I've looked at the markup before, but I haven't really tried pushing the > tools to see how well it works. Ah, thanks. Yes it looks promising. This might be the simplest approach. According to the following, LyX supports Docbook SGML: http://bgu.chez.tiscali.fr/doc/db4lyx/ And the output is well-formed XML with an SGML header - so my little XML => HTML processing script could be modified to remove the SGML header and add an XML header before it transforms it to HTML. I'm installing it |
From: Diane T. <di...@ca...> - 2004-05-11 09:04:29
|
On Tue, May 11, 2004 at 01:29:19PM +0530, Jason E. Stewart wrote: > We don't need 1/100 the power of Docbook, something as simple as the > Perl POD format would even work. Have you looked at restructured text it's a markup language similar in style to what's used by wikis. There's sompe python tools that can dump it to html, xml, and tex. http://docutils.sf.net I've looked at the markup before, but I haven't really tried pushing the tools to see how well it works. diane |
From: <ja...@op...> - 2004-05-11 07:59:44
|
Harry Mangalam <hj...@ta...> writes: > Not a bad idea, especially with lyx. Can OOo export TeX? Only > partly joking.. I'd rather find another way. TeX is a format with no future. Very powerful, but only useful for geeks. > Actually OOo can export PDF and HTML as well, so that becomes a > secondary option I guess.. for having non-geeks assist with docs.. We really want some kind of markup. Plain text is really, well, plain. PDF and HTML are pretty, but HTML has no semantics, it's just presentation. >>> DO we leave them as plain text? HTML-ize them? What? There are >>> certainly advantages to making them html, but once done, they >>> become harder to update. I really think we have two choices: * plain text * some kind of semantic markup We don't need 1/100 the power of Docbook, something as simple as the Perl POD format would even work. >>> I think we gave up on the Docbook approach, right? I was just checking with LyX. There is a way to get LyX to do docbook, but until 1.2 it isn't straight forward. More in a bit. jas. |
From: Harry M. <hj...@ta...> - 2004-05-11 04:13:56
|
Not a bad idea, especially with lyx. Can OOo export TeX? Only partly joking.. Actually OOo can export PDF and HTML as well, so that becomes a secondary option I guess.. for having non-geeks assist with docs.. hjm Brandon King wrote: > Hey Guys, > Maybe I should suggest LaTeX? If you check out the Docs module from > Pymerase cvs, you can use the LaTeX there as a reference and then you > can have HTML, plain text, and pdf. Just a thought. > > -Brandon King > > Harry J Mangalam wrote: > >> DO we leave them as plain text? HTML-ize them? What? There are >> certainly advantages to making them html, but once done, they become >> harder to update. >> I think we gave up on the Docbook approach, right? >> >> What say ye? >> >> > > -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: Brandon K. <ki...@ca...> - 2004-05-10 23:39:21
|
Hey Guys, Maybe I should suggest LaTeX? If you check out the Docs module from Pymerase cvs, you can use the LaTeX there as a reference and then you can have HTML, plain text, and pdf. Just a thought. -Brandon King Harry J Mangalam wrote: >DO we leave them as plain text? HTML-ize them? What? There are certainly >advantages to making them html, but once done, they become harder to update. >I think we gave up on the Docbook approach, right? > >What say ye? > > |
From: Harry J M. <hj...@ta...> - 2004-05-10 22:28:17
|
DO we leave them as plain text? HTML-ize them? What? There are certainly advantages to making them html, but once done, they become harder to update. I think we gave up on the Docbook approach, right? What say ye? -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: Harry J M. <hj...@ta...> - 2004-05-10 22:21:56
|
Hi All, I went thru the docs dir and did some housecleaning/consolidation. I include the following as an overview of what has been done/what needs to be done. Please review to see if you agree. 2004-05.10 Harry Mangalam <hj...@ta...> * del doc/Array-model.txt (merged into glossary) * updated GeneX-ODBC.HOWTO * added scratch.tables.and.filters.txt (from GAIM convers) * updated glossary * rec merging Brandon's howto-new_genex2_file_format.txt into Jason's DataLoading-HOWTO.html or perhaps better into ArrayDesignCreation-HOWTO.html, which itself should be a part of the DataLoading-HOWTO.html. These xml-derived files need to be edited in XML to retain xml->html congruence, so wait til jason responds about how best to do this * delete the old Adding_New_GeneX-Apps.txt as it's now being re-done via protocols (and thus will be described in what will be the expanded version started in 'scratch.tables.and.filters.txt' * codingGuideline.txt will remain as is. * Contributing is slightloy dated and mostly refers to providing patches to the .in files which Jason has indicated are no longer going to be used very much - this file should probably be retired. * Cutting_a_Release.notes refers entirely to CVS which we no longer use and dates back to the NCGR days. I think we'll be using svn to tag releases and release straight from svn rather than go back to the way we used to do releases at NCGR, no? THis file should be removed. * Darwin_HOWTO_Template.xml - this looks to be the XML template for the Apple Darwin HOWTO's.. Interesting, but not really relevant to GeneX docs, I think - this should be removed form the svn repo. * docbook2html.pl - ditto this. This is more a doc tool and users shouldn't really see this. Maybe we should start a useful_tools directory where we can stash these kinds of tools that we want carried along with the genex code, but more out of sight? * GeneSpring_GeneX.database is an example 'database' file that GeneSpring needs to connect to a remote DB. It travels along with 'GeneSpring.notes', but both are dated and I can't update them until we get a new GS license. I'd say leave them for now, but if they age too much, we may have to remove them. * GeneX_DB_tuning.txt is a more a tech note to myself than documentation. It involves the FLOAT -> INT conversion and other tuning tricks. If it remains with the svn repo, it should go into a developer docs dir, rather than a user docs dir. I've left it alone for now, but it should be removed before release. * genex-ents.xml has nothing to do with Tolkien characters, but seems to be a fragment of XML dealing with maillists. Should be removed. * HOWTO_Load_QuantArray_Data.txt is my old HOWTO, but I'm not sure it has anything to do with the current state of reality. Jason? Can this even be updated or is it completely out-of-sync? * Porting.txt - extremely old, short description of how to port GeneX. So old and worn that it's not worth keeping. No special info in it anyway. * ReleaseNotes.txt - very old, but DOES ned to be updated to reflect current GeneX2 release. On my list. * ANother of Jason's Docbook samples (nothing to do with RNA sample or biomaterials). Should go in the Dev_docs dir if we decide to do that. * Using.CPAN is the somewhat old, but still valuable description of how to use CPAN. THis actually is of use to the general GeneX2 user, so I'll update it and keep it. * Security.txt looks to be an older doc that talks about the security model in GeneX. There's no date on it, so I don't know how old it is, so Jason should take a look at it and see if it's still valid. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: Harry J M. <hj...@ta...> - 2004-05-10 21:08:01
|
Hi All, This doc is synthesized form a GAIM chat that Jason and I had a while back. Please read though it and see if it reflects your vison of reality and if not, let me know. Brief explanation how the scratch table works: The scratch table is completely generic; it has columns like INT1, INT2, FLOAT1, FLOAT2, etc. In order for a user to use it properly, a genex admin (a member of the genex_admin group) must create a VIEW onto the table that maps the generic columns to specific names like 'ch1_intensity => float1, ch2_intensity => float2', etc.. This is how the RAD DB at UPenn works. They use it for all their data we just use if for Derived BioAssay Data (I'll have to give a short writeup of how to use the Mason app I created to do the mapping) I think it will have to be a *much* simpler process for DB admins to understand it. This mapping can be made permanent by writing it to the DB - it's a view in the DB, so it is permanent until you drop the DB. It is also written into the TableDef table as a scratch view so all the Mason apps know about it. Further, since it involves the scratch table and views, it doesn't impinge or affect the actual DB schema. The real trick to the scratch table - and maybe the thing that will make it too much of a pain to use is that when you hook an analysis app up to the DB, you have to define a *destination view*. This destination view is where the analysis app will write it's output. This destination view is a view on the scratch table that an admin has already created; it is a formal DB View, as in 'CREATE VIEW AS ....'. All scratch table views are formal DB views that you can see under psql using '\dv' The person who hooks up the app has to choose a view whose columns match the output columns of the app so if the app produces two columns of data, he must choose a view that can hold those two columns. The app does not have to actually connect to the db and write the output INTO the scratch table, it just has to create its output in a format that is compatible with the VIEW that is going to hold it. Hooking up analysis tools to GeneX: 101 1) we want to make it really easy for people to hook up external tools for analysis. 2) therefore we can't expect them to learn and understand the Genex Perl API before they can use the tool with Genex - the API is great but it's complicated 3) we want to encourage users to keep all data stored in the DB - for archival purposes so data is traced and not lost and so it is verifiable (we're in science after all) 4) therefore we want to discourage Schlauchism - exporting data as tab files and littering tens of zip disks with data 5) to this end we need to create a smoke-and-mirrors illusion that apps which know nothing about the DB can actually pull data from the DB, do their analysis and write it back to the DB. When I was working with the Avestha people I tested out some ideas, and it turned out to be really trivial to accomplish this, at least in the limited efforts I had time to make. There are two classes of apps: 1) DB aware apps - they can already pull data from the DB directly and therefore don't need any help at all. These are easy - we just patch them to talk to Genex and viola! 2) DB stupid apps - they need disk files as input sources, not DB connectivity. To make this class of apps work, we need to subclass: subclass 1) enables both --input and --output subclass 2) doesn't allow determining both input and output file name, e.g. maybe the output is written to STDOUT or something. This isn't really so hard, we just have to run it within a wrapper that understands '--input' and '--output', and runs the application, moving files around where needed So subclass1 and subclass2 are really the same except that subclass2 needs to be run inside a (probably thin wrapping) that remaps the output/input - and that wrapper needs to be written separately for each app. To make this work we use our brand-new Protocol method in the DB. This was the bit that Michael Pear wanted to do for the grant and the piece I stole the tables from ESTAP to implement. I've added a bunch of Mason code to make it all work: You define a PROTOCOL that says what table (or query) to take the data from, which app to run on the input, and what scratch view to put the data into once its finished. [a concrete example would be ver useful here]. When the user wants to run an analysis he does the following: 1) choose an experiment to analyze. 2) choose the BioAssay's to be analyzed from that experiment. 3) Choose the analysis protocol to execute on the BioAssays. 4) go and have coffee. The DB code reads the protocol info from the DB and does the following: 1) exports the BioAssay data to a tab-delimited file using an I/O filter (see below). 2) starts the app using --input to tell it the input file name and --output telling it where to write the output. 3) when the app is finished, the wrapping script initiates transfers the data from the output file into the chosen destination view. 4) alerts the user that her data is waiting 5) smokes a cigarette This is reasonably straightforward, not too different from the approach we used in GeneX1 with rcluster, cybert, etc) The I/O filter has to formalize what we were in thinking about for GeneX1 - a general way to provide standardized inputs and outputs for any wild weasel of an app that wants to chew on GeneX data. It should enable anyone to write an I/O filter to massage the data on output from the DB, or input to the DB. So if a particular app wants it's NULL values in some heinous fashion it can be done. The filter idea is incorporated into the the PROTOCOL approach - *PROTOCOLS* are the core of the genex analysis approach. The trick to using the protocols is that each one defines the input and output tables (views) as part of the protocol. You want to store your input data into a scratch view that is useful for your protocols and you want to have scratch views that can handle the output of the protocols. The source view and destination view can be the same or different the only real issue is if they have the correct number of columns We don't necessarily want to design the Ginsu knife of filters, but provide a way that users can simply add their own wrapping filter code so that they an run the apps they want to. So can you see why scratch views are so important - they have to match the input for analysis apps and the output The trick is to change the user's mind as to what they expect the data to look like. For example, People familiar with spreadsheet files are always thinking in terms of data matrices which is fine, but that's not how the data is stored in the DB. In a data matrix, your columns are different BioAssay's (one column of intensity data per BioAssay) and the rows are the gene names with expression (intensity) for each BioAssay but in the DB, the values for the separate BioAssays are stored in different rows so when a user wants to do an analysis they can't think "I want you to run a new analysis on my favorite data matrix". Instead they have to think "I want you to run an analysis on this list of BioAssay's" (unless you have your favorite 'conversion to a data matrix' stored as an input filter - essentially a reasonably complex query). So the idea would be that a user could define a set of bioassays upon which to do other operations; that wouldn't be a filter though, it would just be a user preference that we could track, and we could provide some Mason app that allows users to define BioAssay collections for later processing In the way I've described it, the filter is just a mapping tool that takes a column of data about to be written to a text file, and applies a Perl regex to that column of numbers and based on that regex match, does something to the data. In contrast, the BioAssay set is a collection of favorite data that the user wants to analyze a number of different times. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: Harry J M. <hj...@ta...> - 2004-05-10 20:35:08
|
Hi J & J, I've reviewed Brandon's doc included here - it looks like it should really be merged into Jason's DataLoading-HOWTO.html. Is there a reason not to do this? Is anyone busy updating that file (DataLoading-HOWTO.html)? If not, I'll do this part at any rate. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-05-10 17:12:46
|
Harry Mangalam <hj...@ta...> writes: > This sounds good to me. Oooh - getting close to naildown. Yeah - I'm busting my nuts trying to get all these changes done - even though I'm a big fat control freak, it's really frustrating for me being the bottle neck on this release. There's just way too many things in which I'm the only person suited to get the job done. I really want to get this release made so that people are using the system in production mode.... Cheers, jas. |
From: Harry M. <hj...@ta...> - 2004-05-10 16:39:33
|
This sounds good to me. Oooh - getting close to naildown. hjm Jason E. Stewart wrote: > Hey Harry, > > First, and email to deal with the high-level stuff. > > I think that it is probably a good idea to do the float => int > conversion for all the expression data: > 1) MBA data tables > 2) scratch table > 3) DBA data table > > 1) and 3) are archival tables - they exist so that researchers have a > permanent record of their experiment, they are not for analysis and > processing - that's why I was focusing on 2) - the scratch table is > where all the processing will be taking place. > > Besides these data tables, there are only three other float columns in > the DB: > * default_spot_conc in *both* Array and ArrayDesign - this is an oops. > * tolerance in ArrayManufacture. > > As I wrote earlier, I would like to remove the default_spot_conc > column until someone screams for its return - that leaves only the > tolerance column. > > This is *not* likely to be a very heavily used column - it is there > only for MAGE commpatibility. I don't think we should play games > converting it back and forth. > > So in principal, I think that converting all data table floats to ints > is fine. > > Cheers, > jas. > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-05-10 16:10:48
|
Hey Harry, Here's the technical email. I think your idea is fine. Once the data is in ints shuffling it around is no trouble. The problem is getting how to properly get a float into a DB column that only stores ints - not rocket science, I know, but still an issue. There is a first loading step that takes place in which the numbers are floats, and we have to convert them to ints. We can move back that moment as far back in the process as loading the initial MBA data, but at some point we have to take a float and store it as an int. I was hoping that we could use DB triggers to do the black magic for us. That way the users would all think the DB stored floats - they entered floats into the table, and when they queried the table they got floats back. What could be easier? But, that didn't work. By the time the trigger sees the floating point number it's already been truncated and the data is lost. So we must use a different approach. Since we're going to have *all* the data float columns stored as ints, we only need to do the conversion when the data is loaded into the DB - this simplifies everything - so it makes sense for the data loader to do the conversion when reading the data in. So loading data in is not such a big deal. What about exporting data? Should we provide a mechanism that users can get back their floats? Or are they going to be satisfied with doing everything as ints? We don't have to spend lots of time on it now, but I think there should be a simply mechanism - such as setting some global flag - that enables a user to toggle whether the data matrix comes out as ints or the original floats. So, to recap, we will store all data matrix values as ints, and the data will be returned to the users as ints. In the near future we will think about some way in which the data can be converted to floats for export. Cheers, jas. |
From: <ja...@op...> - 2004-05-10 16:10:23
|
ja...@op... (Jason E. Stewart) writes: > I'm committing my latest checkpoint code to SVN. It works, it should > install, but it has the broken float => int conversion code. I want to > archive this code for future reference before I strip it all out, and > redo it. > > You can do an SVN update, but there are still schema changes pending. I've committed all my changes, and I've tested the code on genex2 - it installs cleanly - but 'make uninstall' didn't run to completion - I didn't have the proper conditional on one of the pieces. I checked in a fix, but I have no way of testing, because I have no way of installing the old DB code in order to test it... Cheers, jas. |
From: SourceForge.net <no...@so...> - 2004-05-10 13:54:31
|
Bugs item #848007, was opened at 2003-11-23 23:06 Message generated for change (Comment added) made by jason_e_stewart You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=848007&group_id=16453 Category: Installer Group: Genex-2 Status: Open Resolution: None Priority: 9 Submitted By: Jason E. Stewart (jason_e_stewart) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: fix scratch views with lookup table security permissions Initial Comment: need mechanism to create scratch views with lookup table security permissions based on dba_fk. ---------------------------------------------------------------------- >Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-05-10 07:54 Message: Logged In: YES user_id=85550 The scratch views are now being created use genex_scratch_view, but genex_scratch_view does not have the proper WHERE clause to use dba_fk as a lookup table fkey. ---------------------------------------------------------------------- Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-04-30 10:54 Message: Logged In: YES user_id=85550 This is true, the proper GRANT's are not being run, but even with permission, users not in the security group are allowed to edit data. This is a bad security hole, bump priority to max. ---------------------------------------------------------------------- Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-04-30 10:44 Message: Logged In: YES user_id=85550 The mechanism exists, but views are not being given the proper GRANT, so members of genex_user are beig refused. Bumped priority to get fixed pre-beta ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=848007&group_id=16453 |
From: SourceForge.net <no...@so...> - 2004-05-10 13:48:08
|
Bugs item #848040, was opened at 2003-11-24 00:26 Message generated for change (Comment added) made by jason_e_stewart You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=848040&group_id=16453 Category: DB Schema Group: Genex-2 >Status: Open >Resolution: Accepted Priority: 9 Submitted By: Nobody/Anonymous (nobody) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: Scratch wants to have it's float fields turned into ints Initial Comment: Scratch wants to have it's float fields turned into ints - this is for indexing performance reasons (how many do we need? Just one I think) perhaps create a SELECT trigger so that the columns are returned correctly as floats? Probably not, since that would probably really confuse the DBI. ---------------------------------------------------------------------- >Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-05-10 07:48 Message: Logged In: YES user_id=85550 Whoops! This didn't get fixed properly. It's back and this time the DB needs a major rehaul. We're going to store all data table floats as ints. The data loader needs to do the conversion from floats => ints at load time. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=848040&group_id=16453 |
From: SourceForge.net <no...@so...> - 2004-05-10 13:46:24
|
Bugs item #945353, was opened at 2004-04-30 09:02 Message generated for change (Comment added) made by jason_e_stewart You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=945353&group_id=16453 Category: Mason GUI Group: Genex-2 >Status: Open >Resolution: Accepted >Priority: 6 Submitted By: Jason E. Stewart (jason_e_stewart) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: 'no child processes' error in copy-to-scratch.html Initial Comment: When copying an MBA to a Scratch view, an error is caught after copy-to-scratch.pl is called even though the process exits normally. ---------------------------------------------------------------------- >Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-05-10 07:46 Message: Logged In: YES user_id=85550 This bug is back but I've commented out the error checking code for now so I'm setting this as a low priority. ---------------------------------------------------------------------- Comment By: Jason E. Stewart (jason_e_stewart) Date: 2004-04-30 09:48 Message: Logged In: YES user_id=85550 It was a valid error. Due to a switch in the CGI parameter name, the hidden variable was no longer being store if the page was called from the old application route through exp-analyze.mason. This old route has been disabled. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=945353&group_id=16453 |
From: <ja...@op...> - 2004-05-10 13:36:08
|
Hey All, I'm committing my latest checkpoint code to SVN. It works, it should install, but it has the broken float => int conversion code. I want to archive this code for future reference before I strip it all out, and redo it. You can do an SVN update, but there are still schema changes pending. Cheers, jas. |
From: <ja...@op...> - 2004-05-10 12:51:10
|
Hey Harry, First, and email to deal with the high-level stuff. I think that it is probably a good idea to do the float => int conversion for all the expression data: 1) MBA data tables 2) scratch table 3) DBA data table 1) and 3) are archival tables - they exist so that researchers have a permanent record of their experiment, they are not for analysis and processing - that's why I was focusing on 2) - the scratch table is where all the processing will be taking place. Besides these data tables, there are only three other float columns in the DB: * default_spot_conc in *both* Array and ArrayDesign - this is an oops. * tolerance in ArrayManufacture. As I wrote earlier, I would like to remove the default_spot_conc column until someone screams for its return - that leaves only the tolerance column. This is *not* likely to be a very heavily used column - it is there only for MAGE commpatibility. I don't think we should play games converting it back and forth. So in principal, I think that converting all data table floats to ints is fine. Cheers, jas. |
From: Harry M. <hj...@ta...> - 2004-05-09 19:05:22
|
Hi Jason, et al, OK - I follow this, but I guess my question is why are we storing the numbers as floats in the DB at all? I understood that we were going to change EVERYTHING to ints, not just the scratch table. (And note the conversion in the annotations so that if a user wanted the data exported as originally supplied, the export routine could do a (float(#)/1000) on the way out.) If I understand your note, GeneX originally stored the data as floats and then copied them to ints in the scratch table. This is one way of doing it, but it seems to beg the question that this approach introduces data bloat where we could really benefit from storing the incoming/original data as Ints. Then we can simply ignore any internal conversions back and forth. And I would guess that eventually, if genex starts gaining users, people are going to want to start querying the DB WITHOUT copying to a scratch fiel, simply because the data involved will start to get very large so that the preparation time (conversion and copying to the scratch) will start to become inhibiting. I say we convert EVERYTHING to INTs on data ENTRY and leave it as such, for the reasons I've mentioned before: - integer comparisons are faster - you can generate indices which is a huge perf gain - integer math is 2-5 faster - integer storage is 1/2 that of floats (and we will hopefully be storing lots of #s, so it's not a trivial advantage). I can see where this might be a problem - you will start to lose accuracy on chained analyses using integer division unless you're careful to capture the results as floats, but that's where the annotation of protocols comes in. And I'm assuming that GeneX's main role is storing/retrieving/presenting the data, not the analyses themselves so much, but even so, the analyses will have to track the math transforms and as such, the analytical routines will have to deal with this. I admit that I don't know how excel treats numbers in the spreadsheet, but I would think it's just like OOo, which treats them as floats, so there is a semi-automatic conversion that goes on there. This does need to be thought out with regard to DB input and output, but since the performance and storage advantages are very large and since any storage of results has to be documented anyway, I see it as an overall advantage. I see nothing wrong with telling people that we're doing this upfront - this is a huge advantage AFAIK, in fact we have to so they can comment on it. Jason and Jenn, what are the implications for the schema if EVERYTHING is an int? anything more than doing a bulk FLOAT -> INT replacement thru the code tree? Well, obviously kidding, but what are the overall implications? I'm thinking back to that >1000-fold increase in performance gain I saw when I hacked the tables to INTs. hjm Jason E. Stewart wrote: > Hey Harry, > > My simple-minded approach to converting float => int in the scratch > table broke. > > I created an INSERT rule that simply multiplied the floats by 1000, > and a SELECT rule that divided them by 1000. This was working > fine. But then I realized that I hadn't actually changed the schema - > the columns were still defined as floats. So I modified the schema and > retested and suddenly it wasn't working correctly. > > During the insert we were losing all the data after the decimal point > - Postgres was rounding it before multiplying by 1000. I realized why > - because the columns are defined as ints, and so my INSERT trigger > fires *after* postgres has tried to store a float into an integer > column, and was forced to truncate it. This really surprised me, > actually, because of how views work I thought Postgres would do that. > > So that leaves us with a couple of choices as I see it: > 1) Find a new way to keep the conversion transparent - try doing > something with temporary floating point columns. I don't have a > clue how to do this, though. > 2) Admit that we can't hide it from the application developers, and > force them to do the multiplication themselves. > 3) Do something fancy at the API level. > > I think that I will likely do both 2) and 3) - provide some API > mechanism that people can use if they want to, but let application > developers do it however they want. > > This is kind of nice, because now if an application is fetching data > from a scratch table and if it doesn't need to convert the int back to > a float, that means fetching the data is much faster. > > I'll get working on it tomorrow. > > Cheers, > jas. > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-05-09 17:28:50
|
Hey Harry, My simple-minded approach to converting float => int in the scratch table broke. I created an INSERT rule that simply multiplied the floats by 1000, and a SELECT rule that divided them by 1000. This was working fine. But then I realized that I hadn't actually changed the schema - the columns were still defined as floats. So I modified the schema and retested and suddenly it wasn't working correctly. During the insert we were losing all the data after the decimal point - Postgres was rounding it before multiplying by 1000. I realized why - because the columns are defined as ints, and so my INSERT trigger fires *after* postgres has tried to store a float into an integer column, and was forced to truncate it. This really surprised me, actually, because of how views work I thought Postgres would do that. So that leaves us with a couple of choices as I see it: 1) Find a new way to keep the conversion transparent - try doing something with temporary floating point columns. I don't have a clue how to do this, though. 2) Admit that we can't hide it from the application developers, and force them to do the multiplication themselves. 3) Do something fancy at the API level. I think that I will likely do both 2) and 3) - provide some API mechanism that people can use if they want to, but let application developers do it however they want. This is kind of nice, because now if an application is fetching data from a scratch table and if it doesn't need to convert the int back to a float, that means fetching the data is much faster. I'll get working on it tomorrow. Cheers, jas. |
From: <ja...@op...> - 2004-05-09 17:15:17
|
Hey All, When the DB is created during the installation, a full set of test data is now loaded for the entire BioAssay hierarchy. Three MBA's are loaded this create's three PBA's during the load. Then one of those MBA's is copied into the Scratch table which creates a DBA. Doing this uprooted a whole slew of issues which are now all resolved (as you may have noticed with all the bug reports flying past). So we now get a DB that you can actually do something with - unfortunately about all that means is you can copy data into the scratch table - there are still no processing tools hooked up during the install. But at least there is data!!! Sadly, the data is only artificiala 20 spot/chip data, but that can easily be changed to load a yeast data set on install once the system is working properly (see next email). Cheers, jas. |
From: SourceForge.net <no...@so...> - 2004-05-09 17:09:06
|
Bugs item #950867, was opened at 2004-05-09 11:09 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=950867&group_id=16453 Category: Bio::Genex Perl API Group: Genex-2 Status: Open Resolution: None Priority: 7 Submitted By: Jason E. Stewart (jason_e_stewart) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: should use Diane's UML mechanism for source/target fkeys Initial Comment: In tables like BioAssayLink that have two fkeys to the same table, one a 'child' and the other a 'parent', we need rename the fkey to something appropriate so that when navigating the hierarchy we use consistent naming. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=950867&group_id=16453 |
From: SourceForge.net <no...@so...> - 2004-05-09 17:06:38
|
Bugs item #950864, was opened at 2004-05-09 11:06 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=950864&group_id=16453 Category: Mason GUI Group: Genex-2 Status: Open Resolution: None Priority: 7 Submitted By: Jason E. Stewart (jason_e_stewart) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: obj2table.mason should show MTO fkey name, not table name Initial Comment: In the MTO fkey columns, it is showing the table name - when it has multiple fkeys from the same table, it is impossible to determine which column is to which fkey. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=950864&group_id=16453 |