genex-dev Mailing List for GeneX Gene Expression Database (Page 20)

Status: Beta

Brought to you by: jason_e_stewart, jwweller, mangalam

genex-dev — GeneX developers

You can subscribe to this list here.

2001	Jan (135)	Feb (57)	Mar (84)	Apr (43)	May (77)	Jun (51)	Jul (21)	Aug (55)	Sep (37)	Oct (56)	Nov (75)	Dec (23)
2002	Jan (32)	Feb (174)	Mar (121)	Apr (70)	May (55)	Jun (20)	Jul (23)	Aug (15)	Sep (12)	Oct (58)	Nov (203)	Dec (90)
2003	Jan (37)	Feb (15)	Mar (14)	Apr (57)	May (7)	Jun (40)	Jul (36)	Aug (1)	Sep (56)	Oct (38)	Nov (105)	Dec (2)
2004	Jan	Feb (117)	Mar (69)	Apr (160)	May (165)	Jun (35)	Jul (7)	Aug (80)	Sep (47)	Oct (23)	Nov (8)	Dec (42)
2005	Jan (19)	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2007	Jan	Feb	Mar	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 18 19 20 21 22 .. 109 > >> (Page 20 of 109)

[GeneX-dev] [ genex-Bugs-942793 ] eliminate warnings on array-design-load.pl

From: SourceForge.net <no...@so...> - 2004-04-27 05:23:06

Bugs item #942793, was opened at 2004-04-26 23:23
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=942793&group_id=16453

Category: Administrative Apps
Group: Genex-2
Status: Open
Resolution: None
Priority: 8
Submitted By: Jason E. Stewart (jason_e_stewart)
Assigned to: Jason E. Stewart (jason_e_stewart)
Summary: eliminate warnings on array-design-load.pl

Initial Comment:
The program spews huge amounts of warnings about not
finding location for feature or not finding zone for
feature for every spot on the chip.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=942793&group_id=16453

Re: [GeneX-dev] errors in loading

From: <ja...@op...> - 2004-04-27 05:14:34

Hey Harry,

Great feedback! It's excellent finally having someone rip apart all
this before it becomes permanantly embedded into the system.

Harry J Mangalam <hj...@ta...> writes:

> Couldn't find location for feature with reporter: YPR204W 
> at /usr/local/genex/bin/array-design-insert.pl line 271.
> Couldn't find zone for feature with reporter: YPR204W 

These are both warnings, and they should be disabled unless --debug is
set, otherwise as you note, the applications spews output:

	  warn "Couldn't find location for feature with reporter: $rep_name";
	  warn "Couldn't find zone for feature with reporter:
	  $rep_name";

I will fix this immediately. 

> at /usr/local/genex/bin/array-design-insert.pl line 293.
> Handling FR Maps took: 2 wallclock secs ( 0.76 usr +  0.21 sys =  0.97 CPU)
> 1 ArrayDesigns to insert
> Inserting array design: gmu.edu/ArrayDesign/Affymetrix/YE6100/2003-11-03
> Found 6191 features.
> In Bio::Genex::ArrayDesign::insert_db: pkey = 
> DBD::Pg::db do failed: ERROR:  duplicate key violates unique constraint 
> "genex_array_design_name_key" 
> at /usr/local/share/perl/5.8.3/Bio/Genex/ArrayDesign.pm line 1049.
> Bio::Genex::ArrayDesign::insert_db: SQL=[[INSERT INTO GENEX_ARRAY_DESIGN 
> (technology_type,provider_con_fk,name,ad_pk) VALUES 
> ('Spotted','253','gmu.edu/ArrayDesign/Affymetrix/YE6100/2003-11-03','38067')]], 
> DBI=[[ERROR:  duplicate key violates unique constraint 
> "genex_array_design_name_key"]] 
> at /usr/local/share/perl/5.8.3/Bio/Genex/ArrayDesign.pm line 1053.
> /usr/local/genex/bin/array-design-insert.pl: couldn't insert record for 
> layout: gmu.edu/ArrayDesign/Affymettrix/YE6100/2003-11-03, DBI=[[ERROR:  
> duplicate key violates unique constraint
> "genex_array_design_name_key"]]

Looks like you tried to load the same array design twice, which
violates the unique clause on the 'name' column. 

The name for the ArrayDesign is take from the XML file:

      <PhysicalArrayDesign identifier="MAGE:LSID:gmu.edu:ArrayDesign:test"

This will be used for the name:

    # use the identifier as the name (for now)
    my $id = $mage_ad->getIdentifier();
    my $name = $id;
    $name =~ s/^MAGE:LSID://;

So we strip off the 'MAGE:LSID' part and use the rest as the name. It
is very simple to add a new field to the loader application that
enables a user to over-ride the name in the file and use their own
name.

I will fix this immediately.

> Loading qt-dim file 
> I copied the qtdim-mas5-short.xml file to the upload dir and could see it as 
> one of the choices, but when I tried to load it, I got the following error:
> ERROR: Page: qtdim-load.html:title not found
> positioned above the expts that I had already loaded
> does this error mean that it really failed or is this more like a compiler 
> warning?

This is a wierd Mason problem (can't call it a bug). For all the
<form>'s, I used to hard-code that 'action' URL, and so whenever I had
to move things around, I had to go and change all the URL's - this
gets to be a pain. So I switched to using some of the Mason internals
that in turn use the mod_perl internals to tell you what the current
component path is. This helped a lot. Unfortunately, I just discovered
that Mason seems to be adding extra information to the component path,
so instead of 'qtdim-load.html' you get 'qtdim-load.html:title'
... because the qtdim-load.html component defines a 'title'
method??!!?

So I have to use a different Mason approach that doesn't add this
extra info to the path. 

I will fix this immediately.

Also - all the qtdim files are pre-loaded at DB installation time - if
you check the output of make install (or if you look in
Perl/scripts/gendb.pl - the script that actually does all the work),
you will see which qtdim files get loaded.

> While I think I could have entered the metadata required for loadin the data, 
> I gave up after one too many partial data entries was destroyed by having to 
> dig one layer deeper to initialize one more dropdown entry (fill this out and 
> then come back and hit reload to lose all your previous work).

Why is the data getting lost on reload? What a pain. On Mozilla and
Galeon, partially filled-in form data is preserved on reload.

> While I know this data has to be filled out, having the user fill it
> out on the fly, losing his previous data in the process is not the
> way to do it.Most of the pages leading to these extraq-work options
> are available on the 'Annotation tab/page' sidebar and all we have
> to do is make sure that the user has to fill out the proper pages
> before she starts loading the expt.

I agree there are *lots* of other ways to do it, and that we need to
test all this stuff out and come up with better ways. Most of these
applications were written by me more than a year ago and have been
sitting slowly suffering from code rot. I am not wedded to any of the
existing applications, in fact I think most of it sucks, but they were
all written by me very hastily in order to scratch a particularily
important itch, and then they were left unused.

I discussed this issue a while back. The existing data insertion
applications are very low-level - they let you enter a single row in a
single table. There is no workflow tool that guides you step-by-step
through the process. This is even more obvious when you enter a
protocol into the DB, in that case you must enter data into the
following tables:
* Protocol
* ProtocolStep
* Procedure
* Parameter - this one has many entries

So you get to fill out too many forms. A higher-level workflow is needed.

> Jason - is it possible to present the links to the data pages with a suffix 
> that shows how many entries are associated with each Expt, Group, 
> Experimental Factor, Citation, etc) - ie:
>
> Experiment (45)
> Group (7)
> Experimental Factor ((8)
> Citation (3)
> etc

Sorry, Harry, I don't understand this. Could you use a single form and
give me the example using that, say Array insert.

> and hilite the ones that still need to be filled out.  (I guess this is the 
> way you're using the red/black labels in the data loading pages - red = needs 
> to be filled out; black = optional.

Yes - the red fields are listed as NOT NULL in the DB, so the
application requires that they be filled in.

> Also, it would be helpful to show (in a top panel) those entries that already 
> exist for each 'Insert new ... ' page, so that new users don;t try to 
> duplicate already existing entries.  This could also be used as a path to 
> edit the entries already in the DB if required.

The problem with this is that as the DB fills up with data, this list
of existing entries becomes huge. I want the insert new entry pages to
be just that. We can certainly create a different component that
combines the table browser (which shows the existing entries) together
with a second half which is the 'insert new entry' portion. But I
would still like to have one component that is just a clean insert
only component.

At the moment, I don't think the insert forms are embeddable, but that
is easy to fix.

Thanks again for plugging through this and thinking critically how we
can improve the existing tools.

Cheers,
jas.

[GeneX-dev] errors in loading

From: Harry J M. <hj...@ta...> - 2004-04-26 22:01:03

taking off again where I left off - loading more data. 

When I loaded the arraydesign xml, it spews out a huge amount of error 
messages (so much so that konqueror becomes very sluggish in scrolling, 
selecting, etc) and ends in the following error:

..
Couldn't find location for feature with reporter: YPR204W 
at /usr/local/genex/bin/array-design-insert.pl line 271.
Couldn't find zone for feature with reporter: YPR204W 
at /usr/local/genex/bin/array-design-insert.pl line 293.
Handling FR Maps took: 2 wallclock secs ( 0.76 usr +  0.21 sys =  0.97 CPU)
1 ArrayDesigns to insert
Inserting array design: gmu.edu/ArrayDesign/Affymetrix/YE6100/2003-11-03
Found 6191 features.
In Bio::Genex::ArrayDesign::insert_db: pkey = 
DBD::Pg::db do failed: ERROR:  duplicate key violates unique constraint 
"genex_array_design_name_key" 
at /usr/local/share/perl/5.8.3/Bio/Genex/ArrayDesign.pm line 1049.
Bio::Genex::ArrayDesign::insert_db: SQL=[[INSERT INTO GENEX_ARRAY_DESIGN 
(technology_type,provider_con_fk,name,ad_pk) VALUES 
('Spotted','253','gmu.edu/ArrayDesign/Affymetrix/YE6100/2003-11-03','38067')]], 
DBI=[[ERROR:  duplicate key violates unique constraint 
"genex_array_design_name_key"]] 
at /usr/local/share/perl/5.8.3/Bio/Genex/ArrayDesign.pm line 1053.
/usr/local/genex/bin/array-design-insert.pl: couldn't insert record for 
layout: gmu.edu/ArrayDesign/Affymettrix/YE6100/2003-11-03, DBI=[[ERROR:  
duplicate key violates unique constraint "genex_array_design_name_key"]]

Loading qt-dim file 
I copied the qtdim-mas5-short.xml file to the upload dir and could see it as 
one of the choices, but when I tried to load it, I got the following error:
ERROR: Page: qtdim-load.html:title not found
positioned above the expts that I had already loaded

does this error mean that it really failed or is this more like a compiler 
warning?

While I think I could have entered the metadata required for loadin the data, 
I gave up after one too many partial data entries was destroyed by having to 
dig one layer deeper to initialize one more dropdown entry (fill this out and 
then come back and hit reload to lose all your previous work).

While I know this data has to be filled out, having the user fill it out on 
the fly, losing his previous data in the process is not the way to do it.Most 
of the pages leading to these extraq-work options are available on the 
'Annotation tab/page' sidebar and all we have to do is make sure that the 
user has to fill out the proper pages before she starts loading the expt.

Jason - is it possible to present the links to the data pages with a suffix 
that shows how many entries are associated with each Expt, Group, 
Experimental Factor, Citation, etc) - ie:

Experiment (45)
Group (7)
Experimental Factor ((8)
Citation (3)
etc

and hilite the ones that still need to be filled out.  (I guess this is the 
way you're using the red/black labels in the data loading pages - red = needs 
to be filled out; black = optional.

Also, it would be helpful to show (in a top panel) those entries that already 
exist for each 'Insert new ... ' page, so that new users don;t try to 
duplicate already existing entries.  This could also be used as a path to 
edit the entries already in the DB if required.

hjm


-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... 
            <<plain text preferred>>

[GeneX-dev] [OT] Spam

From: <ja...@op...> - 2004-04-26 16:47:18

Harry writes:

> - I'm just about to sign up to support the draconian penalties for
> messing with email, just like messing with the US mail.  I now get
> more 'the mail from xxx was refused because it contained a virus'
> than real mail by ~10x.

It's amazing how bad things have gotten so quickly. 

Check out the procmail-sanitizer project. I have an ISP that gives me
a BSD shell account, and I have a bunch of filters that I run. When
SoBig was at it's height I was getting something like 200 viral emails
a day - and most of those were bounce messages from email servers that
detected the virus but didn't bother stripping the virus from the
bounce!!! After installing the sanitizer, I get almost nothing.

But...

I still get flooded by SPAM. SpamAssassin has cut it immensely. It
sorts into three piles (most-likely-spam, probably-spam,
the-rest). I've only found a few false positives in a few thousand
emails so far. I only get a handful of spam that gets through the
heuristics, but I have to check the spam for false positives every few
days (yuck!). Here's the latest statistics:

  1233 /home/jasons/procmail/spool.spool
  317 almost-certainly-spam.spool
  401 probably-spam.spool

so I get around 1200 real messages for every 700 spam, but that's
because 90% of the 'real' email is from mailing lists. I'm thinking
about using the bogofilter bayesian filter - the numbers look even
better than SpamAssassin.

Cheers,
jas.

Re: [GeneX-dev] Topics: ints to represent data; OntologyEntry; next priorities

From: <ja...@op...> - 2004-04-26 16:44:53

Harry Mangalam <hj...@ta...> writes:

> In terms of output, I guess we can present the user a choice - have
> fast output as native ints or slower output as transformed fps.
> Hopefully the amount of numbers flowing out of the db will be orders
> of mag less than what went in, so this shouldn't be a huge
> computational overhead.

Ok. Seems like users will want a choice (you know what users are like).

> I sent out an email (direct to you I now see, not to the dev list) -
> did you get it?  

Sorry, was that 'you' as in me, or 'you' as in Jennifer?

Cheers,
jas.

Re: [GeneX-dev] Topics: ints to represent data; OntologyEntry; next priorities

From: Harry M. <hj...@ta...> - 2004-04-26 16:14:02

Hi All,

For optimum storage savings, we would use regular ints (2^32, as Jason 
indicated) which gives us a range of ~4,000,000,000 (or -2B to + 2B) - I think 
that would be enough precision, even for radiolabel counts (cpm - what kind of 
numbers do you get for those kinds of counts?)

There is also the matter of integer (int) vs float point (fp) math operations - 
int operations are about 2-7 times faster than fp operations, so if we keep the 
numbers as ints, we gain significantly in storage AND math speed, not to mention 
db indexing ops.

In terms of ratios and other native fp representations, I like the idea of 
having a per-array or per-expt annotations that describes the transform just 
like any other piece of metadata.  This could be represented as a 'to_original' 
protocol, just like any other transformation.

In terms of output, I guess we can present the user a choice - have fast output 
as native ints or slower output as transformed fps.  Hopefully the amount of 
numbers flowing out of the db will be orders of mag less than what went in, so 
this shouldn't be a huge computational overhead.

I sent out an email (direct to you I now see, not to the dev list) - did you get 
it?  My email has been amazingly screwed up, much like UCI's and GMU's I 
understand, with delays of up to several days, mail returned tho I know the 
addressee is correct, domains, filtered out due to spamming - I'm just about to 
sign up to support the draconian penalties for messing with email, just like 
messing with the US mail.  I now get more 'the mail from xxx was refused because 
it contained a virus' than real mail by ~10x.

Harry

Jason E. Stewart wrote:
> Hi Jennifer,
> 
> Excellent feedback, thanks!!
> 
> jw...@gm... writes:
> 
> 
>>Representing the data as integers is OK as long as we store the
>>information about the transformation that was used and also maintain
>>information about the units of measurement and precision of
>>measurement.
> 
> 
> We have two choices:
> 1) Advertise that we do this, and return all floats as ints whenever a
>    query is made.
> 2) Don't advertise that we do this, and return floats whenever a query
>    is made. It is posible for us to make all of this totally
>    transparently so that the users never know.
> 
>>From my brief investigation 2) actually seems easier to implement than
> 1) seems. A deeper look my prove me a fool.
> 
> I haven't thought about units - is there such a thing for this type of
> data? I've never seen anyone use units (but my knowledge is agreeably
> small). 
> 
> If we advertise what we do, then we have to be clear about the
> precision. 
> 
> 
>>I submit that the units and precision are both annotations of the
>>measurement that apply to all spots on a chip. They would apply to
>>all chips that belong to one 'physical' experiment, but not to
>>virtual experiments, and since we want to allow such entities, I
>>think the information should get stored on a per-chip basis. 
> 
> 
> The agreement within MGED is that there really is no such thing as a
> physical-experiment or virtual-experiment - that the basic unit is a
> single hybridization measurement and all experiments are virtual -
> comprised of mixing and combining hybridizations.
> 
> 
>>Is the transformation to an integer an analytical process that gets
>>stored as a protocol associated with the value? It will be a
>>chip-wide protocol.
> 
> 
> <designer-hat-on>
> This does seem like a nice, clean idea.
> </designer-hat-on>
> 
> <developer-hat-on>
> It could be possible to do it this way, but from looking at how the
> tables are constructed it looks pretty tough and messy to do it this
> way. It would take a major redesign of some basic stuff, and that
> always invites the potential of adding bugs that break too many
> critical pieces. 
> </developer-hat-on>
> 
>>I cannot determine if even two places after the decimal is realistic
>>(maybe for some radiation-based assays), but we could set that value
>>as the default. 
> 
> 
> Yeah, hard to tell. 
> 
> 
>>So we could have a default transformation of multiplication by 100
>>and trim post-punct, and for the data size itslef a default integer
>>string of 8 positions for the value itself (up to 10,000,000
>>fluorescent units, cpm, ...). This takes care of raw data. Almost
>>all munged data is much smaller and does a better job of reflecting
>>actual measurement precision.
> 
> 
> I'm not quite sure I understand. We would multiply by 100, but why
> worry about how many positions the data takes up? We will just use
> integer storage which gives us 2^32 or about 10^10 (10 positions)
> without having to think.
> 
> 
>>People will use the munged data much more often than the raw data, so
>>after the first round of standardization and normalization (which
>>might even become an automated pipeline as part of upload) it seems
>>like most people will query for normalized data sets - that is, this
>>is where indexing will make a difference. However, a lot of normalized
>>data is stored as a ratio or a log ratio of some type, represented by
>>decimal fractions and this is what the analysis tools that expect to
>>use those values need- we can store as integers if we remember what
>>transformation we use and convert back on display and export. I don't
>>know the 'cost' of doing this process over and over versus the clost
>>of storing the data as floats.
> 
> 
> Yeah, I see. With ratio's that's a big difference. 
> 
> 
>>Up for discussion. Not that I'm one to ask for feedback. I have read
>>through the discussion list mail and it is good to catch up on what
>>you've all been doing - great work folks!
> 
> 
> ;-)
> 
> Thank you!!
> 
> 
>>2. I do not understand all of the implications of using the MAGE
>>OntologyEntry model. 
> 
> 
> Welcome to the club :-(
> 
> 
>>I have a sense of how to use controlled vocabularies since I have
>>put some minimal information in to the first version of genex. The
>>annotation systems I have used include Ontology as one type of a
>>controlled vocabulary - often not very complete (like GO), that can
>>be added to using specific lists relevant to the subfield.
>>
>>The DatabaseEntry link is really useful for taxonomy, GO, and PIR
>>among others and we need it regardless of the acceptance of full Mage
>>compatibility.
>>
>>Is MO another way of saying MGED Ontology? 
> 
> 
> Yes, sorry.
> 
> 
>>I only check it occasionally, but it doesn't seem to be maturing
>>very fast. 
> 
> 
> I don't follow it at all, because I find it very confusing, but it is
> the *only* ontology available for the MAGE data. So we get it, like it
> or not.
> 
> 
>>Are there tools that we could apply based on this that could be
>>easily adapted if we had the same data model? 
> 
> 
> There was a tool just posted today to the Ontology mailing list called
> 'Pedro' developed by Kevin Gardner in the UK.
> 
> 
>>Are there examples to look at? 
> 
> 
> I believe so. All the info is available of the OWG site:
> 
>    mged.sf.net
> 
> 
>>I have to say that if ArrayExpress is the working model I really
>>dislike it. 
> 
> 
> AE is AE. They're just a DB that makes public expression data
> available. They also have people who push on MAGE and MO a lot, so
> they drive the standards decisions quite a bit.
> 
> 
>>I have learned to navigate it well, but a lot of the representations
>>are quite unintuitive to anyone who hasn't has quite a bit of
>>exposure to MAGE and MAGE-ML.
> 
> 
> I think that goes hand-in-hand with the complexity of MAGE. MAGE sucks
> so any system built using a one-to-one reprentation of MAGE is going
> to fight pretty hard not to suck, too. 
> 
> 
>>I would like to be sure that implementing the MAGE OntologyEntry
>>part of the model in GeneX does not constrain the way controlled
>>vocabularies can be used to annotate sequences
> 
> 
> The opposite - it opens up how annotations can be abused. But after
> thinking about this for a week, I realize that this is probably not a
> big issue. People will be dependant on the tools we make available, so
> we just won't enable any tool that is too confusing.
> 
> 
>>if this is not a real problem then making the schema change sounds
>>reasonable since it will let people interested in that aspect of the
>>system have something to develop on.
> 
> 
> I agree.
> 
> 
>>3. I totally agree that now that the schema changes are nearly done
>>and the installation has been cleaned up we should perfect the data
>>loading steps next. 
> 
> 
> OK, this is what I've been working on with Harry - but I got seriously
> stuck trying to figure out how to quickly implement the float => int
> conversion. There is no simple hack, sadly. 
> 
> It seems that making it transparent is the easiest way:
> 1) any data inserted into a 'float' column in the scratch table is
>    auto-multiplied by our precision factor (default 1000) and
>    rounded. 
> 2) any data queried from the table is auto-divided by the factor.
> 
> That way data comes in as a float and leaves as a float. This is ugly
> for those cases when users don't mind the data coming back as integers
> - it means they get forced to do divides for a gazillion columns even
>   though they don't need/want it.
> 
> 
>>I have to find Jason's document from a couple of months back and
>>re-read it. 
> 
> 
> An updated version should be available off the documentation tab of
> the workspace. If it's not there complain, and I will rectify it
> immediately - upgrading the workspace is a simply upgrade that loses
> no data.
> 
> 
>>My first question today has been, where is the software expecting to
>>find the Array Design, QT Dimension and actual data files, 
> 
> 
> If you go to the 'MBA Loader' (assuming that you are logged in as
> someone with CURATOR privelege), it will guide you through all the
> steps (in no particular order):
> 
> * make an experiment to hold the data
> * make an Array for the hybridization - which requires:
>   * an ArrayManufacture - with optional Biomaterial information about
>     the spotted stuff on the physical chip
>   * an ArrayDesign - the blueprint of the design with all the DB
>     annotations for the spotted stuff on the chip
> * the FeatureExtractionSoftware used 
> 
> There are two MAGE-ML files required in all this:
> 1) ArrayDesign - for the ArrayDesign
> 2) QuantitationType Dimension - for the FE Software
> 
> I have some (overly) simplistic tools for doing ArrayDesign's - and
> have not yet tested whether the new schema changes enable us to
> finally store a full Affy ArrayDesign.
> 
> There is a nice Java tool for writing QT Dimensions that is part of
> the MAGE Java package - I have been meaning to add this to the genex
> subversion repository for quite some time, but we progress on the Java
> API was stopped I forgot/lost interest in this. The tool is nice if
> your doing small sized QT Dimensions, but I think anyone who wants to
> make a QT Dimension of any large size (say 20 columns of data) is
> going to be really frustrated having to do it point and click -
> remember the curation tool?
> 
> By far, the simplest way that Brandon and I discovered was to write
> small Python or Perl scripts that use MAGEstk to write the MAGE-ML for
> you based on a simple tab-delimited text file.
> 
> 
>>and what types of names are expected of these files? 
> 
> 
> There is no longer a restriction on names. You simply move any files
> you want to load into your user-defined upload directory (e.g. if
> logged in as jweller: /usr/local/genex/uploads/jweller)
> 
> 
>>And what is the non-Mason way to load data?
> 
> 
> For each of the Mason GUI applications: 'MBA Load', 'QT Dimension Load', and
> 'ArrayDesign Load' there is a perl script underneath that actually
> does all the work. The GUI's just collect the user input and call the
> perl script. So if someone wants to bulk load a whole bunch of stuff,
> it can all be called on the command line. Or if someone wants to write
> a Java GUI, or a Python GUI, etc they can all call the perl script to
> automate the actual loading.
> 
> 
>>Jason has provided a number these files for us over the years and so
>>the first thing I want to try is using them to reload the data sets
>>that worked before.
> 
> 
> The test files are all there, but I admit sheepishly they are badly
> scattered. I will take a few minutes tomorrow and document exactly
> what they all are.
> 
> 
>>I remember having problems keeping track of various types of Affy
>>ArrayDesign files, because it was hard to tell what organism it was
>>for and what subset of the data was meant. (AffyShortMAS5, or
>>something like that). 
> 
> 
> Agreed. It was all done rather haphazardly at the last minute because
> of some crisis need for data to present to someone, so the
> organization suffered rather badly.
> 
> 
>>So even though it sounds nitpicky maybe we should decide on a naming
>>strategy and then a way to let folks know what they are picking
>>(link to a description, I assume). Or has this happened in my
>>absence?
> 
> 
> I think it's an excellent idea! One thing that may help is if I add a
> link to 'Sample Data' under the 'Documenation' section?
> 
> 
>>The next phase will need to be to allow people to modify the Array
>>Design and QT Dimension files themselves as required by new layouts,
>>both for new arrays from commercial sources and the in-house types.
>>Since Jason has the most experience with this, I would really like to
>>hear from him about how he thinks this should be developed.
> 
> 
> See the above about QT Dimensions. From experience, I believe that we
> (the genex developers) can provide default QT Dimensions for all
> common FE Software packages, including the templates in a
> tab-delimited data file, and if users want to modify it, they can
> simply change the template in Excel and re-run the QT Dimension output
> program to produce MAGE-ML, and then re-run the FE Software loader.
> 
> For ArrayDesign's we will have to study this a *lot* more than I
> have. It is very easy to make simple ArrayDesign's - and that is all
> my tools have every done. Making complex ArrayDesign's that include
> Reporter's and CompositeSequence entries that actually include links
> to GenBank or EMBL accession numbers requires some standardization on
> our part. I need some examples of how people other than Affy encode
> this information.
> 
> 
>>It seems like having Hrishi and Durga follow through on the prototype
>>directions really put the pressure on to find all kinds of bugs and
>>problems that didn't happen when everyone was focused in different
>>areas.  
> 
> 
> Yes - having people use the system is the best way to find bugs.
> 
> 
>>Any objections to using the same model to get data loading
>>working really well? 
> 
> 
> None at all. 
> 
> 
>>We have half a dozen different types of data already in hand (well
>>on disc actually), although we don't have ArrayDesigns for all of
>>them.
> 
> 
> All of the data that Karen provided has ArrayDesign's. If there is
> more data that I don't know about, then it probably doesn't have
> ArrayDesign's. 
> 
> 
>>I'd like to know what you all think - let me know what information is
>>needed, suggested priorities and activities and let's come up with the
>>kind of task lists we used in March.
> 
> 
> OK. More on this later.
> 
> Thanks again for all the feedback!!
> 
> Cheers,
> jas.
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
> For a limited time only, get FREE Ground shipping on all orders of $35
> or more. Hurry up and shop folks, this offer expires April 30th!
> http://www.thinkgeek.com/freeshipping/?cpg=12297
> _______________________________________________
> Genex-dev mailing list
> Gen...@li...
> https://lists.sourceforge.net/lists/listinfo/genex-dev
> 

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta...
             <<plain text preferred>>

Re: [GeneX-dev] recent Mason hiccup

From: <ja...@op...> - 2004-04-26 15:23:43

Harry Mangalam <hj...@ta...> writes:

> sorry for the delay in checking - on a complete re-install, this does
> still not work for me.  I still get the same error with r1143.
> hjm
>
>>> error: Error during compilation of
>>> /var/www/genex/mason/nologin/autohandler:
>>> Global symbol "$genex_dir" requires explicit package name at
>>> /var/www/genex/mason/nologin/autohandler line 80.
                        ^^^^^^^^^

Doh!! (this is an internal message to myself to actually *read* the
bug reports). I fixed login/autohandler but forgot
nologin/autohandler. 

Fixed in r1147.... no really...

Cheers,
jas.

[GeneX-dev] ontology entry and schema changes for version 2; Nimblegen data; ESTAP documentation

From: <jw...@gm...> - 2004-04-26 14:38:17

Attachments: jweller.vcf

Hi again Jason,
1. I have looked at something else called Pedro, that has to do with mass spec data and protein annotation, I'll check out this one and get back to the list. It sounds as if adding the OntologyEntry entity to the genex schema is reasonable. Is this the last major change (in addition to integer type changes) that you foresee for now? It would be good to test and document, with a lot of data loading, for a version 2 (real) release.

2. Karen has some Nimblegen data, which uses short oligos in a similar matter to Affymetrix, we can get you the encoding information for that. And I will see what I can get from Motorola, which provides 35-mers. I have quite a bit of C elegans data, some from SMD and some from the AE sites. They provide very different types of details of array design information support, but I can certainly forward it, post it, or point to it, depending on preferences. As usual with GeneX, there are parallel needs - support for lab people making their own arrays, and support for analysts who want to combine local datasets with those of others in order to test or extend conclusions. Let me think about it a little and propose a set useful for this release that everyone can critique. I think that what we have now is actually pretty useful if we add Affy data at the probe level, but it is a bit scattered in my records(and mind) at the moment.

3. Interface design. I can forward or post a pdf of user documentation I provided on the ESTAP pipeline. It worked well for our users, so it might be a design guide for the genex interface - in this case I mean a lot of the meta data, especially contact information and experimental protocols. If posted where should it go? It is not directly relevant to GeneX so you may not want me to subvert subversion. In the meantime I will e-mail it to you and Harry and hope I do not fill up your mailbox.

Cheers,
Jennifer

PS I just got called into a meeting, it will follow in about an hour.

[GeneX-dev] SVN update

From: <ja...@op...> - 2004-04-26 14:14:23

Hey All,

I replaced Connect.pm.in with Connect.pm so if you do an svn update
you will get an error that Connect.pm is in the way. So if you do:

  rm Perl/Bio-Genex/Genex/Connect.pm

svn update will work fine.

Sorry for the inconvenience,
jas.

[GeneX-dev] integers and virtual experiments; integers and data; data size and error checking

From: <jw...@gm...> - 2004-04-26 14:04:52

Attachments: jweller.vcf

Hi Jason,
1. Here I'm going on about virtual experiments: How does the MGED group propose to rationalize experiments performed on different platforms if units are not recorded? There is a big difference between counts per minute, fluorescent units and molecules per um^2 (all of which are possible units). If all experiments were recorded as concentrations you would still need to record the units, or convert to a common unit and allow conversions (but this does not happen). The unit is associated with the physical bioassay but carried into the measured bioassay and may be part of the derived bioassay unless ratios are used. I realize that a lot of folks are sloppy about this, but it is an important concept. Each measurement should have the units and precision associated, units and most often precision will be the same for all measurements from a chip, so storing them per spot measurement is ridiculous. It is possible if more than one sensitivity setting is used that the precision might change

2. Non-advertising of conversion - I agree that this can be an 'internal' process. It is likely that some users will want to apply the correct precision limits, but I think that can be by querying for the precision, applying a normalization protocol and perhaps storing that data set in the working space if this is something that will be needed often. A couple of students have asked me about this: The values we get in output files are most often from software that takes the output of scanners and massages various sums and trimming of extreme values of the signal to give a spot value. Most of these software programs do not take into account the precision precision and just provide a value that has been truncated for convenience. So the decimal places of the measured value relate to the software not the measurement device.

3. The only reason for checking the size of the data is to see if it is sensible - there are values that can't happen and it seems reasonable to keep clearly flawed results from being stored. We also have to remember that there will be negative values in the raw data.

4. You mentioned that storing a transformation protocol for every conversion to integer value would be messy. Is it less messy to store it as a system-wide protocol? Or how are you thinking it should be handled?

More in next note.

Cheers,
Jennifer

Re: [GeneX-dev] Topics: ints to represent data; OntologyEntry; next priorities

From: <ja...@op...> - 2004-04-25 18:53:51

Hi Jennifer,

Excellent feedback, thanks!!

jw...@gm... writes:

> Representing the data as integers is OK as long as we store the
> information about the transformation that was used and also maintain
> information about the units of measurement and precision of
> measurement.

We have two choices:
1) Advertise that we do this, and return all floats as ints whenever a
   query is made.
2) Don't advertise that we do this, and return floats whenever a query
   is made. It is posible for us to make all of this totally
   transparently so that the users never know.

From my brief investigation 2) actually seems easier to implement than
1) seems. A deeper look my prove me a fool.

I haven't thought about units - is there such a thing for this type of
data? I've never seen anyone use units (but my knowledge is agreeably
small). 

If we advertise what we do, then we have to be clear about the
precision. 

> I submit that the units and precision are both annotations of the
> measurement that apply to all spots on a chip. They would apply to
> all chips that belong to one 'physical' experiment, but not to
> virtual experiments, and since we want to allow such entities, I
> think the information should get stored on a per-chip basis. 

The agreement within MGED is that there really is no such thing as a
physical-experiment or virtual-experiment - that the basic unit is a
single hybridization measurement and all experiments are virtual -
comprised of mixing and combining hybridizations.

> Is the transformation to an integer an analytical process that gets
> stored as a protocol associated with the value? It will be a
> chip-wide protocol.

<designer-hat-on>
This does seem like a nice, clean idea.
</designer-hat-on>

<developer-hat-on>
It could be possible to do it this way, but from looking at how the
tables are constructed it looks pretty tough and messy to do it this
way. It would take a major redesign of some basic stuff, and that
always invites the potential of adding bugs that break too many
critical pieces. 
</developer-hat-on>

> I cannot determine if even two places after the decimal is realistic
> (maybe for some radiation-based assays), but we could set that value
> as the default. 

Yeah, hard to tell. 

> So we could have a default transformation of multiplication by 100
> and trim post-punct, and for the data size itslef a default integer
> string of 8 positions for the value itself (up to 10,000,000
> fluorescent units, cpm, ...). This takes care of raw data. Almost
> all munged data is much smaller and does a better job of reflecting
> actual measurement precision.

I'm not quite sure I understand. We would multiply by 100, but why
worry about how many positions the data takes up? We will just use
integer storage which gives us 2^32 or about 10^10 (10 positions)
without having to think.

> People will use the munged data much more often than the raw data, so
> after the first round of standardization and normalization (which
> might even become an automated pipeline as part of upload) it seems
> like most people will query for normalized data sets - that is, this
> is where indexing will make a difference. However, a lot of normalized
> data is stored as a ratio or a log ratio of some type, represented by
> decimal fractions and this is what the analysis tools that expect to
> use those values need- we can store as integers if we remember what
> transformation we use and convert back on display and export. I don't
> know the 'cost' of doing this process over and over versus the clost
> of storing the data as floats.

Yeah, I see. With ratio's that's a big difference. 

> Up for discussion. Not that I'm one to ask for feedback. I have read
> through the discussion list mail and it is good to catch up on what
> you've all been doing - great work folks!

;-)

Thank you!!

> 2. I do not understand all of the implications of using the MAGE
> OntologyEntry model. 

Welcome to the club :-(

> I have a sense of how to use controlled vocabularies since I have
> put some minimal information in to the first version of genex. The
> annotation systems I have used include Ontology as one type of a
> controlled vocabulary - often not very complete (like GO), that can
> be added to using specific lists relevant to the subfield.
>
> The DatabaseEntry link is really useful for taxonomy, GO, and PIR
> among others and we need it regardless of the acceptance of full Mage
> compatibility.
>
> Is MO another way of saying MGED Ontology? 

Yes, sorry.

> I only check it occasionally, but it doesn't seem to be maturing
> very fast. 

I don't follow it at all, because I find it very confusing, but it is
the *only* ontology available for the MAGE data. So we get it, like it
or not.

> Are there tools that we could apply based on this that could be
> easily adapted if we had the same data model? 

There was a tool just posted today to the Ontology mailing list called
'Pedro' developed by Kevin Gardner in the UK.

> Are there examples to look at? 

I believe so. All the info is available of the OWG site:

   mged.sf.net

> I have to say that if ArrayExpress is the working model I really
> dislike it. 

AE is AE. They're just a DB that makes public expression data
available. They also have people who push on MAGE and MO a lot, so
they drive the standards decisions quite a bit.

> I have learned to navigate it well, but a lot of the representations
> are quite unintuitive to anyone who hasn't has quite a bit of
> exposure to MAGE and MAGE-ML.

I think that goes hand-in-hand with the complexity of MAGE. MAGE sucks
so any system built using a one-to-one reprentation of MAGE is going
to fight pretty hard not to suck, too. 

> I would like to be sure that implementing the MAGE OntologyEntry
> part of the model in GeneX does not constrain the way controlled
> vocabularies can be used to annotate sequences

The opposite - it opens up how annotations can be abused. But after
thinking about this for a week, I realize that this is probably not a
big issue. People will be dependant on the tools we make available, so
we just won't enable any tool that is too confusing.

> if this is not a real problem then making the schema change sounds
> reasonable since it will let people interested in that aspect of the
> system have something to develop on.

I agree.

> 3. I totally agree that now that the schema changes are nearly done
> and the installation has been cleaned up we should perfect the data
> loading steps next. 

OK, this is what I've been working on with Harry - but I got seriously
stuck trying to figure out how to quickly implement the float => int
conversion. There is no simple hack, sadly. 

It seems that making it transparent is the easiest way:
1) any data inserted into a 'float' column in the scratch table is
   auto-multiplied by our precision factor (default 1000) and
   rounded. 
2) any data queried from the table is auto-divided by the factor.

That way data comes in as a float and leaves as a float. This is ugly
for those cases when users don't mind the data coming back as integers
- it means they get forced to do divides for a gazillion columns even
  though they don't need/want it.

> I have to find Jason's document from a couple of months back and
> re-read it. 

An updated version should be available off the documentation tab of
the workspace. If it's not there complain, and I will rectify it
immediately - upgrading the workspace is a simply upgrade that loses
no data.

> My first question today has been, where is the software expecting to
> find the Array Design, QT Dimension and actual data files, 

If you go to the 'MBA Loader' (assuming that you are logged in as
someone with CURATOR privelege), it will guide you through all the
steps (in no particular order):

* make an experiment to hold the data
* make an Array for the hybridization - which requires:
  * an ArrayManufacture - with optional Biomaterial information about
    the spotted stuff on the physical chip
  * an ArrayDesign - the blueprint of the design with all the DB
    annotations for the spotted stuff on the chip
* the FeatureExtractionSoftware used 

There are two MAGE-ML files required in all this:
1) ArrayDesign - for the ArrayDesign
2) QuantitationType Dimension - for the FE Software

I have some (overly) simplistic tools for doing ArrayDesign's - and
have not yet tested whether the new schema changes enable us to
finally store a full Affy ArrayDesign.

There is a nice Java tool for writing QT Dimensions that is part of
the MAGE Java package - I have been meaning to add this to the genex
subversion repository for quite some time, but we progress on the Java
API was stopped I forgot/lost interest in this. The tool is nice if
your doing small sized QT Dimensions, but I think anyone who wants to
make a QT Dimension of any large size (say 20 columns of data) is
going to be really frustrated having to do it point and click -
remember the curation tool?

By far, the simplest way that Brandon and I discovered was to write
small Python or Perl scripts that use MAGEstk to write the MAGE-ML for
you based on a simple tab-delimited text file.

> and what types of names are expected of these files? 

There is no longer a restriction on names. You simply move any files
you want to load into your user-defined upload directory (e.g. if
logged in as jweller: /usr/local/genex/uploads/jweller)

> And what is the non-Mason way to load data?

For each of the Mason GUI applications: 'MBA Load', 'QT Dimension Load', and
'ArrayDesign Load' there is a perl script underneath that actually
does all the work. The GUI's just collect the user input and call the
perl script. So if someone wants to bulk load a whole bunch of stuff,
it can all be called on the command line. Or if someone wants to write
a Java GUI, or a Python GUI, etc they can all call the perl script to
automate the actual loading.

> Jason has provided a number these files for us over the years and so
> the first thing I want to try is using them to reload the data sets
> that worked before.

The test files are all there, but I admit sheepishly they are badly
scattered. I will take a few minutes tomorrow and document exactly
what they all are.

> I remember having problems keeping track of various types of Affy
> ArrayDesign files, because it was hard to tell what organism it was
> for and what subset of the data was meant. (AffyShortMAS5, or
> something like that). 

Agreed. It was all done rather haphazardly at the last minute because
of some crisis need for data to present to someone, so the
organization suffered rather badly.

> So even though it sounds nitpicky maybe we should decide on a naming
> strategy and then a way to let folks know what they are picking
> (link to a description, I assume). Or has this happened in my
> absence?

I think it's an excellent idea! One thing that may help is if I add a
link to 'Sample Data' under the 'Documenation' section?

> The next phase will need to be to allow people to modify the Array
> Design and QT Dimension files themselves as required by new layouts,
> both for new arrays from commercial sources and the in-house types.
> Since Jason has the most experience with this, I would really like to
> hear from him about how he thinks this should be developed.

See the above about QT Dimensions. From experience, I believe that we
(the genex developers) can provide default QT Dimensions for all
common FE Software packages, including the templates in a
tab-delimited data file, and if users want to modify it, they can
simply change the template in Excel and re-run the QT Dimension output
program to produce MAGE-ML, and then re-run the FE Software loader.

For ArrayDesign's we will have to study this a *lot* more than I
have. It is very easy to make simple ArrayDesign's - and that is all
my tools have every done. Making complex ArrayDesign's that include
Reporter's and CompositeSequence entries that actually include links
to GenBank or EMBL accession numbers requires some standardization on
our part. I need some examples of how people other than Affy encode
this information.

> It seems like having Hrishi and Durga follow through on the prototype
> directions really put the pressure on to find all kinds of bugs and
> problems that didn't happen when everyone was focused in different
> areas.  

Yes - having people use the system is the best way to find bugs.

> Any objections to using the same model to get data loading
> working really well? 

None at all. 

> We have half a dozen different types of data already in hand (well
> on disc actually), although we don't have ArrayDesigns for all of
> them.

All of the data that Karen provided has ArrayDesign's. If there is
more data that I don't know about, then it probably doesn't have
ArrayDesign's. 

> I'd like to know what you all think - let me know what information is
> needed, suggested priorities and activities and let's come up with the
> kind of task lists we used in March.

OK. More on this later.

Thanks again for all the feedback!!

Cheers,
jas.

[GeneX-dev] Topics: ints to represent data; OntologyEntry; next priorities

From: <jw...@gm...> - 2004-04-24 17:54:32

Attachments: jweller.vcf

1. I spoke to Harry about this last week but did not get the topic written to the list. Representing the data as integers is OK as long as we store the information about the transformation that was used and also maintain information about the units of measurement and precision of measurement.

I submit that the units and precision are both annotations of the measurement that apply to all spots on a chip. They would apply to all chips that belong to one 'physical' experiment, but not to virtual experiments, and since we want to allow such entities, I think the information should get stored on a per-chip basis. Is the transformation to an integer an analytical process that gets stored as a protocol associated with the value? It will be a chip-wide protocol.

Harry suggested using a factor of 1000 to convert from float to integer. I have done a quick survey of raw output from several types of playforms, and so far the highest values seem to be below 100,000 'units'. Some output shows up to four places after the decimal, but there is no indication that this reflects measurement precision. I cannot determine if even two places after the decimal is realistic (maybe for some radiation-based assays), but we could set that value as the default. So we could have a default transformation of multiplication by 100 and trim post-punct, and for the data size itslef a default integer string of 8 positions for the value itself (up to 10,000,000 fluorescent units, cpm, ...). This takes care of raw data. Almost all munged data is much smaller and does a better job of reflecting actual measurement precision.

People will use the munged data much more often than the raw data, so after the first round of standardization and normalization (which might even become an automated pipeline as part of upload) it seems like most people will query for normalized data sets - that is, this is where indexing will make a difference. However, a lot of normalized data is stored as a ratio or a log ratio of some type, represented by decimal fractions and this is what the analysis tools that expect to use those values need- we can store as integers if we remember what transformation we use and convert back on display and export. I don't know the 'cost' of doing this process over and over versus the clost of storing the data as floats.

Up for discussion. Not that I'm one to ask for feedback. I have read through the discussion list mail and it is good to catch up on what you've all been doing - great work folks!

2. I do not understand all of the implications of using the MAGE OntologyEntry model. I have a sense of how to use controlled vocabularies since I have put some minimal information in to the first version of genex. The annotation systems I have used include Ontology as one type of a controlled vocabulary - often not very complete (like GO), that can be added to using specific lists relevant to the subfield.

The DatabaseEntry link is really useful for taxonomy, GO, and PIR among others and we need it regardless of the acceptance of full Mage compatibility.

Is MO another way of saying MGED Ontology? I only check it occasionally, but it doesn't seem to be maturing very fast. Are there tools that we could apply based on this that could be easily adapted if we had the same data model? Are there examples to look at? I have to say that if ArrayExpress is the working model I really dislike it. I have learned to navigate it well, but a lot of the representations are quite unintuitive to anyone who hasn't has quite a bit of exposure to MAGE and MAGE-ML.

I would like to be sure that implementing the MAGE OntologyEntry part of the model in GeneX does not constrain the way controlled vocabularies can be used to annotate sequences - if this is not a real problem then making the schema change sounds reasonable since it will let people interested in that aspect of the system have something to develop on.

3. I totally agree that now that the schema changes are nearly done and the installation has been cleaned up we should perfect the data loading steps next. I have to find Jason's document from a couple of months back and re-read it. My first question today has been, where is the software expecting to find the Array Design, QT Dimension and actual data files, and what types of names are expected of these files? And what is the non-Mason way to load data?

Jason has provided a number these files for us over the years and so the first thing I want to try is using them to reload the data sets that worked before.

I remember having problems keeping track of various types of Affy ArrayDesign files, because it was hard to tell what organism it was for and what subset of the data was meant. (AffyShortMAS5, or something like that). So even though it sounds nitpicky maybe we should decide on a naming strategy and then a way to let folks know what they are picking (link to a description, I assume). Or has this happened in my absence?

The next phase will need to be to allow people to modify the Array Design and QT Dimension files themselves as required by new layouts, both for new arrays from commercial sources and the in-house types. Since Jason has the most experience with this, I would really like to hear from him about how he thinks this should be developed.

It seems like having Hrishi and Durga follow through on the prototype directions really put the pressure on to find all kinds of bugs and problems that didn't happen when everyone was focused in different areas. Any objections to using the same model to get data loading working really well? We have half a dozen different types of data already in hand (well on disc actually), although we don't have ArrayDesigns for all of them.

I'd like to know what you all think - let me know what information is needed, suggested priorities and activities and let's come up with the kind of task lists we used in March.

Cheers,
Jennifer

Re: [GeneX-dev] recent Mason hiccup

From: Harry M. <hj...@ta...> - 2004-04-23 16:40:10

sorry for the delay in checking - on a complete re-install, this does still not 
work for me.  I still get the same error with r1143.
hjm

Jason E. Stewart wrote:
> Harry J Mangalam <hj...@ta...> writes:
> 
> 
>>                  System error
>>
>>error: 
>>Error during compilation of /var/www/genex/mason/nologin/autohandler:
>>Global symbol "$genex_dir" requires explicit package name 
>>at /var/www/genex/mason/nologin/autohandler line 80.
>>
>>context: ... 
>>
>>
>>76: </%method>
>>77: <%method footer>
>>78: </%method>
>>79: <%method stylesheet>
>>80: @import "/<% $genex_dir %>/css/workspace.css";
>>81: </%method>
>>82: 
> 
> 
> Damn. I did a speed-checkin on this file (i.e. no testing - shame on
> me). 
> 
> Fixed in r1143.
> 
> jas.
> 
> PS. the problem was a vaiable scoping issue. $genex_dir was set during
> a Mason <%init> section and I was attempting to access in a mason
> <%method> section - variables defined in <%init> are not visible in a
> <%method>, so I had to move it into a <%shared> section, and viola,
> problem solved.
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Genex-dev mailing list
> Gen...@li...
> https://lists.sourceforge.net/lists/listinfo/genex-dev
> 

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta...
             <<plain text preferred>>

Re: [GeneX-dev] recent Mason hiccup

From: <ja...@op...> - 2004-04-22 08:01:46

Harry J Mangalam <hj...@ta...> writes:

>                   System error
> 
> error: 
> Error during compilation of /var/www/genex/mason/nologin/autohandler:
> Global symbol "$genex_dir" requires explicit package name 
> at /var/www/genex/mason/nologin/autohandler line 80.
> 
> context: ... 
> 
> 
> 76: </%method>
> 77: <%method footer>
> 78: </%method>
> 79: <%method stylesheet>
> 80: @import "/<% $genex_dir %>/css/workspace.css";
> 81: </%method>
> 82: 

Damn. I did a speed-checkin on this file (i.e. no testing - shame on
me). 

Fixed in r1143.

jas.

PS. the problem was a vaiable scoping issue. $genex_dir was set during
a Mason <%init> section and I was attempting to access in a mason
<%method> section - variables defined in <%init> are not visible in a
<%method>, so I had to move it into a <%shared> section, and viola,
problem solved.

[GeneX-dev] [ genex-Bugs-939774 ] array-edit accepts a blank name

From: SourceForge.net <no...@so...> - 2004-04-22 03:59:52

Bugs item #939774, was opened at 2004-04-21 21:59
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=939774&group_id=16453

Category: Mason GUI
Group: Genex-2
Status: Open
Resolution: None
Priority: 8
Submitted By: Jason E. Stewart (jason_e_stewart)
Assigned to: Jason E. Stewart (jason_e_stewart)
Summary: array-edit accepts a blank name

Initial Comment:
the name attribute is required, but array-edit is
accepting the emtpy string as a valid name

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=939774&group_id=16453

[GeneX-dev] recent Mason hiccup

From: Harry J M. <hj...@ta...> - 2004-04-21 19:24:36

Jason,

After a completely new install, I unexpectedly get the following error at 
botttom when going to the default page.  it looks like recent Mason code is 
broken.  the 
direct cause looks like an undeclared var, but where it cam from I'm not sure.  
the line referred to is:
...
<%method footer>
</%method>
<%method stylesheet>
@import "/<% $genex_dir %>/css/workspace.css";   <----------------
</%method>
<END OF FILE>

                  System error

error: 
Error during compilation of /var/www/genex/mason/nologin/autohandler:
Global symbol "$genex_dir" requires explicit package name 
at /var/www/genex/mason/nologin/autohandler line 80.

context: ... 


76: </%method>
77: <%method footer>
78: </%method>
79: <%method stylesheet>
80: @import "/<% $genex_dir %>/css/workspace.css";
81: </%method>
82: 


code stack: 
 /usr/share/perl5/HTML/Mason/Interp.pm:317
 /usr/share/perl5/HTML/Mason/Interp.pm:481
 /usr/share/perl5/HTML/Mason/Component.pm:323
 /usr/share/perl5/HTML/Mason/Request.pm:314
raw error
 

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... 
            <<plain text preferred>>

[GeneX-dev] SVN update

From: <ja...@op...> - 2004-04-21 17:37:40

Hey All,

To ensure getting a proper update please do the following *before*
running 'svn update':

  $ rm -rf Perl/scripts/
  $ rm -rf Mason
  $ svn update

and you will have a working set of source code. 

I repeat: if you do not remove the Perl/scripts and Mason directories,
you will get an svn error and you will have a broken source code
distribution.

The reason for this is that I've removed all the old unnecessary .in
files from the distribution, and so there are name clashes. You can
either checkout a new version (which is more work), or you can remove
the files that clash like I suggested above.

My appologies for doing this in the middle of your working time - but
just as I was committing my work I lost my internet connection after I
had committed half the files and it just came back. 

Again my appologies, but all should be stable at this point. Thanks
for bearing with me.

Cheers,
jas.

Re: [GeneX-dev] SVN commit

From: <ja...@op...> - 2004-04-21 12:07:13

ja...@op... (Jason E. Stewart) writes:

> I'm checking in some unstable code for testing on genex2 - please
> don't commit or update for another 10 minutes or so until I mail the
> list again.

OK.

I've moved all .pl.in files in the Perl/scripts directory to be just
plain old .pl files. SVN will complain if you attempt to do an update
because the old .pl files will be in the way so do this:

  $ rm Perl/scripts/*
  $ svn update

and everthing should be fine.

I've made all substitution parameters in the .pl files be runtime
calls to Config.pm and they all use /usr/bin/perl in the shebang
line. I added code to Install that checks if the install is using
/usr/bin/perl and if so, just copy all .pl files to their installation
directory as-is. If using a different interpreter then modify the
shebang line during the copy.

This way all Perl scripts are usable as is without needing to do a
'make substitute' - this is a big headache removed for us developers
(me??) - who was far too often editing the .pl file instead of the
.pl.in file and losing his changes on the next 'make substitute'.

My next target are all the .in files in the Mason directory.

Cheers,
jas.

[GeneX-dev] SVN commit

From: <ja...@op...> - 2004-04-21 11:50:32

Hey All,

I'm checking in some unstable code for testing on genex2 - please
don't commit or update for another 10 minutes or so until I mail the
list again.

Sorry,
jas.

[GeneX-dev] Re: [Genex-users] Durga : update

From: <ja...@op...> - 2004-04-21 07:42:04

hde...@gm... writes:

> Also the reason i wrote it this way b'coz i have gone through the
> INSTALL .doc and checked the 'logic' and i guess my recipe/go and do
> this and then this recipe/steps helps,if i am not wrong Durga was
> working on this,so this might help her arrange/word the INSTALL.doc in
> a better way.

I agree with Harry and I would like to have INSTALL be a text-only
document. I'm just old fasioned and grew up in a GNU world - every
project has README (text file) and an INSTALL (text file).

It may be possible to make it an HTML documnet tutorial for installing
the system - which is a more verbose version of the INSTALL file, but
I would like to have the main INSTALL file be text only.

Both HTML, text, and XML version well in subversion - with binary
formats like .doc you're under the mercy of Microsoft's internal
versioning system which is not as flexible.

Cheers,
jas.

Re: [GeneX-dev] after uninstall.pl and re-install

From: <ja...@op...> - 2004-04-20 17:35:19

Harry J Mangalam <hj...@ta...> writes:

> !! System Error: No such file or directory @ line: 263  (cp 
> Mason/syshandler /var/www/genex/mason)
> 
> !! System Error: No such file or directory @ line: 263  (cp 
> Mason/comps/autohandler-top.mason /var/www/genex/mason/comps)
> 
> !! System Error: No such file or directory @ line: 263  (cp 
> Mason/comps/autohandler-bottom.mason /var/www/genex/mason/comps)

already corrected r1119

> Were these removed and the above is a result of not removing them from the  
> MANIFEST?  Looks like it but I'll wait for your answer before removing them 
> from MANIFEST.

Correct. Already fixed.

> ALSO!! the mason workspace URLs have been incompletley changed to reflect the 
> latest changes in hierarchy.  Login is fine, but clicking on 'Home' link in 
> the footer tries to go to the old link:
> http://bodi.tacgi.com/genex/mason/workspace/workspace.html
> not where it should:
> http://bodi.tacgi.com/genex/mason/login/workspace/workspace.html

Correct. Fixed in r1117

> and in fact many of the URLs in the sidebar (all the ones in the Admin tab) 
> link to old URLs that use 'workspace' to root off of:
> http://bodi.tacgi.com/workspace/edit/groupsec-edit.html?mode=edit
> 
> and none of the sidebar links in the Docs tab work either - the docs are 
> there, but the links don't point to them.
> 
> Ditto the Annotations tab.

Correct. Fixed in r1117

Run an 'svn update' and let me know if anything is broken still.

Cheers,
jas.

PS. If we configure SVN to email the genex-dev list whenever we do
commits, we'll avoid this problem. Shall I look into that?

[GeneX-dev] after uninstall.pl and re-install

From: Harry J M. <hj...@ta...> - 2004-04-20 17:15:56

Hi Jas,

as above, after running the uninstall and then installing from scratch, I got 
only these errors

!! System Error: No such file or directory @ line: 263  (cp 
Mason/syshandler /var/www/genex/mason)

!! System Error: No such file or directory @ line: 263  (cp 
Mason/comps/autohandler-top.mason /var/www/genex/mason/comps)

!! System Error: No such file or directory @ line: 263  (cp 
Mason/comps/autohandler-bottom.mason /var/www/genex/mason/comps)

Were these removed and the above is a result of not removing them from the  
MANIFEST?  Looks like it but I'll wait for your answer before removing them 
from MANIFEST.

ALSO!! the mason workspace URLs have been incompletley changed to reflect the 
latest changes in hierarchy.  Login is fine, but clicking on 'Home' link in 
the footer tries to go to the old link:
http://bodi.tacgi.com/genex/mason/workspace/workspace.html
not where it should:
http://bodi.tacgi.com/genex/mason/login/workspace/workspace.html

and in fact many of the URLs in the sidebar (all the ones in the Admin tab) 
link to old URLs that use 'workspace' to root off of:
http://bodi.tacgi.com/workspace/edit/groupsec-edit.html?mode=edit

and none of the sidebar links in the Docs tab work either - the docs are 
there, but the links don't point to them.

Ditto the Annotations tab.
-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... 
            <<plain text preferred>>

[GeneX-dev] install docs (was Re: [Genex-users] Durga : update)

From: <ja...@op...> - 2004-04-20 17:14:57

Harry Mangalam <hj...@ta...> writes:

> the crontab mention is a left-over from the analysis stuff that was
> cleaned out.  I'll modify that to remove mention of it.

Hey Harry,

I think we should keep the genex_reaper.pl around. The mason apps
generate quite a bit of output in /tmp that needs to get cleaned up.

Cheers,
jas.

[GeneX-dev] make update?

From: <ja...@op...> - 2004-04-20 17:12:13

Hey Harry,

One function that didn't make it from the old Configure script into
recon.pl yet is the ability to do an update using an old config.

For us developers it isn't an issue - we just keep doing 'svn update'
to pull down the new files on top of the old ones. But for users, they
will be downloading tarballs, and running 'make configure' from the
new directory. Since we already have all the things we need in the
Config.pm file, I'd rather not have to keep around the GeneX.config
file from version to version. 

Since we have Config.pm all we need to do is something like:

  eval "require Bio::Genex::Config";
  if ($@) { 
    # no Config.pm => new install
  } else {
    # Config.pm already exists => update
  }

The only tricky bit happens if we add new options to the GeneX.config
file. We can either have the users file out a new GeneX.config file,
which I'd rather not do, or we'll have to prompt the users for the
info dynamically.

Cheers,
jas.

[GeneX-dev] Schema diagrams in SVN

From: <ja...@op...> - 2004-04-20 17:02:17

Hey All,

I've just added the schema diagrams under the documentation tab.

Do an 'svn update' and a 'make install_files' as root then go to:

  http://<host>/genex/mason/nologin/docs/docs.html

and check out the diagrams under the 'Implementation Documents' nav
section. 

Cheers,
jas.

54 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 18 19 20 21 22 .. 109 > >> (Page 20 of 109)