You can subscribe to this list here.
2001 |
Jan
(135) |
Feb
(57) |
Mar
(84) |
Apr
(43) |
May
(77) |
Jun
(51) |
Jul
(21) |
Aug
(55) |
Sep
(37) |
Oct
(56) |
Nov
(75) |
Dec
(23) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(32) |
Feb
(174) |
Mar
(121) |
Apr
(70) |
May
(55) |
Jun
(20) |
Jul
(23) |
Aug
(15) |
Sep
(12) |
Oct
(58) |
Nov
(203) |
Dec
(90) |
2003 |
Jan
(37) |
Feb
(15) |
Mar
(14) |
Apr
(57) |
May
(7) |
Jun
(40) |
Jul
(36) |
Aug
(1) |
Sep
(56) |
Oct
(38) |
Nov
(105) |
Dec
(2) |
2004 |
Jan
|
Feb
(117) |
Mar
(69) |
Apr
(160) |
May
(165) |
Jun
(35) |
Jul
(7) |
Aug
(80) |
Sep
(47) |
Oct
(23) |
Nov
(8) |
Dec
(42) |
2005 |
Jan
(19) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <ja...@op...> - 2004-08-28 14:57:39
|
Hey All, There are different levels of errors in genex. We need a consistent means of handling each type, primarily so that users (and testers) are given as much information and as little confusion as possible. Type-1 errors are situations under user control. An example is a required input was not provided. This can be fixed by prompting the user to supply the missing field. Type-2 errors are situations out of user control. An example is the bug Hrishi found in which the GUI called a command line application and the command line application caused an error. My problem is that sometimes things that look like Type-2 errors may actually be a Type-1 error. For example, if Hrishi selects the incorrect ArrayManufacture when loading an MBA file, the loader will find the incorrect number of spots. When Hrishi sees that error it could tell him enough to realize his mistake. So I have chosen to give *more* information rather than less information. However, Mason screen dumps are really only useful to programmers, not users. I have them activated to help debugging, and they can be turned off whenever we need to. What should we do with Type-2 errors? Should I simply send the user to a page that says: "We're sorry, but you've encountered an error in the Genex system, your admin is already working on the problem", and then email an error report to the admin address (which they could forward to the genex-user list if they so choose)? Or should we give the full error to the user and let them decide? Cheers, jas. |
From: <ja...@op...> - 2004-08-28 05:11:19
|
jw...@gm... writes: >> jason stewart writes: >> >> The location itself is unique - two locations cannot exist at the same >> place. I recommend using feature id's just using location >> information. Also, you can spot multiple Reporter's on a single >> Feature if you have a multi-wavelength detection system. > > This is why I said that location plus reporter is unique rather than > location or reporter alone.Each is 'unique' but neither alone > conveys unambiguously what you are referring to. Or not? For the Feature the location is the only piece which matters - it is unique and unambiguous. Adding the reporter name is fine if that's what people choose to do, there is nothing wrong with it, it is just extra information. >> CompositeSequence's can also be compiled upon one another if desired - >> e.g. a set of Reporter's can represent the same exon, then the exons >> can represent the same gene, then different genes can represent the >> same gene-family, etc. > > OK, I didn't include this but I will add a sentence to this > effect. How does one distinguish what level is being used? If > nothing has yet been defined do you have a suggestion? As you know > my own interests lie more in the deconstruction than the > super-construction of data layers, so I don't even have in mind a > use case that would inform a standard. I don't have a use-case either. I think that adding an ontology 'type' field to CompositeSequence should work. The idea would be that some analysis tool would have to deconstruct the information. For example, one analysis run would simply investigate the expression level of individual exons in every gene to look for patterns, then a second analysis would look for expression at the gene level. The steps would be: * copy the data set to scratch table (Feature level) * run data processing on data set (Reporter level) * assign exons (CompositeSequence level 1) to each Reporter for the data set * run exon analysis tool and generate report * assign gene ids (CompositeSequence level 2) to data set * run gene analysis tool and generate report Having a type field would make this easier. MAGE doesn't support this, but it seems useful. >>> You must be able to associate an Array Design with your >>> experiment. > >> This is not pedantically true... >> >> Each MBA that is loaded must be associated with an 'Array' - an Array >> is a concrete implementation of an ArrayDesign - it is unique and must >> have some identifier (like a barcode), and it must have an >> ArrayManufacture => information about who made it, when it was made, >> and all of the optional Biomaterial references to each Feature (all >> the LIMS information). > > Now I'm confused: is it possible to load a data file into GeneX > without an associated Array Design? It seems like part of the > required process. If I load the indicated information and there is a > QTD you seem to be saying that a data file can be loaded. Should I > try this? Yes an ArrayDesign must exist, but when you load an MBA you associate it with an Array - not an ArrayDesign. It is the Array that must be associated with the ArrayDesign (see below). The Design is just a blueprint, the Array is the actual instance of the blueprint used for that MBA. So each MBA is likely to have a different Array instance (unless an Array was re-hybridized), but each Array is likely to come from the same ArrayDesign. The original data model missed this and because of this, manufacturing information had to be associated with the ArrayDesign - but what if there are multiple batches made? MAGE caught this omission and included the Array and ArrayManufacture concepts. For any instance of an ArrayDesign you need to be able to track the LIMS information for creating each batch of Array's. But since this information is duplicated across all Array's from a given batch, MAGE includes the ArrayManufacture concept. All the LIMS info (which Biomaterial's were spotted on which Feature's) is included in the ArrayManufacture as is the ArrayDesign, and then each Array is linked to the ArrayManufacture (and thus to the ArrayDesign). I included an email about this some time ago. It can easily be added to the Curator docs. I thought I added it to the docs/ dir but I can't find it. I'll check the email archives. >> Once again, being pedantic - for loading data, the users does not >> require the QT Dim is not required, only the FE Software is. It is the >> FE Software which *requires* the QT Dimension. >> >> The user just selects which FE Software the data is coming from - >> and the QT Dimension and DB data storage table are already >> determined from the FE Software. > > This means that the QTD has to be available for selection, but so > does the FE Software - the user doesn't have to provide it but needs > to be sure it is in the system if s/he wants to load data - is this > correct? That wasn't my plan originally. I thought that the FE Software name was sufficient. But the problem is that there may be multiple QTD's for any given FE SW. To accomplish this in the current data model, you must create two FE SW entries with different names, e.g. QuantArray-jstewart and QuantArray-jweller. Each one can have a different QTD. So when loading data the user needs to know which name is which. I think the current model is broken - but not badly so. I think it would make things cleaner to seperate the FE SW from the QTD, but the real issue is what will be easier for the user? Choosing a single combined FE SW + QTD pair, or having to choose two seperate things? Even if we did seperate the two, we could still provide a single drop-down with the combined FE SW + QTD pairs, instead of two drop-down menus if that is easier for the users. > It sounds like I need to refine the write-up a bit as far as > work-flow goes, to clarify Curator-required activities on which a > users success will depend. I mentioned this in a previous e-mail > anyway. Yes, curator docs are *critcial* for anyone to be able to use Genex. Cheers, jas. |
From: <ja...@op...> - 2004-08-28 05:10:43
|
Harry Mangalam <hj...@ta...> writes: >> Harry Mangalam <hj...@ta...> writes: >> > This is the default Debian config b/c it's required to install/update it >> > automagically. However, it's a nasty security hole if you allow TCP >> > access to the server (which many will want to do) and so I've added a >> > strong warning to the INSTALL doc. >> > >> > I don't know if we want to add a warning about this to the end of the >> > Install script or not... >> >> Hey Harry, >> >> I believe the new authentication system in 7.4 can distinguish between >> a local Unix socket connection and a TCP connection. You can set local >> connections as 'postgres' to be w/o password, but set all remote >> connections to require a password. >> >> Is this true or am I halucinating? >> > No, you're right (the local socket connection can be set up to authenticate > differently than the tcp socket - you may also be hallucinating - I can't > speak to your mental state right now), Probably wise... > but many users will want to enable tcp sockets to allow others to > access the DB directly - so it's not a critical flaw, but one that I > thought should be noted in the INSTALL doc (and now it is). I > realized it after running a nessus scan against my system. I would go one step further, our instructions for opening TCP sockets should include a line the refuses remote connections as postgres. Only someone with root on the machine should be allowed that type of connection. Cheers, jas. |
From: Hrishikesh D. <hde...@gm...> - 2004-08-27 16:47:16
|
Hi All, Here is the error which i get when i try to load a QTD file, login as hrishi (admin,user) and genex_test_curator. The "values" for the fields on the form are: public,public,quantarray,3.0, 38, 5413, quantarray.xml 38, 5413 are the rows from data file 1200-mangle.txt, 38 being the start and 5413 end of data. Is the "values" which i am using a problem!!! I have already i think filed this as a bug. Thanks, Hrishi Genex Job Status Page Your job is finished. The status is: ERROR The error status is: Inappropriate ioctl for device Output of the program: Must specify --abbrev_name USAGE: /usr/local/genex/bin/qtdim-insert.pl [OPTIONS] file_to_read Options: --username=name : the DB username to login as --password=word : the DB password to login with --name=name : the name of the FeatureExtraction SW program --version=num : the version of the FeatureExtraction SW program --feature_identifier_string=string : the string that defines how which columns to use to create a feature identifier from the input --data_start_regex1=regex : regular expression indicating where to begin reading data --data_start_regex2=regex : in case regex1 is not sufficient --data_end_regex=regex : regex indicating when to stop reading optional parameters: --qtdim_only : only insert the QuantitationType info, not anything else --no_db : only output the .xml and .pm files (no DB insert) --ro_groupname : the read only security group NAME --rw_groupname : the read write security group NAME --ro_group_id : the read only security group ID --rw_group_id : the read write security group ID --write_xml : leave the Table.xml file in the temp directory --dbname=name : the name of the DB to use --verbose=n : Print out lots of diagnostic info if 1 and more if 2 --debug : does a rollback instead of a commit --help : print this message |
From: <ja...@op...> - 2004-08-27 05:32:05
|
Harry Mangalam <hj...@ta...> writes: > OK - I added the nec bits to redirect from > http://<your_host>/genex > to > http://<your_host>/genex/Mason/nologin/docs/docs.html > > It gets picked up and installs ok on my system. Let me know if this is NOT a > good place to redirect users to. (should we redirect to the login screen? - > it can be gotten to from the above) I think the docs page is the best one. It doesn't require a login - so the users can read any docs they want. Cheers, jas. > On Wednesday 25 August 2004 8:17 pm, you wrote: >> Harry Mangalam <hj...@ta...> writes: >> > The URL: >> > http://matrix.binf.gmu.edu/genex/mason/login/workspace/workspace.html >> > >> > is the one that is supposed to be used. Where did you see/get this one: >> > >> > http://matrix.binf.gmu.edu/genex/mason/workspace/workspace.html >> > >> > This is the reason I postedthat maybe we should put a referer link at >> > >> > http://host/genex/index.html >> > >> > Otherwise it can be difficult to find the right starting URL. >> >> Agreed. At the moment I don't install *any* pure HTML outside the >> Mason hierarchy, and I think Harry's suggestion is a good one. So >> perhaps if someone is willing to create an el-simplo page with a short >> description and the starting link that would be good. >> >> Cheers, >> jas. > > -- > Cheers, Harry > Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... > <<plain text preferred>> |
From: <jw...@gm...> - 2004-08-26 17:36:57
|
Hi Jason, Thanks for reading through the documentation, I'm glad it is mostly OK. I extracted the parts that still seem to need discussion below. ****************************** The location itself is unique - two locations cannot exist at the same place. I recommend using feature id's just using location information. Also, you can spot multiple Reporter's on a single Feature if you have a multi-wavelength detection system. > This is why I said that location plus reporter is unique rather than location or reporter alone.Each is 'unique' but neither alone conveys unambiguously what you are referring to. Or not? CompositeSequence's can also be compiled upon one another if desired - e.g. a set of Reporter's can represent the same exon, then the exons can represent the same gene, then different genes can represent the same gene-family, etc. >OK, I didn't include this but I will add a sentence to this effect. How does one distinguish what level is being used? If nothing has yet been defined do you have a suggestion? As you know my own interests lie more in the deconstruction than the super-construction of data layers, so I don't even have in mind a use case that would inform a standard. >> You must be able to associate an Array Design with your >> experiment. This is not pedantically true... Each MBA that is loaded must be associated with an 'Array' - an Array is a concrete implementation of an ArrayDesign - it is unique and must have some identifier (like a barcode), and it must have an ArrayManufacture => information about who made it, when it was made, and all of the optional Biomaterial references to each Feature (all the LIMS information). > Now I'm confused: is it possible to load a data file into GeneX without an associated Array Design? It seems like part of the required process. If I load the indicated information and there is a QTD you seem to be saying that a data file can be loaded. Should I try this? Once again, being pedantic - for loading data, the users does not require the QT Dim is not required, only the FE Software is. It is the FE Software which *requires* the QT Dimension. The user just selects which FE Software the data is coming from - and the QT Dimension and DB data storage table are already determined from the FE Software. > This means that the QTD has to be available for selection, but so does the FE Software - the user doesn't have to provide it but needs to be sure it is in the system if s/he wants to load data - is this correct? It sounds like I need to refine the write-up a bit as far as work-flow goes, to clarify Curator-required activities on which a users success will depend. I mentioned this in a previous e-mail anyway. Just to be super-clear - the primary data values are refered to as MeasuredSignal and any data derived from manipulating the primary numbers is a DerivedSignal. > Good point, I am trying to gently introduce the most important parts of the vocabulary, I will add this. Once again, for each MBA file loaded there must be an exact 'Array' and an FE Software file. That means the person who does the hybridization must notate which Biomaterial (sample) was hybridized to which Array. This means that the GUI data loader can only load one MBA at a time - this is cumbersome, but required. If users can stick to some sort of naming convention for their data, we can easily write a batch loader that can automatically associate an MBA data file with the correct Array, but they *must* stick to the convention. >I will try to write up a explanation that includes this: to be honest I am a little confused myself still, so working this out it important. I think it is the work flow that I am not quite clear on. Thanks for the comments, I'll revise and send out another version tonight. Cheers, Jennifer |
From: Harry M. <hj...@ta...> - 2004-08-26 17:28:32
|
On Wednesday 25 August 2004 8:14 pm, Jason E. Stewart wrote: > Harry, did a 'make uninstall' get run before the latest install? It > seems like there are old pages left. If you try to run the old URL you > should just get a generic 'page not found error'. I have a handler > that grabs bad pages under genex/mason/login/* but perhaps I should > add one for just genex/mason/*. No, I don't think so - but according to the installation script, all the previous dirs should have been rm'ed so effectively it should have replaced everything, no? I'll make a note to run a make uninstall before the next installation. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: Harry M. <hj...@ta...> - 2004-08-26 17:15:05
|
No, you're right (the local socket connection can be set up to authenticate differently than the tcp socket - you may also be hallucinating - I can't speak to your mental state right now), but many users will want to enable tcp sockets to allow others to access the DB directly - so it's not a critical flaw, but one that I thought should be noted in the INSTALL doc (and now it is). I realized it after running a nessus scan against my system. hjm On Thursday 26 August 2004 1:56 am, Jason E. Stewart wrote: > Harry Mangalam <hj...@ta...> writes: > > This is the default Debian config b/c it's required to install/update it > > automagically. However, it's a nasty security hole if you allow TCP > > access to the server (which many will want to do) and so I've added a > > strong warning to the INSTALL doc. > > > > I don't know if we want to add a warning about this to the end of the > > Install script or not... > > Hey Harry, > > I believe the new authentication system in 7.4 can distinguish between > a local Unix socket connection and a TCP connection. You can set local > connections as 'postgres' to be w/o password, but set all remote > connections to require a password. > > Is this true or am I halucinating? > > Cheers, > jas. > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-08-26 09:05:33
|
Harry Mangalam <hj...@ta...> writes: > This is the default Debian config b/c it's required to install/update it > automagically. However, it's a nasty security hole if you allow TCP access > to the server (which many will want to do) and so I've added a strong warning > to the INSTALL doc. > > I don't know if we want to add a warning about this to the end of the Install > script or not... Hey Harry, I believe the new authentication system in 7.4 can distinguish between a local Unix socket connection and a TCP connection. You can set local connections as 'postgres' to be w/o password, but set all remote connections to require a password. Is this true or am I halucinating? Cheers, jas. |
From: <ja...@op...> - 2004-08-26 04:04:13
|
Hey Jennifer, jw...@gm... writes: > Test: Logout and browser window > Error: closing browser window does not logout user. > Actions: Pull up login screen and enter name and password > Close window without logging out > Open a new browser window. > Result: you are given an active window showing you being logged > in. It requires active logging out. All session management is done via cookies - this is just like hotmail or any other WWW based application with authentication. In order to have killing the browser window log you out of Genex, I suspect I would also have to make other events do the same thing - e.g. hitting the reload button or hitting the back button. Since that is undesirable, I think that we should just let users know if they are worried about security they should log themselves out. If Harry or anyone else knows how to implement this in a better way, I'd love to know, but currently my knowledge of CGI apps is limited. Cheers, jas. |
From: <ja...@op...> - 2004-08-26 03:19:38
|
Harry Mangalam <hj...@ta...> writes: > The URL: > http://matrix.binf.gmu.edu/genex/mason/login/workspace/workspace.html > > is the one that is supposed to be used. Where did you see/get this one: > > http://matrix.binf.gmu.edu/genex/mason/workspace/workspace.html > > This is the reason I postedthat maybe we should put a referer link at > > http://host/genex/index.html > > Otherwise it can be difficult to find the right starting URL. Agreed. At the moment I don't install *any* pure HTML outside the Mason hierarchy, and I think Harry's suggestion is a good one. So perhaps if someone is willing to create an el-simplo page with a short description and the starting link that would be good. Cheers, jas. |
From: <ja...@op...> - 2004-08-26 03:15:58
|
Hey Jennifer, jw...@gm... writes: > I'm not sure where I got the original link, both Hrishi and I had it > bookmarked. That was the *old* URL - but then I needed to split up the applications between those which required authentication (all the apps that live under */login/*) and those which don't (all the apps that live under */nologin/*). So the old URL is useless. > It is certainly something that seems reasonable to try if you end up > at the matrix mason index page. Harry, did a 'make uninstall' get run before the latest install? It seems like there are old pages left. If you try to run the old URL you should just get a generic 'page not found error'. I have a handler that grabs bad pages under genex/mason/login/* but perhaps I should add one for just genex/mason/*. > The other thing you have to be aware of is that even though you get > the error message screen, you are in fact logged in, and since closing > the browser window does not log you off, you will stay logged in. > > Anyway since the screen looks right, it isn't necessarily clear to a > casual user what has gone wrong. > > Maybe this isn't a 'bug' but it does seem user-relevant. Logging out wouldn't really help anything if the URL is bookmarked wrong. There needs to be a simple URL to the main install genex page that users can use. Also, we aren't likely to be changing URL's again, so that confusion shouldn't happen again. And it is a good warning to be aware of in the future. Cheers, jas. |
From: <ja...@op...> - 2004-08-26 03:15:48
|
Hey Jennifer, jw...@gm... writes: > Test: Logout and browser window > Error: closing browser window does not logout user. > Actions: Pull up login screen and enter name and password > Close window without logging out > Open a new browser window. > Result: you are given an active window showing you being logged > in. It requires active logging out. All session management is done via cookies - this is just like hotmail or any other WWW based application with authentication. In order to have killing the browser window log you out of Genex, I suspect I would also have to make other events do the same thing - e.g. hitting the reload button or hitting the back button. Since that is undesirable, I think that we should just let users know if they are worried about security they should log themselves out. If Harry or anyone else knows how to implement this in a better way, I'd love to know, but currently my knowledge of CGI apps is limited. Cheers, jas. |
From: <ja...@op...> - 2004-08-26 03:14:29
|
Hey Jennifer, jw...@gm... writes: > 1.Array Design > 2.QT Dimension > 3.Data file > * What is the name of each > * What bits do I want to keep > * Where are the bits location and > * what do they mean I want to store in this particular data file. > Why? > When you run a microarray experiment, you are making use of the > microarray design output of a synthesis/software system like > GenePix, and the signal detection/data extraction design of a > detector/software system like Quantarray. Some providers integrate > both, like Affymetrix. The microarray design coordinates information > about the physical entity at a location on a chip (e.g. the DNA on a > spot) and what it is supposed to be interrogating (e.g. the > gene). The signal detection system collects all the output, usually > as an image, and the associated data extraction software presents an > intensity value for a particular wavelength of light at a particular > location (the spot value) and may also automatically compute values > such as ratios. The data extraction software usually replicates some > of the information from the array design such as the probe name and > a gene identifier. To interpret the data from your microarray > experiment you need to associate both of these types of information > with the data set. > > 1. The Array Design should be provided by the array > manufacturer. Affymetrix provides the .cdf and .gif files, GenePix > provides a text file whose contents will depend on the designer of the > custom array. There are three levels of information that may possibly > be provided in such a file. The most basic and absolutely required is > the Feature level, which gives the spot location as a column each for > the s and y coordinates, or sometimes as a subarray with its own row 's' => 'x' (typo) > and column identifier which requires another two columns, and the name > assigned to the attached DNA. Since two things cannot exist at the > same place at the same time, the combination of the location and the > name will be unique, even if the name itself is not. The location itself is unique - two locations cannot exist at the same place. I recommend using feature id's just using location information. Also, you can spot multiple Reporter's on a single Feature if you have a multi-wavelength detection system. > A second level of information is called the Reporter although it is > more properly the BioSequence information, the nucleotide sequence > of the probe resident at a location. For oligonucleotide-based and > PCR-product based probes this sequence should be complete and > unambiguous; in many cases all we will get are unambiguous probe names - this is sufficient - although the complete sequence is best. > in some cases on cDNA arrays the precise sequence of an insert has > not been determined. The third level of information is about the > Composite Sequence, referring to the expected target (mRNA or > gene). CompositeSequence's can also be compiled upon one another if desired - e.g. a set of Reporter's can represent the same exon, then the exons can represent the same gene, then different genes can represent the same gene-family, etc. Most commonly you'll only have one level. > Some platforms, such as the Affymetrix GeneChip, use a set of > oligonucleotides to measure the presence of a target; this set > comprises a composite sequence and the user may extract the > intensity for each member of the set or a single weighted value > across the set. Correct. Each PM/MM pair is a single Reporter, and the group of Reporter's comprise the CompositeSequence for the gene. > There is increasing uncertainty at each level: something with a > particular name has been attached at a given location unless the > chemistry has failed altogether; the attached entity has particular > physical/chemical characteristics unless reagents have been > contaminated or the synthesis device was misaligned; the attached > entity provides information about a particular gene if no competing > sequences exist in the target pool. Only rigorous quality control > and empirical validation allow the investigator to make an > interpretation at the third level as if it were a direct measurement > at the first level. > > You must be able to associate an Array Design with your > experiment. This is not pedantically true... Each MBA that is loaded must be associated with an 'Array' - an Array is a concrete implementation of an ArrayDesign - it is unique and must have some identifier (like a barcode), and it must have an ArrayManufacture => information about who made it, when it was made, and all of the optional Biomaterial references to each Feature (all the LIMS information). > Most users will rely on a programmer or the Curator to > properly extract a usable Array Design and insert it into the > database. Instructions are given in the GeneX Documentation, Array > Design Creation HOWTO and sample programs and files are provided for > guidance. This need only be done once for each build of an array or > chip. When a separate array design file is not available from the > manufacturer it is often possible to extract the minimally needed > information from the output of the signal detection software > (e.g. the Quantarray file). This feels like the appropriate amount of information to give normal users - just enough so they get the big picture, but not too much to freak them out. > An experiment involves hybridizing labeled target to a microarray of > some particular design. After washing the microarray is inserted into > a device that can excite the reporter (if necessary) and measure, at > the required resolution, the signal output of whatever reporter > molecule was attached to the target (usually 32P or one or more > fluorescent dyes). The instrument collects the output signal as an > image. While the image can be stored it is not useful data in this > form, so software is used to extract the signal intensity at each spot > and output it as a unique row in an output file. We refer to this > software program as the Feature Extraction Software. It is possible to > store the original image of the entire array, unextracted (usually as > a 16-but .tif file, but Affymetrix also outputs the proprietary .cdf 'but' => 'bit' (typo) > file). Thus it is possible to re-extract the signal from each spot > using some other image analysis program or parameter settings than was > part of the original platform. For this reason the GeneX form requires > that the user provide information about the version and name of the > feature extraction software used to produce a particular data set. > > 2. The Quantitation Type Dimension represents the categories of > information the investigator wants to retain from the extracted data > file. Once again, being pedantic - for loading data, the users does not require the QT Dim is not required, only the FE Software is. It is the FE Software which *requires* the QT Dimension. The user just selects which FE Software the data is coming from - and the QT Dimension and DB data storage table are already determined from the FE Software. > The output of a scanner, or reader, is the product of some software > package, such as QuantArray 3.0. There are usually two major parts > to these data files. At the top, or 'header' section, you usually > see some rows of general information, perhaps an experiment title > and date and other high-level data. There might also be information > about instrument settings. Farther down will be the data matrix, > which can have a row for every probe or CompositeProbe and from a > few to very many columns. Typically a given row of data for a spot > has columns that give spot identifiers like the x and y position, > the name for the probe and perhaps associated information like a > LocusLink identifier or GenBank accession number, and then numbers > indicating signal intensity values followed by various manipulations > of those values, which we refer to collectively as derived data. Just to be super-clear - the primary data values are refered to as MeasuredSignal and any data derived from manipulating the primary numbers is a DerivedSignal. In order to make QT Dimensions maximally useful the MAGE spec includes a restriction that you can only define one MeasuredSignal for each combination of background/foreground and channel. So you can have a cy3 foreground MeasuredSignal and a cy3 background Measured signal, but you cannot define *two* MeasuredSignal's for cy3 background. This is to enable automatic software processing. If you have two columns that you want to store for cy3 background, you have to choose one of them to represent the MeasuredSignal, and the rest must be defined as one of the other QuantitationTypes, e.g. Ratio, or SpecializeQuantiationType - meaning that generic software can't automatically know it's purpose, but it is still included in the DB. > The intensities are generally given for each wavelength or channel > measured (e.g. for the Cy3 and Cy5 emission optima) for both the > foreground (or spot) and the background. Derived values are > potentially limitless but nearly always include values such as > background subtracted signal, and signal ratios as well as > statistical measures in profusion. Not only will each system have a > different way of doing the calculations and presenting the results, > but many are user-configurable, so that exactly what is calculated > and the order in which information is presented will differ between > users or over time for one user of the same system. The GeneX > database has to be told which of these values you want to store and > the meaning of those values. Thus it is necessary that you sit down > before you do the experiment and familiarize yourself with the > output, and make sure that you know what parts of it you wish to > store and what they mean (i.e. how they are derived). It is possible > to request more output data than you intend to store, of course, but > it is necessary that you describe for the GeneX database exactly > what parts you want to store. This information is what goes into the > Quantitation Type Dimension file. This will be produced by a > programmer or curator to your specification. The instructions are > given in the GeneX Documentation Tab as QT Dimension Creation HOWTO. > > Why not have a completely simple and generic QTD? After all, you would > expect to get a probe_id, Channel 1 signal, Channel 1 background from > any system that exists, with the logical additional categories for > multi-channel systems. It is true that background-subtracted signal > and measures such as variance can be reproduced if the original data > exists, but it is often more convenient to retain values that have > already been computed. Because many derived values are unique to > particular system, the QT Dimension will in fact most often be linked > to a Feature Extraction Software package and a particular user. For > all columns you must specify the type and format of data (the location > as x,y for example) and for derived data types a definition that > defines relationships, including those among other columns: so the > `signal ratio' is more precisely defined as [Channel 1 signal - > Channel 1 background] divided by [Channel 2 signal - Channel 2 > background]. Without this level of detail a new user (or the same user > six months later) will not be clear as to whether or not the > background was subtracted when the ratio was made and which channel > was used as the numerator and which the denominator. Since producing > these files require advance planning and the assistance of the Curator > it is recommended that a single useful format be determined and used > consistently by a user or lab for one platform. Very nice. > 3. Data file output organization can change fairly quickly, sometimes > with every software upgrade. This means that the information included > in the header may take fewer lines or more lines this month than > last. It also means that for programs that are not user-configurable > the position of particular columns (Ch1 signal and Ch1 background, for > instance) may alter. Thus, the investigator must inspect the spot > intensity output file and at data upload describe for the GeneX system > at what row the actual data starts and where it ends, as well as the > relative position of the columns described by the QTD file. > > Preparing to upload data: Ahead of time arrange with the Curator to > provide an Array Design file and a Quantitation Type Dimension file to > the GeneX system. Once again, for each MBA file loaded there must be an exact 'Array' and an FE Software file. That means the person who does the hybridization must notate which Biomaterial (sample) was hybridized to which Array. This means that the GUI data loader can only load one MBA at a time - this is cumbersome, but required. If users can stick to some sort of naming convention for their data, we can easily write a batch loader that can automatically associate an MBA data file with the correct Array, but they *must* stick to the convention. > Know what version of feature extraction software you will use and > any important parameter settings. When you are ready to upload data > check the data file and know what columns represent the values the > QTD expects to store. Know at what row in the file the actual data > begins, and where it ends. From here you can follow the simple > instructions in the Data Loading HOWTO (see Documentation Tab). Thanks Jennifer. I think the level of detail hits the sweet spot - not too much and not too little. Cheers, jas. |
From: <ja...@op...> - 2004-08-26 03:13:33
|
Hey Jennifer, jw...@gm... writes: > I have not had any comments back about the draft of user documentation > for uploading data - probably no other users are out there at the > moment. A couple of things do come up that relate to how we handle > associated files, and this will impact developers and administrators > so I have pulled out that part and summarized my suggestions below. > > I would like to suggest a naming convention for array designs: > clearly, when something is commercial it should have the name of the > company and array including a version number (default is 1), > e.g. AffyHu95Av1. I suggest for custom arrays that we use the name of > the lab PI, four letters for the species, and a date that is > year+month (e.g. CushmanMecr-200209). This is good. Perhaps we can keep the names synchronized, i.e. AffyHu955A-1 instead of AffyHu955Av1. One issue we have not discussed is the use of MAGE identifiers. MAGE requires that any object in a MAGE-ML file that can be referenced by another part of a MAGE-ML file (e.g. an ArrayDesign or a Biomaterial) must have a *unique* identifier so that a single large MAGE-ML file can be split across multiple, smaller files. MAGE does not define *how* to construct these identifiers, but it does suggest using the LSID spec from the I3C. > Is my interpretation of the QT Dimension correct, that is, while it > may relate to the feature extraction software it is not actually > dependent on it? Correct FE Software is just one type of entity in Genex that has a QT Dim associated with it. QT Dim's are used whenever a data matrix is to be stored in Genex. The current configuration of Genex is a bit broken with respect to FE Software and QT Dimension. They should be independant - a single FE software could be used by multiple users and each user might want to save a different set of columns. That would mean when loading data a user must choose both the FE Software *and* the QT Dimension. Currently it doesn't work this way. Currently, a given FE Software package has a hard-coded QT Dimension. So that if two users want to have different QT Dimension's, the second will have to create a different FE Software entry, e.g. QuantArray-3.0-Weller and QuantArray-3.0-Stewart. The reason I do this is the data loader needs to know in which DB table to store the data matrix. I did not make this an attribute of the QT Dimension, instead in my ignorance I made it an attribute of the FE Software - thinking that the output for each software was fixed. I suggest we change this, and make the FE Software and QT Dim independent of one another. That means the data_table_fk column will move from the FE Software table into the QT Dim table. > It seems that there could (and should?) be both quite > generic QT Dimension files and some that are very tailored to the > feature extraction software? It is harder to come up with a meaningful > shorthand for something with so much leeway. Absolutely. In fact, Genex currently installs two very generic QT Dimensions: one with a single value, and one with two values. These are used by the example data files in the scratch table. > It would be very useful to know at least how many columns of data it > handles and which column(s) has the raw signal. When it derives from > a particular feature extraction package that does need to be in the > name, otherwise perhaps it can take the last name of the person who > designed it. And a date would always be useful. So one could have > MAS4d5s3s0-20020823 for a MAS4 derived file with data in 5 columns, > the signal for one channel in the third column, no second channel > and a date - the detail of the date may be overkill, I do think a > year is minimal to account for changes to software packages and > multipl Since the lifetime of the *files* is limited, you can set whatever convention you prefer. In this case the MAGE identifier for the data in the DB is more important. > Jason asked what sorts of associated files we would want to be able to > access and where such files would be available. Certainly the raw data > should be linked, I would think through the experiment name - when > Browsing there should be an option to open or download the associated > files. Originally, I would have agreed that they should be under the experiment, but the schema now allows a single MBA to belong to multiple Experiment's (in accord with MAGE). This is not a problem, we could do one of two things: duplicate the stored files in both places, or make symbolic links from one to the other. Duplicating the information is usually bad - if one gets changed the other may get out of synch. So I would suggest using symbolic links. Or we can associate the archived data with each MBA. I would prefer to do it on an Experiment basis, but I can also see wanting to do it on the level of the MBA. > The raw data constitutes the image file and the extracted > tabular data (should be image file name and a date). Hopefully the > header inthe tabular data tells what image extraction analysis package > and parameter settings were used - we are not a LIMS system and I do > not propose that we try to store all of that. Well - we might not be able to extract such details automatically, but that type of information *should* be entered manually as part of the protocol info for the MBA. > For an Array Design there should be a brief description ("this is a > long-oligo array with 6500 spots that cover 5000 genes and a set of > controls for the organism Mesembryanthemum crystallinum, made by the > UNR genomics core for Dr. John Cushman in August 2002") and also the > actual tab-delimited text file that constitutes the array layout, with > whatever the supporting lab has provided. If people want the layout archived, ok. > Again, we don't inforce what has to be in this file, that is up to > the people who think they may want to refer to it later. I think > these files should be available through links in the Documentation > menu, by Browsing, and also from the drop-down list where one can > select them when setting up GeneX. Since the Curator has to add this > file, the associated files and link could be provided at the same > time. Whatever access point people want, that shouldn't be a problem. We can have multiple access points as well. For example, we should have some sort of report page for every Experiment, and part of that page could be a summary of all the archived files that are part of the Experiment. > For a QT Dimension, again there should be a description (" this is a > generic quantitation type dimension file that takes 7 columns of data, > signal from two channels, background from two channels, > background-subtracted signal from each channel and the ratio of > background-subtracted signal using channel 1 as the numerator and > channel 2 as the denominator"). I don't see the need to archive these. They are stored in sufficient detail in the DB, and can be re-created as MAGE-ML very simply if needed. But, if people want them archived, that's fine too. People should be able to archive *any* file they wish too. > If I read the documentation correctly you are supposed to assign the > column where each type of data occurs in the interface, so this does > not have to be given explicitly here. This should also be available > from the Documentation menu, and as a link from the drop-down menu > where you are allowed to select the file.As above, since the Curator > has to add the QTD s/he can add this file and link at the same time. > > This depends on my having understood how these two files are made and > used, Jason may need to correct this. Your understanding seems right on the mark. Both ArrayDesign's and QT Dimensions are *public* in Genex. I don't know whether this is appropriate for ArrayDesign's - some researchers may not want others to see what probes, etc, they are using in their experiments. If this creates a problem, we may want to address it sooner rather than later. Cheers, jas. |
From: Harry J M. <hj...@ta...> - 2004-08-25 21:13:32
|
Ooops - this was written several days ago but not sent. Use it as you will. These chapters are from a book on how to customize OOo. The book tells how to insert menu items using the Configure gui and how to use OOo's builtin BASIC IDE to write the macros that call out to various other languages. These 2 chapters were useful in discovering WHERE THE HELL THE DOCUMENTATION IS and what this stuff means. I've also been trying to use the OOo2 (aka m638) build to try some of the new advanced stuff, but it's too unstable still for useful work. However the combination of the 2 approaches leads me to think that what we want to be done CAN be done. For example, it is quite easy to insert menu items and bind them to BASIC macros (which in turn can call other languages. You can also use this approach to insert completely new top-level menu entries (as is done in the Thessalonica approach) so we can populate them with R/Bioconductor calls. http://www.hentzenwerke.com/samplechapters/oome_sc01.pdf and http://www.hentzenwerke.com/samplechapters/oome_sc15.pdf -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: SourceForge.net <no...@so...> - 2004-08-24 18:41:22
|
Bugs item #1015474, was opened at 2004-08-24 18:41 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1015474&group_id=16453 Category: Mason GUI Group: None Status: Open Resolution: None Priority: 5 Submitted By: Hrishikesh Deshmukh (hdeshmuk) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: Must specify --abbrev_name for QTD files Initial Comment: Hrishikesh Deshmukh <gen...@ya...> writes: > Now loading the array design file was successful, i > have selected dataset 4 (one of your emails dated: 27 > April,04 has the details). What did you do to fix it? > Second thing which i wanted to do was loading QTD > file, i selected the quantarray.xml file, the data set > file is 1200-mangle-.txt, the values used were: > Must specify --abbrev_name Ok. This is a bug. Could you file a bug report so this doesn't get lost. Cheers, jas. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1015474&group_id=16453 |
From: Harry M. <hj...@ta...> - 2004-08-24 15:58:08
|
Sorry J, I missed looking thru that one - I'll go thru it today for clarity and style, but since I don;t have a data set to load nor the mindset of a loader, I may not be able to fine-tune it. hjm On Tuesday 24 August 2004 7:55 am, jw...@gm... wrote: > I have not had any comments back about the draft of user documentation for > uploading data - probably no other users are out there at the moment. A > couple of things do come up that relate to how we handle associated files, > and this will impact developers and administrators so I have pulled out > that part and summarized my suggestions below. > > I would like to suggest a naming convention for array designs: clearly, > when something is commercial it should have the name of the company and > array including a version number (default is 1), e.g. AffyHu95Av1. I > suggest for custom arrays that we use the name of the lab PI, four letters > for the species, and a date that is year+month (e.g. CushmanMecr-200209). > > Is my interpretation of the QT Dimension correct, that is, while it may > relate to the feature extraction software it is not actually dependent on > it? It seems that there could (and should?) be both quite generic QT > Dimension files and some that are very tailored to the feature extraction > software? It is harder to come up with a meaningful shorthand for something > with so much leeway. It would be very useful to know at least how many > columns of data it handles and which column(s) has the raw signal. When it > derives from a particular feature extraction package that does need to be > in the name, otherwise perhaps it can take the last name of the person who > designed it. And a date would always be useful. So one could have > MAS4d5s3s0-20020823 for a MAS4 derived file with data in 5 columns, the > signal for one channel in the third column, no second channel and a date - > the detail of the date may be overkill, I do think a year is minimal to > account for changes to software packages and multipl > > Jason asked what sorts of associated files we would want to be able to > access and where such files would be available. Certainly the raw data > should be linked, I would think through the experiment name - when Browsing > there should be an option to open or download the associated files. The raw > data constitutes the image file and the extracted tabular data (should be > image file name and a date). Hopefully the header inthe tabular data tells > what image extraction analysis package and parameter settings were used - > we are not a LIMS system and I do not propose that we try to store all of > that. > > For an Array Design there should be a brief description ("this is a > long-oligo array with 6500 spots that cover 5000 genes and a set of > controls for the organism Mesembryanthemum crystallinum, made by the UNR > genomics core for Dr. John Cushman in August 2002") and also the actual > tab-delimited text file that constitutes the array layout, with whatever > the supporting lab has provided. Again, we don't inforce what has to be in > this file, that is up to the people who think they may want to refer to it > later. I think these files should be available through links in the > Documentation menu, by Browsing, and also from the drop-down list where one > can select them when setting up GeneX. Since the Curator has to add this > file, the associated files and link could be provided at the same time. > > For a QT Dimension, again there should be a description (" this is a > generic quantitation type dimension file that takes 7 columns of data, > signal from two channels, background from two channels, > background-subtracted signal from each channel and the ratio of > background-subtracted signal using channel 1 as the numerator and channel 2 > as the denominator"). If I read the documentation correctly you are > supposed to assign the column where each type of data occurs in the > interface, so this does not have to be given explicitly here. This should > also be available from the Documentation menu, and as a link from the > drop-down menu where you are allowed to select the file.As above, since the > Curator has to add the QTD s/he can add this file and link at the same > time. > > This depends on my having understood how these two files are made and used, > Jason may need to correct this. > > Cheers, > Jennifer -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <jw...@gm...> - 2004-08-24 15:49:16
|
I have not had any comments back about the draft of user documentation for uploading data - probably no other users are out there at the moment. A couple of things do come up that relate to how we handle associated files, and this will impact developers and administrators so I have pulled out that part and summarized my suggestions below. I would like to suggest a naming convention for array designs: clearly, when something is commercial it should have the name of the company and array including a version number (default is 1), e.g. AffyHu95Av1. I suggest for custom arrays that we use the name of the lab PI, four letters for the species, and a date that is year+month (e.g. CushmanMecr-200209). Is my interpretation of the QT Dimension correct, that is, while it may relate to the feature extraction software it is not actually dependent on it? It seems that there could (and should?) be both quite generic QT Dimension files and some that are very tailored to the feature extraction software? It is harder to come up with a meaningful shorthand for something with so much leeway. It would be very useful to know at least how many columns of data it handles and which column(s) has the raw signal. When it derives from a particular feature extraction package that does need to be in the name, otherwise perhaps it can take the last name of the person who designed it. And a date would always be useful. So one could have MAS4d5s3s0-20020823 for a MAS4 derived file with data in 5 columns, the signal for one channel in the third column, no second channel and a date - the detail of the date may be overkill, I do think a year is minimal to account for changes to software packages and multipl Jason asked what sorts of associated files we would want to be able to access and where such files would be available. Certainly the raw data should be linked, I would think through the experiment name - when Browsing there should be an option to open or download the associated files. The raw data constitutes the image file and the extracted tabular data (should be image file name and a date). Hopefully the header inthe tabular data tells what image extraction analysis package and parameter settings were used - we are not a LIMS system and I do not propose that we try to store all of that. For an Array Design there should be a brief description ("this is a long-oligo array with 6500 spots that cover 5000 genes and a set of controls for the organism Mesembryanthemum crystallinum, made by the UNR genomics core for Dr. John Cushman in August 2002") and also the actual tab-delimited text file that constitutes the array layout, with whatever the supporting lab has provided. Again, we don't inforce what has to be in this file, that is up to the people who think they may want to refer to it later. I think these files should be available through links in the Documentation menu, by Browsing, and also from the drop-down list where one can select them when setting up GeneX. Since the Curator has to add this file, the associated files and link could be provided at the same time. For a QT Dimension, again there should be a description (" this is a generic quantitation type dimension file that takes 7 columns of data, signal from two channels, background from two channels, background-subtracted signal from each channel and the ratio of background-subtracted signal using channel 1 as the numerator and channel 2 as the denominator"). If I read the documentation correctly you are supposed to assign the column where each type of data occurs in the interface, so this does not have to be given explicitly here. This should also be available from the Documentation menu, and as a link from the drop-down menu where you are allowed to select the file.As above, since the Curator has to add the QTD s/he can add this file and link at the same time. This depends on my having understood how these two files are made and used, Jason may need to correct this. Cheers, Jennifer |
From: Harry M. <hj...@ta...> - 2004-08-24 03:35:59
|
Hi All, I did make a curator-specific user for JWW on matrix - jwwcurat. AFAICT, it's still got curator permissions. Or are you talking about genex2? And if so, why..? hjm On Monday 23 August 2004 10:15 am, Jason E. Stewart wrote: > jw...@gm... writes: > But if you ever run one of the Mason apps as an admin and it gives > some kind of permission error - that is a bug. > > > I am not sure that I have a > > curator-specific persona, but I will ask Harry to create one, it > > would be good for testing purposes. > > Yup. I agree. > -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: Harry M. <hj...@ta...> - 2004-08-24 03:28:07
|
This is the default Debian config b/c it's required to install/update it automagically. However, it's a nasty security hole if you allow TCP access to the server (which many will want to do) and so I've added a strong warning to the INSTALL doc. I don't know if we want to add a warning about this to the end of the Install script or not... -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: Harry M. <hj...@ta...> - 2004-08-24 03:25:36
|
OK - sorry for the delay - MANIFEST.in is changed, and GeneX installs without errors now. On Sunday 22 August 2004 1:16 am, Jason E. Stewart wrote: > Hi Harry, > > Correct - that is the error you will get if the MANIFEST lists a file > that has been removed. > > My bad. If you could remove the offending line from the file - I would > be grateful. > > Cheers, > jas. > > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-08-23 17:34:31
|
Hrishikesh Deshmukh <gen...@ya...> writes: > Now loading the array design file was successful, i > have selected dataset 4 (one of your emails dated: 27 > April,04 has the details). What did you do to fix it? > Second thing which i wanted to do was loading QTD > file, i selected the quantarray.xml file, the data set > file is 1200-mangle-.txt, the values used were: > Must specify --abbrev_name Ok. This is a bug. Could you file a bug report so this doesn't get lost. Cheers, jas. |
From: <ja...@op...> - 2004-08-23 17:28:27
|
jw...@gm... writes: > For the most part I tested everything both logged in as a user > and as an administrator. Do the administrator priviliges subsume > the curator priviliges? They should... But there is one issue. A lot of the navigation bar boxes with curator tasks will only appear if the user has CURATOR priveleges - so even though an admin should be able to do it, you may have to type in the URL to get access to the Mason app. But if you ever run one of the Mason apps as an admin and it gives some kind of permission error - that is a bug. > I am not sure that I have a > curator-specific persona, but I will ask Harry to create one, it > would be good for testing purposes. Yup. I agree. > I think that we will have to make the user documents more specific > about the curator-specific tasks, from 'this is something the > curator should do' to ' this is something the curator will have to > do'. Yes. I agree. Cheers, jas. |
From: <ja...@op...> - 2004-08-23 17:28:11
|
Hrishikesh Deshmukh <gen...@ya...> writes: > I logged in as genes_test_curator and tried loading AD > file,here is the "error" which i get: > Genex Job Status Page Without the output in the apache error log, I can't tell what happened. Cheers, jas. |