You can subscribe to this list here.
2001 |
Jan
(135) |
Feb
(57) |
Mar
(84) |
Apr
(43) |
May
(77) |
Jun
(51) |
Jul
(21) |
Aug
(55) |
Sep
(37) |
Oct
(56) |
Nov
(75) |
Dec
(23) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(32) |
Feb
(174) |
Mar
(121) |
Apr
(70) |
May
(55) |
Jun
(20) |
Jul
(23) |
Aug
(15) |
Sep
(12) |
Oct
(58) |
Nov
(203) |
Dec
(90) |
2003 |
Jan
(37) |
Feb
(15) |
Mar
(14) |
Apr
(57) |
May
(7) |
Jun
(40) |
Jul
(36) |
Aug
(1) |
Sep
(56) |
Oct
(38) |
Nov
(105) |
Dec
(2) |
2004 |
Jan
|
Feb
(117) |
Mar
(69) |
Apr
(160) |
May
(165) |
Jun
(35) |
Jul
(7) |
Aug
(80) |
Sep
(47) |
Oct
(23) |
Nov
(8) |
Dec
(42) |
2005 |
Jan
(19) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <ja...@op...> - 2004-12-31 02:05:55
|
Hrishikesh Deshmukh <hde...@gm...> writes: > 1) I assume for the options exp,ad,dtdim, fe the primary keys are the > accession numbers which are generated once we use the GUI to insert > experiment, ad and qt-dim?! Yes, correct. If this isn't clear in the documentation for the script, let me know, and suggest some wording as to how to improve it, and I will be happy to make it more clear. > 2) Since for Affy we load CDF and thats our AD as well, so i just take > the accession number from the CDF table (there is none in the browsing > interface). CDF files are just Affy's way of specifying an ArrayDesign. So there is no 'CDF' table, it is entered into the ArrayDesign table. > 3) Correct me on this one: I have to use GUI to insert exp, ad and > qt-dim, fe and then use command line interface to load the "bulk" > files?!!! Basically create the file as shown in the example with PBA > and MBA...... Correct. The FE SW must already be in the system - no one but a developer can add new ones. After the trouble I had explaining the regexp based system, I realized it was too complicated. So now, all the information is hard coded in the file parsing system. I am still (slowly) working on the generic tool to enable loading of arbitrary files, and I have only one more step to complete and I can fully test it, document it, and release it to you for more testing. Creating an Experiment is something that is done once for each, well, experiment. So once it is created you can continue to load new data into it as more hybridizations are completed in the lab. The AD and QT Dim will only be loaded once, and then re-used again and again. Once these pieces are in the DB (using the GUI tools), then you can use either the GUI loader to load multiple files (more work I believe) or the command line tool together with the load file (less work when there are many arrays to load - I believe). > Sorry the last question sounds really stupid but i want to make sure > that the workflow part can be smoothed out incase there is any need > and the same questions can go in the FAQ/docs. No problems! Happy New Year (from Chennai), jas. |
From: Hrishikesh D. <hde...@gm...> - 2004-12-30 16:42:46
|
Hi Jason, I have some questions regarding the workflow of using command line interface to load "bulk" data. 1) I assume for the options exp,ad,dtdim, fe the primary keys are the accession numbers which are generated once we use the GUI to insert experiment, ad and qt-dim?! 2) Since for Affy we load CDF and thats our AD as well, so i just take the accession number from the CDF table (there is none in the browsing interface). 3) Correct me on this one: I have to use GUI to insert exp, ad and qt-dim, fe and then use command line interface to load the "bulk" files?!!! Basically create the file as shown in the example with PBA and MBA...... Sorry the last question sounds really stupid but i want to make sure that the workflow part can be smoothed out incase there is any need and the same questions can go in the FAQ/docs. Cheers, Hrishi Jason E. Stewart wrote: > Hrishikesh Deshmukh <hde...@gm...> writes: > > >>I hope there were no problems in Madurai,coz of tsunami? > > > Actually, I've been in the Pondicherry/Auroville area for the past 6 > weeks. I just left last week and went south to Nagapattnam (the huge > church at Vaillankani), and then went inland to Thanjavor. Nothing got > this far in India, so yes I am fine. > > >>I want to give a shot at using command line interface to load "bulk" >>data. Could you please explain this part: >> >>You use it like: >> > >> > run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ >> > --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 >> >>1)What are these numbers? I understand "ad" stands for ArrayDesign but >>what are "ro" and "rw", meanwhile you are at it could you explain all, >>this i will doc them. > > > They are documented already: > > run-mbad-insert.pl --man > > for verbose details > > run-mbad-insert.pl --help > > for brief description. All my commonly used scripts support the new > --man flag that give full details. All the rest support the --help > flag which give a brief output. > > Please read the documentation and I will anwser anything that isn't > clear. > > Also, please note that Perl scripts that use the Getopt::Long system > (which all the Genex scripts do) can abbreviate any command line > option down to any unique piece. This is just to save typing on the > command line. For example, '--qtdim_pk' in the above example was just > listed as '--qt' - because --qt is unabiguous to the option parser. > > >>2)How do i get them? > > > You can either use the command line DB tool, psql, and type SQL > queries, or you can use the genex WWW table browser ("Browse by Table" > under the "Browsing" tab) - I find this convenient. Just choose the > table you want, e.g. Experiment, ArrayDesign, GroupSec, etc, then you > will be given a table of the first 100 DB entries (need to fix that), > the 'Accession Number' column is what you want to use - they are the > primary key values. > > >>3)I assume the .CEL files (they are not binary) which i have do not >>need to be renamed to .txt > > > Correct, names do not matter. > > Cheers, > jas. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev |
From: <ja...@op...> - 2004-12-29 02:40:48
|
Hrishikesh Deshmukh <hde...@gm...> writes: > I hope there were no problems in Madurai,coz of tsunami? Actually, I've been in the Pondicherry/Auroville area for the past 6 weeks. I just left last week and went south to Nagapattnam (the huge church at Vaillankani), and then went inland to Thanjavor. Nothing got this far in India, so yes I am fine. > I want to give a shot at using command line interface to load "bulk" > data. Could you please explain this part: > > You use it like: > > > > run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ > > --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 > > 1)What are these numbers? I understand "ad" stands for ArrayDesign but > what are "ro" and "rw", meanwhile you are at it could you explain all, > this i will doc them. They are documented already: run-mbad-insert.pl --man for verbose details run-mbad-insert.pl --help for brief description. All my commonly used scripts support the new --man flag that give full details. All the rest support the --help flag which give a brief output. Please read the documentation and I will anwser anything that isn't clear. Also, please note that Perl scripts that use the Getopt::Long system (which all the Genex scripts do) can abbreviate any command line option down to any unique piece. This is just to save typing on the command line. For example, '--qtdim_pk' in the above example was just listed as '--qt' - because --qt is unabiguous to the option parser. > 2)How do i get them? You can either use the command line DB tool, psql, and type SQL queries, or you can use the genex WWW table browser ("Browse by Table" under the "Browsing" tab) - I find this convenient. Just choose the table you want, e.g. Experiment, ArrayDesign, GroupSec, etc, then you will be given a table of the first 100 DB entries (need to fix that), the 'Accession Number' column is what you want to use - they are the primary key values. > 3)I assume the .CEL files (they are not binary) which i have do not > need to be renamed to .txt Correct, names do not matter. Cheers, jas. |
From: Hrishikesh D. <hde...@gm...> - 2004-12-28 16:13:46
|
Jason, I hope there were no problems in Madurai,coz of tsunami? I want to give a shot at using command line interface to load "bulk" data. Could you please explain this part: You use it like: > > run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ > --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 1)What are these numbers? I understand "ad" stands for ArrayDesign but what are "ro" and "rw", meanwhile you are at it could you explain all, this i will doc them. 2)How do i get them? 3)I assume the .CEL files (they are not binary) which i have do not need to be renamed to .txt Eagerly waiting for your reply. Thanks, Hrishi Jason E. Stewart wrote: > Hi Hrishi, > > Sorry, I let this one slip. > > hde...@gm... writes: > > >>Can one do a "bulk" upload of data? >>I have like 200 .CEL files for the dataset! > > > Yes. > > You can either select 200 files in the GUI - maybe a bit tedious, or > you can use the command line interface. To learn how to use it type: > > /usr/local/genex/bin/mbad-insert.pl --man > > this spits out the documentation (you can use --help for the brief > docs if you just want a reminder what the required options are). All > the genex Perl scripts honor the --help flag, some of the more > important ones now honor --man, I convert the scripts a little at a > time to the new self-documentation system. > > Anyway... > > required parameters: > --username=name : the DB username to login as > --password=word : the DB password to login with > --mba_name=name : the MeasuredBioAssay name (list) > --pba_name=name : the PhysicalBioAssay name (list) > --ad_pk=pk : the ArrayDesign used > --exp_pk=pk : the primary key of the Experiment > --qtdim_pk=pk : the primary key of the QuantitationTypeDimension > --fe_sw_pk=pk : the primary key of the FeatureExtraction SW > --ro_group_id=pk : the primary key of the read-only group > --rw_group_id=pk : the primary key of the read/write group > > are the required params. You need to use the genex browsing interface > to get all the pkey numbers for Experiment, ArrayDesign, QTD, FE SW, > read and write groups. The only tricky bit is --mba_name and > --pba_name. These need to be specified once for each input file - in > the same order that you list the files. > > To make things easier, I just wrote a helper utility, > run-mbad-insert.pl, that reads the name and file information in from a > tab-delimited text file. > > You use it like: > > run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ > --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 > > here is an example input file: > > /usr/local/genex/uploads/genex/200-short.txt PBA 200-short2 MBA 200-short2 > /usr/local/genex/uploads/genex/201-short.txt PBA 201-short2 MBA 201-short2 > /usr/local/genex/uploads/genex/202-short.txt PBA 202-short2 MBA 202-short2 > > spaces are allowed in the names, just not tabs. run-mbad-insert.pl is > installed and tested on genex2 in /usr/local/genex/bin. > > BTW - the svn checkout directory on genex2 is: > > /home/jasons/work/genex-server > > in case you need to update something. > > Cheers, > jas. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev |
From: <ja...@op...> - 2004-12-21 15:33:40
|
dcarr2 <dc...@gm...> writes: > key 'DBNAME' does not exist at Perl/scripts/tabledef.pl line 25 > Died at /usr/local/genex/bin/create-genex-db.pl line 319, <IN> chunk 5. > Died at Perl/scripts/gendb.pl line 115. > > FATAL ERROR I was finally able to switch Bio::Genex::Config to use StrictHash - which gives a fatal error if you try to access a hash key that doesn't exist - in so doing I've uncovered some errors such as the one above which used to silently fail but now it causes a fatal error. I'm more than a little confused how this one snuck through the cracks... It must mean that I did not run a test install on either my laptop or on genex2... Very embarassing. I've just committed revision 1725 which fixes the problem. Cheers, jas. |
From: dcarr2 <dc...@gm...> - 2004-12-20 20:16:14
|
Harry and Jason, Due to Wellerlab2's crash a couple weeks ago I have had a chance to go back through the full installation process with the latest from genex2 on a 'virgin' system. I had fewer problems this time and got to the installation point. i.e. 'as root make install' I have recieved the following fatal error.... I have also attached the install-errors file. Thanks, Andrew ----------------------------------------------------------------------------------------------------------------------- key 'DBNAME' does not exist at Perl/scripts/tabledef.pl line 25 Died at /usr/local/genex/bin/create-genex-db.pl line 319, <IN> chunk 5. Died at Perl/scripts/gendb.pl line 115. FATAL ERROR I got an error when I ran the DB installer. !! System Error: @ line: 380 (su -c '/usr/bin/perl /home/acarr/genex-server/Perl/scripts/db.pl' postgres) FATAL ERROR I got an error when I ran the DB installer. make: *** [install] Error 9 -------------------------------------------------------------------------------------------------------------------------- |
From: Harry M. <hj...@ta...> - 2004-12-17 05:27:24
|
On Thursday 16 December 2004 6:58 pm, Jason E. Stewart wrote: > the main issue is the .ini type format with the different header > sections [CEL], [INTENSITY], etc. And there needs to be a blank > line between sections. If you look in > Perl/Bio-Genex/Genex/Connect.pm in the load_data_cel() method, you > will see what sections I actually use, and which lines are used. > > Currently only the INTENSITY section is critical. I don't know what > the MASKS, OUTLIERS, and MODIFIED sections are for, so I don't use > them or store them in the DB. OK - I've gotten that wrangled fine (and found out what the other sections are for as well, not that we need them) - I'm just adding some globbing stuff so that it can be used to process multiple files at once and I'll send it to you to see how it works. Still haven't gotten it to work in Inline, but there were other things wrong at the time as well (learning C++ as I go.. - seems to me that C++ is to C as Java is to Perl (hint - it's not a compliment...). -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-12-17 03:01:56
|
Harry J Mangalam <hj...@ta...> writes: > I've been playing with teh Affy code mentioned recently and have gotten it to > compile and validate that the CEL file stuff seems to work OK. > > I'm going to try hacking it a bit to produce the same kind of text output that > the GCOS system exports (it runs faster than the GCOS output and we can > script it to do multiple files at once - an issue for Hrishi. > > Does the GeneX loader need the exact format that the current format uses: > > or if not, what are the critical bits that do need to be parsed out to load > into GeneX? I'll try to duplicate the format as much as possible, but if > what's critical and what's not? the main issue is the .ini type format with the different header sections [CEL], [INTENSITY], etc. And there needs to be a blank line between sections. If you look in Perl/Bio-Genex/Genex/Connect.pm in the load_data_cel() method, you will see what sections I actually use, and which lines are used. Currently only the INTENSITY section is critical. I don't know what the MASKS, OUTLIERS, and MODIFIED sections are for, so I don't use them or store them in the DB. Cheers, jas. |
From: Harry J M. <hj...@ta...> - 2004-12-16 15:59:22
|
HI Jason, I've been playing with teh Affy code mentioned recently and have gotten it to compile and validate that the CEL file stuff seems to work OK. I'm going to try hacking it a bit to produce the same kind of text output that the GCOS system exports (it runs faster than the GCOS output and we can script it to do multiple files at once - an issue for Hrishi. Does the GeneX loader need the exact format that the current format uses: [CEL] Version=3 [HEADER] Cols=1164 Rows=1164 TotalX=1164 TotalY=1164 OffsetX=0 OffsetY=0 GridCornerUL=231 244 GridCornerUR=8422 219 GridCornerLR=8452 8419 GridCornerLL=261 8444 Axis-invertX=0 AxisInvertY=0 swapXY=0 DatHeader=[23..65534] 100704A-01_OLDTR7_+DOX:CLS=8660 RWS=8660 XIN=1 YIN=1 VE=30 2.0 11/02/04 12:41:30 50102870 M10 HG-U133_Plus_2.1sq 6 Algorithm=Percentile AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000 [INTENSITY] NumberCells=1354896 CellHeader=X Y MEAN STDV NPIXELS 0 0 130.0 23.8 25 1 0 17316.0 2283.8 25 2 0 186.0 51.9 25 3 0 17501.0 2171.7 25 4 0 180.0 31.4 25 <<snip>> 1161 1163 240.0 58.7 25 1162 1163 17202.0 2896.0 25 1163 1163 176.0 34.6 25 [MASKS] NumberCells=0 CellHeader=X Y [OUTLIERS] NumberCells=155 CellHeader=X Y 331 0 435 0 651 0 809 0 831 0 893 0 969 0 987 0 <<snip>> 439 1163 553 1163 627 1163 [MODIFIED] NumberCells=0 CellHeader=X Y ORIGMEAN or if not, what are the critical bits that do need to be parsed out to load into GeneX? I'll try to duplicate the format as much as possible, but if what's critical and what's not? -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-12-14 16:28:16
|
Harry Mangalam <hj...@ta...> writes: > I had found this code this morning and as you phoned I was watching > the zipfile expand. Code looks pretty concise. I'll play around > with it and see if I can get it to sit up and bark. > > re: Perl - XS or inline? XS too much of a pain. Either inline or SWIG. Cheers, jas. |
From: SourceForge.net <no...@so...> - 2004-12-14 15:46:02
|
Bugs item #1085191, was opened at 2004-12-14 15:46 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1085191&group_id=16453 Category: Administrative Apps Group: None Status: Open Resolution: None Priority: 5 Submitted By: Harry Mangalam (mangalam) Assigned to: Jason E. Stewart (jason_e_stewart) Summary: loading cel files takes too much mem Initial Comment: During a load of text formatted cel file of a u133_plus_2 cel file, after repeatedly crashing on a 512M/512M (real/swap) machine (load going up to ~20), it ran on a 2G/4G machine without problems, altho the perl process ate almost 800MB. watching the load, it was the perl script, not the postgres process that ate the mem. Jason has said that the problem is due to perl building a huge data struct in mem, THEN writing it to the DB instead of doing it on a smaller basis. apparently an easy fix, but medium priority at this point, unless it impinges on loading multiple files at once. All the test machines (except my laptop) have enough mem to do this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1085191&group_id=16453 |
From: <ja...@op...> - 2004-12-14 01:33:44
|
hrishikesh deshmukh <d_h...@ya...> writes: > --- "Wittner, Ben" <Wit...@mg...> wrote: > >> From: "Wittner, Ben" <Wit...@mg...> To: >> "'bio...@st...'" <bio...@st...> >> Date: Mon, 13 Dec 2004 08:04:12 -0500 Subject: [BioC] LGPL C++ >> source code available from Affy for reading their file f ormats >> >> I think Bioconductor packages have not been able to read the newer >> Affy file formats (for good reasons explained in a post by Rafael >> I.). But I just got an e-mail from Affy saying that they are making >> a library available under LGPL (see below). So, just in case anyone >> who should know does not already know... >> >> File Parsers SDK - OPEN SOURCE We are pleased to announce the >> release of C++ source code for the parsing of Affymetrix CEL, CHP, >> CDF, BAR and BPMAP files. This souce code is being provided under >> the GNU Lesser General Public License (LGPL). >> >> >> Click here to download the SDK <A >> > HREF="http://www.affymetrix.com/redirect/email.jsp?source=nl200412adn&dest=/support/developer/filesdk/index.affx">here</A> Ah, thanks Hrishi. This is definately useful. I'll have to take a look. The only concern is that it is windows based code and will have to be modified for Unix. I'll toss it at SWIG and we'll have a Perl binding for it, and that will be useful for everyone. Cheers, jas. |
From: <ja...@op...> - 2004-12-14 01:27:55
|
Harry Mangalam <hj...@ta...> writes: > Great that you discovered it - that's exactly what it looked like, > watching it run. I'd say that now that we know what it's doing, it's > a medium priority to fix, not high. I'll just load on my larger > system. I think that all the machines we're using have enough mem to > hold this in place while loading. Okee-dokee. Harry, can you mark this as a bug on SF so that we don't forget it? Cheers, jas. |
From: Harry M. <hj...@ta...> - 2004-12-13 19:01:31
|
Great that you discovered it - that's exactly what it looked like, watching it run. I'd say that now that we know what it's doing, it's a medium priority to fix, not high. I'll just load on my larger system. I think that all the machines we're using have enough mem to hold this in place while loading. hjm On Monday 13 December 2004 8:08 am, Jason E. Stewart wrote: > ja...@op... (Jason E. Stewart) writes: > > Harry Mangalam <hj...@ta...> writes: > >> On Monday 13 December 2004 4:10 am, Jason E. Stewart wrote: > >>> So I'm guessing it's Postgres. We are running inside a > >>> transaction, but so what. Could it be that the index is getting > >>> so large that it needs it memory??? Not after half a million > >>> rows it better not be. > >> > >> I was watching both top and the KDE system guard on this pretty > >> closely (activity by process) and it was clearly the perl > >> process that was eating memory. Postgres/postmaster topped out > >> at about 20MB. > > > > Well, that's good news I suppose... Fixing the Perl code is > > easier than fixing Postgres. Let me investigate. > > Ok Sorry about that. It's definately my script. Obviously, two > weeks is too long a period for me to remember my own code. The CDF > loader reads the whole file, builds a big array, and then inserts > the ArrayDesign object all at once. > > So basically, it's a big Perl array that's sucking up memory. There > is definately a fix. > > Is this high priority or medium priority?? > > Cheers, > jas. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. Discover which products truly live up to the hype. Start > reading now. http://productguide.itmanagersjournal.com/ > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <ja...@op...> - 2004-12-13 16:11:46
|
ja...@op... (Jason E. Stewart) writes: > Harry Mangalam <hj...@ta...> writes: > >> On Monday 13 December 2004 4:10 am, Jason E. Stewart wrote: >> >> >>> So I'm guessing it's Postgres. We are running inside a transaction, >>> but so what. Could it be that the index is getting so large that it >>> needs it memory??? Not after half a million rows it better not be. >> >> I was watching both top and the KDE system guard on this pretty >> closely (activity by process) and it was clearly the perl process >> that was eating memory. Postgres/postmaster topped out at about >> 20MB. > > Well, that's good news I suppose... Fixing the Perl code is easier > than fixing Postgres. Let me investigate. Ok Sorry about that. It's definately my script. Obviously, two weeks is too long a period for me to remember my own code. The CDF loader reads the whole file, builds a big array, and then inserts the ArrayDesign object all at once. So basically, it's a big Perl array that's sucking up memory. There is definately a fix. Is this high priority or medium priority?? Cheers, jas. |
From: <hde...@gm...> - 2004-12-13 16:00:29
|
The utility has been updated to genex2 or do i need to copy it to genex2 from the path that is given? Thanks, Hrishi ____________________________________________________________________ 'Life's battles don't always go to the stronger or faster man. But sooner or later the man who wins is the one who thinks he can.' -Walter D. Wintle ----- Original Message ----- From: ja...@op... (Jason E. Stewart) Date: Monday, December 13, 2004 10:54 am Subject: Re: [GeneX-dev] Re: > Hi Hrishi, > > Sorry, I let this one slip. > > hde...@gm... writes: > > > Can one do a "bulk" upload of data? > > I have like 200 .CEL files for the dataset! > > Yes. > > You can either select 200 files in the GUI - maybe a bit tedious, or > you can use the command line interface. To learn how to use it type: > > /usr/local/genex/bin/mbad-insert.pl --man > > this spits out the documentation (you can use --help for the brief > docs if you just want a reminder what the required options are). All > the genex Perl scripts honor the --help flag, some of the more > important ones now honor --man, I convert the scripts a little at a > time to the new self-documentation system. > > Anyway... > > required parameters: > --username=name : the DB username to login as > --password=word : the DB password to login with > --mba_name=name : the MeasuredBioAssay name (list) > --pba_name=name : the PhysicalBioAssay name (list) > --ad_pk=pk : the ArrayDesign used > --exp_pk=pk : the primary key of the Experiment > --qtdim_pk=pk : the primary key of the > QuantitationTypeDimension --fe_sw_pk=pk : the primary > key of the FeatureExtraction SW > --ro_group_id=pk : the primary key of the read-only group > --rw_group_id=pk : the primary key of the read/write group > > are the required params. You need to use the genex browsing interface > to get all the pkey numbers for Experiment, ArrayDesign, QTD, FE SW, > read and write groups. The only tricky bit is --mba_name and > --pba_name. These need to be specified once for each input file - in > the same order that you list the files. > > To make things easier, I just wrote a helper utility, > run-mbad-insert.pl, that reads the name and file information in > from a > tab-delimited text file. > > You use it like: > > run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ > --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 > > here is an example input file: > > /usr/local/genex/uploads/genex/200-short.txt PBA 200-short2 MBA > 200-short2 > /usr/local/genex/uploads/genex/201-short.txt PBA 201-short2 MBA > 201-short2 > /usr/local/genex/uploads/genex/202-short.txt PBA 202-short2 MBA > 202-short2 > > spaces are allowed in the names, just not tabs. run-mbad-insert.pl is > installed and tested on genex2 in /usr/local/genex/bin. > > BTW - the svn checkout directory on genex2 is: > > /home/jasons/work/genex-server > > in case you need to update something. > > Cheers, > jas. > |
From: <ja...@op...> - 2004-12-13 16:00:28
|
Harry Mangalam <hj...@ta...> writes: > On Monday 13 December 2004 4:10 am, Jason E. Stewart wrote: > > >> So I'm guessing it's Postgres. We are running inside a transaction, >> but so what. Could it be that the index is getting so large that it >> needs it memory??? Not after half a million rows it better not be. > > I was watching both top and the KDE system guard on this pretty > closely (activity by process) and it was clearly the perl process > that was eating memory. Postgres/postmaster topped out at about > 20MB. Well, that's good news I suppose... Fixing the Perl code is easier than fixing Postgres. Let me investigate. Cheers, jas. |
From: <ja...@op...> - 2004-12-13 15:57:44
|
Hi Hrishi, Sorry, I let this one slip. hde...@gm... writes: > Can one do a "bulk" upload of data? > I have like 200 .CEL files for the dataset! Yes. You can either select 200 files in the GUI - maybe a bit tedious, or you can use the command line interface. To learn how to use it type: /usr/local/genex/bin/mbad-insert.pl --man this spits out the documentation (you can use --help for the brief docs if you just want a reminder what the required options are). All the genex Perl scripts honor the --help flag, some of the more important ones now honor --man, I convert the scripts a little at a time to the new self-documentation system. Anyway... required parameters: --username=name : the DB username to login as --password=word : the DB password to login with --mba_name=name : the MeasuredBioAssay name (list) --pba_name=name : the PhysicalBioAssay name (list) --ad_pk=pk : the ArrayDesign used --exp_pk=pk : the primary key of the Experiment --qtdim_pk=pk : the primary key of the QuantitationTypeDimension --fe_sw_pk=pk : the primary key of the FeatureExtraction SW --ro_group_id=pk : the primary key of the read-only group --rw_group_id=pk : the primary key of the read/write group are the required params. You need to use the genex browsing interface to get all the pkey numbers for Experiment, ArrayDesign, QTD, FE SW, read and write groups. The only tricky bit is --mba_name and --pba_name. These need to be specified once for each input file - in the same order that you list the files. To make things easier, I just wrote a helper utility, run-mbad-insert.pl, that reads the name and file information in from a tab-delimited text file. You use it like: run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \ --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679 here is an example input file: /usr/local/genex/uploads/genex/200-short.txt PBA 200-short2 MBA 200-short2 /usr/local/genex/uploads/genex/201-short.txt PBA 201-short2 MBA 201-short2 /usr/local/genex/uploads/genex/202-short.txt PBA 202-short2 MBA 202-short2 spaces are allowed in the names, just not tabs. run-mbad-insert.pl is installed and tested on genex2 in /usr/local/genex/bin. BTW - the svn checkout directory on genex2 is: /home/jasons/work/genex-server in case you need to update something. Cheers, jas. |
From: Harry M. <hj...@ta...> - 2004-12-13 15:06:48
|
On Monday 13 December 2004 4:10 am, Jason E. Stewart wrote: > So I'm guessing it's Postgres. We are running inside a transaction, > but so what. Could it be that the index is getting so large that it > needs it memory??? Not after half a million rows it better not be. I was watching both top and the KDE system guard on this pretty closely (activity by process) and it was clearly the perl process that was eating memory. Postgres/postmaster topped out at about 20MB. Here's my hand-typed log of the load: --- running it on bonk, load is ~1 but mem usage is creeping up: 250M, 374, 386, 398, 410, 422, 434, 446, 458, 470, 482, 494, 507, 519, 525, 544, 555, 567, 579, 591, 603 (typed #s as fast as I could) load still only 1.5, 652, no swap yet, 687, 700, 731, 735, now loading postmaster mem static at 735 postmaster is at 18M load is down a bit at 1.2 mem increases slightly at 737 now load is up to 1.9 now postgres is 80% of load (but only 20M mem usage), perl is 9-11% web page reorts found 1.2M repoters same # of features, load is up to 2.4, perl mem is up to 740 post is around 12-18M still no swapping postgres is now 88% of load, perl about 13%, load down to 1.7 (bouncing but not above 3) ok - load the cdf file in 5881 sec looks like it finished correctly altho we still got an ERROR msg "Can't ignore signal CHLD, forcing to default" --- Could it have been that DBI was caching all this stuff internally? I don't think postgres memory would be showing up under a perl process unless it truly was using it as a library and 'inherited' the memory use. > Harry, can you post a message about this to the postgres-users > list? Sure, but how can we make sure that it's not a runaway perl array 1st? The processes seem to indicate that it's perl, not postgres. Is there magic to monitor sizes of perl data structures without explicitly tracking each one? -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <hde...@gm...> - 2004-12-13 13:57:54
|
Jason, Can one do a "bulk" upload of data? I have like 200 .CEL files for the dataset! Waiting for your reply. Thanks, Hrishi |
From: <ja...@op...> - 2004-12-13 12:13:54
|
Harry Mangalam <hj...@ta...> writes: > On Sunday 12 December 2004 6:01 am, Jason E. Stewart wrote: >> FYI, in honor of Harry's valiant > > thanks for not accurately saying "stupid and pointless struggle" It reminded me of time Peter Hraber and I spent two days trying to load the very first data set into genex1 - we simply could not believe the number of discrepancies in the data file: yeast ORF names that didn't exist, rows with different number of columns, etc, etc. GIGO ... >> Harry, is this possibly the cause of the massive thrashing you >> experienced when loading the test data? I was surprised by the >> memory requirements, but I don't see how this could be related >> (just thrashing for a cause). > > NO - I'm pretty sure, after watching the lappie die repeatedly and > then watching it go to completion on the 2GB machine, that it was > running out of mem. It is a /100MB/ CDF file and I can send you my > (typed) log of the thing as it was running. IIRC, mem usage for it > went to ~800MB (real RAM) which it could as I had 2GB available. It > never went to swap. > On the lappie, once it started swapping, the game was over. (I have > 512MB real but only 512 MB swap on the lappie, so what with the other > things running, it was very close to the limit and while it was > negotiating with itself for the last few bytes of vmem, it just > stopped responding to anything. I don't know why this is happening, but it's bad, whatever the reason. I'd like to debug this a bit. My code parses the .CDF file and sends an INSERT directly to the DB => one line of the file, one INSERT. Perl isn't storing *any* data at all (except for a line number counter - but that ain't chewing up >800Mb). So I'm guessing it's Postgres. We are running inside a transaction, but so what. Could it be that the index is getting so large that it needs it memory??? Not after half a million rows it better not be. Harry, can you post a message about this to the postgres-users list? Cheers, jas. |
From: Harry M. <hj...@ta...> - 2004-12-12 17:20:58
|
On Sunday 12 December 2004 6:01 am, Jason E. Stewart wrote: > FYI, in honor of Harry's valiant thanks for not accurately saying "stupid and pointless struggle" > struggle against the binary .CEL > files, I've added an error check to see if the files are actually > text files or binary ... interestingly enough the unix 'file' > utility seems to think that text .CEL files are 'data' and not text > so I have to be more clever than just using 'file' to figure it > out... > Harry, is this possibly the cause of the massive thrashing you > experienced when loading the test data? I was surprised by the > memory requirements, but I don't see how this could be related > (just thrashing for a cause). NO - I'm pretty sure, after watching the lappie die repeatedly and then watching it go to completion on the 2GB machine, that it was running out of mem. It is a /100MB/ CDF file and I can send you my (typed) log of the thing as it was running. IIRC, mem usage for it went to ~800MB (real RAM) which it could as I had 2GB available. It never went to swap. On the lappie, once it started swapping, the game was over. (I have 512MB real but only 512 MB swap on the lappie, so what with the other things running, it was very close to the limit and while it was negotiating with itself for the last few bytes of vmem, it just stopped responding to anything. If the CDF file was smaller, or I had allocated 1GB of swap, it probably could have made it. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: <hde...@gm...> - 2004-12-12 16:00:41
|
Hi Jason, I assume i have to load one .CEL at a time, is there a way to load in bulk? Thanks, Hrishi ____________________________________________________________________ 'Life's battles don't always go to the stronger or faster man. But sooner or later the man who wins is the one who thinks he can.' -Walter D. Wintle ----- Original Message ----- From: ja...@op... (Jason E. Stewart) Date: Sunday, December 12, 2004 9:01 am Subject: [GeneX-dev] Re: How to create AD file for Affymetrix data > Harry Mangalam <hj...@ta...> writes: > > > The problem on my end, I've confirmed is that the CEL files were > in > > binary format, not text (what an idiot) and so completely > confused > > the loader. > > FYI, in honor of Harry's valiant struggle against the binary .CEL > files, I've added an error check to see if the files are actually text > files or binary ... interestingly enough the unix 'file' utility seems > to think that text .CEL files are 'data' and not text so I have to be > more clever than just using 'file' to figure it out... > > > I've been fighting with samba and disk space all morning to > finally > > discover that while I have all the right programs installed, the > > 600MB of space I have left is nowhere near enough to extract the > text > > data files, so I'm going to have to go down to my wife's lab and > do > > it there. I hope to have this done sometime today, but I dunno > when. > Harry, is this possibly the cause of the massive thrashing you > experienced when loading the test data? I was surprised by the memory > requirements, but I don't see how this could be related (just > thrashing for a cause). > > Cheers, > jas. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users.Discover which products truly live up to the hype. Start > reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > |
From: <ja...@op...> - 2004-12-12 14:05:08
|
Harry Mangalam <hj...@ta...> writes: > The problem on my end, I've confirmed is that the CEL files were in > binary format, not text (what an idiot) and so completely confused > the loader. FYI, in honor of Harry's valiant struggle against the binary .CEL files, I've added an error check to see if the files are actually text files or binary ... interestingly enough the unix 'file' utility seems to think that text .CEL files are 'data' and not text so I have to be more clever than just using 'file' to figure it out... > I've been fighting with samba and disk space all morning to finally > discover that while I have all the right programs installed, the > 600MB of space I have left is nowhere near enough to extract the text > data files, so I'm going to have to go down to my wife's lab and do > it there. I hope to have this done sometime today, but I dunno when. Harry, is this possibly the cause of the massive thrashing you experienced when loading the test data? I was surprised by the memory requirements, but I don't see how this could be related (just thrashing for a cause). Cheers, jas. |
From: <ja...@op...> - 2004-12-11 17:37:04
|
Hrishikesh Deshmukh <hde...@gm...> writes: > Here is the snap shot where i have to confirm QT, i looked at the > files you had in /home/jasons/data/Lymph-data/ and my .CEL files look > the same. yup, they are kosher. > So please take a look at the snap shot and tell me whether this is > right/wrong?! They are right. > Harry is this what you see or there is some difference! Nope, Harry had binary .CEL files > All i did was insert experiment and then MBAD and then select the > right parameters from the drop down list and here is the output/result: > > > Genex Job Status Page > > Your job is finished. The status is: SUCCESS That means your data is in the DB, all safe and sound. If we had a nice report sheet, you could actually see how many spots were entered, etc. But we don't. I'd love for people to give some ideas for useful reports. Apparently BASE has some really nice ones. Cheers, jas. |