genex-dev Mailing List for GeneX Gene Expression Database (Page 2)

Status: Beta

Brought to you by: jason_e_stewart, jwweller, mangalam

genex-dev — GeneX developers

You can subscribe to this list here.

2001	Jan (135)	Feb (57)	Mar (84)	Apr (43)	May (77)	Jun (51)	Jul (21)	Aug (55)	Sep (37)	Oct (56)	Nov (75)	Dec (23)
2002	Jan (32)	Feb (174)	Mar (121)	Apr (70)	May (55)	Jun (20)	Jul (23)	Aug (15)	Sep (12)	Oct (58)	Nov (203)	Dec (90)
2003	Jan (37)	Feb (15)	Mar (14)	Apr (57)	May (7)	Jun (40)	Jul (36)	Aug (1)	Sep (56)	Oct (38)	Nov (105)	Dec (2)
2004	Jan	Feb (117)	Mar (69)	Apr (160)	May (165)	Jun (35)	Jul (7)	Aug (80)	Sep (47)	Oct (23)	Nov (8)	Dec (42)
2005	Jan (19)	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2007	Jan	Feb	Mar	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 2 3 4 .. 109 > >> (Page 2 of 109)

[GeneX-dev] Re: Questions about command line interface for loading "bulk" data

From: <ja...@op...> - 2004-12-31 02:05:55

Hrishikesh Deshmukh <hde...@gm...> writes:

> 1) I assume for the options exp,ad,dtdim, fe the primary keys are the
> accession numbers which are generated once we use the GUI to insert
> experiment, ad and qt-dim?!

Yes, correct. If this isn't clear in the documentation for the script,
let me know, and suggest some wording as to how to improve it, and I
will be happy to make it more clear.

> 2) Since for Affy we load CDF and thats our AD as well, so i just take
> the accession number from the CDF table (there is none in the browsing
> interface).

CDF files are just Affy's way of specifying an ArrayDesign. So there
is no 'CDF' table, it is entered into the ArrayDesign table.

> 3) Correct me on this one: I have to use GUI to insert exp, ad and
> qt-dim, fe  and then use command line interface to load the "bulk"
> files?!!! Basically create the file as shown in the example with PBA
> and MBA......

Correct.

The FE SW must already be in the system - no one but a developer can
add new ones. After the trouble I had explaining the regexp based
system, I realized it was too complicated. So now, all the information
is hard coded in the file parsing system.

I am still (slowly) working on the generic tool to enable loading of
arbitrary files, and I have only one more step to complete and I can
fully test it, document it, and release it to you for more testing. 

Creating an Experiment is something that is done once for each, well,
experiment. So once it is created you can continue to load new data
into it as more hybridizations are completed in the lab.

The AD and QT Dim will only be loaded once, and then re-used again and
again.

Once these pieces are in the DB (using the GUI tools), then you can
use either the GUI loader to load multiple files (more work I believe)
or the command line tool together with the load file (less work when
there are many arrays to load - I believe).

> Sorry the last question sounds really stupid but i want to make sure
> that the workflow part can be smoothed out incase there is any need
> and the same questions can go in the FAQ/docs.

No problems!

Happy New Year (from Chennai),
jas.

Re: [GeneX-dev] Re: Questions about command line interface for loading "bulk" data

From: Hrishikesh D. <hde...@gm...> - 2004-12-30 16:42:46

Hi Jason,

I have some questions regarding the workflow of using command line 
interface to load "bulk" data.

1) I assume for the options exp,ad,dtdim, fe the primary keys are the 
accession numbers which are generated once we use the GUI to insert 
experiment, ad and qt-dim?!

2) Since for Affy we load CDF and thats our AD as well, so i just take 
the accession number from the CDF table (there is none in the browsing 
interface).

3) Correct me on this one: I have to use GUI to insert exp, ad and 
qt-dim, fe  and then use command line interface to load the "bulk" 
files?!!! Basically create the file as shown in the example with PBA and 
MBA......

Sorry the last question sounds really stupid but i want to make sure 
that the workflow part can be smoothed out incase there is any need and 
the same questions can go in the FAQ/docs.

Cheers,
Hrishi





Jason E. Stewart wrote:
> Hrishikesh Deshmukh <hde...@gm...> writes:
> 
> 
>>I hope there were no problems in Madurai,coz of tsunami?
> 
> 
> Actually, I've been in the Pondicherry/Auroville area for the past 6
> weeks. I just left last week and went south to Nagapattnam (the huge
> church at Vaillankani), and then went inland to Thanjavor. Nothing got
> this far in India, so yes I am fine.
> 
> 
>>I want to give a shot at using command line interface to load "bulk"
>>data. Could you please explain this part:
>>
>>You use it like:
>> >
>> >   run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \
>> >      --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679
>>
>>1)What are these numbers? I understand "ad" stands for ArrayDesign but
>>what are "ro" and "rw", meanwhile you are at it could you explain all,
>>this i will doc them.
> 
> 
> They are documented already:
> 
>   run-mbad-insert.pl --man
> 
> for verbose details
> 
>   run-mbad-insert.pl --help
> 
> for brief description. All my commonly used scripts support the new
> --man flag that give full details. All the rest support the --help
> flag which give a brief output.
> 
> Please read the documentation and I will anwser anything that isn't
> clear.
> 
> Also, please note that Perl scripts that use the Getopt::Long system
> (which all the Genex scripts do) can abbreviate any command line
> option down to any unique piece. This is just to save typing on the
> command line. For example, '--qtdim_pk' in the above example was just
> listed as '--qt' - because --qt is unabiguous to the option parser.
> 
> 
>>2)How do i get them?
> 
> 
> You can either use the command line DB tool, psql, and type SQL
> queries, or you can use the genex WWW table browser ("Browse by Table"
> under the "Browsing" tab) - I find this convenient. Just choose the
> table you want, e.g. Experiment, ArrayDesign, GroupSec, etc, then you
> will be given a table of the first 100 DB entries (need to fix that),
> the 'Accession Number' column is what you want to use - they are the
> primary key values.
> 
> 
>>3)I assume the .CEL files (they are not binary) which i have do not
>>need to be renamed to .txt
> 
> 
> Correct, names do not matter.
> 
> Cheers,
> jas.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now. 
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> Genex-dev mailing list
> Gen...@li...
> https://lists.sourceforge.net/lists/listinfo/genex-dev

[GeneX-dev] Re: Questions about command line interface for loading "bulk" data

From: <ja...@op...> - 2004-12-29 02:40:48

Hrishikesh Deshmukh <hde...@gm...> writes:

> I hope there were no problems in Madurai,coz of tsunami?

Actually, I've been in the Pondicherry/Auroville area for the past 6
weeks. I just left last week and went south to Nagapattnam (the huge
church at Vaillankani), and then went inland to Thanjavor. Nothing got
this far in India, so yes I am fine.

> I want to give a shot at using command line interface to load "bulk"
> data. Could you please explain this part:
>
> You use it like:
>  >
>  >   run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \
>  >      --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679
>
> 1)What are these numbers? I understand "ad" stands for ArrayDesign but
> what are "ro" and "rw", meanwhile you are at it could you explain all,
> this i will doc them.

They are documented already:

  run-mbad-insert.pl --man

for verbose details

  run-mbad-insert.pl --help

for brief description. All my commonly used scripts support the new
--man flag that give full details. All the rest support the --help
flag which give a brief output.

Please read the documentation and I will anwser anything that isn't
clear.

Also, please note that Perl scripts that use the Getopt::Long system
(which all the Genex scripts do) can abbreviate any command line
option down to any unique piece. This is just to save typing on the
command line. For example, '--qtdim_pk' in the above example was just
listed as '--qt' - because --qt is unabiguous to the option parser.

> 2)How do i get them?

You can either use the command line DB tool, psql, and type SQL
queries, or you can use the genex WWW table browser ("Browse by Table"
under the "Browsing" tab) - I find this convenient. Just choose the
table you want, e.g. Experiment, ArrayDesign, GroupSec, etc, then you
will be given a table of the first 100 DB entries (need to fix that),
the 'Accession Number' column is what you want to use - they are the
primary key values.

> 3)I assume the .CEL files (they are not binary) which i have do not
> need to be renamed to .txt

Correct, names do not matter.

Cheers,
jas.

[GeneX-dev] Questions about command line interface for loading "bulk" data

From: Hrishikesh D. <hde...@gm...> - 2004-12-28 16:13:46

Jason,

I hope there were no problems in Madurai,coz of tsunami?

I want to give a shot at using command line interface to load "bulk" 
data. Could you please explain this part:

You use it like:
 >
 >   run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \
 >      --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679

1)What are these numbers? I understand "ad" stands for ArrayDesign but 
what are "ro" and "rw", meanwhile you are at it could you explain all, 
this i will doc them.
2)How do i get them?
3)I assume the .CEL files (they are not binary) which i have do not need 
to be renamed to .txt

Eagerly waiting for your reply.

Thanks,
Hrishi



Jason E. Stewart wrote:
> Hi Hrishi,
> 
> Sorry, I let this one slip.
> 
> hde...@gm... writes:
> 
> 
>>Can one do a "bulk" upload of data?
>>I have like 200 .CEL files for the dataset!
> 
> 
> Yes. 
> 
> You can either select 200 files in the GUI - maybe a bit tedious, or
> you can use the command line interface. To learn how to use it type:
> 
>   /usr/local/genex/bin/mbad-insert.pl --man
> 
> this spits out the documentation (you can use --help for the brief
> docs if you just want a reminder what the required options are). All
> the genex Perl scripts honor the --help flag, some of the more
> important ones now honor --man, I convert the scripts a little at a
> time to the new self-documentation system.
> 
> Anyway...
> 
>       required parameters:
>         --username=name   : the DB username to login as
>         --password=word   : the DB password to login with
>         --mba_name=name   : the MeasuredBioAssay name (list)
>         --pba_name=name   : the PhysicalBioAssay name (list)
>         --ad_pk=pk        : the ArrayDesign used
>         --exp_pk=pk       : the primary key of the Experiment
>         --qtdim_pk=pk     : the primary key of the QuantitationTypeDimension
>         --fe_sw_pk=pk     : the primary key of the FeatureExtraction SW
>         --ro_group_id=pk  : the primary key of the read-only group
>         --rw_group_id=pk  : the primary key of the read/write group
> 
> are the required params. You need to use the genex browsing interface
> to get all the pkey numbers for Experiment, ArrayDesign, QTD, FE SW,
> read and write groups. The only tricky bit is --mba_name and
> --pba_name. These need to be specified once for each input file - in
> the same order that you list the files.
> 
> To make things easier, I just wrote a helper utility,
> run-mbad-insert.pl, that reads the name and file information in from a
> tab-delimited text file.
> 
> You use it like:
> 
>   run-mbad-insert.pl --file /tmp/file.txt --user genex --pass XXX \
>      --ad 1133 --exp 1195 --qt 756 --fe 752 --ro 671 --rw 679
> 
> here is an example input file:
> 
> /usr/local/genex/uploads/genex/200-short.txt	PBA 200-short2	MBA 200-short2
> /usr/local/genex/uploads/genex/201-short.txt	PBA 201-short2	MBA 201-short2
> /usr/local/genex/uploads/genex/202-short.txt	PBA 202-short2	MBA 202-short2
> 
> spaces are allowed in the names, just not tabs. run-mbad-insert.pl is
> installed and tested on genex2 in /usr/local/genex/bin.
> 
> BTW - the svn checkout directory on genex2 is:
> 
>   /home/jasons/work/genex-server
> 
> in case you need to update something.
> 
> Cheers,
> jas.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now. 
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> Genex-dev mailing list
> Gen...@li...
> https://lists.sourceforge.net/lists/listinfo/genex-dev

Re: [GeneX-dev] Installation errors...

From: <ja...@op...> - 2004-12-21 15:33:40

dcarr2 <dc...@gm...> writes:

> key 'DBNAME' does not exist at Perl/scripts/tabledef.pl line 25
> Died at /usr/local/genex/bin/create-genex-db.pl line 319, <IN> chunk 5.
> Died at Perl/scripts/gendb.pl line 115.
>
>   FATAL ERROR

I was finally able to switch Bio::Genex::Config to use StrictHash -
which gives a fatal error if you try to access a hash key that doesn't
exist - in so doing I've uncovered some errors such as the one above
which used to silently fail but now it causes a fatal error.

I'm more than a little confused how this one snuck through the
cracks... It must mean that I did not run a test install on either my
laptop or on genex2... Very embarassing.

I've just committed revision 1725 which fixes the problem.

Cheers,
jas.

[GeneX-dev] Installation errors...

From: dcarr2 <dc...@gm...> - 2004-12-20 20:16:14

Attachments: install-errors.txt

Harry and Jason,

Due to Wellerlab2's crash a couple weeks ago I have had a chance to go 
back through the full installation process with the latest from genex2 
on a 'virgin' system.

I had fewer problems this time and got to the installation point. i.e. 
'as root   make install'  I have recieved the following fatal error....

I have also attached the install-errors file.

Thanks,
Andrew


-----------------------------------------------------------------------------------------------------------------------
key 'DBNAME' does not exist at Perl/scripts/tabledef.pl line 25
Died at /usr/local/genex/bin/create-genex-db.pl line 319, <IN> chunk 5.
Died at Perl/scripts/gendb.pl line 115.

  FATAL ERROR

I got an error when I ran the DB installer.


!! System Error:  @ line: 380  (su -c '/usr/bin/perl 
/home/acarr/genex-server/Perl/scripts/db.pl' postgres)

  FATAL ERROR

I got an error when I ran the DB installer.

make: *** [install] Error 9
--------------------------------------------------------------------------------------------------------------------------

Re: [GeneX-dev] Affy file parsing code

From: Harry M. <hj...@ta...> - 2004-12-17 05:27:24

On Thursday 16 December 2004 6:58 pm, Jason E. Stewart wrote:
> the main issue is the .ini type format with the different header
> sections [CEL], [INTENSITY], etc. And there needs to be a blank
> line between sections. If you look in
> Perl/Bio-Genex/Genex/Connect.pm in the load_data_cel() method, you
> will see what sections I actually use, and which lines are used.
>
> Currently only the INTENSITY section is critical. I don't know what
> the MASKS, OUTLIERS, and MODIFIED sections are for, so I don't use
> them or store them in the DB.

OK - I've gotten that wrangled fine (and found out what the other 
sections are for as well, not that we need them) - I'm just adding 
some globbing stuff so that it can be used to process multiple files 
at once and I'll send it to you to see how it works.  Still haven't 
gotten it to work in Inline, but there were other things wrong at the 
time as well (learning C++ as I go.. - seems to me that C++ is to C 
as Java is to Perl (hint - it's not a compliment...).
-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... 
            <<plain text preferred>>

Re: [GeneX-dev] Affy file parsing code

From: <ja...@op...> - 2004-12-17 03:01:56

Harry J Mangalam <hj...@ta...> writes:

> I've been playing with teh Affy code mentioned recently and have gotten it to 
> compile and validate that the CEL file stuff seems to work OK.
>
> I'm going to try hacking it a bit to produce the same kind of text output that 
> the GCOS system exports (it runs faster than the GCOS output and we can 
> script it to do multiple files at once - an issue for Hrishi.
>
> Does the GeneX loader need the exact format that the current format uses:
>
> or if not, what are the critical bits that do need to be parsed out to load 
> into GeneX?  I'll try to duplicate the format as much as possible, but if 
> what's critical and what's not?

the main issue is the .ini type format with the different header
sections [CEL], [INTENSITY], etc. And there needs to be a blank line
between sections. If you look in Perl/Bio-Genex/Genex/Connect.pm in
the load_data_cel() method, you will see what sections I actually use,
and which lines are used.

Currently only the INTENSITY section is critical. I don't know what
the MASKS, OUTLIERS, and MODIFIED sections are for, so I don't use
them or store them in the DB.

Cheers,
jas.

[GeneX-dev] Affy file parsing code

From: Harry J M. <hj...@ta...> - 2004-12-16 15:59:22

HI Jason,

I've been playing with teh Affy code mentioned recently and have gotten it to 
compile and validate that the CEL file stuff seems to work OK.

I'm going to try hacking it a bit to produce the same kind of text output that 
the GCOS system exports (it runs faster than the GCOS output and we can 
script it to do multiple files at once - an issue for Hrishi.

Does the GeneX loader need the exact format that the current format uses:

[CEL]
Version=3

[HEADER]
Cols=1164
Rows=1164
TotalX=1164
TotalY=1164
OffsetX=0
OffsetY=0
GridCornerUL=231 244
GridCornerUR=8422 219
GridCornerLR=8452 8419
GridCornerLL=261 8444
Axis-invertX=0
AxisInvertY=0
swapXY=0
DatHeader=[23..65534]  100704A-01_OLDTR7_+DOX:CLS=8660 RWS=8660 XIN=1  YIN=1  
VE=30        2.0 11/02/04 12:41:30 50102870  M10        HG-U133_Plus_2.1sq                           
6
Algorithm=Percentile
AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000

[INTENSITY]
NumberCells=1354896
CellHeader=X Y MEAN STDV NPIXELS
  0   0 130.0 23.8  25
  1   0 17316.0 2283.8  25
  2   0 186.0 51.9  25
  3   0 17501.0 2171.7  25
  4   0 180.0 31.4  25
 
           <<snip>>

1161 1163 240.0 58.7  25
1162 1163 17202.0 2896.0  25
1163 1163 176.0 34.6  25

[MASKS]
NumberCells=0
CellHeader=X Y

[OUTLIERS]
NumberCells=155
CellHeader=X Y
331 0
435 0
651 0
809 0
831 0
893 0
969 0
987 0
            <<snip>>

439 1163
553 1163
627 1163

[MODIFIED]
NumberCells=0
CellHeader=X Y ORIGMEAN

or if not, what are the critical bits that do need to be parsed out to load 
into GeneX?  I'll try to duplicate the format as much as possible, but if 
what's critical and what's not?


-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hj...@ta... 
            <<plain text preferred>>

[GeneX-dev] Re: Fwd: [BioC] LGPL C++ source code available from Affy for reading their file f ormats

From: <ja...@op...> - 2004-12-14 16:28:16

Harry Mangalam <hj...@ta...> writes:

> I had found this code this morning and as you phoned I was watching 
> the zipfile expand.  Code looks pretty concise.  I'll play around 
> with it and see if I can get it to sit up and bark.
>
> re: Perl - XS or inline?

XS too much of a pain. Either inline or SWIG.

Cheers,
jas.

[GeneX-dev] [ genex-Bugs-1085191 ] loading cel files takes too much mem

From: SourceForge.net <no...@so...> - 2004-12-14 15:46:02

Bugs item #1085191, was opened at 2004-12-14 15:46
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1085191&group_id=16453

Category: Administrative Apps
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Harry Mangalam (mangalam)
Assigned to: Jason E. Stewart (jason_e_stewart)
Summary: loading cel files takes too much mem

Initial Comment:
During a load of text formatted cel file of a u133_plus_2 cel 
file, after repeatedly crashing on a 512M/512M (real/swap) 
machine (load going up to ~20), it ran on a 2G/4G machine 
without problems, altho the perl process ate almost 
800MB.  watching the load, it was the perl script, not the 
postgres process that ate the mem.  Jason has said that 
the problem is due to perl building a huge data struct in 
mem, THEN writing it to the DB instead of doing it on a 
smaller basis.  apparently an easy fix, but medium priority 
at this point, unless it impinges on loading multiple files at 
once.  All the test machines (except my laptop) have 
enough mem to do this. 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116453&aid=1085191&group_id=16453

[GeneX-dev] Re: Fwd: [BioC] LGPL C++ source code available from Affy for reading their file f ormats

From: <ja...@op...> - 2004-12-14 01:33:44

hrishikesh deshmukh <d_h...@ya...> writes:

> --- "Wittner, Ben" <Wit...@mg...> wrote:
>
>> From: "Wittner, Ben" <Wit...@mg...> To:
>> "'bio...@st...'" <bio...@st...>
>> Date: Mon, 13 Dec 2004 08:04:12 -0500 Subject: [BioC] LGPL C++
>> source code available from Affy for reading their file f ormats
>> 
>> I think Bioconductor packages have not been able to read the newer
>> Affy file formats (for good reasons explained in a post by Rafael
>> I.). But I just got an e-mail from Affy saying that they are making
>> a library available under LGPL (see below). So, just in case anyone
>> who should know does not already know...
>> 
>> File Parsers SDK - OPEN SOURCE We are pleased to announce the
>> release of C++ source code for the parsing of Affymetrix CEL, CHP,
>> CDF, BAR and BPMAP files. This souce code is being provided under
>> the GNU Lesser General Public License (LGPL).
>> 
>> 
>> Click here to download the SDK <A
>>
> HREF="http://www.affymetrix.com/redirect/email.jsp?source=nl200412adn&dest=/support/developer/filesdk/index.affx">here</A>

Ah, thanks Hrishi. This is definately useful.

I'll have to take a look. The only concern is that it is windows based
code and will have to be modified for Unix. 

I'll toss it at SWIG and we'll have a Perl binding for it, and that
will be useful for everyone.

Cheers,
jas.