You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
(34) |
Aug
(14) |
Sep
(10) |
Oct
(10) |
Nov
(11) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(56) |
Feb
(76) |
Mar
(68) |
Apr
(11) |
May
(97) |
Jun
(16) |
Jul
(29) |
Aug
(35) |
Sep
(18) |
Oct
(32) |
Nov
(23) |
Dec
(77) |
2004 |
Jan
(52) |
Feb
(44) |
Mar
(55) |
Apr
(38) |
May
(106) |
Jun
(82) |
Jul
(76) |
Aug
(47) |
Sep
(36) |
Oct
(56) |
Nov
(46) |
Dec
(61) |
2005 |
Jan
(52) |
Feb
(118) |
Mar
(41) |
Apr
(40) |
May
(35) |
Jun
(99) |
Jul
(84) |
Aug
(104) |
Sep
(53) |
Oct
(107) |
Nov
(68) |
Dec
(30) |
2006 |
Jan
(19) |
Feb
(27) |
Mar
(24) |
Apr
(9) |
May
(22) |
Jun
(11) |
Jul
(34) |
Aug
(8) |
Sep
(15) |
Oct
(55) |
Nov
(16) |
Dec
(2) |
2007 |
Jan
(12) |
Feb
(4) |
Mar
(8) |
Apr
|
May
(19) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
(12) |
Oct
(3) |
Nov
|
Dec
|
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(21) |
2009 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(19) |
Jun
(14) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(22) |
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Chris S. <sto...@pc...> - 2007-10-05 02:01:22
|
Hi Haiming, Another view of NAFeature used to handle GenBank qualifiers (that we were not using at the time the DoTS schema was created) is DoTS.Miscellaneous http://www.gusdb.org/SchemaBrowser/table.htm? schema=DoTS&table=Miscellaneous Note that most of the columns are of type string so some of this information could be packed into EVIDENCE or PCR_CONDITIONS. Between that view and DoTS.Source you might find a field to put things. Not a clean solution but may be sufficient for now. What is the anticipated use of those pieces of information? Is this primarily archival? descriptive? Or will there be queries of the nature: provide all isolates from the USA collected after the year 2000? For the first two cases, proper structuring is less urgent but for the latter we would certainly want to add the appropriate attributes. Frank is working on a 3.6 version of GUS and this could be a candidate for inclusion. Cheers, Chris Chris Stoeckert, Ph.D. Research Professor, Dept. of Genetics 1415 Blockley Hall, Center for Bioinformatics 423 Guardian Dr., University of Pennsylvania Philadelphia, PA 19104 Ph: 215-573-4409 FAX: 215-573-3111 http://www.cbil.upenn.edu On Oct 4, 2007, at 11:08 AM, Haiming Wang wrote: > Hi, > > I'm loading isolate genbank records into GUS using > GUS::Supported::Plugin::InsertSequenceFeatures. It works well > except some qualifiers (under source) do not have corresponding > columns in DOTS.SOURCE table. Those qualifiers include "country, > collection_date, collected_by, environmental_sample, PCR_primers, > virion...", such as > > /organism="Cryptosporidium environmental sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#1" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Jun-2001" > /collected_by="Kristen Jellison" > PCR_primers=fwd_name: CPB-DIAGF, rev_name:CPB-DIAGR" > ... > > Please refer to the attached genbank file - 117671303.gb, for details > > I'd like to know if we intend to add those qualifier-match columns > in DOTS.SOURCE table. Otherwise, we need several special handlers > to deal with them. Thanks for your inputs! > > -Haiming > > > p.s. all the qualifiers in Genbank source > > Feature Key source > > Mandatory qualifiers /organism="text" > /mol_type="genomic DNA", "genomic RNA", "mRNA"... > Optional qualifiers /cell_line="text" > /cell_type="text" > /chromosome="text" > /citation=[number] > /clone="text" > /clone_lib="text" > /collected_by="text" > /collection_date="text" > /country="<country_value>[:<region>][, > <locality>]" > /cultivar="text" > /db_xref="<database>:<identifier>" > /dev_stage="text" > /ecotype="text" > /environmental_sample > /focus > /frequency="text" > /germline > /haplotype="text" > /identified_by="text" > /isolate="text" > /isolation_source="text" > /label=feature_label > /lab_host="text" > /lat_lon="text" > /macronuclear > /map="text" > /note="text" > /organelle=<organelle_value> > /PCR_primers="[fwd_name: XXX, ]fwd_seq: xxxxx, > [rev_name: YYY, ]rev_seq: yyyyy" > /plasmid="text" > /pop_variant="text" > /proviral > /rearranged > /segment="text" > /serotype="text" > /serovar="text" > /sex="text" > /specimen_voucher="text" > /specific_host="text" > /strain="text" > /sub_clone="text" > /sub_species="text" > /sub_strain="text" > /tissue_lib="text" > /tissue_type="text" > /transgenic > /variety="text" > /virion > LOCUS EF060289 425 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#1 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060289 > VERSION EF060289.1 GI:117671303 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 425) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 425) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..425 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#1" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Jun-2001" > /collected_by="Kristen Jellison" > /PCR_primers="fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>425 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gctaatattc tatgtaaatg cttttgcctt > tacatgttgt > 61 tagccttccc cgtattactg cttcagtatg cggattttac tttgagaaaa > ttagagtgct > 121 taaagcaggc ttttgccttg aatactccag catggaataa tattaaagat > ttttatcttt > 181 cttattggtt ctaagataaa aataatgatt aatagggaca gttgggggca > tttgtattta > 241 acagtcagag gtgaaattct tagatttgtt aaagacaaac tagtgcgaaa > gcatttgcca > 301 aggatgtttt cattaatcaa gaacgaaagt taggggatcg aagacgatca > gataccgtcg > 361 tagtcttaac cataaactat gccaactaga gattggaggt tgttccttac > tccttcagca > 421 cctta > // > LOCUS EF060290 425 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#2 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060290 > VERSION EF060290.1 GI:117671304 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 425) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 425) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..425 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#2" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Jun-2001" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>425 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gctaatattc tatgtaaatg cttttgcctt > tacatgttgt > 61 tagccttccc cgtattactg cttcagtatg cggattttac tttgagaaaa > ttagagtgct > 121 taaagcaggc tcttgccttg aatactccag catggaataa taccaaggat > ttttgtcctt > 181 cttattggtt ctaggataga aataatgatt aatagggaca gttgggggca > tttgtattta > 241 acagccagag gtgaaattct tagacttgtt aaagacaaac tagtgcgaaa > gcatttgcca > 301 aggatgtttt cattaatcaa gaacgaaagt taggggatcg aagacgatca > gataccgtcg > 361 tagtcttaac cataaactat gccaactaga gattggaggt tgttccttac > tccttcagca > 421 cctta > // > LOCUS EF060291 429 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#3 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060291 > VERSION EF060291.1 GI:117671305 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 429) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 429) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..429 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#3" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Aug-2001" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>429 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttagtgtct tatatcttgt actttattgt > acttgtatag > 61 tactaacata attcatatta ctttttagta ttatgaaatt ttactttgag > aaaattagag > 121 tgcttaaagc aggcttttgc cttgaatact ccagcatgga ataatacgaa > ggatttttat > 181 ctttcttatt ggttctaaga taaaaataat ggttaatagg aacagttggg > ggcatttgta > 241 tttaacagtc agaggtgaaa ttcttagatt tgttaaagac aaactaatgc > gaaagcattt > 301 gccaaggatg ttttcattaa tcaagaacga aagttagggg atcgaagacg > atcagatacc > 361 gtcgtagtct taaccataaa ctatgccaac tagagattgg aggttgttcc > ttactccttc > 421 agcacctta > // > LOCUS EF060292 430 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#4 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060292 > VERSION EF060292.1 GI:117671306 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 430) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 430) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..430 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#4" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Nov-2001" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>430 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttaattgtt tatatgattt atcttataat > atttatatag > 61 cattaacata attcatatta ctatttttag tatatgaaat tttactttga > gaaaattaga > 121 gtgcttaaag caggcttttg ccttgaatac tccagcatgg aataatatta > aagattttta > 181 tctttcttat tggttctaag atagaaataa tgattaatag ggacagttgg > gggcatttgt > 241 atttaacagt cagaggtgaa attcttagat ttgttaaaga caaactagtg > cgaaagcatt > 301 tgccaaggat gttttcatta atcaagaacg aaagttaggg gatcgaagac > gatcagatac > 361 cgtcgtagac ttaaccataa actatgccaa ctagagattg gaggttgttc > cttactcctt > 421 cagcacctta > // > LOCUS EF060293 432 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#5 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060293 > VERSION EF060293.1 GI:117671307 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 432) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 432) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..432 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#5" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Nov-2001" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>432 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttaattgtt tatatttgtt atctattgat > acttgtataa > 61 cactaacata attcatatta ctatttatct agtatatgaa agtttacttt > gagaaaatta > 121 gagtgcttaa agcaggcttt tgccttgaat actccaacat ggaataatat > aaaagatttt > 181 tatctttctt attggttcta agatagaaat aatgattaat agggacagtt > gggggcattt > 241 gtatttaaca gtcagaggtg aaattcttag atttgttaaa gacaaactag > tgcgaaagca > 301 tttgccaagg atgttttcat taatcaagaa cgaaagttag gggatcgaag > acgatcagat > 361 accgccgtag tcttaaccat aaactatgcc aactagagat tggaggttgt > tccttactcc > 421 ttcagcacct ta > // > LOCUS EF060294 432 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone JF#6 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060294 > VERSION EF060294.1 GI:117671308 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 432) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 2 (bases 1 to 432) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..432 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="JF#6" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Nov-2001" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>432 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttaattgtt tatatttgtt atctattgat > acttgtataa > 61 cattaacata attcatatta ctatttatct agtatatgaa attttacttt > gagaaaatta > 121 gagtgcttaa agcaggcttt tgccttgaat actccagcat ggaataatat > taaagatttt > 181 tatcattctt attggttcta agatagaaat aatgattaat agggacagtt > gggggcattt > 241 gtatttaaca gtcagaggtg aaattcttag atttgttaaa gacaaactag > tgcgaaagca > 301 tttgccaagg atgttttcat taatcaagaa cgaaagttag gggatcgaag > acgatcagat > 361 accgtcgtag tcataaccat aaactatgcc aactagagat tggaggttgt > tccttactcc > 421 ttcagcacct ta > // > LOCUS EF060295 425 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone Cow 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060295 > VERSION EF060295.1 GI:117671309 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 425) > AUTHORS Jellison,K.L., Hemond,H.F. and Schauer,D.B. > TITLE Sources and species of Cryptosporidium oocysts in the > Wachusett > Reservoir watershed > JOURNAL Appl. Environ. Microbiol. 68 (2), 569-575 (2002) > PUBMED 11823192 > REFERENCE 2 (bases 1 to 425) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 3 (bases 1 to 425) > AUTHORS Jellison,K.L., Distel,D.L. and Hemond,H.F. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, United States > FEATURES Location/Qualifiers > source 1..425 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="adult Bos taurus" > /db_xref="taxon:310753" > /clone="Cow" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Jun-2000" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>425 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttaattttt atatataata tcacgatatt > tatataatat > 61 taacataatt catattactt tttagtatat gaaactttac tttgagaaaa > ttagagtgct > 121 taaagcaggc tattgccttg aatactccag catggaataa tattaaggat > ttttattctt > 181 cttattggtt ctagaataaa aatgatgatt aatagggaca gttgggggca > tttgtattta > 241 acagtcagag gtgaaattct tagatttgtt aaagacaaac tactgcgaaa > gcatttgcca > 301 aggatgtttt cattaatcaa gaacgaaagt taggggatcg aagacgatca > gataccgtcg > 361 tagtcttaac cattaactat gccaactaga gattggaggt tgttccttac > tccttcagca > 421 cctta > // > LOCUS EF060296 431 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone Manure 18S > small > subunit ribosomal RNA gene, partial sequence. > ACCESSION EF060296 > VERSION EF060296.1 GI:117671310 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 431) > AUTHORS Jellison,K.L., Hemond,H.F. and Schauer,D.B. > TITLE Sources and species of Cryptosporidium oocysts in the > Wachusett > Reservoir watershed > JOURNAL Appl. Environ. Microbiol. 68 (2), 569-575 (2002) > PUBMED 11823192 > REFERENCE 2 (bases 1 to 431) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 3 (bases 1 to 431) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..431 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="farm manure pit" > /db_xref="taxon:310753" > /clone="Manure" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Jun-2000" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>431 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gttgtataat ttataatatt accaaggtaa > ttattatatt > 61 atcaacatcc ttcctattat attctaaata tataggaaat tttactttga > gaaaattaga > 121 gtgcttaaag caggcaactg ccttgaatac tccagcatgg aataataagt > aaggactttt > 181 gtctttctta ttggttctag gacaaaagta atggttaata gggacagttg > ggggcattcg > 241 tatttaacag ccagaggtga aattcttaga tttgttaaag acgaactact > gcgaaagcat > 301 ttgccaagga tgttttcatt aatcaagaac gaaagttagg ggatcgaaga > cgatcagata > 361 ccgtcgtagt cttaaccata aactatgccg actagagatt ggaggttgtt > ccttactcct > 421 tcagcacctt a > // > LOCUS EF060297 431 bp DNA linear ENV > 14-NOV-2006 > DEFINITION Cryptosporidium environmental sequence clone SF 18S > small subunit > ribosomal RNA gene, partial sequence. > ACCESSION EF060297 > VERSION EF060297.1 GI:117671311 > KEYWORDS ENV. > SOURCE Cryptosporidium environmental sequence > ORGANISM Cryptosporidium environmental sequence > Eukaryota; Alveolata; Apicomplexa; Coccidia; > Eucoccidiorida; > Eimeriorina; Cryptosporidiidae; Cryptosporidium; > environmental > samples. > REFERENCE 1 (bases 1 to 431) > AUTHORS Jellison,K.L., Hemond,H.F. and Schauer,D.B. > TITLE Sources and species of Cryptosporidium oocysts in the > Wachusett > Reservoir watershed > JOURNAL Appl. Environ. Microbiol. 68 (2), 569-575 (2002) > PUBMED 11823192 > REFERENCE 2 (bases 1 to 431) > AUTHORS Jellison,K.L., Distel,D.L., Hemond,H.F. and Schauer,D.B. > TITLE Phylogenetic analysis implicates birds as an important > source of > Cryptosporidium spp. oocysts in agricultural watersheds > JOURNAL Unpublished > REFERENCE 3 (bases 1 to 431) > AUTHORS Jellison,K.L. > TITLE Direct Submission > JOURNAL Submitted (12-OCT-2006) Civil and Environmental > Engineering, Lehigh > University, Fritz Engineering Laboratory, 13 E. Packer > Avenue, > Bethlehem, PA 18015, USA > FEATURES Location/Qualifiers > source 1..431 > /organism="Cryptosporidium environmental > sequence" > /mol_type="genomic DNA" > /isolation_source="environmental water sample" > /db_xref="taxon:310753" > /clone="SF" > /environmental_sample > /country="USA: Massachusetts" > /collection_date="Mar-1999" > /collected_by="Kristen Jellison" > /note="PCR_primers=fwd_name: CPB-DIAGF, rev_name: > CPB-DIAGR" > rRNA <1..>431 > /product="18S small subunit ribosomal RNA" > ORIGIN > 1 aagctcgtag ttggatttct gctgtataat ttataatatt accaaggtaa > ttattatatt > 61 atcaacatcc ttcctattat attctaaata tataggaaat tttactttga > gaaaattaga > 121 gtgcttaaag caggcaactg ccttgaatac tccagcatgg aataataagt > aaggactttt > 181 gtctttctta ttggttctag gacaaaagta atggttaata gggacagttg > ggggcattcg > 241 tatttaacag ccagaggtga agttcttaga tttgttaaag acgaactact > gcgaaagcat > 301 ttgccaagga tgttttcatt aatcaagaac gaaagttagg ggatcgaaga > cgatcagata > 361 ccgtcgtagt cttaaccata aactatgccg actagagatt ggaggttgtt > ccttactcct > 421 tcagcacctt a > // > > _______________________________________________ > ApiDB mailing list > Ap...@pc... > https://mail.pcbi.upenn.edu/mailman/listinfo/apidb |
From: Margarita R. <rui...@ya...> - 2007-10-03 20:36:26
|
Hello GusDev members, I need to load the cd-hits file result. If somebody already made something similar it could help me, please. My ideia is as follow: The input file format is (cd-hit file results): >Cluster 0 > 0 17392aa, >AAZ14281.1... * > 1 17392aa, >XP_843163.1... at 100% > 2 17392aa, >XP_843163.1... at 100% > >Cluster 1 > 0 10589aa, >AAN35571.1... * > 1 10589aa, >XP_001347658.1... at 100% > 2 10589aa, >XP_001347658.1... at 100% > >Cluster 2 > 0 10287aa, >XP_966264.1... * > 1 10287aa, >XP_966264.1... at 100% > >Cluster 3 > 0 10061aa, >CAD51479.1... * > 1 10061aa, >XP_001351672.1... at 100% > 2 10061aa, >XP_001351672.1... at 100% Then, I think to use three tables for that: Dots.sequencesequencegroup Dots.sequencegroup and Dots.seqgroupexperiment In the Dots.seqgroupexperiment table, I'll put the description of the executed CD-HIT . Ex. (Dots.SeqGroupExperiment.description = "tcruzi vs tcruzi 100%" Dots.SeqGroupExperiment.sequence_source = "tcruzi" Dots.SeqGroupExperiment.percent_identity = "1") For the groups, I'll use Dots.sequencegroup. Ex.(Dots.SequenceGroup.number_of_members = 3 Dots.SequenceGroup.number_of_taxa = 1 Dots.SequenceGroup.min_percent_match = 1 Dots.SequenceGroup.max_percent_match = 1) and in the Dots.sequencesequencegroup, I'll put the sequences for each group. Ex. (sequence_id = aa_sequence_id of the sequence sequence_group_id = the identifier of the group source_table_id = in this case the identifier of the Dots.TranslatedAASequence) Thanks for your help, Margarita Ruiz Oswaldo Cruz Institute Rio de Janeiro, Brazil --------------------------------- ¡Sé un mejor besador! Comparte todo lo que sabes sobre besos en: http://telemundo.yahoo.com/promos/mejorbesador.html |
From: Elisabetta M. <man...@pc...> - 2007-09-15 10:09:06
|
A clarification regarding point (2) and my response to that below. LoadBatchArrayResults is more flexible regarding input format than LoadArrayResults. In fact, LoadArrayResults requires the data_file provided to be in the format specified in the documentation: https://www.cbil.upenn.edu/svn/gus/GusAppFramework/trunk/Supported/doc/LoadArrayResults.html so typically requires some parsing of the original software output prior to being input into this plugin. In LoadBatchArrayResults the software output is assumed to be tab-delimited text, however typically output from programs like MAS4, MAS5, RMAExpress, MOID, GenePix or ArrayVision, can be used as is, but the user needs to provide an xml_file which tells the plugins how this output should be reformatted before the plugins calls LoadSimpleArrayResults (a simplified version of LoadArrayResults that requires a similar data_file input format) to load it into RAD. The files LoadBatchArrayResuts*.xml in https://www.cbil.upenn.edu/svn/gus/GusAppFramework/trunk/Community/config/ are examples of such specifications. Basically they tell the plugin how to map the columns of the software output to columns whose headers are acceptable as data_file input for LoadSimpleArrayResults, i.e. columns compatible with the fields of the view to be populated. It is also possible in this xml file (see GenePix example) to specify how to transform a subset of these columns through a function (e.g. see coordGenePix2RAD). Thus for your APT files for RMA, if they are tab-delimited text, by providing the correct xml_file which tells LoadBatchArrayResult how to read them and map its columns to fields in the RAD.RMAExpress view, you can load them through this plugin (note the xml file will be similar to the RMAExpress.xml example in the website above, but you might have to adjust the input header names according to those in your file). For Plier, as mentioned, there is now view yet. Once the view is created, if the APT output is tab-delimited, you could use LoadBatchArrayResult but first you need to extend its code to accomodate the plier protocol (list of current protocols accepted by LoadBatchArrayResults is at https://www.cbil.upenn.edu/svn/gus/GusAppFramework/trunk/Community/doc/LoadBatchArrayResults.html) and second you will need to create the appropriate xml_file which describes how the APT output should be mapped. As mentioned below though, if in your situation you expect to always use only a couple of software packages for summarization, always with the same type of format, it might be more efficient to write a specific (and simpler) plugin that deals directly only with those. Elisabetta On Fri, 14 Sep 2007, Elisabetta Manduchi wrote: > > Hi Dave, > I'll respond to 2 and 4. For (1) I defer to Junmin. > For (3) all I can say is that it is in our lab's plans to release bug-fixes > and new releases of GUS, however this keeps being postponed due to other > priorities. In the meantime for postresql questions re GUS, John Iodice might > be able to help you. > Getting back to your question (2), first of all, as mentioned in my previous > email we currently have a view for RMA results, but we do not have a view for > Plier results. If you need a view for Plier in your instance of the DB > though, you can simply create such a view with the attributes you need in > your own instance. It would be a view of RAD.CompositeElementResultImp. Once > created, remember to update Core.TableInfo and rebuild GUS, so that the > objects for the new view are in place. > The current available plugins to load data into RAD.CompositeElementResultImp > views are: LoadArrayResult (in Supported) which loads the results of one > assay at a time, and LoadBatchResult which we have already discussed. The > documentation of these plugins, available from svn illustrates, what the > input format should be. The idea guiding the design of these plugins we made > available was that they would be *generic*, i.e. they would be able to take > data from a wide variety of quantification software and load them into RAD. > So we opted for one generic code at the expense of some work to put the input > into the appropriate format. > If a project/lab typically gets files in a particular data format, then it > might be worth for them to write a plugin which is specific to that rather > than using the generic plugin. This way they can use the output as spit out > by the software they use. It is fairly simple to write a plugin specific to > one's needs using the Plugin package. So if you expect to deal most of the > timewith a particular type of output (e.g. from APT) you might consider > writing a specific plugin. > > Regarding your question (4), the answer is no. We do not store images in GUS. > For certain types of images, like microarray images (e.g. files resulting > from scanning, like .TIF or .DAT) we store in the db their uri to the > fileserver (in RAD.Acquisition.uri). > Hope this helps, > Elisabetta > > --- > > On Fri, 14 Sep 2007, Dave Hau wrote: > >> Junmin and Elisabetta, thanks again for your helpful comments. >> >> Couple of questions. >> >> 1. The HG-U133_Plus_2 array annotation file I downloaded from Affymetrix >> is an xml file in MAGE-ML format. On the RAD download page ( >> http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool called >> mage2tab-v0.9, which I assume would be able to convert the annotation file >> to MAGE-TAB format. Then in order to load this MAGE-TAB file into GUS, I >> noticed on the CBIL Lab Meetings web page, for Thursday March 15, 2007, >> Junmin gave a talk on MR-Ti, and the description mentions the loadMageDoc >> GUS plugin. I notice (and have downloaded) a file on the RAD download >> page called "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in >> there. Is there a way for me to obtain this plugin? >> >> 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools (APT) >> package ( >> http://www.affymetrix.com/support/developer/powertools/index.affx ) and >> obtained probe set data for my .CEL files, one set for RMA and another set >> for PLIER. Is there a plugin that will readily load these APT output >> files into GUS as probe set data? >> >> 3. The GUS installation I'm using is top of trunk from the CBIL svn >> repository. This is because I'm using postgresql on the back end, and the >> 3.5 GUS package gave me a lot of problems. These seem to have been fixed >> in the top of trunk. However, in order to use existing plugins, would it >> be advisable to use top of trunk (including the new schema changes for new >> features that Elisabetta mentioned)? If not, is there, or do you plan on >> releasing a bug-fix version of 3.5 that contains bug fixes back-ported to >> 3.5, but does not contain any of the new features not yet released? >> >> 4. Is there any way in RAD or GUS to load pathological images (e.g. >> associated with biosamples used for hybridization) into the GUS database? >> >> Thanks very much, >> Dave >> >> >> >> Junmin Liu wrote: >> > Hi, Dave, >> > Again in line: >> > >> > > > The consensus not to load CEL files into the database - is it >> > > > because we only >> > > > query for probe set data based on the gene, but not for probe cell >> > > > data? If I >> > > >> > > yes typically people query the summarized results at the probe set >> > > level. >> > >> > Generally speaking, schema design and data management have to be in the >> > context of contract or any requirements you are obligated to. >> > >> > Ask the question what is the next if you load CEL? or what is the next >> > if you load array data and etc? >> > >> > GUS and its app stacks certainly will allow you do those things, but it >> > is critical you have some judgement calls. And the cost of loading raw >> > data then querying them out is pretty expensive. >> > >> > > There are multiple choices for where to store array annotation at the >> > > moment. >> > > 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have >> > > been >> > > added to more quickly annotate Affy data with Entrez Genes and RefSeq >> > > info >> > > respectively. >> > > 2. Another possibility is to use the external_database_release_id and >> > > source_id pair in RAD.ShortOligoFamily to point to one preferred >> > > annotation for each probe set (but you would have to choose one). >> > > 3. Another, less structured possibility, is to use >> > > RAD.CompositeElementAnnotation, where you use the attribute 'name' to >> > > denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >> > > attribute 'value' for the annotation (e.g. entrez gene id, or refseq >> > > id, >> > > etc.) itself. This has less structured but it will allow you to load >> > > as >> > > many annotations as you like. >> > >> > I normally favor the consistant data management policy, that means, you >> > don't need documentation somewhere saying "case 1, load data into table >> > a, b, c; case 2, load data into table d, e, f; case 3, load data into >> > table g, h, i", which not only make you data loading tough, also will >> > make you app code built on top db stink. >> > >> > We didn't manage our own db perfectly neither. But hopefully our >> > experiences could prove useful to you. >> > >> > I strongly suggest you look at the MAGE-Tab spec for raw/processed data >> > and ADF spec for array data on ArrayExpress site, for MAGE-Tab and ADF >> > are proved to be very effective for large db like AE. If you can make >> > your app/db align to the standards as we are trying to do also, it >> > certainly give you a safe edge. >> > >> > ---junmin >> > >> > -- Elisabetta Manduchi Computational Biology and Informatics Laboratory Center for Bioinformatics University of Pennsylvania 1428 Blockley Hall 423 Guardian Drive Philadelphia, PA 19104-6021 phone: 215-573-4408 fax: 215 573-3111 email: man...@pc... web: http://www.cbil.upenn.edu/~manduchi --- |
From: Junmin L. <ju...@pc...> - 2007-09-14 21:08:05
|
Hi, Dave, LoadMageDoc plugin is replacement for RAD study annotator if you heard about it, it is for loading mage-ml or mage-tab, and meta data part only. By meta data I mean information about protocol, samples, assay, acquisition, quantification, study design, study factor and etc., excluding the array annotation, raw data and process data. MAGE-ML can contain array design/reporter/feature info, but normally people seperate them out into single tab-delim file called ADF. Try to find out or ask ArrayExpress for ADF file of that affy chip. MR_TforGUS35 package only contain code for export data from RAD to MAGE-ML and its associated data file: raw/processed data. RAD/MR_T/lib/perl/MageImport truck contains all of the perl package which the LoadMageDoc plugin depends on. The MR_Ti itself as toolkit can be downloaded here: https://www.cbil.upenn.edu/magewiki/index.php/mage2tab Sorry we have poor documentation on those things. As this plugin is purely in-house now. We only expose the toolkit to community including non-GUS users. ---junmin On Fri, 14 Sep 2007, Dave Hau wrote: > My bad... I just noticed the LoadMageDoc plugin is in the community > plugin directory. > > Thanks Elisabetta for your prompt reply. > > - Dave > > > Elisabetta Manduchi wrote: >> >> Hi Dave, >> I'll respond to 2 and 4. For (1) I defer to Junmin. >> For (3) all I can say is that it is in our lab's plans to release >> bug-fixes and new releases of GUS, however this keeps being postponed >> due to other priorities. In the meantime for postresql questions re >> GUS, John Iodice might be able to help you. >> Getting back to your question (2), first of all, as mentioned in my >> previous email we currently have a view for RMA results, but we do not >> have a view for Plier results. If you need a view for Plier in your >> instance of the DB though, you can simply create such a view with the >> attributes you need in your own instance. It would be a view of >> RAD.CompositeElementResultImp. Once created, remember to update >> Core.TableInfo and rebuild GUS, so that the objects for the new view >> are in place. >> The current available plugins to load data into >> RAD.CompositeElementResultImp views are: LoadArrayResult (in >> Supported) which loads the results of one assay at a time, and >> LoadBatchResult which we have already discussed. The documentation of >> these plugins, available from svn illustrates, what the input format >> should be. The idea guiding the design of these plugins we made >> available was that they would be *generic*, i.e. they would be able to >> take data from a wide variety of quantification software and load them >> into RAD. So we opted for one generic code at the expense of some work >> to put the input into the appropriate format. >> If a project/lab typically gets files in a particular data format, >> then it might be worth for them to write a plugin which is specific to >> that rather than using the generic plugin. This way they can use the >> output as spit out by the software they use. It is fairly simple to >> write a plugin specific to one's needs using the Plugin package. So if >> you expect to deal most of the timewith a particular type of output >> (e.g. from APT) you might consider writing a specific plugin. >> >> Regarding your question (4), the answer is no. We do not store images >> in GUS. For certain types of images, like microarray images (e.g. >> files resulting from scanning, like .TIF or .DAT) we store in the db >> their uri to the fileserver (in RAD.Acquisition.uri). >> Hope this helps, >> Elisabetta >> >> --- >> >> On Fri, 14 Sep 2007, Dave Hau wrote: >> >>> Junmin and Elisabetta, thanks again for your helpful comments. >>> >>> Couple of questions. >>> >>> 1. The HG-U133_Plus_2 array annotation file I downloaded from >>> Affymetrix is an xml file in MAGE-ML format. On the RAD download >>> page ( http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool >>> called mage2tab-v0.9, which I assume would be able to convert the >>> annotation file to MAGE-TAB format. Then in order to load this >>> MAGE-TAB file into GUS, I noticed on the CBIL Lab Meetings web page, >>> for Thursday March 15, 2007, Junmin gave a talk on MR-Ti, and the >>> description mentions the loadMageDoc GUS plugin. I notice (and have >>> downloaded) a file on the RAD download page called >>> "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there. >>> Is there a way for me to obtain this plugin? >>> >>> 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools >>> (APT) package ( >>> http://www.affymetrix.com/support/developer/powertools/index.affx ) >>> and obtained probe set data for my .CEL files, one set for RMA and >>> another set for PLIER. Is there a plugin that will readily load >>> these APT output files into GUS as probe set data? >>> >>> 3. The GUS installation I'm using is top of trunk from the CBIL svn >>> repository. This is because I'm using postgresql on the back end, >>> and the 3.5 GUS package gave me a lot of problems. These seem to >>> have been fixed in the top of trunk. However, in order to use >>> existing plugins, would it be advisable to use top of trunk >>> (including the new schema changes for new features that Elisabetta >>> mentioned)? If not, is there, or do you plan on releasing a bug-fix >>> version of 3.5 that contains bug fixes back-ported to 3.5, but does >>> not contain any of the new features not yet released? >>> >>> 4. Is there any way in RAD or GUS to load pathological images (e.g. >>> associated with biosamples used for hybridization) into the GUS >>> database? >>> >>> Thanks very much, >>> Dave >>> >>> >>> >>> Junmin Liu wrote: >>>> Hi, Dave, >>>> Again in line: >>>> >>>>>> The consensus not to load CEL files into the database - is it >>>>>> because we only >>>>>> query for probe set data based on the gene, but not for probe cell >>>>>> data? If I >>>>> >>>>> yes typically people query the summarized results at the probe set >>>>> level. >>>> >>>> Generally speaking, schema design and data management have to be in >>>> the context of contract or any requirements you are obligated to. >>>> >>>> Ask the question what is the next if you load CEL? or what is the >>>> next if you load array data and etc? >>>> >>>> GUS and its app stacks certainly will allow you do those things, but >>>> it is critical you have some judgement calls. And the cost of >>>> loading raw data then querying them out is pretty expensive. >>>> >>>>> There are multiple choices for where to store array annotation at the >>>>> moment. >>>>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence >>>>> have been >>>>> added to more quickly annotate Affy data with Entrez Genes and >>>>> RefSeq info >>>>> respectively. >>>>> 2. Another possibility is to use the external_database_release_id and >>>>> source_id pair in RAD.ShortOligoFamily to point to one preferred >>>>> annotation for each probe set (but you would have to choose one). >>>>> 3. Another, less structured possibility, is to use >>>>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to >>>>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >>>>> attribute 'value' for the annotation (e.g. entrez gene id, or >>>>> refseq id, >>>>> etc.) itself. This has less structured but it will allow you to >>>>> load as >>>>> many annotations as you like. >>>> >>>> I normally favor the consistant data management policy, that means, >>>> you don't need documentation somewhere saying "case 1, load data >>>> into table a, b, c; case 2, load data into table d, e, f; case 3, >>>> load data into table g, h, i", which not only make you data loading >>>> tough, also will make you app code built on top db stink. >>>> >>>> We didn't manage our own db perfectly neither. But hopefully our >>>> experiences could prove useful to you. >>>> >>>> I strongly suggest you look at the MAGE-Tab spec for raw/processed >>>> data and ADF spec for array data on ArrayExpress site, for MAGE-Tab >>>> and ADF are proved to be very effective for large db like AE. If you >>>> can make your app/db align to the standards as we are trying to do >>>> also, it certainly give you a safe edge. >>>> >>>> ---junmin >>>> >>> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Dave H. <doc...@gm...> - 2007-09-14 19:51:14
|
My bad... I just noticed the LoadMageDoc plugin is in the community plugin directory. Thanks Elisabetta for your prompt reply. - Dave Elisabetta Manduchi wrote: > > Hi Dave, > I'll respond to 2 and 4. For (1) I defer to Junmin. > For (3) all I can say is that it is in our lab's plans to release > bug-fixes and new releases of GUS, however this keeps being postponed > due to other priorities. In the meantime for postresql questions re > GUS, John Iodice might be able to help you. > Getting back to your question (2), first of all, as mentioned in my > previous email we currently have a view for RMA results, but we do not > have a view for Plier results. If you need a view for Plier in your > instance of the DB though, you can simply create such a view with the > attributes you need in your own instance. It would be a view of > RAD.CompositeElementResultImp. Once created, remember to update > Core.TableInfo and rebuild GUS, so that the objects for the new view > are in place. > The current available plugins to load data into > RAD.CompositeElementResultImp views are: LoadArrayResult (in > Supported) which loads the results of one assay at a time, and > LoadBatchResult which we have already discussed. The documentation of > these plugins, available from svn illustrates, what the input format > should be. The idea guiding the design of these plugins we made > available was that they would be *generic*, i.e. they would be able to > take data from a wide variety of quantification software and load them > into RAD. So we opted for one generic code at the expense of some work > to put the input into the appropriate format. > If a project/lab typically gets files in a particular data format, > then it might be worth for them to write a plugin which is specific to > that rather than using the generic plugin. This way they can use the > output as spit out by the software they use. It is fairly simple to > write a plugin specific to one's needs using the Plugin package. So if > you expect to deal most of the timewith a particular type of output > (e.g. from APT) you might consider writing a specific plugin. > > Regarding your question (4), the answer is no. We do not store images > in GUS. For certain types of images, like microarray images (e.g. > files resulting from scanning, like .TIF or .DAT) we store in the db > their uri to the fileserver (in RAD.Acquisition.uri). > Hope this helps, > Elisabetta > > --- > > On Fri, 14 Sep 2007, Dave Hau wrote: > >> Junmin and Elisabetta, thanks again for your helpful comments. >> >> Couple of questions. >> >> 1. The HG-U133_Plus_2 array annotation file I downloaded from >> Affymetrix is an xml file in MAGE-ML format. On the RAD download >> page ( http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool >> called mage2tab-v0.9, which I assume would be able to convert the >> annotation file to MAGE-TAB format. Then in order to load this >> MAGE-TAB file into GUS, I noticed on the CBIL Lab Meetings web page, >> for Thursday March 15, 2007, Junmin gave a talk on MR-Ti, and the >> description mentions the loadMageDoc GUS plugin. I notice (and have >> downloaded) a file on the RAD download page called >> "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there. >> Is there a way for me to obtain this plugin? >> >> 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools >> (APT) package ( >> http://www.affymetrix.com/support/developer/powertools/index.affx ) >> and obtained probe set data for my .CEL files, one set for RMA and >> another set for PLIER. Is there a plugin that will readily load >> these APT output files into GUS as probe set data? >> >> 3. The GUS installation I'm using is top of trunk from the CBIL svn >> repository. This is because I'm using postgresql on the back end, >> and the 3.5 GUS package gave me a lot of problems. These seem to >> have been fixed in the top of trunk. However, in order to use >> existing plugins, would it be advisable to use top of trunk >> (including the new schema changes for new features that Elisabetta >> mentioned)? If not, is there, or do you plan on releasing a bug-fix >> version of 3.5 that contains bug fixes back-ported to 3.5, but does >> not contain any of the new features not yet released? >> >> 4. Is there any way in RAD or GUS to load pathological images (e.g. >> associated with biosamples used for hybridization) into the GUS >> database? >> >> Thanks very much, >> Dave >> >> >> >> Junmin Liu wrote: >>> Hi, Dave, >>> Again in line: >>> >>>>> The consensus not to load CEL files into the database - is it >>>>> because we only >>>>> query for probe set data based on the gene, but not for probe cell >>>>> data? If I >>>> >>>> yes typically people query the summarized results at the probe set >>>> level. >>> >>> Generally speaking, schema design and data management have to be in >>> the context of contract or any requirements you are obligated to. >>> >>> Ask the question what is the next if you load CEL? or what is the >>> next if you load array data and etc? >>> >>> GUS and its app stacks certainly will allow you do those things, but >>> it is critical you have some judgement calls. And the cost of >>> loading raw data then querying them out is pretty expensive. >>> >>>> There are multiple choices for where to store array annotation at the >>>> moment. >>>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence >>>> have been >>>> added to more quickly annotate Affy data with Entrez Genes and >>>> RefSeq info >>>> respectively. >>>> 2. Another possibility is to use the external_database_release_id and >>>> source_id pair in RAD.ShortOligoFamily to point to one preferred >>>> annotation for each probe set (but you would have to choose one). >>>> 3. Another, less structured possibility, is to use >>>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to >>>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >>>> attribute 'value' for the annotation (e.g. entrez gene id, or >>>> refseq id, >>>> etc.) itself. This has less structured but it will allow you to >>>> load as >>>> many annotations as you like. >>> >>> I normally favor the consistant data management policy, that means, >>> you don't need documentation somewhere saying "case 1, load data >>> into table a, b, c; case 2, load data into table d, e, f; case 3, >>> load data into table g, h, i", which not only make you data loading >>> tough, also will make you app code built on top db stink. >>> >>> We didn't manage our own db perfectly neither. But hopefully our >>> experiences could prove useful to you. >>> >>> I strongly suggest you look at the MAGE-Tab spec for raw/processed >>> data and ADF spec for array data on ArrayExpress site, for MAGE-Tab >>> and ADF are proved to be very effective for large db like AE. If you >>> can make your app/db align to the standards as we are trying to do >>> also, it certainly give you a safe edge. >>> >>> ---junmin >>> >> > |
From: Elisabetta M. <man...@pc...> - 2007-09-14 19:43:31
|
Hi Dave, I'll respond to 2 and 4. For (1) I defer to Junmin. For (3) all I can say is that it is in our lab's plans to release bug-fixes and new releases of GUS, however this keeps being postponed due to other priorities. In the meantime for postresql questions re GUS, John Iodice might be able to help you. Getting back to your question (2), first of all, as mentioned in my previous email we currently have a view for RMA results, but we do not have a view for Plier results. If you need a view for Plier in your instance of the DB though, you can simply create such a view with the attributes you need in your own instance. It would be a view of RAD.CompositeElementResultImp. Once created, remember to update Core.TableInfo and rebuild GUS, so that the objects for the new view are in place. The current available plugins to load data into RAD.CompositeElementResultImp views are: LoadArrayResult (in Supported) which loads the results of one assay at a time, and LoadBatchResult which we have already discussed. The documentation of these plugins, available from svn illustrates, what the input format should be. The idea guiding the design of these plugins we made available was that they would be *generic*, i.e. they would be able to take data from a wide variety of quantification software and load them into RAD. So we opted for one generic code at the expense of some work to put the input into the appropriate format. If a project/lab typically gets files in a particular data format, then it might be worth for them to write a plugin which is specific to that rather than using the generic plugin. This way they can use the output as spit out by the software they use. It is fairly simple to write a plugin specific to one's needs using the Plugin package. So if you expect to deal most of the timewith a particular type of output (e.g. from APT) you might consider writing a specific plugin. Regarding your question (4), the answer is no. We do not store images in GUS. For certain types of images, like microarray images (e.g. files resulting from scanning, like .TIF or .DAT) we store in the db their uri to the fileserver (in RAD.Acquisition.uri). Hope this helps, Elisabetta --- On Fri, 14 Sep 2007, Dave Hau wrote: > Junmin and Elisabetta, thanks again for your helpful comments. > > Couple of questions. > > 1. The HG-U133_Plus_2 array annotation file I downloaded from Affymetrix is > an xml file in MAGE-ML format. On the RAD download page ( > http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool called > mage2tab-v0.9, which I assume would be able to convert the annotation file to > MAGE-TAB format. Then in order to load this MAGE-TAB file into GUS, I > noticed on the CBIL Lab Meetings web page, for Thursday March 15, 2007, > Junmin gave a talk on MR-Ti, and the description mentions the loadMageDoc GUS > plugin. I notice (and have downloaded) a file on the RAD download page > called "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there. Is > there a way for me to obtain this plugin? > > 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools (APT) > package ( http://www.affymetrix.com/support/developer/powertools/index.affx ) > and obtained probe set data for my .CEL files, one set for RMA and another > set for PLIER. Is there a plugin that will readily load these APT output > files into GUS as probe set data? > > 3. The GUS installation I'm using is top of trunk from the CBIL svn > repository. This is because I'm using postgresql on the back end, and the > 3.5 GUS package gave me a lot of problems. These seem to have been fixed in > the top of trunk. However, in order to use existing plugins, would it be > advisable to use top of trunk (including the new schema changes for new > features that Elisabetta mentioned)? If not, is there, or do you plan on > releasing a bug-fix version of 3.5 that contains bug fixes back-ported to > 3.5, but does not contain any of the new features not yet released? > > 4. Is there any way in RAD or GUS to load pathological images (e.g. > associated with biosamples used for hybridization) into the GUS database? > > Thanks very much, > Dave > > > > Junmin Liu wrote: >> Hi, Dave, >> Again in line: >> >>>> The consensus not to load CEL files into the database - is it because we >>>> only >>>> query for probe set data based on the gene, but not for probe cell data? >>>> If I >>> >>> yes typically people query the summarized results at the probe set >>> level. >> >> Generally speaking, schema design and data management have to be in the >> context of contract or any requirements you are obligated to. >> >> Ask the question what is the next if you load CEL? or what is the next if >> you load array data and etc? >> >> GUS and its app stacks certainly will allow you do those things, but it is >> critical you have some judgement calls. And the cost of loading raw data >> then querying them out is pretty expensive. >> >>> There are multiple choices for where to store array annotation at the >>> moment. >>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have been >>> added to more quickly annotate Affy data with Entrez Genes and RefSeq info >>> respectively. >>> 2. Another possibility is to use the external_database_release_id and >>> source_id pair in RAD.ShortOligoFamily to point to one preferred >>> annotation for each probe set (but you would have to choose one). >>> 3. Another, less structured possibility, is to use >>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to >>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >>> attribute 'value' for the annotation (e.g. entrez gene id, or refseq id, >>> etc.) itself. This has less structured but it will allow you to load as >>> many annotations as you like. >> >> I normally favor the consistant data management policy, that means, you >> don't need documentation somewhere saying "case 1, load data into table a, >> b, c; case 2, load data into table d, e, f; case 3, load data into table g, >> h, i", which not only make you data loading tough, also will make you app >> code built on top db stink. >> >> We didn't manage our own db perfectly neither. But hopefully our >> experiences could prove useful to you. >> >> I strongly suggest you look at the MAGE-Tab spec for raw/processed data and >> ADF spec for array data on ArrayExpress site, for MAGE-Tab and ADF are >> proved to be very effective for large db like AE. If you can make your >> app/db align to the standards as we are trying to do also, it certainly >> give you a safe edge. >> >> ---junmin >> > |
From: Dave H. <doc...@gm...> - 2007-09-14 19:02:13
|
Junmin and Elisabetta, thanks again for your helpful comments. Couple of questions. 1. The HG-U133_Plus_2 array annotation file I downloaded from Affymetrix is an xml file in MAGE-ML format. On the RAD download page ( http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool called mage2tab-v0.9, which I assume would be able to convert the annotation file to MAGE-TAB format. Then in order to load this MAGE-TAB file into GUS, I noticed on the CBIL Lab Meetings web page, for Thursday March 15, 2007, Junmin gave a talk on MR-Ti, and the description mentions the loadMageDoc GUS plugin. I notice (and have downloaded) a file on the RAD download page called "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there. Is there a way for me to obtain this plugin? 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools (APT) package ( http://www.affymetrix.com/support/developer/powertools/index.affx ) and obtained probe set data for my .CEL files, one set for RMA and another set for PLIER. Is there a plugin that will readily load these APT output files into GUS as probe set data? 3. The GUS installation I'm using is top of trunk from the CBIL svn repository. This is because I'm using postgresql on the back end, and the 3.5 GUS package gave me a lot of problems. These seem to have been fixed in the top of trunk. However, in order to use existing plugins, would it be advisable to use top of trunk (including the new schema changes for new features that Elisabetta mentioned)? If not, is there, or do you plan on releasing a bug-fix version of 3.5 that contains bug fixes back-ported to 3.5, but does not contain any of the new features not yet released? 4. Is there any way in RAD or GUS to load pathological images (e.g. associated with biosamples used for hybridization) into the GUS database? Thanks very much, Dave Junmin Liu wrote: > Hi, Dave, > Again in line: > >>> The consensus not to load CEL files into the database - is it >>> because we only >>> query for probe set data based on the gene, but not for probe cell >>> data? If I >> >> yes typically people query the summarized results at the probe set >> level. > > Generally speaking, schema design and data management have to be in > the context of contract or any requirements you are obligated to. > > Ask the question what is the next if you load CEL? or what is the next > if you load array data and etc? > > GUS and its app stacks certainly will allow you do those things, but > it is critical you have some judgement calls. And the cost of loading > raw data then querying them out is pretty expensive. > >> There are multiple choices for where to store array annotation at the >> moment. >> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have >> been >> added to more quickly annotate Affy data with Entrez Genes and RefSeq >> info >> respectively. >> 2. Another possibility is to use the external_database_release_id and >> source_id pair in RAD.ShortOligoFamily to point to one preferred >> annotation for each probe set (but you would have to choose one). >> 3. Another, less structured possibility, is to use >> RAD.CompositeElementAnnotation, where you use the attribute 'name' to >> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >> attribute 'value' for the annotation (e.g. entrez gene id, or refseq id, >> etc.) itself. This has less structured but it will allow you to load as >> many annotations as you like. > > I normally favor the consistant data management policy, that means, > you don't need documentation somewhere saying "case 1, load data into > table a, b, c; case 2, load data into table d, e, f; case 3, load data > into table g, h, i", which not only make you data loading tough, also > will make you app code built on top db stink. > > We didn't manage our own db perfectly neither. But hopefully our > experiences could prove useful to you. > > I strongly suggest you look at the MAGE-Tab spec for raw/processed > data and ADF spec for array data on ArrayExpress site, for MAGE-Tab > and ADF are proved to be very effective for large db like AE. If you > can make your app/db align to the standards as we are trying to do > also, it certainly give you a safe edge. > > ---junmin > |
From: Junmin L. <ju...@pc...> - 2007-09-14 16:36:23
|
Hi, Dave, Again in line: >> The consensus not to load CEL files into the database - is it because we only >> query for probe set data based on the gene, but not for probe cell data? If I > > yes typically people query the summarized results at the probe set > level. Generally speaking, schema design and data management have to be in the context of contract or any requirements you are obligated to. Ask the question what is the next if you load CEL? or what is the next if you load array data and etc? GUS and its app stacks certainly will allow you do those things, but it is critical you have some judgement calls. And the cost of loading raw data then querying them out is pretty expensive. > There are multiple choices for where to store array annotation at the > moment. > 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have been > added to more quickly annotate Affy data with Entrez Genes and RefSeq info > respectively. > 2. Another possibility is to use the external_database_release_id and > source_id pair in RAD.ShortOligoFamily to point to one preferred > annotation for each probe set (but you would have to choose one). > 3. Another, less structured possibility, is to use > RAD.CompositeElementAnnotation, where you use the attribute 'name' to > denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the > attribute 'value' for the annotation (e.g. entrez gene id, or refseq id, > etc.) itself. This has less structured but it will allow you to load as > many annotations as you like. I normally favor the consistant data management policy, that means, you don't need documentation somewhere saying "case 1, load data into table a, b, c; case 2, load data into table d, e, f; case 3, load data into table g, h, i", which not only make you data loading tough, also will make you app code built on top db stink. We didn't manage our own db perfectly neither. But hopefully our experiences could prove useful to you. I strongly suggest you look at the MAGE-Tab spec for raw/processed data and ADF spec for array data on ArrayExpress site, for MAGE-Tab and ADF are proved to be very effective for large db like AE. If you can make your app/db align to the standards as we are trying to do also, it certainly give you a safe edge. ---junmin |
From: Elisabetta M. <man...@pc...> - 2007-09-14 14:30:20
|
Hi Dave, in line: > Thanks Junmin and Elisabetta for your helpful comments. > > The consensus not to load CEL files into the database - is it because we only > query for probe set data based on the gene, but not for probe cell data? If I yes typically people query the summarized results at the probe set level. > store the CEL file in the filesystem and only store a file URI in the > database, does RAD provide a way to run summarization algorithms (e.g. RMA, > Plier) on those files? Not currently. RAD provides the database where the results of such algorithms can be stored. One could certainly write a plugin that goes to the .CEL file indicated by the uri and then uses it to run their summarization algorithms of choice. However we do not currently have any such plugin in Supported or Community. > Can I load multiple sets of probe set data for a > single set of probe cell data (e.g. one for RMA, one for Plier)? Certainly. You would create as many entries in RAD.Quantification as the number of summarization protocols you run (e.g. MAS 5, RMA, Plier) on the same .CEL file, each such entry will point to the appropriate summarization protocol. You would additionally have a quantification referring to the .CEL file. In RAD.RelatedQuantification you can connect to the .cel quantification each of the others (summarization ones) that have used that .cel file. Then you can load the results of the summarization algorithms in the corresponding views of RAD.CompositeElementResultImp. Currently we have views for MAS4, MAS5, RMAExpress (which will simply be renamed in the next release RMA, and which accomodates RMA, gcRMA, etc.) and MOID. But it's easy to create additional views of the same table in your own istance that might accomodate other summarization programs. > Also, according to the instructions in the RAD website on how to load a > complete microarray study into the GUS database, the first step mentions > "Further array annotation can be loaded via > GUS::Community::Plugin::InsertArray2DbRefAndNaSeq. I tried to run this > plugin, but got this error: > > FATAL: Can't locate GUS/Model/RAD/CompositeElementDbRef.pm in @INC > > Do you know where I can find this CompositeElementDbRef.pm file? I think this is because the tables RAD.(Composite)ElementDbRef and RAD.(Composite)ElementNASequence where added after the last official GUS release. They are scheduled for the next GUS release (which probably won't occur in the near future). We have added them to our own instance of GUS at CBIL. So, if you want to use these tables, you first need to add those 4 tables to your db instance (you can find the latest sql for GUS in the GusSchema svn at https://www.cbil.upenn.edu/svn/gus/GusSchema/trunk/Definition/config/gus_schema.xml). (Note that this contains also other modifications made to tables subsequently to the 3.5 GUS release). Then you need to populate Core.TableInfo with entries for these new tables. Then you need to rebuild GUS forcing rebuilding of the objects. This way the code generator will see the new tables and create the corresponding objects, including the one you are referring to above. > I would like to load the annotation file I obtained from the Affymetrix > website for the HG-U133_Plus_2 array into the GUS database. What's the best > way to go about this? There are multiple choices for where to store array annotation at the moment. 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have been added to more quickly annotate Affy data with Entrez Genes and RefSeq info respectively. 2. Another possibility is to use the external_database_release_id and source_id pair in RAD.ShortOligoFamily to point to one preferred annotation for each probe set (but you would have to choose one). 3. Another, less structured possibility, is to use RAD.CompositeElementAnnotation, where you use the attribute 'name' to denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the attribute 'value' for the annotation (e.g. entrez gene id, or refseq id, etc.) itself. This has less structured but it will allow you to load as many annotations as you like. Elisabetta |
From: Dave H. <doc...@gm...> - 2007-09-13 23:34:19
|
Thanks Junmin and Elisabetta for your helpful comments. The consensus not to load CEL files into the database - is it because we only query for probe set data based on the gene, but not for probe cell data? If I store the CEL file in the filesystem and only store a file URI in the database, does RAD provide a way to run summarization algorithms (e.g. RMA, Plier) on those files? Can I load multiple sets of probe set data for a single set of probe cell data (e.g. one for RMA, one for Plier)? Also, according to the instructions in the RAD website on how to load a complete microarray study into the GUS database, the first step mentions "Further array annotation can be loaded via GUS::Community::Plugin::InsertArray2DbRefAndNaSeq. I tried to run this plugin, but got this error: FATAL: Can't locate GUS/Model/RAD/CompositeElementDbRef.pm in @INC Do you know where I can find this CompositeElementDbRef.pm file? I would like to load the annotation file I obtained from the Affymetrix website for the HG-U133_Plus_2 array into the GUS database. What's the best way to go about this? Thanks very much for your help. Best regards, Dave Junmin Liu wrote: > Hi, Dave, > I had couple discussion with other people in ArrayExpress and Joe > white from Harvard in terms of raw data loading in previous MGED > workshops. > > The consensus is that especially for the CEL file, people don't load > them into database, unless you got some convincing use cases or strong > needs to load cel file into database. > > So give it a second thought before you even proceed. > ---junmin > > > > On Wed, 12 Sep 2007, Dave Hau wrote: > >> Elisabetta, >> >> Thanks for your and John Brestelli's (via personal email) very >> informative replies. They are very helpful indeed. >> >> Regarding loading .CEL files (probe cell data, not probe set data), John >> mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which >> I had noticed too. The help page for this plugin mentions a number of >> quantification protocols supported including mas4/mas5 (Affymetrix MAS >> 4.0 and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix >> MAS 4.0 and 5.0 Probe Cell quantification protocol). It seems that >> cel4/cel5 would correspond to the .CEL files I need to load (i.e. probe >> *cell* data). Is this correct? I was wondering because you mentioned in >> your reply that there's no plugin available for loading probe cell data. >> >> Also, in the Affymetrix file format description document ( >> http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two >> file formats are described: Version 3 files (text data) generated by the >> MAS software, and version 4 files (binary data) generated by the GCOS >> software. So both cel4 and cel5 for the plugin would correspond to >> Version 3 files, right? That means the LoadBatchArrayResults plugin does >> not support the Version 4 (binary) file format, correct? >> >> Thanks again for your help. >> >> Best regards, >> Dave Hau >> >> >> Elisabetta Manduchi wrote: >>> >>> Hi Dave, >>> let me clarify GUS vs Affy. >>> Affymetrix quantified results are of two types, corresponding to 2 >>> different level of analysis: >>> >>> (i) probe-cell level results (e.g. from .CEL files), which contain >>> intensity values for each individual probe cell on the chip; and >>> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in >>> the .CHP files, or from RMA or gcRMA) which contain *summarized* >>> intensities for probe sets on the chip. >>> >>> The GUS schema in principle supports storage of both: >>> >>> (i) the probe cell results would go into a view of >>> RAD.ElementResultImp (in fact there is a view to this end called >>> RAD.AffymetrixCEL); >>> (ii) the probe set results would go to view of >>> RAD.CompositeElementResultImp. For the latter, currently we have views >>> to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >>> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed >>> RAD.RMA in the next GUS release). >>> >>> Now, here at CBIL, we do not store or support loading of the .CEL file >>> data in the database, because we really only use the probe-set level >>> results in our applications, so we have no need to store .CEL in the >>> db. >>> So the way we do it is as follows: >>> * for every Affymetrix assay, we have TWO related quantifications, one >>> corresponding to the .CEL quantification and the other corresponding >>> to whatever summarization quantification was created (e.g. with MAS4, >>> MAS5, RMA); >>> * we place 2 entries in RAD.Quantifications, one pointing to the uri >>> of the .CEL file (which we keep on our server) and one pointing to the >>> uri of the probe-set level result file >>> * we however do not store the data from the .CEL file in >>> RAD.AffymetrixCEL >>> * we only store the data from the probe-set level results in one of >>> the RAD.CompositeElementResultImp views mentioned above. >>> >>> The current plugin in GUS::Supported, as Junmin mentioned in the >>> posting you are referring to, can be used to populate the data for the >>> probe-set level results. As far as I know, we do not have currently a >>> plugin to store the .CEL files in the db. >>> So the db allows for the latter, but you'd have to write your own >>> plugin. We didn't find useful to store .CEL results in GUS, but again >>> this depends on the type of applications you might be interested in. >>> Hope this helps, >>> Elisabetta >>> >>> >>> On Tue, 28 Aug 2007, Dave Hau wrote: >>> >>>> I would like to import a number of Affymetrix .CEL files into the GUS >>>> database, which was installed from top of trunk from the GUS svn >>>> repository. The CEL files each have some text headers, and then binary >>>> data afterwards. So I suppose they are in CEL Version 4 format. >>>> >>>> Doing some search on previous posts, I came across this one: >>>> >>>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>>> >>>> >>>> >>>> It seems that at the time of the post (12/2005), the way these .CEL >>>> files would be imported was that the headers would go to one of the >>>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>>> the actual file would sit in the file system, and we'd insert a row to >>>> the RAD.Quantification table with a URI pointing to the location of >>>> the >>>> .CEL file. >>>> >>>> Also, looking through the different plugins in both the Supported and >>>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>>> format. Is this the plugin I should use? >>>> >>>> Any help would be much appreciated. Thanks. >>>> >>>> Best regards, >>>> Dave Hau >>> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > |
From: Junmin L. <ju...@pc...> - 2007-09-13 14:09:23
|
Hi, Dave, I had couple discussion with other people in ArrayExpress and Joe white from Harvard in terms of raw data loading in previous MGED workshops. The consensus is that especially for the CEL file, people don't load them into database, unless you got some convincing use cases or strong needs to load cel file into database. So give it a second thought before you even proceed. ---junmin On Wed, 12 Sep 2007, Dave Hau wrote: > Elisabetta, > > Thanks for your and John Brestelli's (via personal email) very > informative replies. They are very helpful indeed. > > Regarding loading .CEL files (probe cell data, not probe set data), John > mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which > I had noticed too. The help page for this plugin mentions a number of > quantification protocols supported including mas4/mas5 (Affymetrix MAS > 4.0 and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix > MAS 4.0 and 5.0 Probe Cell quantification protocol). It seems that > cel4/cel5 would correspond to the .CEL files I need to load (i.e. probe > *cell* data). Is this correct? I was wondering because you mentioned in > your reply that there's no plugin available for loading probe cell data. > > Also, in the Affymetrix file format description document ( > http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two > file formats are described: Version 3 files (text data) generated by the > MAS software, and version 4 files (binary data) generated by the GCOS > software. So both cel4 and cel5 for the plugin would correspond to > Version 3 files, right? That means the LoadBatchArrayResults plugin does > not support the Version 4 (binary) file format, correct? > > Thanks again for your help. > > Best regards, > Dave Hau > > > Elisabetta Manduchi wrote: >> >> Hi Dave, >> let me clarify GUS vs Affy. >> Affymetrix quantified results are of two types, corresponding to 2 >> different level of analysis: >> >> (i) probe-cell level results (e.g. from .CEL files), which contain >> intensity values for each individual probe cell on the chip; and >> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in >> the .CHP files, or from RMA or gcRMA) which contain *summarized* >> intensities for probe sets on the chip. >> >> The GUS schema in principle supports storage of both: >> >> (i) the probe cell results would go into a view of >> RAD.ElementResultImp (in fact there is a view to this end called >> RAD.AffymetrixCEL); >> (ii) the probe set results would go to view of >> RAD.CompositeElementResultImp. For the latter, currently we have views >> to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed >> RAD.RMA in the next GUS release). >> >> Now, here at CBIL, we do not store or support loading of the .CEL file >> data in the database, because we really only use the probe-set level >> results in our applications, so we have no need to store .CEL in the db. >> So the way we do it is as follows: >> * for every Affymetrix assay, we have TWO related quantifications, one >> corresponding to the .CEL quantification and the other corresponding >> to whatever summarization quantification was created (e.g. with MAS4, >> MAS5, RMA); >> * we place 2 entries in RAD.Quantifications, one pointing to the uri >> of the .CEL file (which we keep on our server) and one pointing to the >> uri of the probe-set level result file >> * we however do not store the data from the .CEL file in >> RAD.AffymetrixCEL >> * we only store the data from the probe-set level results in one of >> the RAD.CompositeElementResultImp views mentioned above. >> >> The current plugin in GUS::Supported, as Junmin mentioned in the >> posting you are referring to, can be used to populate the data for the >> probe-set level results. As far as I know, we do not have currently a >> plugin to store the .CEL files in the db. >> So the db allows for the latter, but you'd have to write your own >> plugin. We didn't find useful to store .CEL results in GUS, but again >> this depends on the type of applications you might be interested in. >> Hope this helps, >> Elisabetta >> >> >> On Tue, 28 Aug 2007, Dave Hau wrote: >> >>> I would like to import a number of Affymetrix .CEL files into the GUS >>> database, which was installed from top of trunk from the GUS svn >>> repository. The CEL files each have some text headers, and then binary >>> data afterwards. So I suppose they are in CEL Version 4 format. >>> >>> Doing some search on previous posts, I came across this one: >>> >>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>> >>> >>> It seems that at the time of the post (12/2005), the way these .CEL >>> files would be imported was that the headers would go to one of the >>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>> the actual file would sit in the file system, and we'd insert a row to >>> the RAD.Quantification table with a URI pointing to the location of the >>> .CEL file. >>> >>> Also, looking through the different plugins in both the Supported and >>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>> format. Is this the plugin I should use? >>> >>> Any help would be much appreciated. Thanks. >>> >>> Best regards, >>> Dave Hau >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Elisabetta M. <man...@pc...> - 2007-09-13 00:14:19
|
Hi Dave, I just went back and quickly looked over the LoadBatchArrayResult code, which refreshed my memory... First, one correction: below I said that this enters 2 quantifications. Actually the quantifications are assumed to have already been entered and corresponding to the protocols provided (for cel and probe set); but what LoadBatchArrayResult does is relating the .cel and probe set quantifications (populating RAD.RelatedQuantification). Second: independently of whether or not LoadSimpleArrayResult can load .cel data, LoadBatchArrayResult only calls this, in the case of Affy data, to load *probe set* data. In fact from the code, the list of possible views that this plugin will populate is given in: $globalRef->{'resultSubclassView'} = { 'mas4'=>'AffymetrixMAS4', 'mas5'=>'AffymetrixMAS5', 'genepix'=>'GenePixElementResult', 'arrayvision'=>'ArrayVisionElementResult', 'rmaexpress'=>'RMAExpress', 'moid' => 'MOIDResult', }; So for Affy data, the relevant ones are those corresponding to the keys mas4, mas5, rmaexpress and moid. All of these are for probe set results. Thus, as is, this plugin won't load .cel data. It could be that the auxiliary Community plugin LoadSimpleArrayResult is able to load .cel data, but this is what I'm deferring to Junmin for. Elisabetta On Wed, 12 Sep 2007, Elisabetta Manduchi wrote: > > Hi Dave, > the LoadBatchArrayResult wants to know the cel protocols because it > enters 2 entries in RAD.Quantification per assay: one for the .CEL > quantification and one for the probe set quantification. What's entered in > Quantification are just the protocol references (e.g. reference to entries > in RAD.Protocol describing the CEL 4, MAS 5, RMA protocols) > and the uri with the path to the actual data files on the fileserver). > Then LoadBatchArrayResult calls LoadSimpleArrayResults which actually > takes care of entering the quantified data in views of > RAD.ElementResultImp or RAD.CompositeElementResultImp. > Now, definitely the latter plugin will populate views such as > AffymetrixMAS4 and AFFymetrixMAS5 and RMAExpress, which corresponds to > probe set quantified data. I believe from earlier correspondence Junmin > (here cc-ed), who wrote that LoadSimpleArrayResult, said that doesn't > support loading of AffymetrixCel. But I see this view mentioned in the > code of that plugin, so I'm deferring to Junmin to double-check on that. > The plugin *only accepts text files* as data files. So files like the > Metrics files (the .txt correspondent of the .CHP MAS4/5 files) will do, > as well as RMA like text files. I believe with GCOS it is possible to > export the data as metrics (txt) files corresponding to quantifications > using the MAS5 algorithm. > Elisabetta > > --- > > On Wed, 12 Sep 2007, Dave Hau wrote: > >> Elisabetta, >> >> Thanks for your and John Brestelli's (via personal email) very informative >> replies. They are very helpful indeed. >> >> Regarding loading .CEL files (probe cell data, not probe set data), John >> mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which I >> had noticed too. The help page for this plugin mentions a number of >> quantification protocols supported including mas4/mas5 (Affymetrix MAS 4.0 >> and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix MAS 4.0 >> and 5.0 Probe Cell quantification protocol). It seems that cel4/cel5 would >> correspond to the .CEL files I need to load (i.e. probe *cell* data). Is this >> correct? I was wondering because you mentioned in your reply that there's no >> plugin available for loading probe cell data. >> >> Also, in the Affymetrix file format description document ( >> http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two file >> formats are described: Version 3 files (text data) generated by the MAS >> software, and version 4 files (binary data) generated by the GCOS software. >> So both cel4 and cel5 for the plugin would correspond to Version 3 files, >> right? That means the LoadBatchArrayResults plugin does not support the >> Version 4 (binary) file format, correct? >> >> Thanks again for your help. >> >> Best regards, >> Dave Hau >> >> >> Elisabetta Manduchi wrote: >>> >>> Hi Dave, >>> let me clarify GUS vs Affy. >>> Affymetrix quantified results are of two types, corresponding to 2 >>> different level of analysis: >>> >>> (i) probe-cell level results (e.g. from .CEL files), which contain >>> intensity values for each individual probe cell on the chip; and >>> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in the >>> .CHP files, or from RMA or gcRMA) which contain *summarized* intensities >>> for probe sets on the chip. >>> >>> The GUS schema in principle supports storage of both: >>> >>> (i) the probe cell results would go into a view of RAD.ElementResultImp (in >>> fact there is a view to this end called RAD.AffymetrixCEL); >>> (ii) the probe set results would go to view of >>> RAD.CompositeElementResultImp. For the latter, currently we have views to >>> accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >>> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed RAD.RMA >>> in the next GUS release). >>> >>> Now, here at CBIL, we do not store or support loading of the .CEL file data >>> in the database, because we really only use the probe-set level results in >>> our applications, so we have no need to store .CEL in the db. >>> So the way we do it is as follows: >>> * for every Affymetrix assay, we have TWO related quantifications, one >>> corresponding to the .CEL quantification and the other corresponding to >>> whatever summarization quantification was created (e.g. with MAS4, MAS5, >>> RMA); >>> * we place 2 entries in RAD.Quantifications, one pointing to the uri of the >>> .CEL file (which we keep on our server) and one pointing to the uri of the >>> probe-set level result file >>> * we however do not store the data from the .CEL file in RAD.AffymetrixCEL >>> * we only store the data from the probe-set level results in one of the >>> RAD.CompositeElementResultImp views mentioned above. >>> >>> The current plugin in GUS::Supported, as Junmin mentioned in the posting >>> you are referring to, can be used to populate the data for the probe-set >>> level results. As far as I know, we do not have currently a plugin to store >>> the .CEL files in the db. >>> So the db allows for the latter, but you'd have to write your own plugin. >>> We didn't find useful to store .CEL results in GUS, but again this depends >>> on the type of applications you might be interested in. >>> Hope this helps, >>> Elisabetta >>> >>> >>> On Tue, 28 Aug 2007, Dave Hau wrote: >>> >>>> I would like to import a number of Affymetrix .CEL files into the GUS >>>> database, which was installed from top of trunk from the GUS svn >>>> repository. The CEL files each have some text headers, and then binary >>>> data afterwards. So I suppose they are in CEL Version 4 format. >>>> >>>> Doing some search on previous posts, I came across this one: >>>> >>>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>>> >>>> It seems that at the time of the post (12/2005), the way these .CEL >>>> files would be imported was that the headers would go to one of the >>>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>>> the actual file would sit in the file system, and we'd insert a row to >>>> the RAD.Quantification table with a URI pointing to the location of the >>>> .CEL file. >>>> >>>> Also, looking through the different plugins in both the Supported and >>>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>>> format. Is this the plugin I should use? >>>> >>>> Any help would be much appreciated. Thanks. >>>> >>>> Best regards, >>>> Dave Hau >>> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Elisabetta Manduchi Computational Biology and Informatics Laboratory Center for Bioinformatics University of Pennsylvania 1428 Blockley Hall 423 Guardian Drive Philadelphia, PA 19104-6021 phone: 215-573-4408 fax: 215 573-3111 email: man...@pc... web: http://www.cbil.upenn.edu/~manduchi --- |
From: Elisabetta M. <man...@pc...> - 2007-09-12 23:52:53
|
Hi Dave, the LoadBatchArrayResult wants to know the cel protocols because it enters 2 entries in RAD.Quantification per assay: one for the .CEL quantification and one for the probe set quantification. What's entered in Quantification are just the protocol references (e.g. reference to entries in RAD.Protocol describing the CEL 4, MAS 5, RMA protocols) and the uri with the path to the actual data files on the fileserver). Then LoadBatchArrayResult calls LoadSimpleArrayResults which actually takes care of entering the quantified data in views of RAD.ElementResultImp or RAD.CompositeElementResultImp. Now, definitely the latter plugin will populate views such as AffymetrixMAS4 and AFFymetrixMAS5 and RMAExpress, which corresponds to probe set quantified data. I believe from earlier correspondence Junmin (here cc-ed), who wrote that LoadSimpleArrayResult, said that doesn't support loading of AffymetrixCel. But I see this view mentioned in the code of that plugin, so I'm deferring to Junmin to double-check on that. The plugin *only accepts text files* as data files. So files like the Metrics files (the .txt correspondent of the .CHP MAS4/5 files) will do, as well as RMA like text files. I believe with GCOS it is possible to export the data as metrics (txt) files corresponding to quantifications using the MAS5 algorithm. Elisabetta --- On Wed, 12 Sep 2007, Dave Hau wrote: > Elisabetta, > > Thanks for your and John Brestelli's (via personal email) very informative > replies. They are very helpful indeed. > > Regarding loading .CEL files (probe cell data, not probe set data), John > mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which I > had noticed too. The help page for this plugin mentions a number of > quantification protocols supported including mas4/mas5 (Affymetrix MAS 4.0 > and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix MAS 4.0 > and 5.0 Probe Cell quantification protocol). It seems that cel4/cel5 would > correspond to the .CEL files I need to load (i.e. probe *cell* data). Is this > correct? I was wondering because you mentioned in your reply that there's no > plugin available for loading probe cell data. > > Also, in the Affymetrix file format description document ( > http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two file > formats are described: Version 3 files (text data) generated by the MAS > software, and version 4 files (binary data) generated by the GCOS software. > So both cel4 and cel5 for the plugin would correspond to Version 3 files, > right? That means the LoadBatchArrayResults plugin does not support the > Version 4 (binary) file format, correct? > > Thanks again for your help. > > Best regards, > Dave Hau > > > Elisabetta Manduchi wrote: >> >> Hi Dave, >> let me clarify GUS vs Affy. >> Affymetrix quantified results are of two types, corresponding to 2 >> different level of analysis: >> >> (i) probe-cell level results (e.g. from .CEL files), which contain >> intensity values for each individual probe cell on the chip; and >> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in the >> .CHP files, or from RMA or gcRMA) which contain *summarized* intensities >> for probe sets on the chip. >> >> The GUS schema in principle supports storage of both: >> >> (i) the probe cell results would go into a view of RAD.ElementResultImp (in >> fact there is a view to this end called RAD.AffymetrixCEL); >> (ii) the probe set results would go to view of >> RAD.CompositeElementResultImp. For the latter, currently we have views to >> accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed RAD.RMA >> in the next GUS release). >> >> Now, here at CBIL, we do not store or support loading of the .CEL file data >> in the database, because we really only use the probe-set level results in >> our applications, so we have no need to store .CEL in the db. >> So the way we do it is as follows: >> * for every Affymetrix assay, we have TWO related quantifications, one >> corresponding to the .CEL quantification and the other corresponding to >> whatever summarization quantification was created (e.g. with MAS4, MAS5, >> RMA); >> * we place 2 entries in RAD.Quantifications, one pointing to the uri of the >> .CEL file (which we keep on our server) and one pointing to the uri of the >> probe-set level result file >> * we however do not store the data from the .CEL file in RAD.AffymetrixCEL >> * we only store the data from the probe-set level results in one of the >> RAD.CompositeElementResultImp views mentioned above. >> >> The current plugin in GUS::Supported, as Junmin mentioned in the posting >> you are referring to, can be used to populate the data for the probe-set >> level results. As far as I know, we do not have currently a plugin to store >> the .CEL files in the db. >> So the db allows for the latter, but you'd have to write your own plugin. >> We didn't find useful to store .CEL results in GUS, but again this depends >> on the type of applications you might be interested in. >> Hope this helps, >> Elisabetta >> >> >> On Tue, 28 Aug 2007, Dave Hau wrote: >> >>> I would like to import a number of Affymetrix .CEL files into the GUS >>> database, which was installed from top of trunk from the GUS svn >>> repository. The CEL files each have some text headers, and then binary >>> data afterwards. So I suppose they are in CEL Version 4 format. >>> >>> Doing some search on previous posts, I came across this one: >>> >>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>> >>> It seems that at the time of the post (12/2005), the way these .CEL >>> files would be imported was that the headers would go to one of the >>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>> the actual file would sit in the file system, and we'd insert a row to >>> the RAD.Quantification table with a URI pointing to the location of the >>> .CEL file. >>> >>> Also, looking through the different plugins in both the Supported and >>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>> format. Is this the plugin I should use? >>> >>> Any help would be much appreciated. Thanks. >>> >>> Best regards, >>> Dave Hau >> > |
From: Dave H. <doc...@gm...> - 2007-09-12 23:13:25
|
Elisabetta, Thanks for your and John Brestelli's (via personal email) very informative replies. They are very helpful indeed. Regarding loading .CEL files (probe cell data, not probe set data), John mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which I had noticed too. The help page for this plugin mentions a number of quantification protocols supported including mas4/mas5 (Affymetrix MAS 4.0 and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix MAS 4.0 and 5.0 Probe Cell quantification protocol). It seems that cel4/cel5 would correspond to the .CEL files I need to load (i.e. probe *cell* data). Is this correct? I was wondering because you mentioned in your reply that there's no plugin available for loading probe cell data. Also, in the Affymetrix file format description document ( http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two file formats are described: Version 3 files (text data) generated by the MAS software, and version 4 files (binary data) generated by the GCOS software. So both cel4 and cel5 for the plugin would correspond to Version 3 files, right? That means the LoadBatchArrayResults plugin does not support the Version 4 (binary) file format, correct? Thanks again for your help. Best regards, Dave Hau Elisabetta Manduchi wrote: > > Hi Dave, > let me clarify GUS vs Affy. > Affymetrix quantified results are of two types, corresponding to 2 > different level of analysis: > > (i) probe-cell level results (e.g. from .CEL files), which contain > intensity values for each individual probe cell on the chip; and > (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in > the .CHP files, or from RMA or gcRMA) which contain *summarized* > intensities for probe sets on the chip. > > The GUS schema in principle supports storage of both: > > (i) the probe cell results would go into a view of > RAD.ElementResultImp (in fact there is a view to this end called > RAD.AffymetrixCEL); > (ii) the probe set results would go to view of > RAD.CompositeElementResultImp. For the latter, currently we have views > to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and > RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed > RAD.RMA in the next GUS release). > > Now, here at CBIL, we do not store or support loading of the .CEL file > data in the database, because we really only use the probe-set level > results in our applications, so we have no need to store .CEL in the db. > So the way we do it is as follows: > * for every Affymetrix assay, we have TWO related quantifications, one > corresponding to the .CEL quantification and the other corresponding > to whatever summarization quantification was created (e.g. with MAS4, > MAS5, RMA); > * we place 2 entries in RAD.Quantifications, one pointing to the uri > of the .CEL file (which we keep on our server) and one pointing to the > uri of the probe-set level result file > * we however do not store the data from the .CEL file in > RAD.AffymetrixCEL > * we only store the data from the probe-set level results in one of > the RAD.CompositeElementResultImp views mentioned above. > > The current plugin in GUS::Supported, as Junmin mentioned in the > posting you are referring to, can be used to populate the data for the > probe-set level results. As far as I know, we do not have currently a > plugin to store the .CEL files in the db. > So the db allows for the latter, but you'd have to write your own > plugin. We didn't find useful to store .CEL results in GUS, but again > this depends on the type of applications you might be interested in. > Hope this helps, > Elisabetta > > > On Tue, 28 Aug 2007, Dave Hau wrote: > >> I would like to import a number of Affymetrix .CEL files into the GUS >> database, which was installed from top of trunk from the GUS svn >> repository. The CEL files each have some text headers, and then binary >> data afterwards. So I suppose they are in CEL Version 4 format. >> >> Doing some search on previous posts, I came across this one: >> >> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >> >> >> It seems that at the time of the post (12/2005), the way these .CEL >> files would be imported was that the headers would go to one of the >> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >> the actual file would sit in the file system, and we'd insert a row to >> the RAD.Quantification table with a URI pointing to the location of the >> .CEL file. >> >> Also, looking through the different plugins in both the Supported and >> Community folders, it seems LoadBatchArrayResults supports the cel4 >> format. Is this the plugin I should use? >> >> Any help would be much appreciated. Thanks. >> >> Best regards, >> Dave Hau > |
From: Elisabetta M. <man...@pc...> - 2007-08-29 00:23:37
|
Hi Dave, let me clarify GUS vs Affy. Affymetrix quantified results are of two types, corresponding to 2 different level of analysis: (i) probe-cell level results (e.g. from .CEL files), which contain intensity values for each individual probe cell on the chip; and (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in the .CHP files, or from RMA or gcRMA) which contain *summarized* intensities for probe sets on the chip. The GUS schema in principle supports storage of both: (i) the probe cell results would go into a view of RAD.ElementResultImp (in fact there is a view to this end called RAD.AffymetrixCEL); (ii) the probe set results would go to view of RAD.CompositeElementResultImp. For the latter, currently we have views to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed RAD.RMA in the next GUS release). Now, here at CBIL, we do not store or support loading of the .CEL file data in the database, because we really only use the probe-set level results in our applications, so we have no need to store .CEL in the db. So the way we do it is as follows: * for every Affymetrix assay, we have TWO related quantifications, one corresponding to the .CEL quantification and the other corresponding to whatever summarization quantification was created (e.g. with MAS4, MAS5, RMA); * we place 2 entries in RAD.Quantifications, one pointing to the uri of the .CEL file (which we keep on our server) and one pointing to the uri of the probe-set level result file * we however do not store the data from the .CEL file in RAD.AffymetrixCEL * we only store the data from the probe-set level results in one of the RAD.CompositeElementResultImp views mentioned above. The current plugin in GUS::Supported, as Junmin mentioned in the posting you are referring to, can be used to populate the data for the probe-set level results. As far as I know, we do not have currently a plugin to store the .CEL files in the db. So the db allows for the latter, but you'd have to write your own plugin. We didn't find useful to store .CEL results in GUS, but again this depends on the type of applications you might be interested in. Hope this helps, Elisabetta On Tue, 28 Aug 2007, Dave Hau wrote: > I would like to import a number of Affymetrix .CEL files into the GUS > database, which was installed from top of trunk from the GUS svn > repository. The CEL files each have some text headers, and then binary > data afterwards. So I suppose they are in CEL Version 4 format. > > Doing some search on previous posts, I came across this one: > > http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu > > It seems that at the time of the post (12/2005), the way these .CEL > files would be imported was that the headers would go to one of the > Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), > the actual file would sit in the file system, and we'd insert a row to > the RAD.Quantification table with a URI pointing to the location of the > .CEL file. > > Also, looking through the different plugins in both the Supported and > Community folders, it seems LoadBatchArrayResults supports the cel4 > format. Is this the plugin I should use? > > Any help would be much appreciated. Thanks. > > Best regards, > Dave Hau |
From: Dave H. <doc...@gm...> - 2007-08-28 21:03:35
|
I would like to import a number of Affymetrix .CEL files into the GUS database, which was installed from top of trunk from the GUS svn repository. The CEL files each have some text headers, and then binary data afterwards. So I suppose they are in CEL Version 4 format. Doing some search on previous posts, I came across this one: http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu It seems that at the time of the post (12/2005), the way these .CEL files would be imported was that the headers would go to one of the Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), the actual file would sit in the file system, and we'd insert a row to the RAD.Quantification table with a URI pointing to the location of the .CEL file. Also, looking through the different plugins in both the Supported and Community folders, it seems LoadBatchArrayResults supports the cel4 format. Is this the plugin I should use? Any help would be much appreciated. Thanks. Best regards, Dave Hau |
From: Steve F. <sfi...@pc...> - 2007-08-12 19:38:15
|
michael- let's set up a time to have a phone meeting. i think it may be easier to help that way. send me a few times that might work for you next week. steve mro...@cs... wrote: > Hello GusDev members, > > I need you help..... > > Using he InsertSequenceFeatures plugin, i loaded the nov 2006 data with NO > problems. > > I went to NCBI and found a newer version dec 22 2006 > > so I unloaded the nov data using the InsertSequenceFeaturesUndo plugin, > and tried to load the dec data getting this error at record 3700: > > In feature map XML file '/home/biorg-srv-2/mrobi002/gus/pseudomonas > aeruginosa/pa01/genbank2gus.xml' <feature name="CDS"> does not have a > <qualifier> for 'ribosomal_slippage', which is found in the input. > > Please edit file '/home/biorg-srv-2/mrobi002/gus/pseudomonas > aeruginosa/pa01/genbank2gus.xml', adding a <qualifer> tag under <feature > name="CDS">. > > If you want to ignore the input data add <qualifier > name="ribosomal_slippage" ignore="true"/>. > > If you want to handle the data, you will need to add a special case > handler (please see the plugin documentation). > > So for the moment I modified genbank2gus.xml adding > > <qualifier name="ribosomal_slippage" ignore="true"/> > > > I run InsertSequenceFeaturesUndo with NO problem and then I reloaded the > dec data now getting this error at record 4700: > > ERROR: > Map XML file '/home/biorg-srv-2/mrobi002/gus/pseudomonas > aeruginosa/pa01/genbank2gus.xml' does not contain a <feature > name="pseudogene">, which is found in the input > > Lookin into genbank2gus.xml I find that each feature name block has: > > <feature name="GC_signal" table="DoTS::DnaRegulatory" so="GC_rich_region"> > <qualifier name="allele" ignore="true"/> > <qualifier name="citation"/> > <qualifier name="evidence"/> > <qualifier name="gene" handler="standard" method="gene"/> > <qualifier name="label"/> > <qualifier name="locus_tag" column="source_id"/> > <qualifier name="map"/> > <qualifier name="note" handler="standard" method="note"/> > <qualifier name="old_locus_tag" ignore="true"/> > <qualifier name="usedin"/> > <qualifier name="db_xref" handler="standard" method="dbXRef"/> > </feature> > > > Therefore I need a table name, so, and qualifier names > > Could you please tell me where I could find the information needed for the > <feature name="pseudogene" > > > Thanks > > Michael Robinson > Bioinformatics Research Group (Biorg) > Florida International University > ECS 254 > Tel 786-543-3553 > email mro...@cs... > Miami, Florida, USA 33199 > > > > > |
From: <mro...@cs...> - 2007-08-11 21:11:31
|
Hello GusDev members, I need you help..... Using he InsertSequenceFeatures plugin, i loaded the nov 2006 data with NO problems. I went to NCBI and found a newer version dec 22 2006 so I unloaded the nov data using the InsertSequenceFeaturesUndo plugin, and tried to load the dec data getting this error at record 3700: In feature map XML file '/home/biorg-srv-2/mrobi002/gus/pseudomonas aeruginosa/pa01/genbank2gus.xml' <feature name="CDS"> does not have a <qualifier> for 'ribosomal_slippage', which is found in the input. Please edit file '/home/biorg-srv-2/mrobi002/gus/pseudomonas aeruginosa/pa01/genbank2gus.xml', adding a <qualifer> tag under <feature name="CDS">. If you want to ignore the input data add <qualifier name="ribosomal_slippage" ignore="true"/>. If you want to handle the data, you will need to add a special case handler (please see the plugin documentation). So for the moment I modified genbank2gus.xml adding <qualifier name="ribosomal_slippage" ignore="true"/> I run InsertSequenceFeaturesUndo with NO problem and then I reloaded the dec data now getting this error at record 4700: ERROR: Map XML file '/home/biorg-srv-2/mrobi002/gus/pseudomonas aeruginosa/pa01/genbank2gus.xml' does not contain a <feature name="pseudogene">, which is found in the input Lookin into genbank2gus.xml I find that each feature name block has: <feature name="GC_signal" table="DoTS::DnaRegulatory" so="GC_rich_region"> <qualifier name="allele" ignore="true"/> <qualifier name="citation"/> <qualifier name="evidence"/> <qualifier name="gene" handler="standard" method="gene"/> <qualifier name="label"/> <qualifier name="locus_tag" column="source_id"/> <qualifier name="map"/> <qualifier name="note" handler="standard" method="note"/> <qualifier name="old_locus_tag" ignore="true"/> <qualifier name="usedin"/> <qualifier name="db_xref" handler="standard" method="dbXRef"/> </feature> Therefore I need a table name, so, and qualifier names Could you please tell me where I could find the information needed for the <feature name="pseudogene" Thanks Michael Robinson Bioinformatics Research Group (Biorg) Florida International University ECS 254 Tel 786-543-3553 email mro...@cs... Miami, Florida, USA 33199 |
From: <mro...@cs...> - 2007-08-11 20:32:35
|
Sorry I forgot to fill in the subject, so her I go again. ---------------------------- Original Message ---------------------------- Subject: From: mro...@cs... Date: Wed, August 8, 2007 6:37 pm To: gus...@li... Cc: "Steve Fischer" <sfi...@pc...> ------------------------------------------------------------------------- Hello GUS group, In am running GUS version 3.5 In November 2006 I used InsertSequenceFeatures.pm successfuly to load a complete bacteria file downloaded from NCBI. Since then NCBI published a new version of the same bacteria. To delete the data from the tables and get ready to load the new data, I ran InsertSequenceFeaturesUndo.pm also successfuly, except for: - sres.reference which still has 3 (three) records. - sres.externaldatabaserelease with 1 (one) record - sres.externaldataabase with one 1 (record) Reading the InsertSequenceFeaturesUndo.pm source code it does not mention the above tables in any of the delete methods. My question is: Should I delete the data from those tables manually or is there a new version of InsertSequenceFeaturesUndo.pm. Thanks for your help Michael Robinson Bioinformatics Research Group (Biorg) Florida International University ECS 254 Tel 786-543-3553 Miami, Florida, USA 33199 |
From: <mro...@cs...> - 2007-08-08 22:37:13
|
Hello GUS group, In am running GUS version 3.5 In November 2006 I used InsertSequenceFeatures.pm successfuly to load a complete bacteria file downloaded from NCBI. Since then NCBI published a new version of the same bacteria. To delete the data from the tables and get ready to load the new data, I ran InsertSequenceFeaturesUndo.pm also successfuly, except for: - sres.reference which still has 3 (three) records. - sres.externaldatabaserelease with 1 (one) record - sres.externaldataabase with one 1 (record) Reading the InsertSequenceFeaturesUndo.pm source code it does not mention the above tables in any of the delete methods. My question is: Should I delete the data from those tables manually or is there a new version of InsertSequenceFeaturesUndo.pm. Thanks for your help Michael Robinson Bioinformatics Research Group (Biorg) Florida International University ECS 254 Tel 786-543-3553 Miami, Florida, USA 33199 |
From: davila <da...@io...> - 2007-07-30 12:14:57
|
Dear All, I just downloaded and installed ApiCommonData plugins to our GUS 3.5 = Postgres install , then ran: nohup ga ApiCommonData::Load::Plugin::InsertGOTermsFromObo --oboFile /home/user/GO/gene_ontology_edit.obo.2007-07-01 --extDbRlsName "Gene Ontology" --extDbRlsVer 5.375 --commit > /home/user/log/ga_ApiCommonData_InsertGOTermsFromObo_GeneOntology.log & It loaded many GO terms into the SRes.GOTerms table (not sure if all of = them) but also generated a huge log file (bigger than 25GB) with the = errors listed below... any ideas on how to solve this ? Thanks, Alberto. [user@server log]$ more ga_ApiCommonData_InsertGOTermsFromObo_GeneOntology.log Wed Jul 25 13:06:50 2007 DSN dbi:Pg:dbname=3Dmydb Wed Jul 25 13:06:50 2007 PLUGIN ApiCommonData::Load::Plugin::InsertGOTermsFromObo Wed Jul 25 13:06:50 2007 ARG algoinvo 1 Wed Jul 25 13:06:50 2007 ARG comment Wed Jul 25 13:06:50 2007 ARG commit 1 Wed Jul 25 13:06:50 2007 ARG debug 0 Wed Jul 25 13:06:50 2007 ARG extDbRlsName Gene Ontology Wed Jul 25 13:06:50 2007 ARG extDbRlsVer 5.375 Wed Jul 25 13:06:50 2007 ARG group Wed Jul 25 13:06:50 2007 ARG gusconfigfile $GUS_HOME/config/gus.config Wed Jul 25 13:06:50 2007 ARG help Wed Jul 25 13:06:50 2007 ARG helpHTML Wed Jul 25 13:06:50 2007 ARG oboFile /home/user/GO/gene_ontology_edit.obo.2007-07-01 Wed Jul 25 13:06:50 2007 ARG project Wed Jul 25 13:06:50 2007 ARG sqlVerbose 0 Wed Jul 25 13:06:50 2007 ARG user Wed Jul 25 13:06:50 2007 ARG verbose 0 Wed Jul 25 13:06:50 2007 ARG veryVerbose 0 Wed Jul 25 13:06:50 2007 AlgInvocationId 461 Wed Jul 25 13:06:50 2007 COMMIT commit on Processed 500 terms Processed 1000 terms Processed 1500 terms Processed 2000 terms Processed 2500 terms Processed 3000 terms Processed 3500 terms Processed 4000 terms Processed 4500 terms Processed 5000 terms Processed 5500 terms Processed 6000 terms Processed 6500 terms Processed 7000 terms Processed 7500 terms Processed 8000 terms Processed 8500 terms Processed 9000 terms Processed 9500 terms Processed 10000 terms Processed 10500 terms Processed 11000 terms Processed 11500 terms Processed 12000 terms Processed 12500 terms Processed 13000 terms Processed 13500 terms Processed 14000 terms Processed 14500 terms Processed 15000 terms Processed 15500 terms Processed 16000 terms Processed 16500 terms Processed 17000 terms Processed 17500 terms Processed 18000 terms Processed 18500 terms Processed 19000 terms Processed 19500 terms Processed 20000 terms Processed 20500 terms Processed 21000 terms Processed 21500 terms Processed 22000 terms Processed 22500 terms Processed 23000 terms Processed 23500 terms Processed 24000 terms DBD::Pg::db do failed: ERROR: table "go_tc" does not exist DBD::Pg::db do failed: ERROR: syntax error at or near "(" at character = 47 DBD::Pg::db do failed: ERROR: current transaction is aborted, commands ignored until end of transaction block DBD::Pg::db do failed: ERROR: current transaction is aborted, commands ignored until end of transaction block DBD::Pg::db selectrow_array failed: ERROR: current transaction is aborted, commands ignored until end of transaction block DBD::Pg::db selectrow_array failed: ERROR: current transaction is aborted, commands ignored until end of transaction block GO Terms: DBD::Pg::db selectrow_array failed: ERROR: current transaction is aborted, commands ignored until end of transaction block Relationships: starting size: DBD::Pg::st execute failed: ERROR: current transaction is aborted, commands ignored until end of transaction block DBD::Pg::st fetchrow_array failed: no statement executing Transitive closure (length 2): added 0 edges DBD::Pg::st execute failed: ERROR: prepared statement "dbdpg_2" does not exist DBD::Pg::st fetchrow_array failed: no statement executing Transitive closure (length 3): added 0 edges DBD::Pg::st execute failed: ERROR: prepared statement "dbdpg_2" does not exist DBD::Pg::st fetchrow_array failed: no statement executing Transitive closure (length 4): added 0 edges DBD::Pg::st execute failed: ERROR: prepared statement "dbdpg_2" does not exist=20 |
From: Chris S. <sto...@pc...> - 2007-06-28 02:40:30
|
Hi Henrique, The person working on that here has left so not sure what the status =20 is. I can say that we have identified making OrthoMCL work as a =20 production system with GUS a high priority and will be addressing =20 this next month if we are successful in a hire. Sorry for the delay =20 but hope to get back to you soon. Cheers, Chris On Jun 26, 2007, at 4:54 PM, Henrique Juc=E1 wrote: > Hi folks, > > I've been wondering if anyone has been using the plugin created =20 > (OrthologGroupsMCL.pm) to insert OrthoMCL results into GUS tables. =20 > It's somewhat old, considering the current version so, has anyone =20 > tried it with the current GUS version? > > > Cheers, > > > Henrique Cesar Lemos Juc=E1 > > "The techniques of the Way of Peace change constantly; every =20 > encounter is unique, and the appropriate response should emerge =20 > naturally. Today's techniques will be different tomorrow. Do not =20 > get caught up with the form and appearance of a challenge. The Art =20 > of Peace has no form - it is the study of the spirit." > > The Art of Peace - Morihei Ueshiba, O-Sensei, (1883-1969) > ----------------------------------------------------------------------=20= > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: <mro...@cs...> - 2007-06-27 20:58:54
|
Hello gusdev members, I finally got WDK toy model version 1.11 working using version 1.12 documentation. I would like to call perl or java programs from inside the WDK system is it possible, and if someone has done could I get some hele please Thanks Michael Robinson Bioinformatics Research Group (Biorg) Florida International University ECS 254 Miami, Florida, USA 33199 |
From: <hen...@gm...> - 2007-06-26 20:54:07
|
Hi folks, I've been wondering if anyone has been using the plugin created ( OrthologGroupsMCL.pm) to insert OrthoMCL results into GUS tables. It's somewhat old, considering the current version so, has anyone tried it with the current GUS version? Cheers, Henrique Cesar Lemos Juc=E1 "The techniques of the Way of Peace change constantly; every encounter is unique, and the appropriate response should emerge naturally. Today's techniques will be different tomorrow. Do not get caught up with the form and appearance of a challenge. The Art of Peace has no form - it is the study of the spirit." The Art of Peace - Morihei Ueshiba, O-Sensei, (1883-1969) |
From: <mro...@cs...> - 2007-05-31 00:47:58
|
Thank you for your answers, well we decided to do a clean re-install and we got up to: wdkCache -model toyModel -new log4j:WARN No appenders could be found for logger (org.apache.commons.digester.Digester.sax). log4j:WARN Please initialize the log4j system properly. Making cache table gusdb.QueryInstance Creating sequence gusdb.QueryInstance_pkseq Done Command succeeded in 0.348 seconds On the old notes we noticed that in the last install, at the same place (creating the wdkCache) we got the same warnings and we ignored them, this time we want to fix them before we continue, There is this file called log4j.properties at gus_home/config ### direct log messages to stdout ### log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n #log4j.appender.file=org.apache.log4j.FileAppender #log4j.appender.file.File= #log4j.appender.file.Append = false #log4j.appender.file.layout=org.apache.log4j.PatternLayout #log4j.appender.file.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n ### set log levels - for more verbose logging change 'info' to 'debug' ### log4j.rootLogger=info, stdout #log4j.category.org.apache.commons=warn, stdout #log4j.category.org.gusdb.dba.model=debug, file #log4j.category.org.gusdb.dba.util=debug, stdout, file #log4j.category.org.gusdb.dba.reader=debug, stdout, file #log4j.category.org.gusdb.dba.writer=debug, stdout, file which seems to be the one causing the problem, We are looking for instructions on how to modify this file for the wdkToySite, please help. Thank you very much Michael > hi michael- > > i suspect tomcat is not seeing your controller jar file. > > steve > > mro...@cs... wrote: >> Hello folks; >> >> We installed wdk, tomcat 6.0.10 and we are trying to get the wdktoysite >> to >> work. >> >> Everything compiled and all data files got create. >> >> Now when we try to start the site inside tomcat we get the following >> errors: >> >> in catalina.log >> May 16, 2007 7:06:44 PM org.apache.catalina.core.StandardContext start >> SEVERE: Error listenerSta >> May 16, 2007 7:06:44 PM org.apache.catalina.core.StandardContext start >> SEVERE: Context [/wdktoysite] startup failed due to previous errors >> >> >> in localhost.log >> May 16, 2007 6:46:51 PM org.apache.catalina.core.StandardContext >> listenerStart >> SEVERE: Error configuring application listener of class >> org.gusdb.wdk.controller.ApplicationInitListener >> java.lang.ClassNotFoundException: >> org.gusdb.wdk.controller.ApplicationInitListener >> >> and then all other errors have the same explanation >> >> java.lang.ClassNotFoundException: >> org.gusdb.wdk.controller.ApplicationInitListener >> >> Has anybody run into this problem and how was it solved. >> >> >> >> Thanks >> >> Michael Robinson >> Bioinformatics Research Group (Biorg) >> Florida International University >> ECS 254 >> Miami, Florida, USA 33199 >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > |