Thread: [XMLPipeDB-developer] UniProt_All issues
Brought to you by:
kdahlquist,
zugzugglug
From: Dahlquist, K. D. <Kam...@lm...> - 2014-07-01 21:30:17
|
Hi, The export of H. pylori from the UniProt_All postgres database crashed at the relations tables. It was stuck and I clicked the cancel button and the Fatal error message came up shown on the wiki here: http://xmlpipedb.sourceforge.net/wiki/index.php/Kdahlquist#2014-06-27 I also note that the UniProt table had only about 500 records when ~1500 were expected (judging by the single-proteome export). I am sure this is because the TrEMBL records were missing. I looked into downloading TrEMBL, but it is ~47 GB compressed! Even if we wanted to deal with that large of a file, chances are that an export from that would have many more records than the proteome and not be suitable for our needs. So it looks like we are stuck with the single species proteomes for now. However, we should maybe still look into this bug because it might potentially affect other exports when multiple species are loaded. I was thinking that I might start loading multiple species' proteomes into the same postgres database so that I don't have to keep loading/processing GO all the time. Best, Kam |
From: John D. N. D. <do...@lm...> - 2014-07-01 21:35:50
|
Yes, it seems that the species filtering did not go all the way to the relationship exporting code. I will have to look at this more closely. Wow---47 GB compressed...now *that* is getting to "big data." And I agree it will probably not end up exporting a match for our needs. John David N. Dionisio, PhD Associate Professor, Computer Science Associate Director, University Honors Program Loyola Marymount University On Jul 1, 2014, at 2:28 PM, Dahlquist, Kam D. <Kam...@lm...> wrote: > Hi, > > The export of H. pylori from the UniProt_All postgres database crashed at the relations tables. It was stuck and I clicked the cancel button and the Fatal error message came up shown on the wiki here: > > http://xmlpipedb.sourceforge.net/wiki/index.php/Kdahlquist#2014-06-27 > > I also note that the UniProt table had only about 500 records when ~1500 were expected (judging by the single-proteome export). I am sure this is because the TrEMBL records were missing. I looked into downloading TrEMBL, but it is ~47 GB compressed! Even if we wanted to deal with that large of a file, chances are that an export from that would have many more records than the proteome and not be suitable for our needs. > > So it looks like we are stuck with the single species proteomes for now. > > However, we should maybe still look into this bug because it might potentially affect other exports when multiple species are loaded. > > I was thinking that I might start loading multiple species' proteomes into the same postgres database so that I don't have to keep loading/processing GO all the time. > > Best, > Kam > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > xmlpipedb-developer mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer |