From: Steve F. <sfi...@pc...> - 2004-05-12 23:14:07
|
madhura, paul- looking at paul's fix, it is not clear to me why it should work. my understanding of the nameForDb hash is that it maps incoming names to ones in the database in order to locate the proper external database with which to make the relationship. i suspect that those records that are not in the hash will not have the external links written. while this might not be essential data, if that is so, then it subverts the design of the plugin. if i understand the situation correctly, it is not simple. it is a problem of data integration. the heart of the problem is that this part of the plugin was not written in a portable way. we are loading data that links to entries in a set of external databases. we want to make sure that we correctly capture that information so that we can query on it later. to do so, we need to know: - all the databases referred to in the file, using the file's naming structure - for all those names, what is the GUS external database that it refers to in the actual instance of GUS that the plugin is running under. for a proper solution what we need is: - a configuration file (instead of a hard-coded hash) that maps the Pfam names to names in the DB - a pre-process run of the plugin which: - confirms that all GUS names in the config file are actually in the database - confirms that all names in the Pfam file are in the config file - puts out proper error messages so that the config file and/or database can be corrected. we could provide with the code a sample config file which holds the mapping used by CBIL. if you want to hack a solution, i think you should do what i described to madhura in a separate mail: - use UNIX's grep and sort -u to find all the names in the pfam file. - discover which of them are not in the hash - add entries in the db for them, and also, put them in the hash. steve Madhura Sharangpani wrote: >Hi Steve > >I was also coordinating with Paul Mooney and he found out a bug in >LoadPfam.pm which was causing the problem, he has fixed the bug and today he >also checked in the revised version of LoadPfam.pm into cvs repository. I >used that version and was able to successfully load Pfam-A.full file into >GUS. > >Thanks for your help! > >Best > >Madhura > >----- Original Message ----- >From: "Steve Fischer" <sfi...@pc...> >To: "Madhura Sharangpani" <sma...@st...> >Sent: Wednesday, May 12, 2004 10:15 AM >Subject: Re: Problem Loading pfam data into Gus > > > > >>madhura- >> >>i think i have an idea why this is happening. on line 97 there is a >>hash mapping from names found in the file to names found in our db. >>probably the name in the file (on or near line 14034 in the pfam file) >>is not in that hash. >> >>the use of that hash is not robust code. the mapping should be >>provided in a config file. >> >>i think the thing to do is discover all the DBs that your pfam file >>references. do that by using unix grep on the file, then pipe the >>output to unix's 'sort -u' which will make the list unique. compare >>that list to the hash. let me know what is not there. >> >>steve >> >>Madhura Sharangpani wrote: >> >> >> >>>Hi Steve >>> >>>Thanks for your reply! On observing ExternalDatabase.xml and >>>ExternalDatabaseRelease.xml files I >>>observed that ExternalDatabase.xml file has entries for id 3001,3002 and >>>3003 vs ExternalDatabaseRelease.xml file does not have them >>>I inserted those entries and then ran the command to lead pfam-A.full >>> >>> >file > > >>>again, it made progress as compared to before, the previous error >>>didnot occur >>>But it is still giving following error: >>> >>>*********************************************************** >>>elaine7:~/RA/pfam> ga >>>GUS::Common::Plugin::LoadPfam --verbose --parse_only >>>--flat_file=Pfam-A.full >>>.gz --release=13.0 >>>Reading properties from >>>/afs/ir/users/s/m/smadhura/RA/GUS/config/GUS-PluginMgr.prop >>>Reading properties from >>> >>> >/afs/ir/users/s/m/smadhura/RA/GUS/.gus.properties > > >>>ImportPfam: COMMIT OFF >>>ImportPfam: reading Pfam release 13.0 from Pfam-A.full.gz >>>In readExternalDbReleases() >>>0: PF00244.7 >>>1: PF00389.14 >>>2: PF02826.5 >>>3: PF00198.10 >>>4: PF04029.4 >>>5: PF02834.4 >>>6: PF03171.7 >>>7: PF03475.3 >>>8: PF06983.1 >>>9: PF06052.1 >>>10: PF01612.10 >>>11: PF00803.8 >>>12: PF01073.7 >>>13: PF06725.1 >>>14: PF02829.4 >>>15: PF00725.10 >>>16: PF02737.5 >>>17: PF05902.1 >>>18: PF02446.5 >>>19: PF04419.3 >>>20: PF03061.8 >>>21: PF01812.9 >>>22: PF06189.1 >>>23: PF01367.9 >>>24: PF02739.5 >>>25: PF05761.3 >>>26: PF02872.7 >>>27: PF03491.3 >>>28: PF02096.8 >>>Unable to find most recent ExternalDatabaseRelease for >>>ExternalDatabase:**** >>>and Id:**** at >>>/afs/ir/users/s/m/smadhura/RA/GUS/lib/perl/GUS/Common/Plugin/LoadPfam.pm >>>line 420, <PFAM> line 14034. >>> >>>gunzip: stdout: Broken pipe >>>elaine7:~/RA/pfam> >>>*********************************************************** >>> >>>Note that I had put the **before and after $name field and $relId field >>> >>> >at > > >>>line 420 in LoadPfam.pm to be able to read the name more clearly ( code >>>line: die "Unable to find most recent ExternalDatabaseRelease for >>>ExternalDatabase :**$name** and Id:**$relId**" if (not >>> >>> >defined($relId)); ) > > >>>and I see that for this error there is no name, $name field is >>> >>> >null/empty > > >>>and hence $relId is also empty >>> >>>Can you suggetst any reason why this is happening? >>> >>>Thanks! >>> >>>Madhura >>> >>> >>> |