Re: [XMLPipeDB-developer] M smeg source data files?
Brought to you by:
kdahlquist,
zugzugglug
From: Kam D. <kda...@lm...> - 2011-03-03 22:19:27
|
Hi, I have resolved any outstanding issues for Mycobacterium smegmatis. 1. The import/export I completed with the 11/28 version of files retrieved from the S120 Mac matched with both the 20101130 and 20110128 versions of the gdb. Thus, it was a mistake with the version of the source files posted to the A-team page, but it *WAS* the 11/28 versions that were used to generate the gdb's. 2. I was able to correctly state the version of the source files for the ReadMe; we are going to release the 20110128 version (with the corrected link-out). 3. There is an issue with the TallyEngine for M. smegmatis with regards to the number of RefSeq and GeneIds. TallyEngine reports 6720 for both the XML and database, while there are only 6716 in the gdb. I ran match on the XML for the pattern for RefSeq and GeneId and found that there are only 6716 of each of those IDs in the XML. So I don't know why there are four extra being counted by the TallyEngine. Since this is only affecting the TallyEngine and not the gdb, I think we are OK to release. I'm going to send the files in a separate e-mail. Cheers, Kam >On Mar 1, 2011, at 4:09 PM, Kam Dahlquist ><<mailto:kda...@lm...>kda...@lm...> wrote: > > > Hi, > > > > I've got an update on this. > > > > I went an retrieved the source files from the Mac Rich was using in > > Seaver 120. It appears that while the 11/28/10 versions of the > > source files were uploaded to the wiki, the 11/16/10 versions of the > > files were the ones actually used to export the deliverable gdb. > > > > This doesn't affect the versioning info on the ReadMe for the UniProt > > XML because that is only updated once a month and appears to be the > > same file on both of the above dates. However, both the GO and GOA > > files are different. We do not have the version info for the 11/1610 > > GO OBO-XML file (at least I don't know where it is). And the version > > info for the GOA file in the ReadMe is incorrect, it is not reporting > > the right information for GOA, but is repeating what was said for the > > UniProt XML. > > > > So at this point, we can't release the M. smegmatis gdb until we get > > the versioning issues resolved and are sure that the readme is correct. > > > > I decided to try downloading fresh source files and just > > importing/exporting a new gdb. However, the GO OBO-XML gave an > > error. Either this particular GO file is bad or there is a change in > > the format. I'm going to put this to the side for now until I can > > work with Rich to resolve the above versioning issues and move on to > > working on the other gdb's. > > > > I've been annotating the task list on the wiki with my progress. > > > > > <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Task_List>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Task_List > > > > Best, > > Dr. D > > > > At 06:52 PM 2/28/2011, Kam Dahlquist wrote: > >> Hi Rich, > >> > >> I ran an export of the Mycobacterium smegmatis gdb today with the > >> data source files (UniProt XML, GO OBO-XML, GOA) found on the A team > >> deliverables page. However, I didn't get the same number of GO > >> terms that you did when you last exported an M smeg gdb on > >> 2011-01-28 or as your team turned in for the deliverables. the 1/28 > >> version was identical to the deliverables version, but my version > >> had a different number of GO terms. I wanted to double-check with > >> you that those were the correct GO XML and GOA files to use. > >> > >> Thanks, > >> Dr. D |