Thread: [XMLPipeDB-developer] Leishmania IDs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Dondi,

Here's a rundown of what we talked about yesterday with regards to 
the Leishmania major IDs and some explanations:

1.  LmjF##.#### This form is "old" according to GeneDB; we are going 
to keep it because it is used in our microarray data.

2.  LMJF.##.####  This form is the "official" new form according to 
GeneDB.  Thus we need it.

3.  LMJF_##_#### This form actually appears in UniProt records 
although it is not in our microarray data or GeneDB.  We are going to 
keep it because it's used by UniProt so somebody out there might be 
using it, too. (We just figured this out this morning.)

4.  "straggler IDs"  These are not of the above three forms, but are 
coming from UniProt.  They are synonyms of other genes that have one 
of the above three forms.  We are automatically capturing them with 
our code, so we are going to just keep them instead of trying to weed 
them out.  It's possible that somebody out there is using them since 
they are in UniProt.  Dondi, if you would construct a query to find 
out which UniProt records have these extra IDs, that would be 
great.  We just want it for our documentation.

In summary, we are going to have triplicates of all the IDs with 
forms 1, 2, and 3, + some extra number of stragglers in our 
OrderedLocusNames table.

Let's make sure that we are capturing IDs from both the Ordered Locus 
tag and the ORF tag (I think that's what you've been doing, but I 
just wanted to make sure).  You might be converting IDs to one of 
these forms, could you let us know what conversions you are doing so 
we can have that for our documentation?

Kevin is right now in the process of documenting ID forms for 
Leishmania infantum.  Right off the bat, it looks like we are going 
to have the same forms of 1, 2, and 3, except the prefix will be LinJ 
instead of LmjF.  Also, it looks like the IDs are appearing under the 
ORF tag instead of the OrderedLocus tag (an export with a generic 
profile turned up 0 OrderedLocus IDs), but we'd better capture from 
both tags, just in case.

Also, we are fast-tracking the creation of an L. infantum gdb because 
it turns out that the vast majority of IDs from the microarray data 
are to L. infantum instead of L. major.  Once we have an infantum 
gdb, we can create the combined gdb.  The next step would be a 
further customization that relates the L. major IDs to the L. infantum IDs.

Finally, in the microarray data, each gene has duplicate spots 
(duplicate rows).  Ideally, we would like to average the log fold 
changes for these as technical replicates before doing the 
statistics.  I think I have a script in Matlab that Dr. Fitzpatrick 
wrote for me to split my yeast data.  I'm going to try  it on the 
Leishmania data to see if it works.  If so, we would not need an 
additional script to split the data.

Best,
Kam

Thread: [XMLPipeDB-developer] Leishmania IDs

xmlpipedb-developer