Thread: [XMLPipeDB-developer] Leishmania IDs
Brought to you by:
kdahlquist,
zugzugglug
From: Kam D. <kda...@lm...> - 2014-02-25 19:41:51
|
Hi Dondi, Here's a rundown of what we talked about yesterday with regards to the Leishmania major IDs and some explanations: 1. LmjF##.#### This form is "old" according to GeneDB; we are going to keep it because it is used in our microarray data. 2. LMJF.##.#### This form is the "official" new form according to GeneDB. Thus we need it. 3. LMJF_##_#### This form actually appears in UniProt records although it is not in our microarray data or GeneDB. We are going to keep it because it's used by UniProt so somebody out there might be using it, too. (We just figured this out this morning.) 4. "straggler IDs" These are not of the above three forms, but are coming from UniProt. They are synonyms of other genes that have one of the above three forms. We are automatically capturing them with our code, so we are going to just keep them instead of trying to weed them out. It's possible that somebody out there is using them since they are in UniProt. Dondi, if you would construct a query to find out which UniProt records have these extra IDs, that would be great. We just want it for our documentation. In summary, we are going to have triplicates of all the IDs with forms 1, 2, and 3, + some extra number of stragglers in our OrderedLocusNames table. Let's make sure that we are capturing IDs from both the Ordered Locus tag and the ORF tag (I think that's what you've been doing, but I just wanted to make sure). You might be converting IDs to one of these forms, could you let us know what conversions you are doing so we can have that for our documentation? Kevin is right now in the process of documenting ID forms for Leishmania infantum. Right off the bat, it looks like we are going to have the same forms of 1, 2, and 3, except the prefix will be LinJ instead of LmjF. Also, it looks like the IDs are appearing under the ORF tag instead of the OrderedLocus tag (an export with a generic profile turned up 0 OrderedLocus IDs), but we'd better capture from both tags, just in case. Also, we are fast-tracking the creation of an L. infantum gdb because it turns out that the vast majority of IDs from the microarray data are to L. infantum instead of L. major. Once we have an infantum gdb, we can create the combined gdb. The next step would be a further customization that relates the L. major IDs to the L. infantum IDs. Finally, in the microarray data, each gene has duplicate spots (duplicate rows). Ideally, we would like to average the log fold changes for these as technical replicates before doing the statistics. I think I have a script in Matlab that Dr. Fitzpatrick wrote for me to split my yeast data. I'm going to try it on the Leishmania data to see if it works. If so, we would not need an additional script to split the data. Best, Kam |
From: John D. N. D. <do...@lm...> - 2014-02-25 20:24:05
|
Hi Kam, Sounds good, thanks for the updates and details. I'll put the queries together then work on a new versions of the profiles (new profile in the case of infantum). I had ended my office hours a few minutes early due to an 11am meeting in UHall. But that's over now :) John David N. Dionisio, PhD Associate Professor, Computer Science Associate Director, University Honors Program Loyola Marymount University > On Feb 25, 2014, at 11:41 AM, Kam Dahlquist <kda...@lm...> wrote: > > Hi Dondi, > > Here's a rundown of what we talked about yesterday with regards to > the Leishmania major IDs and some explanations: > > 1. LmjF##.#### This form is "old" according to GeneDB; we are going > to keep it because it is used in our microarray data. > > 2. LMJF.##.#### This form is the "official" new form according to > GeneDB. Thus we need it. > > 3. LMJF_##_#### This form actually appears in UniProt records > although it is not in our microarray data or GeneDB. We are going to > keep it because it's used by UniProt so somebody out there might be > using it, too. (We just figured this out this morning.) > > 4. "straggler IDs" These are not of the above three forms, but are > coming from UniProt. They are synonyms of other genes that have one > of the above three forms. We are automatically capturing them with > our code, so we are going to just keep them instead of trying to weed > them out. It's possible that somebody out there is using them since > they are in UniProt. Dondi, if you would construct a query to find > out which UniProt records have these extra IDs, that would be > great. We just want it for our documentation. > > In summary, we are going to have triplicates of all the IDs with > forms 1, 2, and 3, + some extra number of stragglers in our > OrderedLocusNames table. > > Let's make sure that we are capturing IDs from both the Ordered Locus > tag and the ORF tag (I think that's what you've been doing, but I > just wanted to make sure). You might be converting IDs to one of > these forms, could you let us know what conversions you are doing so > we can have that for our documentation? > > Kevin is right now in the process of documenting ID forms for > Leishmania infantum. Right off the bat, it looks like we are going > to have the same forms of 1, 2, and 3, except the prefix will be LinJ > instead of LmjF. Also, it looks like the IDs are appearing under the > ORF tag instead of the OrderedLocus tag (an export with a generic > profile turned up 0 OrderedLocus IDs), but we'd better capture from > both tags, just in case. > > Also, we are fast-tracking the creation of an L. infantum gdb because > it turns out that the vast majority of IDs from the microarray data > are to L. infantum instead of L. major. Once we have an infantum > gdb, we can create the combined gdb. The next step would be a > further customization that relates the L. major IDs to the L. infantum IDs. > > Finally, in the microarray data, each gene has duplicate spots > (duplicate rows). Ideally, we would like to average the log fold > changes for these as technical replicates before doing the > statistics. I think I have a script in Matlab that Dr. Fitzpatrick > wrote for me to split my yeast data. I'm going to try it on the > Leishmania data to see if it works. If so, we would not need an > additional script to split the data. > > Best, > Kam > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > xmlpipedb-developer mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer |