gusdev-gusdev Mailing List for Genomic Unified Schema Development (Page 52)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

paul-  see in line

steve

Paul Mooney wrote:

>
> On 10 Dec 2004, at 12:52, Steve Fischer wrote:
>
>> paul-
>>
>> ok, i see.
>>
>> are there any other examples besides curation in which you have 
>> placed structured data in qualifiers?     are there examples of 
>> standard embl qualifiers in which you expect to find structured data 
>> and parse it?
>>
>
> After talking with Arnaud it seems we can take each 
> qualifier/structured field and create a new feature, with each one of 
> its qualifiers holding one piece of data. This would fit into your 
> mapping scheme.
>
ok.  great.  i was wondering about that.

so does that mean that we can expect that no qualifiers will contain 
structured data that needs to be parsed?

>> in the case of curation, where do you put that info in GUS?
>
>
> It will probably end up as a note, for now at least.
>
>>
>> about systematic_ids, i understand what you've said.   one thing 
>> though.  how do they relate to gene names?
>
>
ok, but, what i'm driving at is that the unflattener uses gene name 
(/gene=) to decide what features go together in one gene model.   
really, it wouldn't matter what the value of the /gene= is, as long as 
it is identical for all features that belong to the gene.   is that 
consistent with your use of /gene?

> They are the gene names :)
> Standard EMBL uses a /gene qualifier for the gene symbol and 
> /standard_name for the human readable name.
> During sequencing and annotation using a single /gene conveys no 
> meaning as to how stable/temporary the ID is.
>
>> steve
>>
>> Paul Mooney wrote:
>>
>>>
>>> On 9 Dec 2004, at 23:21, Steve Fischer wrote:
>>>
>>>> paul-
>>>>
>>>> let me start digesting this by email.
>>>>
>>>> about your extensions to EMBL.  the bioPerl model we are parsing 
>>>> into is based on generic features, tags and annotation.  as long as 
>>>> the extensions can be parsed into those objects we're half way 
>>>> there.   are the extensions syntactically consistent w/ standard 
>>>> embl files, but varying only in the particulars of what the data is 
>>>> called?
>>>
>>>
>>>
>>> We have additional qualifiers with values. The values hold 
>>> structured information (say key=value pairs).
>>> Bioperl will quite happily parse them into tags and values.
>>> What controls the mapping of a tag to a GUS objects(s)?
>>> What parses the structured information out to populate the object(s) 
>>> and fill in the objects fields (which is another mapping)?
>>>
>>> Something like this non-EMBL standard entry, curation, has several 
>>> values in a fixed field format;
>>>
>>>     /curation="name; origin; date; permission; type; dbref; notes ..."
>>> i.e.
>>>     /curation="Matt Berriman; genedb; 20020128; public; comment"
>>>
>>> How do we specify where to put this in GUS? It's very PSU specific. 
>>> Perhaps some sort of hook with specifying some perl code elsewhere 
>>> to handle it?
>>> We currently store GO annotation in EMBL like this;
>>>
>>>     /GO="aspect=process; GOid=GO:0006810; term=transport; 
>>> evidence=ISS; db_xref=GOC:unpublished; with=SPTR:Q9UQ36; date=20001122"
>>>
>>> as EMBL only has the format /db_xref="GO:00123" but I hope there is 
>>> a GO flat file loader so we don't have to worry about this in the 
>>> future.
>>>
>>>> about building the hierarchy.  if you looked at the bioperl api for 
>>>> the unflattener, you'd see that its unflattening uses gene name as 
>>>> a clue to deciding what features go together in a particular gene 
>>>> model.
>>>>
>>>> can gene name be relied upon to identify all the features that are 
>>>> associated with this gene?
>>>
>>>
>>>
>>> You can switch to use any qualifier you like to identify groups, but 
>>> you can only specify *one*.
>>> We can have 2 :)
>>> In the same sequence a gene may be identified by systematic_id.
>>> Another gene in the same sequence maybe identified by 
>>> temporary_systematic_id.
>>> Eventually all genes will get a systematic_id but not straight away.
>>>
>>> In theory it should be easy to modify the flattener to use a 'best 
>>> name first' policy.
>>>
>>> For TIGR XML you'd have PUB_LOCUS and LUCUS as the best names, in 
>>> that order. Their too mix identifiers but since the XML already has 
>>> a hierarchy you might get away with it????
>>>
>>>
>>>> finally, about the GO stuff, yes, we can probably reuse your code.
>>>>
>>>> steve
>>>>
>>>>
>>>> Paul Mooney wrote:
>>>>
>>>>>
>>>>> On 9 Dec 2004, at 19:31, Steve Fischer wrote:
>>>>>
>>>>>> paul-
>>>>>>
>>>>>> hey.  do you want to set up a time to chat so i can catch you up 
>>>>>> on what we have in mind?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At the moment I'm curious how much can be achieved via a generic 
>>>>> plugin. I think the plugin will need plugin's to do specialised 
>>>>> parts :) However I'd be glad to give my assistance to the effort. 
>>>>> Below are my random thoughts I've just had on the matter;
>>>>>
>>>>>
>>>>> Here at the PSU we store an awful lot of info that can not be 
>>>>> stored in a standard EMBL file, hence we have extended it to fit 
>>>>> out own needs. As an example we use several name qualifiers for 
>>>>> genes;
>>>>>
>>>>>     . systematic_id           - the name cast in stone
>>>>>     . temporary_systematic_id - the name as it is currently known
>>>>>     . previous_systematic_id  - as it was known
>>>>>     . gene                    - EMBL standard qualifier
>>>>>
>>>>> Hence just trying to unflatten the EMBL file is tricky because 
>>>>> systematic and temporary_sysetmatic_ids are mixed in the same 
>>>>> sequence, hence building the hierarchy would need specialised 
>>>>> code. TIGR XML has the same issue though so maybe its not too 
>>>>> specialised after all :/ (PUB_LOCUS and LOCUS has a direct mapping 
>>>>> to systematic_id and temporary_systematic_id).
>>>>>
>>>>> Something like this entry;
>>>>>     /curation="name; origin; date; permission; type; dbref; notes 
>>>>> ..."
>>>>> i.e.
>>>>>     /curation="Matt Berriman; genedb; 20020128; public; comment"
>>>>> is unique to the PSU and I'm not sure where it fits in GUS.
>>>>>
>>>>> However;
>>>>>
>>>>> I have code that creates GO entries - supply a high level function 
>>>>> with all the standard GO fields and it creates the 5 rows (?) in 
>>>>> the different tables as required. This is definitely something 
>>>>> that can be shared across centres, perhaps in a code library. All 
>>>>> your code has to do is parse out the GO fields from the data. No 
>>>>> reason why it couldn't accept a GO Bioperl object (I presume one 
>>>>> exists).
>>>>>
>>>>> Perhaps the parsing needs to a super class for each data source 
>>>>> and then sub-classed by each centre?
>>>>>
>>>>> Ok, enough ramblings. Does any of this make sense?
>>>>> Paul.
>>>>>
>>>>>> steve
>>>>>>
>>>>>> Chris Stoeckert wrote:
>>>>>>
>>>>>>> Hi Steve,
>>>>>>> Thanks for putting this out on gusdev. Marie-Adele indicated 
>>>>>>> that Paul Mooney was very interested in this and I will likely 
>>>>>>> meet with him about this when I visit in January. Please include 
>>>>>>> him in email correspondence when not addressed to the general 
>>>>>>> gusdev list.
>>>>>>> Thanks,
>>>>>>> Chris
>>>>>>>
>>>>>>> On Dec 9, 2004, at 2:11 PM, Steve Fischer wrote:
>>>>>>>
>>>>>>>> folks-
>>>>>>>>
>>>>>>>> the UGA folks and CBIL folks have started collaborating on a 
>>>>>>>> new plugin called LoadAnnotatedSeqs.   It will use BioPerl to 
>>>>>>>> parse the input data.
>>>>>>>>
>>>>>>>> We expect it to take annotated sequences (NA at first) in 
>>>>>>>> genbank, tigr xml and embl formats (plus any others supported 
>>>>>>>> by the bioPerl parser).
>>>>>>>>
>>>>>>>> It will take an XML file that describes the mapping from the 
>>>>>>>> input features to GUS features, and SO features.
>>>>>>>> It will also hard code special cases to handle qualifer data 
>>>>>>>> that is distributed to tables outside of the NAFeature tables.
>>>>>>>>
>>>>>>>> For our projects we will be developing a mapping that unifies 
>>>>>>>> the semantics of the data we are getting from our different 
>>>>>>>> sources and formats.
>>>>>>>> (we plan to work with the PSU folks to incorporate the 
>>>>>>>> knowledge they have acquired in their work to make an EMBL parser)
>>>>>>>>
>>>>>>>> ideas and suggestions are encouraged.
>>>>>>>>
>>>>>>>> steve
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------
>>>>>>>> SF email is sponsored by - The IT Product Guide
>>>>>>>> Read honest & candid reviews on hundreds of IT Products from 
>>>>>>>> real users.
>>>>>>>> Discover which products truly live up to the hype. Start 
>>>>>>>> reading now. http://productguide.itmanagersjournal.com/
>>>>>>>> _______________________________________________
>>>>>>>> Gusdev-gusdev mailing list
>>>>>>>> Gus...@li...
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

2002	Jan	Feb	Mar	Apr	May	Jun (11)	Jul (34)	Aug (14)	Sep (10)	Oct (10)	Nov (11)	Dec (6)
2003	Jan (56)	Feb (76)	Mar (68)	Apr (11)	May (97)	Jun (16)	Jul (29)	Aug (35)	Sep (18)	Oct (32)	Nov (23)	Dec (77)
2004	Jan (52)	Feb (44)	Mar (55)	Apr (38)	May (106)	Jun (82)	Jul (76)	Aug (47)	Sep (36)	Oct (56)	Nov (46)	Dec (61)
2005	Jan (52)	Feb (118)	Mar (41)	Apr (40)	May (35)	Jun (99)	Jul (84)	Aug (104)	Sep (53)	Oct (107)	Nov (68)	Dec (30)
2006	Jan (19)	Feb (27)	Mar (24)	Apr (9)	May (22)	Jun (11)	Jul (34)	Aug (8)	Sep (15)	Oct (55)	Nov (16)	Dec (2)
2007	Jan (12)	Feb (4)	Mar (8)	Apr	May (19)	Jun (3)	Jul (1)	Aug (6)	Sep (12)	Oct (3)	Nov	Dec
2008	Jan (4)	Feb	Mar	Apr	May (1)	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec (21)
2009	Jan	Feb (2)	Mar (1)	Apr	May (1)	Jun (8)	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb (1)	Mar (4)	Apr (3)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr (4)	May (19)	Jun (14)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar (22)	Apr (12)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (2)	Nov	Dec
2015	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug (2)	Sep	Oct	Nov	Dec (1)
2016	Jan (1)	Feb (1)	Mar	Apr (1)	May	Jun (2)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec
2017	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

gusdev-gusdev Mailing List for Genomic Unified Schema Development (Page 52)

gusdev-gusdev — Topics concerning GUS development