If we want to make RDML files compatible with public repositories which may arise in the future, it could be a good idea to add some kind of unique RDML ID. This ID could be the compound of an instance/publisher/issuer (I don't know an appropriate english term) and a serial number, the combination of which should be unique. The publisher could be something like 'RDML consortium', NCBI, ProgramX-MAC address.
In the future we could even add some kind of HASH-code or key to allow the evalutation of the validity and integrity of the RDML file. This would be an additional benefit of RDML for organisations working according to the CFR21Part11 guidelines of the FDA.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yeah an ID is a crucial thing. Especially as people are bound to have bags full of instances of this format lying around if it goes well.
For example, the PRIDE database at EBI (proteomics) takes XML submissions, but they must have that unique ID. The discussion of how to achieve uniqueness is ongoing iirc (I think PRIDE still does it itself at the moment), but the intent is to set up an authority to dish them out (basically a server that keeps track of assigned accessions and an provide unassigned ones singly or as a block [think curation]).
If we want to make RDML files compatible with public repositories which may arise in the future, it could be a good idea to add some kind of unique RDML ID. This ID could be the compound of an instance/publisher/issuer (I don't know an appropriate english term) and a serial number, the combination of which should be unique. The publisher could be something like 'RDML consortium', NCBI, ProgramX-MAC address.
In the future we could even add some kind of HASH-code or key to allow the evalutation of the validity and integrity of the RDML file. This would be an additional benefit of RDML for organisations working according to the CFR21Part11 guidelines of the FDA.
Ok, sounds good. I agree. Can you make an example?
It would look something like this :
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="rdml_id_type">
<xs:sequence>
<xs:element name="publisher" type="xs:string"/>
<xs:element name="id" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:element name="rdml">
<xs:complexType>
<xs:sequence>
<xs:element name="version" type="xs:string"/>
<xs:element name="experiments_folder" type="experiments_folder_type"/>
<xs:element name="third_party_extensions" type="third_party_extensions_type" minOccurs="0"/>
<xs:element name="rdml_id" type="rdml_id_type" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Yeah an ID is a crucial thing. Especially as people are bound to have bags full of instances of this format lying around if it goes well.
For example, the PRIDE database at EBI (proteomics) takes XML submissions, but they must have that unique ID. The discussion of how to achieve uniqueness is ongoing iirc (I think PRIDE still does it itself at the moment), but the intent is to set up an authority to dish them out (basically a server that keeps track of assigned accessions and an provide unassigned ones singly or as a block [think curation]).
(For reference: http://www.ebi.ac.uk/pride/\)
Cheers, Chris.