Dear MIRIAM-team,
several lead metabolomics groups (Halle/DE, USDavis/US, EAWAG/CH, ...) are proposing a "key", the SPLASH, for spectra and implemented for mass spectra at this moment. Key metabolite spectra databases (HMDB, MassBank among them) have or are adopting the SPLASH. For example, the database in the example link, but several others. I will add further "resource" that use this identifier as comments (or email). MetaboLights is an example resource at the EMBL-EBI looking into the adopting it on their main website.
Information, source, code, etc can be found on these website: http://splash.fiehnlab.ucdavis.edu/, https://github.com/berlinguyinca/spectra-hash
Appearance of the SPLASH in MIRIAM will help us get the community into movement to make mass spectral data more interoperable. Please email me if you have questions,
Greetings,
Egon
The example URL is for the MoNA resource, which has these metadata:
Description: MassBank of North America (MoNA)
Access URL: http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/$1
Institution: UC Davis
Website: http://mona.fiehnlab.ucdavis.edu/
Dear MIRIAM team, it seems that the other resources that are adopting the SPLASH do not have "Access URLs" yet. I will add these as soon as I get them. The request for entering into MIRIAM still stands: it would really help us.
Hi Egon,
I've been trying to find a bit more information on the hash is generated, and what form the final identifier takes (always begins splash10?, will that ever change? how many dash delimited alphanumeric strings). If you have any more information, that would be very useful,
thanks for the request!
cheers
Nick
Yes, "splash" is fixed now (to allow recognition, like "InChI=")... the next digit is for the spectral type (1=MS), then a version number, and then the spectral info in the second two parts. I just discussed, and this is the regular expression most appropriate now and fairly future proof:
^splash\d.-[0-9A-z]+-[0-9A-z]+$
(it's case insensitive, but not sure how to encode that in a regular expression...)
Any further info you need? I already had the feeling the form did not have all the fields...
Hi Egon, thanks for the explanation. That makes sense.
(I assume the \d. should be a \d+ in the regex.)
And yes, the form is not exhaustive. We thought that asking users to complete long forms would likely be off-putting. To be honest, for most well documented or established sources, its normally quite easy to distill an accurate regex. Of course, thats more difficult for newer/complex efforts/identifiers such as this one.
Again, thanks for you help. Feel free to let me know if anything here is suboptimal and I will update the record if needed: http://identifiers.org/splash/
cheers
Nick
Actually, the \d. was correct :)
the first of the two chars is a digit, but the second can be a digit or a alphabetical char, and combined just two.
Ahh ok, no worries.
The alphabetical character is lower case? or can be either?
I'd really rather avoid a '.' as that is literally almost anything.
I will ask, but given that the rest is case insensitive, I suggest [0-9A-z] then.
changed regex, and completed as above
cheers
Nick