Menu

#42 De-colonization of Sarracenia

Sarra Beta
closed
nobody
None
7
2018-06-02
2016-06-21
psilva
No

We have populated a tree for use by sarracenia tools, but we are still using the file names from sundew.
Those files have a bunch of fields called an 'extension' in sundew parlance. the extension is used for some routing, so those who route still need it. Getting rid of colons is a design goal:

-- colons in file names are very bad on windows (hint drive letter separator)
-- many comments received from potential adopters about 'can I get rid of the colon?
-- many users deterred from using sundew in the first place by the requirement to create 'PDS names'
-- burying of the file extensions in the middle of the name (before the sundew suffix) makes it invisible
to many programs that process the files. Examples, any file receiving the data, but also apache itself.

Avoiding use of the sundew extensions as part of the file name is an important improvement
that is part of transition to sarracenia.

Suggestion #1: sundew_ext AMQP header, put the colons in there.

modify the notify.py scripts used in sundew as follows:

hdr['sundew_ext']=<the colon="" separated="" fields="" of="" the="" file="" name=""></the>

this should create an AMQP header in the messages called 'sundew_ext'

then it is easy to write in a config file:

msg_sundew_ext_match .UKMET-BACKUP.
on_msg msg_sundew_ext

so msg_sundew_ext is a plugin that compiles the regex and compares it to the header.

for a receiver that wants the extension field, they just add them back to the file name using
an on_file plugin we could also create... optionally, could use extended attributes for that
so it would work on windows also.

Suggestion #2: a bunch of sundew_ext headers...

instead of one header, name each of the fields, and get rid of the colons entirely

say we name the field
sundew_type:sundew_circuit:sundew_pri:sundew_rxstamp:

 the txstamp on the names in some cases?

sundew_type
sundew_circuit
sundew_pri
sundew_rxstamp
sundew_txstamp

then have a plugin:

msg_sundew_pri 5
msg_sundew_circuit UKMET-BACKUP
on_message msg_sundew_ext

So instead of using regexes on the whole extension, you can just say the value of the field of interest.
Usually there is only one field of interest.

One might just match the specific value in the extension, or could use regexes for matching each value also.

Suggestion #3:

replace colon with § ... this unicode character is not a colon, therefore does not suffer the problems of colons, on the other hand, the file name extensions are still buried.

Discussion

  • psilva

    psilva - 2016-06-22

    if we compare all of the suggestions that put the colons in the headers, then if the file is on disk, and one doesn't have the header, then you can't get it back... re-transmission and such.
    It might be a good idea to save the extension in an extended attribute, but I dunno if we can do that in a sender, which is the source of everything from metpx... might be able to use a plugin in a shovel.

     
  • psilva

    psilva - 2016-06-24

    to deal with the issue above, can store the header in file extended attributes. so clients that write
    files would just need to persist the header to the file in the form of an extended attribute, and post/watch etc... could read it when needed... means use of pyxattr library, which is already in the 16.04 repos, and this feature is likely to be used by user written plugins anyways.

    so option 1 (single variable) is likely most compatible (doesn´t split anything up.)
    add extended attributes (probably not initially required, but eventually good to have)
    problem... have to worry about namespace for attribute.

     
  • psilva

    psilva - 2016-06-27

    new driver: extensions make the file names longer, and we are having issues with path names exceeding 255 characters...

     
  • psilva

    psilva - 2016-10-05

    Jun has deployed sundew_extension on all the sundew scripts. Now need to build reference to those options into the sender.

     
  • psilva

    psilva - 2016-10-05

    Seeing lots of cases of file names being too long. Bug#42 fix truncates the strings used in headers, which at least lets the data flow, but getting shorter names, by ditching the colons, would be a good thing.

     
  • psilva

    psilva - 2016-11-11

    just committed 776b41a5513c4fac935cdc1aaedca9743464ef8b which adds code to include sundew_extension in sr_subscribe accept/reject, so no longer need the full long file names to have patterns that filter by extension.

     
  • psilva

    psilva - 2016-12-03

    next step is to start specifying WHATFN on some feeds to confirm that all the routing works as before even thought the colon's are no longer in the file names. first choice would be NSWOB.

    downstream users that want the colons back just add:

    filename NONE

    I think.

    note: retransmission still not addressed, but not sure it is a real problem.

     
  • psilva

    psilva - 2018-06-02
    • status: open --> closed
     
  • psilva

    psilva - 2018-06-02
     
MongoDB Logo MongoDB