|
From: Klaus Z. <kla...@fm...> - 2010-05-02 13:02:44
Attachments:
smime.p7s
|
Hello all,
Am Freitag, den 30.04.2010, 21:40 +0200 schrieb Rolf Wuerdemann:
> On Thu, 29 Apr 2010 07:34:09 +0200, "Dr. Andreas W. Liehr"
> <ob...@fm...> wrote:
> > There is no doubt, that the quantities module should belong to the
> > FMF package. But, what about DataContainers? Pyphant as such, does
> > not play any restrictions on the data objects to be transfered
> > between the workers, but on the other hand uses DataContainers
> > heavily for Visualisation and the Knowledge Cloud.
> >
> > Of course a FMF module could return a dictionary of dictionary
> > containing the metadata and a set of numpy arrays, containing the
> > data columns.
[...]
> Klaus also pronounced the very good idea of using an sax-like
> approach where we generate events wo help people parsing the files.
> I would suggest to give people both possibilities: "In Memory"
> (dict type) and "event driven" (sax type).
To formalize my proposal a little more:
Say someone wanted to read in the simple fmf file
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
[*reference]
author: Max Mustermann
place: Musterstadt
[*data definition]
wavelength: \lambda [nm]
absorption: A(\lambda)
[*data]
500 .2
510 .4
520 .6
530 .4
540 .2
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
She would give it to the parser. The parser would start forming tokens
in a stream like model. It would then call (in order)
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
start_section("*reference")
item("author", "Max Mustermann")
item("place", "Musterstadt")
start_data_definition("")
column("wavelength", "\lambda", Quantity("1 nm"))
column("absorption", "A", 1)
start_data("")
cell(500)
cell(.2)
new_row()
cell(510)
cell(.4)
.
.
.
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
This model, while at first glance perhaps a bit complicated has some
intriguing features:
1) No dependence on any data structure.
While there are associative arrays in many languages, like dicts in
python, in equally many languages these are available only through non
standard libraries or not at all. Moreover arbitrary data structures may
be used to store the information retrieved by the parser without the
need for copying of stuff in memory.
2) Applicability to many languages.
While data structures may vary, many languages have the concept of
procedures and strings.
3) I/O Symmetry.
Want to write a fmf file? Just call the same sequence on the writer!
This also facilitates easy use as a network streaming protocol.
4) Extensibility.
The functions do not have to be hard coded. Even the names can be
allowed to very. Just provide the parser with a list of function
pointers. If you want to add a new item type provide a dispatch
function. A set of item types could be comfortably managed with the use
of regular expressions for example.
5) Back and forward compatibility.
If one of the above mentioned mechanisms is exploited to provide an
extension this would not break compatibility on a lower level. At the
very least every parser would still be capable of extracting pairs of
strings, equating to level 0.
How does all of this relate to the entrance question of Data containers
and stuff? Such a parser, if available in a fmf library could still do
most of the difficult stuff like converting the various entries to
appropriate objects in the respective language. It would however have a
greatly reduced set of dependencies, thus further simplifying its use
and provide a powerful means for the construction of application
specific front ends, be it a vba macro to feed the information into an
excel sheet or an embedded c library to configure a micro controller on
the fly without ever storing the information transmitted.
So what do you think?
Cheers,
Klaus
|
|
From: Dr. A. W. L. <ob...@fm...> - 2010-05-03 04:22:27
Attachments:
smime.p7s
|
Dear *,
great ideas. But remember our experiments with ANTLR. Chopping the file
by means of a grammar extremely slowed down the parsing extremely,
espescially for large data sections. Therefore we had decided to
preparse the FMF files, such that the parsing of [*data]-section is
accomplished by numpys IO-functions and all other sections by a INI-Parser.
All the best,
Andreas
Am 02.05.10 15:02, schrieb Klaus Zimmermann:
> Hello all,
>
>
> Am Freitag, den 30.04.2010, 21:40 +0200 schrieb Rolf Wuerdemann:
>> On Thu, 29 Apr 2010 07:34:09 +0200, "Dr. Andreas W. Liehr"
>> <ob...@fm...> wrote:
>>> There is no doubt, that the quantities module should belong to the
>>> FMF package. But, what about DataContainers? Pyphant as such, does
>>> not play any restrictions on the data objects to be transfered
>>> between the workers, but on the other hand uses DataContainers
>>> heavily for Visualisation and the Knowledge Cloud.
>>>
>>> Of course a FMF module could return a dictionary of dictionary
>>> containing the metadata and a set of numpy arrays, containing the
>>> data columns.
> [...]
>> Klaus also pronounced the very good idea of using an sax-like
>> approach where we generate events wo help people parsing the files.
>> I would suggest to give people both possibilities: "In Memory"
>> (dict type) and "event driven" (sax type).
>
> To formalize my proposal a little more:
> Say someone wanted to read in the simple fmf file
>
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
> [*reference]
> author: Max Mustermann
> place: Musterstadt
>
> [*data definition]
> wavelength: \lambda [nm]
> absorption: A(\lambda)
>
> [*data]
> 500 .2
> 510 .4
> 520 .6
> 530 .4
> 540 .2
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
>
> She would give it to the parser. The parser would start forming tokens
> in a stream like model. It would then call (in order)
>
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
> start_section("*reference")
> item("author", "Max Mustermann")
> item("place", "Musterstadt")
> start_data_definition("")
> column("wavelength", "\lambda", Quantity("1 nm"))
> column("absorption", "A", 1)
> start_data("")
> cell(500)
> cell(.2)
> new_row()
> cell(510)
> cell(.4)
> .
> .
> .
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
>
> This model, while at first glance perhaps a bit complicated has some
> intriguing features:
> 1) No dependence on any data structure.
> While there are associative arrays in many languages, like dicts in
> python, in equally many languages these are available only through non
> standard libraries or not at all. Moreover arbitrary data structures may
> be used to store the information retrieved by the parser without the
> need for copying of stuff in memory.
> 2) Applicability to many languages.
> While data structures may vary, many languages have the concept of
> procedures and strings.
> 3) I/O Symmetry.
> Want to write a fmf file? Just call the same sequence on the writer!
> This also facilitates easy use as a network streaming protocol.
> 4) Extensibility.
> The functions do not have to be hard coded. Even the names can be
> allowed to very. Just provide the parser with a list of function
> pointers. If you want to add a new item type provide a dispatch
> function. A set of item types could be comfortably managed with the use
> of regular expressions for example.
> 5) Back and forward compatibility.
> If one of the above mentioned mechanisms is exploited to provide an
> extension this would not break compatibility on a lower level. At the
> very least every parser would still be capable of extracting pairs of
> strings, equating to level 0.
>
> How does all of this relate to the entrance question of Data containers
> and stuff? Such a parser, if available in a fmf library could still do
> most of the difficult stuff like converting the various entries to
> appropriate objects in the respective language. It would however have a
> greatly reduced set of dependencies, thus further simplifying its use
> and provide a powerful means for the construction of application
> specific front ends, be it a vba macro to feed the information into an
> excel sheet or an embedded c library to configure a micro controller on
> the fly without ever storing the information transmitted.
>
> So what do you think?
>
> Cheers,
> Klaus
>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Pyphant-devel mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyphant-devel
|
|
From: Rolf W. <ro...@di...> - 2010-05-06 22:10:02
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Klaus Zimmermann schrieb: > Hello all, > > > Am Freitag, den 30.04.2010, 21:40 +0200 schrieb Rolf Wuerdemann: >> On Thu, 29 Apr 2010 07:34:09 +0200, "Dr. Andreas W. Liehr" >> <ob...@fm...> wrote: >>> [...] > > So what do you think? We had discussed this topic some times .... My choice would be some hybride ... A set of hooks with pre-defined functions. Users can use our supplied functions and get the data in an language approbiate manner (whatever this means). They can also supply their own functions and store the data in any way they likes to (but are also responsible for the storage) ... Why I would choose this way? For the most applications an "In Memory" storage would be nice (esp. for smaller applications) and if we force user to set up their own data storage they can (and will) also write their own libs and perhaps will use other formats (csv?) I see the big usage of fmf - and from this point I like to take the entry point for using fmf as low as possible ... from this point of view I would recommend the above mentioned hybride - which so can use in a way he or she wants - and no exclusive-or solution. > > Cheers, > Klaus > > Kind regards, Rolf - -- Security is an illusion - Datasecurity twice Rolf Würdemann - ro...@di... GnuPG fingerprint: 7383 348F 67D1 CD27 C90F DDD0 86A3 31B6 67F0 D02F jabber: ro...@di... ECF127C7 EAB85F87 BC75ACB5 2EC646D4 99211A31 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkvjPjAACgkQhqMxtmfw0C8t+QCgmPPrj9q7kbiymw/vLLYvu65O 3m4An2TK117JGyiBokOrOyaHj18bhR6V =aWyW -----END PGP SIGNATURE----- |