|
From: Dr. A. W. L. <ob...@fm...> - 2010-05-03 04:22:27
|
Dear *,
great ideas. But remember our experiments with ANTLR. Chopping the file
by means of a grammar extremely slowed down the parsing extremely,
espescially for large data sections. Therefore we had decided to
preparse the FMF files, such that the parsing of [*data]-section is
accomplished by numpys IO-functions and all other sections by a INI-Parser.
All the best,
Andreas
Am 02.05.10 15:02, schrieb Klaus Zimmermann:
> Hello all,
>
>
> Am Freitag, den 30.04.2010, 21:40 +0200 schrieb Rolf Wuerdemann:
>> On Thu, 29 Apr 2010 07:34:09 +0200, "Dr. Andreas W. Liehr"
>> <ob...@fm...> wrote:
>>> There is no doubt, that the quantities module should belong to the
>>> FMF package. But, what about DataContainers? Pyphant as such, does
>>> not play any restrictions on the data objects to be transfered
>>> between the workers, but on the other hand uses DataContainers
>>> heavily for Visualisation and the Knowledge Cloud.
>>>
>>> Of course a FMF module could return a dictionary of dictionary
>>> containing the metadata and a set of numpy arrays, containing the
>>> data columns.
> [...]
>> Klaus also pronounced the very good idea of using an sax-like
>> approach where we generate events wo help people parsing the files.
>> I would suggest to give people both possibilities: "In Memory"
>> (dict type) and "event driven" (sax type).
>
> To formalize my proposal a little more:
> Say someone wanted to read in the simple fmf file
>
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
> [*reference]
> author: Max Mustermann
> place: Musterstadt
>
> [*data definition]
> wavelength: \lambda [nm]
> absorption: A(\lambda)
>
> [*data]
> 500 .2
> 510 .4
> 520 .6
> 530 .4
> 540 .2
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
>
> She would give it to the parser. The parser would start forming tokens
> in a stream like model. It would then call (in order)
>
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
> start_section("*reference")
> item("author", "Max Mustermann")
> item("place", "Musterstadt")
> start_data_definition("")
> column("wavelength", "\lambda", Quantity("1 nm"))
> column("absorption", "A", 1)
> start_data("")
> cell(500)
> cell(.2)
> new_row()
> cell(510)
> cell(.4)
> .
> .
> .
> ---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
>
> This model, while at first glance perhaps a bit complicated has some
> intriguing features:
> 1) No dependence on any data structure.
> While there are associative arrays in many languages, like dicts in
> python, in equally many languages these are available only through non
> standard libraries or not at all. Moreover arbitrary data structures may
> be used to store the information retrieved by the parser without the
> need for copying of stuff in memory.
> 2) Applicability to many languages.
> While data structures may vary, many languages have the concept of
> procedures and strings.
> 3) I/O Symmetry.
> Want to write a fmf file? Just call the same sequence on the writer!
> This also facilitates easy use as a network streaming protocol.
> 4) Extensibility.
> The functions do not have to be hard coded. Even the names can be
> allowed to very. Just provide the parser with a list of function
> pointers. If you want to add a new item type provide a dispatch
> function. A set of item types could be comfortably managed with the use
> of regular expressions for example.
> 5) Back and forward compatibility.
> If one of the above mentioned mechanisms is exploited to provide an
> extension this would not break compatibility on a lower level. At the
> very least every parser would still be capable of extracting pairs of
> strings, equating to level 0.
>
> How does all of this relate to the entrance question of Data containers
> and stuff? Such a parser, if available in a fmf library could still do
> most of the difficult stuff like converting the various entries to
> appropriate objects in the respective language. It would however have a
> greatly reduced set of dependencies, thus further simplifying its use
> and provide a powerful means for the construction of application
> specific front ends, be it a vba macro to feed the information into an
> excel sheet or an embedded c library to configure a micro controller on
> the fly without ever storing the information transmitted.
>
> So what do you think?
>
> Cheers,
> Klaus
>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Pyphant-devel mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyphant-devel
|