[Pyphant-devel] New fmf library interface

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello all,

Am Freitag, den 30.04.2010, 21:40 +0200 schrieb Rolf Wuerdemann:
> On Thu, 29 Apr 2010 07:34:09 +0200, "Dr. Andreas W. Liehr"
> <ob...@fm...> wrote:
> > There is no doubt, that the quantities module should belong to the
> > FMF package. But, what about DataContainers? Pyphant as such, does
> > not play any restrictions on the data objects to be transfered
> > between the workers, but on the other hand uses DataContainers
> > heavily for Visualisation and the Knowledge Cloud.
> > 
> > Of course a FMF module could return a dictionary of dictionary 
> > containing the metadata and a set of numpy arrays, containing the
> > data columns. 
[...]
> Klaus also pronounced the very good idea of using an sax-like
> approach where we generate events wo help people parsing the files.
> I would suggest to give people both possibilities: "In Memory" 
> (dict type) and "event driven" (sax type).

To formalize my proposal a little more:
Say someone wanted to read in the simple fmf file

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
[*reference]
author: Max Mustermann
place: Musterstadt

[*data definition]
wavelength: \lambda [nm]
absorption: A(\lambda)

[*data]
500 .2
510 .4
520 .6
530 .4
540 .2
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

She would give it to the parser. The parser would start forming tokens
in a stream like model. It would then call (in order)

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---
start_section("*reference")
item("author", "Max Mustermann")
item("place", "Musterstadt")
start_data_definition("")
column("wavelength", "\lambda", Quantity("1 nm"))
column("absorption", "A", 1)
start_data("")
cell(500)
cell(.2)
new_row()
cell(510)
cell(.4)
.
.
.
---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

This model, while at first glance perhaps a bit complicated has some
intriguing features:
  1) No dependence on any data structure.
While there are associative arrays in many languages, like dicts in
python, in equally many languages these are available only through non
standard libraries or not at all. Moreover arbitrary data structures may
be used to store the information retrieved by the parser without the
need for copying of stuff in memory.
  2) Applicability to many languages.
While data structures may vary, many languages have the concept of
procedures and strings.
  3) I/O Symmetry.
Want to write a fmf file? Just call the same sequence on the writer!
This also facilitates easy use as a network streaming protocol.
  4) Extensibility.
The functions do not have to be hard coded. Even the names can be
allowed to very. Just provide the parser with a list of function
pointers. If you want to add a new item type provide a dispatch
function. A set of item types could be comfortably managed with the use
of regular expressions for example.
  5) Back and forward compatibility.
If one of the above mentioned mechanisms is exploited to provide an
extension this would not break compatibility on a lower level. At the
very least every parser would still be capable of extracting pairs of
strings, equating to level 0.

How does all of this relate to the entrance question of Data containers
and stuff? Such a parser, if available in a fmf library could still do
most of the difficult stuff like converting the various entries to
appropriate objects in the respective language. It would however have a
greatly reduced set of dependencies, thus further simplifying its use
and provide a powerful means for the construction of application
specific front ends, be it a vba macro to feed the information into an
excel sheet or an embedded c library to configure a micro controller on
the fly without ever storing the information transmitted.

So what do you think?

Cheers,
Klaus