Re: [Pytables-users] Main differences between PyTables and Relational

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello Alvaro,

Thanks for writing this up.  I think this would go nicely in our docs if
you are willing to let us add it ;).

My one comment is that in your NOTE you say that "*
SQL tables do not play well with Numpy containers.
*"  I think that this would be better phrased as saying that when
converting SQL Tables to/from NumPy record arrays or PyTables Tables you
are adding or removing the ordering of rows and columns.  Additionally
going from NumPy / PyTables to SQL requires that you perform a unique() or
set() operation on the rows.

So it isn't that they *can't* play nicely together, but rather you have to
understand how they do.   Thanks again.

Be Well
Anthony

On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero <al...@mi...>wrote:

> * Hello list,
> The relational model has a strong foundation and I have spent a few hours
> thinking about what in PyTables is structurally different from it. Here are
> my thoughts. I would be delighted if you could add/comment/correct on these
> ideas. This could eventually help people with a relational/SQL background
> who want to know how to best use the idioms of PyTables for their data
> modeling
>
> ---
> I make a distinction between relational and SQL (see CJ Date’s "SQL and
> relational theory" for more on that). From a purely structural point of
> view, the following differences are apparent:
>
>    1. relations vs. sequences. Relations are sets (i.e. not ordered) of
>    tuples (again, not ordered).
>       1. rows: In PyTables, every container has an implicit row number,
>       meaning there is always a candidate key and order matters. Although
>       strictly an implementation-level concern, row numbers in PyTables are not
>       stored but computed, thanks to the in-disk adjacency of the records. This
>       is important for large datasets, where adding storage of row numbers means
>       roughly a doubling of diskspace.
>       2. columns: In PyTables columns are ordered. That is not the case
>       in a purely relational system but it is the case in SQL.
>    2. Flat tablespace vs. hierarchical tablespace. SQL tables live in a
>    global namespace. PyTables objects can be put inside Groups. Each approach
>    can be mapped onto the other by name mangling. Groups in PyTables are like
>    tables of tables -- for each node in a group there is a full table (or
>    another group...). This introduces a possible ambiguity in data modeling:
>
> Consider a table of car parts, one column is Part ID and the other is
> Model ID, indicating in what car models a particular part is built in. In
> PyTables you can construct the same table /or/ create a /models group and
> create one table per model consisting of a single column of Part IDs e.g.
> /model/sedan, /model/cabrio... etc. The same is possible in a relational
> setting (dividing the tables according to one attribute, and naming them
> according to the attribute value, e.g. model_sedan, model_cabrio...). The
> defining difference is that the interface to manipulate that list is the
> same (it is a table) whereas in PyTables one listing is a Table object and
> the other is a list of Nodes, and the API for both is a bit different.
>
>    1. Attributes of tables and integrity. Any Node (Groups and Tables
>    included) can receive a limited amount of metadata in PyTables, by using
>    the attached attributeset. In SQL, metadata is limited to some keywords, to
>    be used upon table creation, that establish constraints on the columns that
>    have a functional significance. A prime example of this is identifying
>    foreign keys. SQL allows to use this information structurally at the time
>    of joins, whereas in PyTables one is free to implement this or any other
>    navigational scheme in a customized way using the attributes of the table.
>
> When designing such a scheme it has to be remembered that PyTables tables
> have always an implicit column containing the row numbers, and this is
> likely to be used as a key.
>
> NOTE: I intentionally excluded here implementation issues whenever they
> are not related to structural ones, e.g. SQL tables do not play well with
> Numpy containers and are thus ill-suited for big data with Python. Another
> example would be all the features related to transactions/concurrency and
> authorization, which are orthogonal to the data model.
>
> *
>
> -á.
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>