From: Anthony S. <sc...@gm...> - 2012-04-25 22:19:53
|
Hello Alvaro, Thanks for writing this up. I think this would go nicely in our docs if you are willing to let us add it ;). My one comment is that in your NOTE you say that "* SQL tables do not play well with Numpy containers. *" I think that this would be better phrased as saying that when converting SQL Tables to/from NumPy record arrays or PyTables Tables you are adding or removing the ordering of rows and columns. Additionally going from NumPy / PyTables to SQL requires that you perform a unique() or set() operation on the rows. So it isn't that they *can't* play nicely together, but rather you have to understand how they do. Thanks again. Be Well Anthony On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero <al...@mi...>wrote: > * Hello list, > The relational model has a strong foundation and I have spent a few hours > thinking about what in PyTables is structurally different from it. Here are > my thoughts. I would be delighted if you could add/comment/correct on these > ideas. This could eventually help people with a relational/SQL background > who want to know how to best use the idioms of PyTables for their data > modeling > > --- > I make a distinction between relational and SQL (see CJ Date’s "SQL and > relational theory" for more on that). From a purely structural point of > view, the following differences are apparent: > > 1. relations vs. sequences. Relations are sets (i.e. not ordered) of > tuples (again, not ordered). > 1. rows: In PyTables, every container has an implicit row number, > meaning there is always a candidate key and order matters. Although > strictly an implementation-level concern, row numbers in PyTables are not > stored but computed, thanks to the in-disk adjacency of the records. This > is important for large datasets, where adding storage of row numbers means > roughly a doubling of diskspace. > 2. columns: In PyTables columns are ordered. That is not the case > in a purely relational system but it is the case in SQL. > 2. Flat tablespace vs. hierarchical tablespace. SQL tables live in a > global namespace. PyTables objects can be put inside Groups. Each approach > can be mapped onto the other by name mangling. Groups in PyTables are like > tables of tables -- for each node in a group there is a full table (or > another group...). This introduces a possible ambiguity in data modeling: > > Consider a table of car parts, one column is Part ID and the other is > Model ID, indicating in what car models a particular part is built in. In > PyTables you can construct the same table /or/ create a /models group and > create one table per model consisting of a single column of Part IDs e.g. > /model/sedan, /model/cabrio... etc. The same is possible in a relational > setting (dividing the tables according to one attribute, and naming them > according to the attribute value, e.g. model_sedan, model_cabrio...). The > defining difference is that the interface to manipulate that list is the > same (it is a table) whereas in PyTables one listing is a Table object and > the other is a list of Nodes, and the API for both is a bit different. > > 1. Attributes of tables and integrity. Any Node (Groups and Tables > included) can receive a limited amount of metadata in PyTables, by using > the attached attributeset. In SQL, metadata is limited to some keywords, to > be used upon table creation, that establish constraints on the columns that > have a functional significance. A prime example of this is identifying > foreign keys. SQL allows to use this information structurally at the time > of joins, whereas in PyTables one is free to implement this or any other > navigational scheme in a customized way using the attributes of the table. > > When designing such a scheme it has to be remembered that PyTables tables > have always an implicit column containing the row numbers, and this is > likely to be used as a key. > > NOTE: I intentionally excluded here implementation issues whenever they > are not related to structural ones, e.g. SQL tables do not play well with > Numpy containers and are thus ill-suited for big data with Python. Another > example would be all the features related to transactions/concurrency and > authorization, which are orthogonal to the data model. > > * > > -á. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |