From: Francesc A. <fa...@py...> - 2004-03-08 11:56:33
|
Hi Bernard, A Monday 08 March 2004 10:01, Bernard Kaplan va escriure: > Dear community, > > I have to develop a program that performs numerical analysis on data > that come from a fab production line. Every month I can count on > approximately 100 000 new entries. Each entry is composed on the one > hand of general information (such as date, machine, ...) and on the > other hand of raw data that we measure (a matrix of size 2000x1000 or > more). So far I gather the general information in a relational database > (firebird - kinterbasdb) and the data are just kept in individual files. > I appreciate the database because I can sort my data on the different > columns of my table and I can perform fast search to organize my huge > number of entries. But I also realize that the numerical treatment that > will follow will become quite cumbersome. This is why I am interested in > PyTables (to be honest I am also interested in PyTables because I trully > hate SQL and love Python) > > Here are my questions: > - can I replace my database with PyTables ? Well, it depends. Normally PyTables is not designed to work as RDB replacement, but rather as a helper of it (or alone if you don't need relational or indexing capabilities). Read behind for a better explanation. > - is it possible to sort efficiently (meaning fast) a table in PyTables > along a specific column ? How ? It is possible, but you need to do some hacking. You can read the column, then sort it with the numarray.argsort function (http://stsdas.stsci.edu/numarray/numarray-0.8.html/node33.html) to get the sorted indices, then rewrite the table following this new order. However, this will only work for columns that fits in-memory. An out-of-core algorithm for doing the same could be done if there is enough interest. > - does the concept of primary key in a database exist in PyTables ? I > use primary key to avoid inserting two times the same row in my table. No, the only primary key is the row number > Is there an equivalent way to do it in PyTables? What about caching the primary key in a list and checking if the element already exists on it before adding to the table? > - how does PyTables compare with relational databases such as Firebird, > SQLite,... in terms of performance ? See http://pytables.sourceforge.net/html/HowFast.html > - Are my questions relevant or do you instead advise me to keep to > relational database ? Completely relevant. I would advise you to combine an RDB an PyTables and get the best of the two worlds. Regards, -- Francesc Alted |