From: Vineet J. <vin...@ya...> - 2003-06-30 23:22:30
|
Couple of questions about pytables: I built two samples. One with pysqlite and one with pytables and I found pytables to be about 20 times faster than the pysqlite version and used a lot less space. Let me commend you on a great application. I have the following requests/questions: 1. Update certain rows in a table and append to a table. The latter you handle but am not sure how to do the former. Will updating rows ever be supported? 2. For arrays or rows returned from a table. How can you do the following: Row1 = table1.read() Row2 = table2.read() FinalRow = row1+row2 Without having to loop through them. 3 Something useful found in pysqlite, and the postgress db driver is the ability to access field names directly: row = table.read() high = row[10000].high (where high is a field of the table) 4 Is there any way the rows returned from table can be treated as numarray objects? Thanks for your replies, vinj |
From: Francesc A. <fa...@op...> - 2003-07-01 21:35:33
|
Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-07-01 22:16:09
|
Thanks for your replies. I'm not sure what I did wrong with pysqlite because my example was very simple. But assigning values to row by fetchall took significantly more time than pytables. I have decided to go forward with pytables for the time being. Great to hear that hd1.6 is planning to implement the updating feature. I'm looking to store stock minute bars for around 5000 stocks for several years. This will be a lot of data and pytables is very fast so I don't hav eto worry about the IO part of things. Yes you assumed correctly that I wanted to create a new recarray with the total number of rows which would be numrows1+numrows2. Once I read the objects from memory are they mutable or immutable. Can I change some of the values in place? So if read gets all the rows of a table in memory does iterrow only load the rows that you requested? Is there any way to get a recarray back and not load all the data into memory with out having to go through the iterator? -----Original Message----- From: Francesc Alted [mailto:fa...@op...] Sent: Tuesday, July 01, 2003 1:26 PM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. What is the main difference between a recarray and array object especially since both of them can be passed to numarray? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-07-01 22:20:30
|
Is there any disadvantage to using the recarray object over the array object? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() -----Original Message----- From: pyt...@li... [mailto:pyt...@li...] On Behalf Of Vineet Jain Sent: Tuesday, July 01, 2003 3:16 PM To: 'Francesc Alted'; pyt...@li... Subject: RE: [Pytables-users] Question about pytables Thanks for your replies. I'm not sure what I did wrong with pysqlite because my example was very simple. But assigning values to row by fetchall took significantly more time than pytables. I have decided to go forward with pytables for the time being. Great to hear that hd1.6 is planning to implement the updating feature. I'm looking to store stock minute bars for around 5000 stocks for several years. This will be a lot of data and pytables is very fast so I don't hav eto worry about the IO part of things. Yes you assumed correctly that I wanted to create a new recarray with the total number of rows which would be numrows1+numrows2. Once I read the objects from memory are they mutable or immutable. Can I change some of the values in place? So if read gets all the rows of a table in memory does iterrow only load the rows that you requested? Is there any way to get a recarray back and not load all the data into memory with out having to go through the iterator? -----Original Message----- From: Francesc Alted [mailto:fa...@op...] Sent: Tuesday, July 01, 2003 1:26 PM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. What is the main difference between a recarray and array object especially since both of them can be passed to numarray? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@op...> - 2003-07-02 18:08:11
|
Vineet, I've looked at your examples, and I think you are getting such a good results with pytables because your data is completely repetitive and you are using compression. In the real life, however, data is not so compressible and you can be sure that this speed-up will decrease substantially. Having said that, if your data is compressible (even a small amount) pytables will have a clear advantage when reading over SQLite and large amounts of data (tipically larger than your available system memory). Besides, for creating tables, you can expect always much better performance using pytables than a relational database. As always, your best bet is to run the benchmarks with *real* data. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-02 18:08:25
|
A Dimecres 02 Juliol 2003 00:15, Vineet Jain va escriure: > I have decided to go forward with pytables for the time being. Great to > hear that hd1.6 is planning to implement the updating feature. Well, it's not exactly a feature on HDF5 1.6, but rather a combination of HDF5_HL library (which ships with pytables) and HDF5 1.6. But, it seems like if HDF5 1.6 is the missing factor to achieve that. Hope you will be happy to use pytables. If in the future you need some specific need that is not implemented in pytables or just want professional support, remember that I'm offering commercial support for that kind of things ;-). > I'm > looking to store stock minute bars for around 5000 stocks for several > years. This will be a lot of data and pytables is very fast so I don't > hav eto worry about the IO part of things. That sounds nice. In addition, if your data is compressible, you will find that you need a fraction of the space of a relational database. > > Yes you assumed correctly that I wanted to create a new recarray with > the total number of rows which would be numrows1+numrows2. Once I read > the objects from memory are they mutable or immutable. Can I change some > of the values in place? The RecArray object is mutable, so you can change this values in-memory. Besides, if you save this object on the file later-on and delete the original table you have a rather primitive, yet effective way of upgrading rows, until a more efficient way would be implemented. > > So if read gets all the rows of a table in memory does iterrow only load > the rows that you requested? To be exact, read() only reads the rows specified on its start, stop, step and field arguments, in the same way that iterrows(). The difference between them is that read() returns a monolithic object (i.e. a RecArray) with all the info you have requested, while iterrows() is a row iterator, so you get only a row each time it is invoked. > Is there any way to get a recarray back and > not load all the data into memory with out having to go through the > iterator? That's possible, as I said before, by using the start, stop and step parameters of read(). But if you want to read over all the table without loading all the data in memory, you will need iterrows(), of course. Cheers, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-07-10 21:51:15
|
I'll keep the offer of commercial support in mind. It's great to know that you offer this for people who need it. So far I'm really impressed with pytables. I'm using a combination of sqlite and pytables right now. Sqlite when I need to do some complex queries and pytable stuff for almost everything else. Will keep you posted on how it works out. -----Original Message----- From: pyt...@li... [mailto:pyt...@li...] On Behalf Of Francesc Alted Sent: Wednesday, July 02, 2003 8:05 AM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables A Dimecres 02 Juliol 2003 00:15, Vineet Jain va escriure: > I have decided to go forward with pytables for the time being. Great to > hear that hd1.6 is planning to implement the updating feature. Well, it's not exactly a feature on HDF5 1.6, but rather a combination of HDF5_HL library (which ships with pytables) and HDF5 1.6. But, it seems like if HDF5 1.6 is the missing factor to achieve that. Hope you will be happy to use pytables. If in the future you need some specific need that is not implemented in pytables or just want professional support, remember that I'm offering commercial support for that kind of things ;-). > I'm > looking to store stock minute bars for around 5000 stocks for several > years. This will be a lot of data and pytables is very fast so I don't > hav eto worry about the IO part of things. That sounds nice. In addition, if your data is compressible, you will find that you need a fraction of the space of a relational database. > > Yes you assumed correctly that I wanted to create a new recarray with > the total number of rows which would be numrows1+numrows2. Once I read > the objects from memory are they mutable or immutable. Can I change some > of the values in place? The RecArray object is mutable, so you can change this values in-memory. Besides, if you save this object on the file later-on and delete the original table you have a rather primitive, yet effective way of upgrading rows, until a more efficient way would be implemented. > > So if read gets all the rows of a table in memory does iterrow only load > the rows that you requested? To be exact, read() only reads the rows specified on its start, stop, step and field arguments, in the same way that iterrows(). The difference between them is that read() returns a monolithic object (i.e. a RecArray) with all the info you have requested, while iterrows() is a row iterator, so you get only a row each time it is invoked. > Is there any way to get a recarray back and > not load all the data into memory with out having to go through the > iterator? That's possible, as I said before, by using the start, stop and step parameters of read(). But if you want to read over all the table without loading all the data in memory, you will need iterrows(), of course. Cheers, -- Francesc Alted ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@op...> - 2003-07-11 07:49:09
|
A Dijous 10 Juliol 2003 20:25, Vineet Jain va escriure: > I'll keep the offer of commercial support in mind. It's great to know > that you offer this for people who need it. So far I'm really impressed > with pytables. I'm using a combination of sqlite and pytables right now. > Sqlite when I need to do some complex queries and pytable stuff for > almost everything else. Mixing SQLite and pytables is a very good approach for many situations. I think I should further investigate the different possibilities of collaboration of both packages and give examples to easy the people to see how powerful this combination can be. Cheers, -- Francesc Alted |