From: Antonio V. <ant...@ti...> - 2013-07-17 19:13:02
|
Hi Pushkar, Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: > Hi all, > > I am trying to figure out the best way to bulk load data into pytables. > This question may have been already answered but I couldn't find what I was > looking for. > > The source data is in form of csv which may require parsing, type checking > and setting default values if it doesn't conform to the type of the column. > There are over 100 columns in a record. Doing this in a loop in python for > each row of the record is very slow compared to just fetching the rows from > one pytable file and writing it to another. Difference is almost a factor > of ~50. > > I believe if I load the data using a C procedure that does the parsing and > builds the records to write in pytables I can get close to the speed of > just copying and writing the rows from 1 pytable to another. But may be > there is something simple and better that already exists. Can someone > please advise? But if it is a C procedure that I should write can someone > point me to some examples or snippets that I can refer to put this together. > > Thanks, > Pushkar > numpy has some tools for loading data from csv files like loadtxt [1], genfromtxt [2] and other variants. Non of them is OK for you? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt cheers -- Antonio Valentino |