From: Julio T. <jul...@gm...> - 2013-04-08 17:43:52
|
Hey Anthony Thanks a lot for this. Your method with map() works around 30000 times faster! BEFORE: (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.096931 seconds to do everything else (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.780372 seconds to ZIP AFTER: (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.073058 seconds to do everything else (database)DEBUG:BOVESPA.VISTA.PETR4: took 0.000024 seconds to ZIP On Fri, Mar 22, 2013 at 12:35 PM, Anthony Scopatz <sc...@gm...> wrote: > On Fri, Mar 22, 2013 at 7:11 AM, Julio Trevisan <jul...@gm...>wrote: > >> Hi, >> >> I just joined this list, I am using PyTables for my project and it works >> great and fast. >> >> I am just trying to optimize some parts of the program and I noticed that >> zipping the tuples to get one tuple per column takes much longer than >> reading the data itself. The thing is that readWhere() returns one tuple >> per row, whereas I I need one tuple per column, so I have to use the zip() >> function to achieve this. Is there a way to skip this zip() operation? >> Please see below: >> >> >> def quote_GetData(self, period, name, dt1, dt2): >> """Returns timedata.Quotes object. >> >> Arguments: >> period -- value from within infogetter.QuotePeriod >> name -- quote symbol >> dt1, dt2 -- datetime.datetime or timestamp values >> >> """ >> t = time.time() >> node = self.quote_GetNode(period, name) >> ts1 = misc.datetime2timestamp(dt1) >> ts2 = misc.datetime2timestamp(dt2) >> >> L = node.readWhere( \ >> "(timestamp/1000 >= %f) & (timestamp/1000 <= %f)" % \ >> (ts1/1000, ts2/1000)) >> rowNum = len(L) >> Q = timedata.Quotes() >> print "%s: took %f seconds to do everything else" % (name, >> time.time()-t) >> >> t = time.time() >> if rowNum > 0: >> (Q.timestamp, Q.open, Q.close, Q.high, Q.low, Q.volume, \ >> Q.numTrades) = zip(*L) >> print "%s: took %f seconds to ZIP" % (name, time.time()-t) >> return Q >> >> *And the printout:* >> BOVESPA.VISTA.PETR4: took 0.068788 seconds to do everything else >> BOVESPA.VISTA.PETR4: took 0.379910 seconds to ZIP >> > > Hi Julio, > > The problem here isn't zip (packing and un-packing are generally > fast operations -- they happen *all* the time in Python). Nor is the > problem specifically with PyTables. Rather this is an issue with how you > are using numpy structured arrays (look them up). Basically, this is slow > because you are creating a list of column tuples where every element is a > Python object of the corresponding type. For example upcasting every > 32-bit integer to a Python int is very expensive! > > What you *should* be doing is keeping the columns as numpy arrays, which > keeps the memory layout small, continuous, fast, and if done right does not > require a copy (which you are doing now). > > The value of L here is a structured array. So say I have some > other structured array with 4 fields, the right way to do this is to pull > out each field individually by indexing > > a, b, c, d = x['a'], x['b'], x['c'], x['d'] > > or more generally (for all fields): > > a, b, c, d = map(lambda x: i[x], i.dtype.names) > > or for some list of fields: > > a, c, b = map(lambda x: i[x], ['a', 'c', 'b']) > > Timing both your original method and the new one gives: > > In [47]: timeit a, b, c, d = zip(*i) > 1000 loops, best of 3: 1.3 ms per loop > > In [48]: timeit a, b, c, d = map(lambda x: i[x], i.dtype.names) > 100000 loops, best of 3: 2.3 µs per loop > > So the method I propose is 500x-1000x times faster. Using numpy > idiomatically is very important! > > Be Well > Anthony > > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_mar >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |