Re: [Pytables-users] Pytables-users Digest, Vol 78, Issue 6

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

2012. 11. 17. 오후 12:46에 <pyt...@li...>님이 작성:
>
> Send Pytables-users mailing list submissions to
>         pyt...@li...
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>         pyt...@li...
>
> You can reach the person managing the list at
>         pyt...@li...
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>    1. Re: pyTable index from c++ (Jim Knoll)
>    2. Store a reference to a dataset (Juan Manuel V?zquez Tovar)
>    3. Histogramming 1000x too slow (Jon Wilson)
>    4. Re: Histogramming 1000x too slow (Anthony Scopatz)
>    5. Re: Histogramming 1000x too slow (Jon Wilson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 9 Nov 2012 15:26:38 -0600
> From: Jim Knoll <jim...@sp...>
> Subject: Re: [Pytables-users] pyTable index from c++
> To: 'Discussion list for PyTables'
>         <pyt...@li...>
> Message-ID:
>         <
142...@SP...>
> Content-Type: text/plain; charset="us-ascii"
>
> Thanks for taking the time.
>
> Most of our tables are very wide  lots of col....  and simple conditions
are common.... so that is why in-kernel makes almost no impact for me.
>
> -----Original Message-----
> From: Francesc Alted [mailto:fa...@gm...]
> Sent: Friday, November 09, 2012 11:27 AM
> To: pyt...@li...
> Subject: Re: [Pytables-users] pyTable index from c++
>
> Well, expected performance of in-kernel (numexpr powered) queries wrt
> regular (python) queries largely depends on where the bottleneck is. If
> your table has a lot of columns, then the bottleneck is going to be more
> on the I/O side, so you cannot expect a large difference in performance.
> However, if your table has a small number of columns, then there is more
> likelihood that bottleneck is CPU, and your chances to experiment a
> difference are higher.
>
> Of course, having complex queries (i.e. queries that take conditions
> over several columns, or just combinations of conditions in the same
> column) makes the query more CPU intensive, and in-kernel normally wins
> by a comfortable margin.
>
> Finally, what indexing is doing is to reduce the number of rows where
> the conditions have to be evaluated, so depending on the cardinality of
> the query and the associated index, you can get more or less speedup.
>
> Francesc
>
> On 11/9/12 5:12 PM, Jim Knoll wrote:
> >
> > Thanks for the reply. I will put some investigation of C++ access on
> > my list for items to look at over the slow holiday season.
> >
> > For the short term we will store a C++ ready index as a different
> > table object in the same h5 file. It will work... just a bit of a waste
> > on disk space.
> >
> > One follow up question
> >
> > Why would my performance of
> >
> > for row in node.where('stringField == "SomeString"'):
> >
> > *not*be noticeably faster than
> >
> > for row in node:
> >
> > if row.stringField == "SomeString" :
> >
> > Specifically when there is no index. I understand and see the speed
> > improvement only when I have a index. I expected to see some benefit
> > from numexpr even with no index. I expected node.where() to be much
> > faster. What I see is identical performance. Is numexpr benefit only
> > seen for complex math like (floatField ** intField > otherFloatField)
> > I did not see that to be the case on my first attempt.... Seems that I
> > only benefit from a index.
> >
> > *From:*Anthony Scopatz [mailto:sc...@gm...]
> > *Sent:* Friday, November 09, 2012 12:24 AM
> > *To:* Discussion list for PyTables
> > *Subject:* Re: [Pytables-users] pyTable index from c++
> >
> > On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll
> > <jim...@sp... <mailto:jim...@sp...>>
> > wrote:
> >
> > I love the index function and promote the internal use of PyTables at
> > my company. The availability of a indexed method to speed the search
> > is the main reason why.
> >
> > We are a mixed shop using c++ to create H5 (just for the raw speed ...
> > need to keep up with streaming data) End users start with python
> > pyTables to consume the data. (Often after we have created indexes
> > from python pytables.col.col1.createIndex())
> >
> > Sometimes the users come up with something we want to do thousands of
> > times and performance is critical. But then we are falling back to c++
> > We can use our own index method but would like to make dbl use of the
> > PyTables index.
> >
> > I know the python table.where( is implemented in C.
> >
> > Hi Jim,
> >
> > This is only kind of true. Querying (ie all of the where*() methods)
> > are actually mostly written in Python in the tables.py and
> > expressions.py files. However, they make use of numexpr [1].
> >
> >     Is there a way to access that from c or c++? Don't mind if I need
> >     to do work to get the result I think in my case the work may be
> >     worth it.
> >
> > *PLAN 1:* One possibility is that the parts of PyTables are written in
> > Cython. We could maybe try (without making any edits to these files)
> > to convert them to Cython. This has the advantage that for Cython
> > files, if you write the appropriate C++ header file and link against
> > the shared library correctly, it is possible to access certain
> > functions from C/C++. BUT, I am not sure how much of speed boost you
> > would get out of this since you would still be calling out to the
> > Python interpreter to get these result. You are just calling Python's
> > virtual machine from C++ rather than calling it from Python (like
> > normal). This has the advantage that you would basically get access to
> > these functions acting on tables from C++.
> >
> > *PLAN 2:* Alternatively, numexpr itself is mostly written in C++
> > already. You should be able to call core numexpr functions directly.
> > However, you would have to feed it data that you read from the tables
> > yourself. These could even be table indexes. On a personal note, if
> > you get code working that does this, I would be interested in seeing
> > your implementation. (I have another project where I have tables that
> > I want to query from C++)
> >
> > Let us know what route you ultimately end up taking or if you have any
> > further questions!
> >
> > Be Well
> >
> > Anthony
> >
> > 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr
> >
> >
------------------------------------------------------------------------
> >
> >     *Jim Knoll**
> >     *Data Developer**
> >
> >     Spot Trading L.L.C
> >     440 South LaSalle St., Suite 2800
> >     Chicago, IL 60605
> >     Office: 312.362.4550 <tel:312.362.4550>
> >     Direct: 312-362-4798 <tel:312-362-4798>
> >     Fax: 312.362.4551 <tel:312.362.4551>
> >     jim...@sp... <mailto:jim...@sp...>
> >     www.spottradingllc.com <http://www.spottradingllc.com/>
> >
> >
------------------------------------------------------------------------
> >
> >     The information contained in this message may be privileged and
> >     confidential and protected from disclosure. If the reader of this
> >     message is not the intended recipient, or an employee or agent
> >     responsible for delivering this message to the intended recipient,
> >     you are hereby notified that any dissemination, distribution or
> >     copying of this communication is strictly prohibited. If you have
> >     received this communication in error, please notify us immediately
> >     by replying to the message and deleting it from your computer.
> >     Thank you. Spot Trading, LLC
> >
> >
> >
------------------------------------------------------------------------------
> >     Everyone hates slow websites. So do we.
> >     Make your web apps faster with AppDynamics
> >     Download AppDynamics Lite for free today:
> >     http://p.sf.net/sfu/appdyn_d2d_nov
> >     _______________________________________________
> >     Pytables-users mailing list
> >     Pyt...@li...
> >     <mailto:Pyt...@li...>
> >     https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
------------------------------------------------------------------------------
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > http://p.sf.net/sfu/appdyn_d2d_nov
> >
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> --
> Francesc Alted
>
>
>
------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_nov
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 11 Nov 2012 01:39:33 +0100
> From: Juan Manuel V?zquez Tovar <jmv...@gm...>
> Subject: [Pytables-users] Store a reference to a dataset
> To: Discussion list for PyTables
>         <pyt...@li...>
> Message-ID:
>         <
CAD...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
> I have to deal in pytables with a very large dataset. The file already
> compressed with blosc5 is a