From: John V. S. <js...@gm...> - 2009-05-13 00:06:33
|
Nicolae Mihalache wrote: > Here I am again... > > I did a little profiling on luciddb, and it seems when scanning a table > a lot of time is spend in fennel:TupleAccessor marshal and unmarshal > (about 20% from the scan time for the first query below (if the data is > all in memory)). > > When I do a "select max(c) from xyz", unmarshal is called twice for each > row and marshal once. When I do "select c,count(*) from xyz group by c", > unmarshal is called 5 times and marshal 2 times for each row. So it > seems some unnecessary marhsalling/unmarshalling is taking place which, > if eliminated, could improve the time with about 13% for the first query > and 30%-40% for the second query (again, assuming that all data is in Fennel uses a dataflow architecture in which tuples are batched in between "ExecStream" objects. This is in contrast to a standard iterator-based architecture in which a single tuple is processed at a time. There are details here: http://fennel.sourceforge.net/doxygen/html/structExecStreamDesign.html The Fennel model has the advantage of better locality of reference (for both data and instruction caches), scheduler versatility (e.g. push vs pull), easier parallelization, and easier distribution in a message-passing environment. However, it does introduce the overhead you noticed (marshal/unmarshal across the boundaries of ExecStreams as tuple batches move through). It's possible to do better (using a vectorized approach like MonetDB). JVS |