From: sreeaurovindh v. <sre...@gm...> - 2012-03-19 18:40:08
|
Thanks for your clarification and immense help Regards Sree aurovindh V On Mon, Mar 19, 2012 at 11:58 PM, Anthony Scopatz <sc...@gm...> wrote: > On Mon, Mar 19, 2012 at 12:08 PM, sreeaurovindh viswanathan < > sre...@gm...> wrote: > > [snip] > > >> 2) Can you please point out to an example where i can do block hdf5 file >> write using pytables (sorry for this naive question) >> > > The Table.append() method ( > http://pytables.github.com/usersguide/libref.html#tables.Table.append) > allows you to write multiple rows at the same time. Arrays have similar > methods (http://pytables.github.com/usersguide/libref.html#earray-methods) > if they are extensible. Please note that these methods accept any > sequence which can be converted to a numpy record array! > > Be Well > Anthony > > >> >> Thanks >> Sree aurovindh V >> >> On Mon, Mar 19, 2012 at 10:16 PM, Anthony Scopatz <sc...@gm...>wrote: >> >>> What Francesc said ;) >>> >>> >>> On Mon, Mar 19, 2012 at 11:43 AM, Francesc Alted <fa...@gm...>wrote: >>> >>>> My advice regarding parallelization is: do not worry about this *at >>>> all* unless you already spent long time profiling your problem and you are >>>> sure that parallelizing could be of help. 99% of the time is much more >>>> productive focusing on improving serial speed. >>>> >>>> Please, try to follow Anthony's suggestion and split your queries in >>>> blocks, and pass these blocks to PyTables. That would represent a huge >>>> win. For example, use: >>>> >>>> SELECT * FROM `your_table` LIMIT 0, 10000 >>>> >>>> for the first block, and send the results to `Table.append`. Then go >>>> for the second block as: >>>> >>>> SELECT * FROM `your_table` LIMIT 10000, 20000 >>>> >>>> and pass this to `Table.append`. And so on and so forth until you >>>> exhaust all the data in your tables. >>>> >>>> Hope this helps, >>>> >>>> Francesc >>>> >>>> On Mar 19, 2012, at 11:36 AM, sreeaurovindh viswanathan wrote: >>>> >>>> > Hi, >>>> > >>>> > Thanks for your reply.In that case how will be my querying >>>> efficiency? will i be able to query parrellely?(i.e) will i be able to run >>>> multiple queries on a single file.Also if i do it in 6 chunks will i be >>>> able to parrelize it? >>>> > >>>> > >>>> > Thanks >>>> > Sree aurovindh Viswanathan >>>> > On Mon, Mar 19, 2012 at 10:01 PM, Anthony Scopatz <sc...@gm...> >>>> wrote: >>>> > Is there any way that you can query and write in much larger chunks >>>> that 6? I don't know much about postgresql in specific, but in general >>>> HDF5 does much better if you can take larger chunks. Perhaps you could at >>>> least do the postgresql in parallel. >>>> > >>>> > Be Well >>>> > Anthony >>>> > >>>> > On Mon, Mar 19, 2012 at 11:23 AM, sreeaurovindh viswanathan < >>>> sre...@gm...> wrote: >>>> > The problem is with respect to the writing speed of my computer and >>>> the postgresql query performance.I will explain the scenario in detail. >>>> > >>>> > I have data about 80 Gb (along with approprite database indexes in >>>> place). I am trying to read it from Postgresql database and writing it into >>>> HDF5 using Pytables.I have 1 table and 5 variable arrays in one hdf5 >>>> file.The implementation of Hdf5 is not multithreaded or enabled for >>>> symmetric multi processing. >>>> > >>>> > As for as the postgresql table is concerned the overall record size >>>> is 140 million and I have 5 primary- foreign key referring tables.I am not >>>> using joins as it is not scalable >>>> > >>>> > So for a single lookup i do 6 lookup without joins and write them >>>> into hdf5 format. For each lookup i do 6 inserts into each of the table and >>>> its corresponding arrays. >>>> > >>>> > The queries are really simple >>>> > >>>> > >>>> > select * from x.train where tr_id=1 (primary key & indexed) >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > select q_t from x.qt where q_id=2 (non-primary key but indexed) >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > (similarly four queries) >>>> > >>>> > Each computer writes two hdf5 files and hence the total count comes >>>> around 20 files. >>>> > >>>> > Some Calculations and statistics: >>>> > >>>> > >>>> > Total number of records : 14,37,00,000 >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Total number >>>> > of records per file : 143700000/20 =71,85,000 >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > The total number >>>> > of records in each file : 71,85,000 * 5 = 3,59,25,000 >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Current Postgresql database config : >>>> > >>>> > My current Machine : 8GB RAM with i7 2nd generation Processor. >>>> > >>>> > I made changes to the following to postgresql configuration file : >>>> shared_buffers : 2 GB effective_cache_size : 4 GB >>>> > >>>> > Note on current performance: >>>> > >>>> > I have run it for about ten hours and the performance is as follows: >>>> The total number of records written for a single file is about 25,00,000 * >>>> 5 =1,25,00,000 only. It has written 2 such files .considering the size it >>>> would take me atleast 20 hrs for 2 files.I have about 10 files and hence >>>> the total hours would be 200 hrs= 9 days. I have to start my experiments as >>>> early as possible and 10 days is too much. Can you please help me to >>>> enhance the performance. >>>> > >>>> > Questions: 1. Should i use Symmetric multi processing on my >>>> computer.In that case what is suggested or prefereable? 2. Should i use >>>> multi threading.. In that case any links or pointers would be of great help. >>>> > >>>> > >>>> > >>>> > Thanks >>>> > >>>> > Sree aurovindh V >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > This SF email is sponsosred by: >>>> > Try Windows Azure free for 90 days Click Here >>>> > http://p.sf.net/sfu/sfd2d-msazure >>>> > _______________________________________________ >>>> > Pytables-users mailing list >>>> > Pyt...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > This SF email is sponsosred by: >>>> > Try Windows Azure free for 90 days Click Here >>>> > http://p.sf.net/sfu/sfd2d-msazure >>>> > _______________________________________________ >>>> > Pytables-users mailing list >>>> > Pyt...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > This SF email is sponsosred by: >>>> > Try Windows Azure free for 90 days Click Here >>>> > >>>> http://p.sf.net/sfu/sfd2d-msazure_______________________________________________ >>>> > Pytables-users mailing list >>>> > Pyt...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> -- Francesc Alted >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF email is sponsosred by: >>>> Try Windows Azure free for 90 days Click Here >>>> http://p.sf.net/sfu/sfd2d-msazure >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> This SF email is sponsosred by: >>> Try Windows Azure free for 90 days Click Here >>> http://p.sf.net/sfu/sfd2d-msazure >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> This SF email is sponsosred by: >> Try Windows Azure free for 90 days Click Here >> http://p.sf.net/sfu/sfd2d-msazure >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |