dispy vs. pp

  • Paco de Kumite
    Paco de Kumite

    First of all, HUGE thanks to Giridhar Pemmasani for this wonderful tool. I'm just getting started with parallel computation with python. I've tried both pp and dispy so far, with virtually identical logic, on identical hardware, and here's what I've found.

    pp is consistently faster (sometimes by almost a factor of 2, but dispy still smokes the sequential version)
    dispy manages memory much better.

    My test case creates a lot of jobs (thousands), and dispy memory use is very low. On the other hand, pp is faster but craps out if the job is big enough to fill up my ram.

    I'm wondering if anyone knows why I might be seeing this. I don't claim to understand how these libraries work at any deep level, but something fundamental that seems apparent in their respective syntax is that dispy creates the server with the function, and it just submits the arguments to that function with each job. Whereas pp does not create the server with anything, and you submit to it both the function and arguments with each job. I guess if pp creates a new copy of the function with each submission, then creating many jobs faster than they are completing would result in huge memory use. Does this make sense?

    I'm also wondering if there is any danger in running both dispy and pp within the same app (so they'd both be fighting for the same resources). Or is this just a stupid idea?

    • Hi

      Can you also send me output of attached program? It checks if number of processors are detected correctly by dispynode.


  • I am not sure why dispy should be slow - it schedules jobs as soon as a processor is available. I can only guess that it could be because dispy doesn't detect number of processors on the nodes correctly. Can you run 'dispynode.py -d' on each of the nodes and check that it finds processors (number of CPUs * number of cores on each CPU) correctly. If possible, please attach (skeleton of) your program.


  • Paco de Kumite
    Paco de Kumite

    Thanks so much for your reply. Do you think that dispy should be up to par with pp on speed? Here is the output of "$ python2 dispynode.py -d

    2012-11-05 22:02:00,285 - asyncoro - poller: epoll
    2012-11-05 22:02:00,312 - dispynode - auth_code for 51573e2dae7c4a6fa634cb4a35e1fed6f11c792e
    2012-11-05 22:02:00,313 - dispynode - serving 4 cpus at
    2012-11-05 22:02:00,313 - dispynode - tcp server at

    I find that strange, because I ran it on my netbook with has a dual core cpu (Atom N570). I usually run this code on my desktop, which has an i7-2600. When I run dispynode.py, the output suggests it is serving 8 processors, which makes sense since it's a quad-core with 8 threads.

    Here is the code. What I'm trying to do is to read a csv file and write the records to a mysql database using the django db api. Some of the variables are initialized by a separate pyqt app, so just assume that the variables have valid values if their initial values look strange (e.g. are None or "")

    import sys
    from django.conf import settingsItemAggregate
    from faodata.models import *
    import dispy

    fname = "" #file name of csv file to convert
    field_names = ""
    delimiter = "\t"
    sel_model = Entry #Entry is a class in django's models.py

    def writeline2MySQL(block,field_names,delimiter,sel_model):

        Writes the lines in block to mysql table defined by django model.
        - "block" contains csv records to be written
        - "field_names" is a list of mysql table fields
        - "delimiter" is a string containing the delimiter used in the csv
        - "sel_model" refers to a django model in models.py
    records = []
    for line in block:
        record = line.split(delimiter)
        primary_val = None #value of the primary key in the record
        vals = {} #values of non-primary-key fields in the record
        for field,val in zip(field_names,record):
            field = str(field).strip()
            val = val.strip()
            if field[-1]=='+': #leading "+" in field name denotes primary key
                primary_val = val
            elif field<>'-': # field name "-" indicates a field to skip.
        rec = None
        try:  #DoesNotExists is ALWAYS thrown (on purpose)
            rec = sel_model.objects.get(pk=primary_val)
        except sel_model.DoesNotExist:
            rec = sel_model(pk=primary_val) #create the mysql record.
            for k,v in vals.iteritems(): #set the values of the fields
                    v = float(v)
                except ValueError:
    sel_model.objects.bulk_create(records) #django's way to write records to db
    return len(block) #gotta return something, right?

    def convert2MySQL():

    cluster = dispy.JobCluster(writeline2MySQL)
    csvfile = open(fname, 'r')
    with csvfile:
        lines = csvfile.readlines()
        step = 500
        for n in xrange(1,len(lines),step):
            block = lines[n:n+step]
            cluster.submit(block, field_names, delimiter, sel_model)    

    Instead of breaking the file up into blocks of 500 lines (see "step") and submitting each block for processing, would it make sense to bread the file up into a number of blocks equal to the number of "workers", submitting those blocks for processing, then further breaking them into more manageable-sized blocks inside the writelines2mysql function? I'm gonna try this anyway at some point tomorrow.

    Thanks a bunch.

  • Please email me (look up my profile) so I can send you files to test.

  • From what you described, it looks to me you have a single database server that stores computation results. In that case, the bottleneck could be the I/O on that server. Depending on how the jobs are scheduled, you can have widely different performance: For example, if each time scheduled jobs finish computation and write to database simultaneously, you can have worst performance, although for purely/mostly computational tasks such scheduling gives best performance. You may want to consider/experiment with getting the computational results back to the client program and have client program store the result. You may also experiment with sending jobs with different length of data (e.g., a random number between 100 and 500 each time) instead of fixed length so that not all jobs finish at the same time and compete for I/O.

    Regarding your question on splitting file in to chunks, again it depends: If all jobs try to read from same disk, then that could give worst reading performance.

    Last edit: Giridhar Pemmasani 2012-11-06