Sébastien Boisgérault a écrit :
Eli Golovinsky a écrit :
  
Hi,

I just tried to run BeautifulSoup (3.1.0.1) with Jython (2.5.1) and I
was amazed to see how much slower it was than CPython (2.6). Parsing a
page (http://www.fixprotocol.org/specifications/fields/5000-5999) with
CPython took just under a second (0.844 second to be exact). With
Jython it took 564 seconds - almost 700 times as much.

Can anyone confirm this result? It's doesn't seem reasonable for
Jython to run 700 times slower than CPython. 
    
CPython is about x380 faster on my box.

ouch ...

SB

  
Attached below the execution profiles with CPython and Jython.

AFAICT BeautifulSoup code performs OK with Jython (a few seconds tops spent in handle_* methods), but the HTMLParser code (goahead, parse_* methods) it calls is painfully slow.



CPYTHON

Tue Nov  3 14:24:46 2009    results

         903568 function calls (903519 primitive calls) in 6.512 CPU seconds

   Ordered by: cumulative time
   List reduced from 137 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.512    6.512 profile:0(BeautifulSoup(data))
        1    0.000    0.000    6.512    6.512 <string>:1(<module>)
        1    0.000    0.000    6.512    6.512 BeautifulSoup.py:1164(__init__)
        1    0.000    0.000    6.512    6.512 BeautifulSoup.py:1236(_feed)
        1    0.000    0.000    6.512    6.512 BeautifulSoup.py:1495(__init__)
        1    0.000    0.000    6.484    6.484 HTMLParser.py:101(feed)
        1    0.600    0.600    6.484    6.484 HTMLParser.py:132(goahead)
    11083    0.556    0.000    3.784    0.000 HTMLParser.py:224(parse_starttag)
    11083    0.060    0.000    2.552    0.000 BeautifulSoup.py:1013(handle_starttag)
    11083    0.264    0.000    2.492    0.000 BeautifulSoup.py:1397(unknown_starttag)
     8351    0.240    0.000    1.404    0.000 HTMLParser.py:305(parse_endtag)
     8351    0.060    0.000    1.044    0.000 BeautifulSoup.py:1019(handle_endtag)
     8351    0.120    0.000    0.984    0.000 BeautifulSoup.py:1427(unknown_endtag)
     8349    0.484    0.000    0.912    0.000 BeautifulSoup.py:1351(_smartPop)
    11084    0.168    0.000    0.676    0.000 BeautifulSoup.py:500(__init__)
    12420    0.276    0.000    0.604    0.000 BeautifulSoup.py:1329(_popToTag)
    19441    0.216    0.000    0.528    0.000 BeautifulSoup.py:1306(endData)
    33250    0.244    0.000    0.368    0.000 BeautifulSoup.py:1269(isSelfClosingTag)
    66134    0.344    0.000    0.344    0.000 :0(match)
   103566    0.280    0.000    0.280    0.000 BeautifulSoup.py:554(__nonzero__)
JYTHON
Tue Nov  3 14:31:34 2009    results

         383982 function calls (383944 primitive calls) in 390.007 CPU seconds

   Ordered by: cumulative time
   List reduced from 97 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003  390.007  390.007 profile:0(BeautifulSoup(data))
        1    0.000    0.000  390.004  390.004 <string>:0(<module>)
        1    0.000    0.000  390.004  390.004 BeautifulSoup.py:1495(__init__)
        1    0.000    0.000  390.004  390.004 BeautifulSoup.py:1164(__init__)
        1    0.372    0.372  390.003  390.003 BeautifulSoup.py:1236(_feed)
        1    0.000    0.000  389.553  389.553 HTMLParser.py:101(feed)
        1  159.714  159.714  389.552  389.552 HTMLParser.py:132(goahead)
    11083  112.086    0.010  159.921    0.014 HTMLParser.py:224(parse_starttag)
     8351   68.361    0.008   69.394    0.008 HTMLParser.py:305(parse_endtag)
    11083   45.443    0.004   45.443    0.004 HTMLParser.py:275(check_for_whole_start_tag)
    11083    0.084    0.000    2.363    0.000 BeautifulSoup.py:1013(handle_starttag)
    11083    0.536    0.000    2.278    0.000 BeautifulSoup.py:1397(unknown_starttag)
     8351    0.051    0.000    1.009    0.000 BeautifulSoup.py:1019(handle_endtag)
     8351    0.077    0.000    0.958    0.000 BeautifulSoup.py:1427(unknown_endtag)
    19441    0.438    0.000    0.761    0.000 BeautifulSoup.py:1306(endData)
    11084    0.374    0.000    0.726    0.000 BeautifulSoup.py:500(__init__)
     8349    0.498    0.000    0.630    0.000 BeautifulSoup.py:1351(_smartPop)
    38924    0.435    0.000    0.435    0.000 markupbase.py:49(updatepos)
    12420    0.251    0.000    0.334    0.000 BeautifulSoup.py:1329(_popToTag)
    17252    0.220    0.000    0.245    0.000 BeautifulSoup.py:118(setup)





  
Perhaps something is
wrong with my setup.

Here's the code I used:

import time
from BeautifulSoup import BeautifulSoup
data = open("fix-5000-5999.html").read()
start = time.time()
soup = BeautifulSoup(data)
print time.time() - start

---
gooli

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users

  
    


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users