From: Anton G. <gl...@mi...> - 2000-12-10 04:55:07
|
Hi, I'd like to solicit your response on how useful JPython/Jython is when analyzing large data sets. I am working on a project for which a data analyzing algorithm was written in Java, and wrote a JPython 1.5.2 script to test the algorithm. When analyzing large data sets (flat text files) of more than 20, 30 MB we originally faced two problems: This was slow, and we ran out of memory even if the JVM is allocated the full 256 MB RAM. After rewriting the JPython code the out-of-memory problem went away, but presumably the limit was just pushed a little further. In short, this is what the script does: It reads the flat file and builds Java objects holding data records (really just arrays of doubles). The records are built and handed to the analyzing algorithm one at a time. Java classes analyze the data, and create objects describing the results. The script then examines the results and writes out some diagnostics. In order to do this, the data set is read twice more (reading it once and holding the data in memory had let to the out-of-memory problem). So my question is: Is JPython a useful tool for this kind of testing? Do we need to be concerned about memory leaks when handling large data sets? We also have a Python script that tests the C++ version of the algorithm, and it runs considerably faster on the same machine (by about half). Would you say that this is solely because of the difference between Java and C++, or are there additional factors influencing the speed of execution? Considering the out-of-memory problem: Is this a general Python problem, or is it more pronounced in JPython? I realize that these are difficult questions to answer because of their generality. But we would like to make an educated decision for future projects - whether to stay with JPython/Jython (and Python) or to look for alternatives. Thank you for any responses. Anton |