From: George H. <ghe...@cf...> - 2000-12-10 17:27:52
|
Anton Gluck wrote: > > I'd like to solicit your response on how useful JPython/Jython is when > analyzing large data sets. I don't feel particularly qualified to comment on the usefulness of Jython for your particular problem; however, there are the usual concerns when using a scripting language to be taken into account. Generally, both execution time and memory requirements are somewhat higher for scripting languages. > I am working on a project for which a data analyzing algorithm was written > in Java, and wrote a JPython 1.5.2 script to test the algorithm. When > analyzing large data sets (flat text files) of more than 20, 30 MB we > originally faced two problems: This was slow, and we ran out of memory > even if the JVM is allocated the full 256 MB RAM. It is possible to allocate more than the available RAM to the JVM. Thiswill probably result in higher paging rates for your program in execution, but the degree of impact would depend on the locality of reference within your program. Some of the slowness you are experiencing currently is probably due to paging. > In short, this is what the script does: It reads the flat file and builds > Java objects holding data records (really just arrays of doubles). Reducing the number of objects your program works with is always a god idea, and using native types (e.g. doubles) is useful from a performance perspective. It sounds as if you are already taking this advice, but it never hurts to re-examine your design. Also, there may be an opportunity to reduce the amount of data worked on at one time. That is it may be possible to work on the data in chunks instead of all at once. > We also have a Python script that tests the C++ version of the algorithm, > and it runs considerably faster on the same machine (by about half). Would > you say that this is solely because of the difference between Java and > C++, or are there additional factors influencing the speed of execution? My comment above about paging may be applicable. I won't weigh in on the Java/C++ debate. Good luck, George |