I tried to create a spreadsheet from a file containing 108 fields and 23,000 rows. After 30 minutes run time, the tool had only processed 1200 rows, so I killed the job. Is this what one would expect from this tool based on the number of fields or could something else be wrong?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Performance is something that will vary from machine to machine. have you updated the poi.jar prior to running this? if so then that would definitely impact the first run since the OS "unpacks" the jar file. Can you let it run to completion & let me know the stats for run-time, CPU, etc are? maybe I can find some tweaks to boost performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I re-executed the command, this time only processing 1000 records. Job started at 14:24 and ended at 14:39, or 15 minutes (571 seconds of processor time). Resulting file looks great, but if I need to process 30,000 records I am going to have a problem. I downloaded the jar file when I first installed your code. It is poi-2.5.1-final-20040804.jar, and I placed it in /QIBM/UserData/Java400/ext/ as recommended.
To my knowledge this is the only process on our system that uses java. We do use SQL and never had a performance issue.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
fdeutsch,
If your data is not critital confidential, send me the file in a SAVF.
I'll then test your problem on my model 400 (a little old - but working). Cant do it on my customers box.
If you agree, send the file to:
Guldbrand(remove this)@Think400.dk
If you send me the file, dont worry, it will be deleted after test.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Bad news. I have tested the performance issue on my box (model 400/V5R1) with F's file (1000 rec's with 108 fields).
I had a program (RPG without SQL) that writes/updates to XLS and changed it to run with F's file.
1) XLCRT with the file. Took 45 min.
2) MYPGM with the file. Took 24 min. to create the XLS.
3) Created an empty XLS and called MYPGM with the file. Took 22 min. to update the XLS.
4) Took the XLS from issue 2 and ran MYPGM with the file. Took 16 min. to update.
It seems that POI HSSF (and java/JVM) is too slow on an iSeries - and should only be used for small files. Bummer :(
Leif
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
New performance test - after compiling the poi-2.5.1-final20040804.jar.
Still using F's file (1000 rec's with 108 fields)and still using XLCRT vs. my own RPG program.
1) XLCRT with the file. Took 41 min.
2) MYPGM with the file. Took 21 min. to create the XLS.
3) Created an empty XLS and called MYPGM with the file. Took 21 min. to update the XLS.
4) Took the XLS from issue 2 and ran MYPGM with the file. Took 21 (???) min. to update.
Both with the first and this test, I was the only person on the box - and no other jobs was running except what's normal.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am not doing anything fancy. Just executing the XLCRT command trying to copy from an AS/400 Physical file to the IFS. I just recently downloaded the poi.jar's (poi-2.5.1-final-20040804.jar). I have been playing around with this tool, but mostly with small files. This is the first "large" file. We are at V5R2 of the OS, don't know if that makes a difference. We do other processes using SQL and never had a performance problem. This is the first time, to my knowledge, that we are runing software that uses Java.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
unfortunately if the issue is with POI running on the iSeries there's not alot I can do performance wise. I can (& will) however see if I can optimize the code further to help out with the performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have everything running now and have had a chance to benchmark. I have run the optimization I described previously and am running on a lightly loaded model 810 with 750cpw and 2GB memory.
Using the XLCRT command, a 4550 record table with 10 columns takes about 90 seconds. A 10,161 record table with 10 columns takes just over 3 minutes. 10,161 records with 80 columns takes 54 minutes. From this totally unscientific test, the performance of adding rows seems to be fairly linear. But there is an inordinately large penalty for adding columns.
I'm not sure if this helps ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Couldn't get your
SBMJOB CMD(QSH CMD('for jar in $(find /jakarta-poi-2.5.1/ -name ''*.jar''); do system "CRTJVAPGM CLSF(''"$jar"'') OPTIMIZE(40)";done'))
to work.
Error: find: 001-0023 Error found opening file S. S)
Will try tomorrow with CRTJVAPGM and OPTIMIZE(40). Guess it's the $ that confuse my box here in Europe (as usual :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I forgot to mention you will need to modify the "/jakarta-poi-2.5.1/" portion to point to your POI directory. And it may run for a long time depending on your box.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, got the .jar compiled (CRTJVAPGM). Only mistake... had 7 other .jar's in the folder. Took almost 12 hours, but tomorrow I'll start testing the same routines as mentioned previous in the thread.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried to create a spreadsheet from a file containing 108 fields and 23,000 rows. After 30 minutes run time, the tool had only processed 1200 rows, so I killed the job. Is this what one would expect from this tool based on the number of fields or could something else be wrong?
Performance is something that will vary from machine to machine. have you updated the poi.jar prior to running this? if so then that would definitely impact the first run since the OS "unpacks" the jar file. Can you let it run to completion & let me know the stats for run-time, CPU, etc are? maybe I can find some tweaks to boost performance.
I re-executed the command, this time only processing 1000 records. Job started at 14:24 and ended at 14:39, or 15 minutes (571 seconds of processor time). Resulting file looks great, but if I need to process 30,000 records I am going to have a problem. I downloaded the jar file when I first installed your code. It is poi-2.5.1-final-20040804.jar, and I placed it in /QIBM/UserData/Java400/ext/ as recommended.
To my knowledge this is the only process on our system that uses java. We do use SQL and never had a performance issue.
fdeutsch,
If your data is not critital confidential, send me the file in a SAVF.
I'll then test your problem on my model 400 (a little old - but working). Cant do it on my customers box.
If you agree, send the file to:
Guldbrand(remove this)@Think400.dk
If you send me the file, dont worry, it will be deleted after test.
Data file sent.
Bad news. I have tested the performance issue on my box (model 400/V5R1) with F's file (1000 rec's with 108 fields).
I had a program (RPG without SQL) that writes/updates to XLS and changed it to run with F's file.
1) XLCRT with the file. Took 45 min.
2) MYPGM with the file. Took 24 min. to create the XLS.
3) Created an empty XLS and called MYPGM with the file. Took 22 min. to update the XLS.
4) Took the XLS from issue 2 and ran MYPGM with the file. Took 16 min. to update.
It seems that POI HSSF (and java/JVM) is too slow on an iSeries - and should only be used for small files. Bummer :(
Leif
New performance test - after compiling the poi-2.5.1-final20040804.jar.
Still using F's file (1000 rec's with 108 fields)and still using XLCRT vs. my own RPG program.
1) XLCRT with the file. Took 41 min.
2) MYPGM with the file. Took 21 min. to create the XLS.
3) Created an empty XLS and called MYPGM with the file. Took 21 min. to update the XLS.
4) Took the XLS from issue 2 and ran MYPGM with the file. Took 21 (???) min. to update.
Both with the first and this test, I was the only person on the box - and no other jobs was running except what's normal.
I know this is comparing apples (iSeries) and oranges (programs), but I have an application with 4 RPG programs.
XLS has about 110 rows and 224 cells in each row.
Prog-A uploads data from XLS to a workfile on QSYS.
Prog-B updates this workfile with data from QSYS, and add new customers.
Prog-C creates an XML file from this workfile and read it down to IFS, ready to be sent outsite.
Prog-D now update/add the spreadsheet from the workfile.
This takes max. 4 minutes. The box is heavily loaded.
This is all in RPG.... could it be an SQL problem - if 'fdeutsch' is up with his poi.jar's ??
I am not doing anything fancy. Just executing the XLCRT command trying to copy from an AS/400 Physical file to the IFS. I just recently downloaded the poi.jar's (poi-2.5.1-final-20040804.jar). I have been playing around with this tool, but mostly with small files. This is the first "large" file. We are at V5R2 of the OS, don't know if that makes a difference. We do other processes using SQL and never had a performance problem. This is the first time, to my knowledge, that we are runing software that uses Java.
Hmmm... funny editor we have here at SF :-)
Sorry.
I have the same peformance problems with V5R3 om a model 520. Processing 11.000 recs with 20 fields took about 45 min.
unfortunately if the issue is with POI running on the iSeries there's not alot I can do performance wise. I can (& will) however see if I can optimize the code further to help out with the performance.
Have you tried optimizing the *.jar files? If not try this command:
SBMJOB CMD(QSH CMD('for jar in $(find /jakarta-poi-2.5.1/ -name ''*.jar''); do system "CRTJVAPGM CLSF(''"$jar"'') OPTIMIZE(40)";done'))
(Thanks to David Morris who provided this command as part of his series on installing Tomcat. http://www.itjungle.com/mpo/mpo021402-story02.html\)
I have know idea how much this will help, if any, as I am just getting started with POI.
Thanks Mark,
It's worth a try. When I have the time (or others on the list ??) - I'll look into this and David's article in IT-Jungle.
Leif
I have everything running now and have had a chance to benchmark. I have run the optimization I described previously and am running on a lightly loaded model 810 with 750cpw and 2GB memory.
Using the XLCRT command, a 4550 record table with 10 columns takes about 90 seconds. A 10,161 record table with 10 columns takes just over 3 minutes. 10,161 records with 80 columns takes 54 minutes. From this totally unscientific test, the performance of adding rows seems to be fairly linear. But there is an inordinately large penalty for adding columns.
I'm not sure if this helps ...
Couldn't get your
SBMJOB CMD(QSH CMD('for jar in $(find /jakarta-poi-2.5.1/ -name ''*.jar''); do system "CRTJVAPGM CLSF(''"$jar"'') OPTIMIZE(40)";done'))
to work.
Error: find: 001-0023 Error found opening file S. S)
Will try tomorrow with CRTJVAPGM and OPTIMIZE(40). Guess it's the $ that confuse my box here in Europe (as usual :-)
I forgot to mention you will need to modify the "/jakarta-poi-2.5.1/" portion to point to your POI directory. And it may run for a long time depending on your box.
Ok, got the .jar compiled (CRTJVAPGM). Only mistake... had 7 other .jar's in the folder. Took almost 12 hours, but tomorrow I'll start testing the same routines as mentioned previous in the thread.
Has there been any improvement by optimizing the JAR files?
Sorry, got sidetracked on a major project and never got back to this item to test further.