#1021 SysFileTree usage Access Violation

4.1.3
closed
Mark Miesfeld
none
7
2013-07-09
2011-09-07
Jerry Senowitz
No

Under Windows XP Professional, SP1 32bit w/ ooRexx 4.1.0, when I try to use the results of SysFileTree of my C: Drive (64k+entries, avg entry length 102 bytes) an access violation occurs or I get "date" function argument errors randomly. Sometimes the .rex program even works as expected. When the errors occur is unpredictable as well as when it will run successflly. Frustrating! Tried to put in stops like Trace ?r or exit after certain points. Never have been able to catch what is causing the problem.

The function is rc = SysFileTree("C:*","f","BLS","*")
After SysFileTree is executed, the RC is 0 or the .rex program will exit displaying a message and the return code.
The .rex program goes on to examine the results of SysFileTree and select entries based on user command line selected search criteria. At random points in the seach loop I get either of the above errors. Have included a zip file containing two Dr Watson logs and a sample of the random "date" function error.

Regressed to ooRexx to 4.0.0 and still had the above problems.

Regressed to ooRexx to 3.2.0 and I no longer had any problems with ooRexx. Finished writing the code under 3.2.0 and it works as expected. After I finished writing the code I restored my Windows XP System back to the ooRexx 4.1.0 version and the problems are reoccuring again.

I need to move to a new computer and operating system (Windows 7 Home Premium 64 bit) and wanted to use the latest ooRexx 64 bit programs. I am writing .rex programs to replace 16 bit programs that no longer function on Windows 7. I've developed replacement programs on Windows XP and hope to port them to Windows 7. But, now I am concerned about ooRexx reliability on Windows 7.

Discussion

<< < 1 2 3 4 > >> (Page 2 of 4)
  • Jerry Senowitz
    Jerry Senowitz
    2011-10-17

    Disgusted with the reliability of SysFileTree (now approaching a 50% failure rate) I regressed ooRexx back to Version 3.2.0 again. Since first encountering this problem I have written a number of testn.rex apps to see if I could predict it's reliability, but, to no avail. Well, I have now run these little apps on version 3.2.0 and the problem DOES exist at this level as well. This is NOT a problem introduced in 4.x. It has existed for several ooRexx releases.

    I've tried to emulate (using SysStemInsert) the massive amount of variable length data dumped into stem variables that is produced by SysFileTree w/ the "S" option, but have not found a problem. The only difference between SysFileTree and my emulations is that after a stem variable is generated under this emulation, control is returned to my app for the next instruction in the DO loop. SysFileTree continues building stem variables until it completes it recursion and then returns.

    When I first submitted this problem report I thought I was dealing with an access violation problem and since it only happened on 4.x I thought I was safe at 3.2. While I have not seen the access violation at 3.2, I have now seen several iterations of missing variables, something I didn't initially check when I submitted the problem report. Because of the problem, I have added code to test each stem variable using var().

    The SysFileTree process, in it's variable building manages to step on pointers or doesn't set them up properly in the variable pool that cause missing variables to occur and on occasion storage access violations. Access violations can occur during the stem variable build or somewhere after SysFileTree returns to my app. My app now checks, using var(), whether or not the stem variable has been defined. If 1 is undefined, it will loop thru the rest of the stem variables, listing the stem variables that var() thinks are undefined. It then dumps the variable pool using SysDumpVariables so that I can inspect the data. When the problem occurs, I have seen anywhere from 1 to 16 stem variables that var() thinks are undefined and while some are consecutive numbers, they aren't always. Also, I have seen access violations occur on the var() and SysDumpVariables functions.

    My only problem is that I can't predict when this is going to occur other than it happens in a window of 64107 to 64136 stem variables. I've seen missing variables as low as number 1301 and as high as 27000+. With access violations, I usually don't see anything that tells me which variable caused the problem (bad pointer?). If I add a line of code to the app, the dynamics are such that I probably won't get the error. That is frustrating! I did try to get a feel for size of the variable pool once by summing the lengths of the stem variables to the point of failure, but everything ran to normal completion. Took the code out and the failure reoccurred. I can run the test app 1 time and it fails and run it again and it works fine under the same cmd.exe invocation. Have seen it run fine using cmd.exe but happen when invoking the test using rexx.exe and visa versa.

    Without knowing the dynamics of variable pool assignment and how it expands when variables are added, I can offer little info on what to focus on other than it occurs during a recursion of subdirectories.

    I have been a system and application programmer most of my adult life and have used various incarnations of ooRexx's parents going back to the 1980's, mostly at the classic rexx level. I was happy when IBM decided to introduce it on the PC with PC DOS. While I do not consider myself a REXX expert, I do have a lot of experience with the language and what it can do. As I stated in previous comments, I am trying to replace several old 16 bit apps that no longer work in 64 bit Windows and there are no known replacements for these apps. 2 of these apps require the use of SysFileTree with the ability to recurse thru the directory tree structure of hard drives.

    Realizing that this problem can be difficult to reproduce, I would be willing to provide assistance in problem determination, if that would be helpful. I am adding my latest incarnation of a test that produces the problem plus a console log and select variable pool stem variables - 3405740-2.zip.

     
  • Bruce
    Bruce
    2011-10-17

    I'm seeing pretty much the same thing on ooRexx 4.2.0/Mac OS X/ 64bit.

    I get one of three things:

     5 *-* RC =  SysFileTree("/Users/bjskelly/*", "f", "S");
    

    REX0040E: Error 40 running /Users/bjskelly/sft.rex line 5: Incorrect call to routine
    REX0372E: Error 40.1: External routine "SYSFILETREE" failed

    or
    Abort

    or
    Bus Error

    The first case occurs when there are massive amounts for files to process, in my case 600k files and sub directories.

    Abort happens when I give a more qualified filespec, so the data being added is small, but the number of directories to process is large, 90k.

     
  • Mark Miesfeld
    Mark Miesfeld
    2011-12-28

    I can not reproduce this problem. I have tried your test programs and run them hundreds of times with greater than 100,000 files on my systems. No luck.

    Bruce if you can reproduce this on your Mac, maybe you could try to debug it.

     
  • Jerry Senowitz
    Jerry Senowitz
    2012-02-08

    Status Update:

    1) Test4.rex has been the most effective test that demonstrates this problem whether it is using command.com, cmd.exe, rexx.exe or "Try Rexx". It doesn't always occur in each of these environments.

    2) Regressed ooRexx back to 3.1.1 (the newest release that does not show the problem) and have been running successfully on Windows XP since 11-16-11. Unfortunately 3.1.1 has a bug (1630937) that isn't fixed until 3.1.2. The bug prevents 3.1.1 from running on Windows 7 and Vista (x64). 3.1.2 and newer ooRexx releases have all demonstated the problem of this bug report (3405740).

    3) Walked thru the SysFileTree code in RexxUtil in 3.1.1 and 4.1.0. The only real difference is that there is a structure definition change in the SysFileTree section between the 2 releases.

    4) Upgraded my Windows XP (x86) to SP3. Made no difference.

    5) Installed Visual C redistributables thru 2010. Made no difference.

    6) Thinking that maybe the reason you, Mark, can't reproduce the problem is that you have Visual Studio installed, So, I installed Visual Studio 2005. Problem no longer appeared. Interesting!

    7) Decided to get rid of the backups from the SP3 update (3087 objects). Running Test4.rex, the problem reappears with an "Access violation" and Visual Studio Debugger gets control. Because I have not compiled 4.1.0, I don't
    have the necessary data to let the debugger do it's thing. (Don't want to do a build of ooRexx either.) I choose to leave this level of problem determination to the experts.

    8) Based on number 7), I have concluded that it is NOT the number of stem variables that causes the problem, but a variable pool management problem. It is the amount of data that causes the problem (each stem variable is of variable length), not the number of variables.

    Here is my theory:

    Variable Pool Manager (call it VPM) allocates memory for the variable pool.

    As VPM runs out of memory, it allocates (malloc?) more memory and puts the next stem variable in the new block of memory. But, VPM keeps track of the unused memory from the previous malloc(s) so that if a variable fits in that unused memory it will be used. This would explain why stem variables, after a while, are no longer consecutively numbered when dumped using SysDumpVariables. In hind sight I should not have sorted the variable pool variables prior to submitting additional doc on the problem.

    I think that there are situations where VPM thinks data will fit in the unused part of memory but it does not take into account stem variable name and or ptr adjustment or both. I further believe that it is the case where data is equal to the amount of unused memory or within 6 bytes. This would explain the situation in "CONSOLES.TXT"(in 3405740-2.zip) where another stem variable was overlaid. It also would explain it occuring so randomly. It would also explain why adding additional code to a .rex for problem determination changes the dynamics of it's occurence. It also explains that when it does occur, it will reoccur the same way in the same instance of a DOS VDM.

    The question I have is whether the VPM does memory defragmentation (sometimes called "data/garbage collection") along the way and whether that impacts the problem?

    Without me having to dig thru all the source code, which module actually performs the VDM function? I thought I could take a peek at it in my spare time to see if anything obvious jumps out at me.

     
  • Bruce
    Bruce
    2012-02-08

    This line causes the rexx program to abort:

    RC = SysFileTree("/Users/bjskelly/b*", "F", "DS");

    Mac OS 10.6.8 / ooRexx 4.1.1 Intel 64 bit.

    See Crash Reporter File rexx_2012-02-08-111628_BookWormMac attached below.

     
  • Jerry Senowitz
    Jerry Senowitz
    2012-02-08

    In my 2012-02-07 17:00:00 PST comment, last paragraph, I asked which module performed the "VDM" function. This was a typeo. I meant "VPM". Sorry for any confusion.

     
  • Mark Miesfeld
    Mark Miesfeld
    2012-02-09

    Hi Bruce,

    Looking at the crash log and searching on: release_file_streams_for_task I see a number pages that seem to hint that this could be an Apple bug rather than ooRexx. Here is just one, where the person says that Apple admitted they have a bug in 10.6.8, which is what it looks like you for using.

    http://pleasantsoftware.com/blog/

     
  • Mark Miesfeld
    Mark Miesfeld
    2012-02-09

    We already had this bug opened:

    3149277 SysFileTree causes Segmentation fault

    which I'm going to close as a duplicate of this one (even though it was opened up first, this one currently has more information in it.)

    The total comment of 3149277 so far is:

    I was running ooRexx 4.0.1 and found a seg fault within SysFileTree. I updated to ooRexx-4.1.0-ubuntu1004.i386.deb and the error still persists.

    Ubuntu Server 10.04 x86

    This code never completes, rather seg faults:
    / Next scan for FILE's... /
    say 'Next scan for FILEs...'
    rc=SysFileTree('*', f., 'FOS')

    I checked the subtree in question and it appears the SysFileTree should return:
    /srv/shares$ find data -name "*" -print | wc -l
    217331

    that many files. I recalled SysFileTree not being able to get through the data share a LONG time ago, before I did massive cleanup. Perhaps then the file count was around 500,000. Puzzling that years later with far less files it still can not make it through.

    I was running the rexx as root via sudo, so that can not be the problem.

    I guess I will take the data share a subtree at a time to get through the files fixing perms. Please let me know how I may help sort this out. Thank you!

    opened by: mdlueck

     
<< < 1 2 3 4 > >> (Page 2 of 4)


Anonymous


Cancel   Add attachments