From: CVBruce <cv...@gm...> - 2016-09-24 22:26:19
|
Are you just wanting to remove lines that are duplicates within a given file, or lines that are duplicate by occurring in one or more files? Bruce > On Sep 24, 2016, at 7:04 AM, Les Koehler <vm...@ta...> wrote: > > That's an interesting problem to solve. What's your approach? > > Les > > On 9/24/2016 7:31 AM, Bertram Moshier wrote: >> Hello, >> >> I'm writing a program to remove duplicate lines from multiple files. >> (About 3600 files (971 or so have duplicate lines) and all files total >> about 512MB of space.) >> >> After reading in all the files, I've discovered ooRexx will no longer >> write to any hard drive on the system. I suspect this is a memory >> issue, as usage is above 3GB and can be above 4GB. YES, I AM using the >> 64 bit version. >> >> The version number: REXX-ooRexx_4.2.0(MT)_64-bit 6.04 22 Feb 2014 >> >> The program runs fine even when memory usage exceed 8GB, except for I/O >> (specifically output). >> >> Below you'll find some of the code. PLEASE note: the !!Say_Directed >> subroutine is like the UNIX tee command. It can output to both the >> console and file. It was my first indication of a problem, as any >> output would go to the console BUT not the file! >> >> The !!EOJ subroutine generates a stop and timing information message. >> This is the FAILURE as the line does not get written to disk (RC = 1 for >> lineout). The line does get written to disk if the !!EOJ occurs BEFORE >> the last do loop (e.g. after the SysFileTree). The memory usage of the >> last DO LOOP is what takes ooRexx to 3-4GB and higher. >> >> NOTE: The system is running Windows 10 Professional AND has 48GB of >> physical RAM. >> >> >> Here is a piece of the code: >> >> files_with_duplicate_lines. = '' >> files_with_duplicate_lines.0 = 0 >> >> GOODMARK = 'GOOD: ' >> >> OFN = 0 >> output_files. = '' >> input_files. = '' >> >> >> rc = SysFileTree('C:\Program Files (x86)\Windower4\logs\*.log','files','O') >> if rc <> 0 then do >> call !!EOJ 1000 + rc >> end >> >> do filenum = 1 to files.0 >> do lines = 1 while lines(files.filenum) > 0 >> files.filenum.lines = linein(files.filenum) >> end >> files.filenum.0 = lines - 1 >> end >> >> call !!EOJ 0 >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Oorexx-users mailing list >> Oor...@li... >> https://lists.sourceforge.net/lists/listinfo/oorexx-users >> > > ------------------------------------------------------------------------------ > _______________________________________________ > Oorexx-users mailing list > Oor...@li... > https://lists.sourceforge.net/lists/listinfo/oorexx-users |