GnuCOBOL / Wish List / #460 add option to sort/merge with multipe processes/thread

Mickey White - 2024-01-24

I hate to butt in , but wouldn't the DISK (if files on disk) be an issue. will there be 8 reads and 8 writes to the same disk as the same time. Or would all 8 (processes/threads) be in memory.
I remember back in the Mainframe IMS days we would do something similar . Our large database was on about 12 big disks drives that the IMS randomizer would split the DB up into. Then using randomizer with the key for the sequential flat files we knew could work on all 16 disks at the same time. We could to a reload quicker... sorts... whatever.
Just a question/thought.
Feel free to delete this comment, but send me your thoughts and corrections please

Last edit: Mickey White 2024-01-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sauro Menna - 2024-01-31

Hi,
I am preparing feedback on current GCSort times and a parallelism hypothesis for sort. I will forward the result of the analysis.
Splitting into blocks, at read time is only possible on fixed length files, while for variable ones there will be a need for only one Reader and 'n' Sort.
Give me a few more days.
I in the experiments I initially performed have found Memory Mapped File type files to perform very well in both encryption and reading.
It must be considered that the final write will be unique, so there will be:
- a Reader
- n Sorter
- a Writer
From the feedback I will send it will be evident.
Best Regards.
Sauro

👍
1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sauro Menna - 2024-02-23
  
  Hi,
  attached is a document where there are some considerations on the topic of multi threading.
  I have reported some real execution cases and a projection using threads.
  The greatest benefits are obtained with sorting with a large number of sorting keys.
  Performance with CPU and I/O parallelisms must be considered.
  Thread usage for sort is triggered when the read block ends, so the thread works while the reader reads the next chuck, so the read time must be added to the execution time of the sort phase. This results in the threads working in parallel, but ending one after the other as they will start at different times. That is why only in the case of sort with multiple keys can multi-threading be considered, whereas when the sort time is less than the read time you run the risk of having a single thread.
  I will do some more analysis on the subject.
  Best regards.
  Sauro
  
  ❤️
  1
  
  MultiThread_vs_SingleThread_vE01.odt
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Simon Sobisch - 2024-02-23
    
    Thank you for your thoughts and tests. I think that the "single reader" would not be necessary when applying a trick: example layout with LSQ files and a "read split" with 100k
    
    thread 1 is started to read "around 100k", it starts and handles the first records
    thread 2 is started in the meantime to read "around 100k". It positions itself at 100k, then reads until it finds a line feed, then starts from there, reading at least until 200k.
    
    thread 1 sometime hits its 100k limit in reading, it then reads until the record ends, which is the same place that thread 2 started to handle the records.
    
    That sounds simple - would it solve the non fixed-length issue? Note: with indexed files the multi read can only work if the number of records are known and a position by record number is possible.
    
    Your conclusion about the sorting itself being the biggest issue (when memory mapped files are used) and that multi-thread = "max sort thread time" vs. single thread "sum sort time" is the biggest part of the difference is definitely true. Operating "multi process" is of course more expensive than "multi thread", especially on Win32; so it is only reasonable to use this approach when threads don't work in that environment (both approaches can use memory mapped files).
    
    What I'm not sure is the reading part - if the source file is memory mapped, too, then the threads itself can read it (split by offset), so the single reader is not necessary, no?
    For multiple input files these could itself be split to be handled by multiple threads.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Sauro Menna - 2024-03-10
      
      Hi Simon,
      I have integrated multithreading into gcsort and have run initial tests in a Windows environment with fixed-length sequential files.
      I will extend the handling to variable-length sequential files and then to line sequential files.
      Added a parameter to the command-line 'mt=<n>' where <n> is the number of threads.
      This will give each thread an assigned block of records to read.
      Each thread opens the input file(s), loads the data into meoria, and then sorts it according to the specification of the sort key.
      The gcsort process waits for all threads to finish and then merge and write the result to the output file.
      The merge step can read either from memory or from TMP (MMF) files depending on the size of the input file.
      I will now test the operation in linux environment.
      I used the qsort_s function for Linux and qsort_r for Windows.
      I will have to update the testcases and documentation.
      I will send an update on developments.</n></n>
      
      Sauro
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Simon Sobisch - 2024-03-10
        
        Thank you for this good news.
        
        It is reasonable to do fixed length first, adding the others later on.
        
        Do you already have running results concerning the run with different values?
        
        Simon
        
        alternate
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Sauro Menna - 2024-03-10
        
        The execution times between singleThread and multiThread are similar.
        The data are identical between gcsort no Thread, multiThread and COBOL sort program.
        Unfortunately, the part where gcsort loses time is the merge and write phase of the output result of the parallel sorts.
        I will have to improve this phase.
        In general, in linux environment, the times are:
        
        Memory allocation
        GCSORT_MEMSIZE="2048000000"
        
        ~~~
        Single KEY
        Single Thread
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:20:12 2024
        End : Sun Mar 10 13:20:23 2024
        Elapsed Time 00hh 00mm 11ss 000ms
        
        MultiThread (4 threads)
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:20:58 2024
        End : Sun Mar 10 13:21:09 2024
        Elapsed Time 00hh 00mm 11ss 000ms
        
        MultiThread (3 threads)
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:24:55 2024
        End : Sun Mar 10 13:25:08 2024
        Elapsed Time 00hh 00mm 13ss 000ms
        
        MultiThread (6 threads)
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:25:35 2024
        End : Sun Mar 10 13:25:50 2024
        Elapsed Time 00hh 00mm 15ss 000ms
        
        Multy Keys
        File Size: 270 MByte
        SORT FIELDS : (8,5,CH,A,13,3,BI,A,16,4,FI,A,20,8,FL,A,28,4,PD,A,32,7,ZD,A)
        record length = 90
        
        No Thread
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:34:33 2024
        End : Sun Mar 10 13:34:48 2024
        Elapsed Time 00hh 00mm 15ss 000ms
        
        MultiThread (4 threads)
        Total Records Number..............: 3000000
        Total Records Write Sort..........: 0
        Total Records Write Output........: 3000000
        Start : Sun Mar 10 13:33:15 2024
        End : Sun Mar 10 13:33:39 2024
        Elapsed Time 00hh 00mm 24ss 000ms
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Sauro Menna - 2024-04-17
        
        Hi,
        I have revised the handling of Threads and the times have improved.
        I had a problem in linux environment with cobc version 2.2 that took me a lot of time and the only way to solve it was to insert a mutex during the sort phase. The sort phase uses cob_fields to compare values.
        On a few records it works, but on large files there were mismatches on sorting while generating the same number of records.
        In Windows 10 environment and cobc 3.2 I had no problems.
        I will continue testing to nominate the first release with multithreaded integration.
        Regards.
        Sauro
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sauro Menna - 2024-06-23

Hello,
I consolidated the multithreaded version of GCSort.
The version is 1.04.00, tested in windows, linux(ubuntu) and mingw 64 environment.
The execution times seem satisfactory to me (I think).
Attached is a test in Linux environment, with:
1) Sequential file generation of 3,500,000 records.
2) Sort COBOL (set the size of the sort in memory).
3) GCSort single process
4) comparison of result between gcsort and cobol.
5) GCSort with mt=2 (two threads)
6) comparison of the result between gcsort and cobol
7) GCSort with mt=4 (four threads)
8) comparison of the result between gcsort and cobol

Elapsed
1) Generation of :
2) Cobol Sort : sbig001 - 0 minutes and 48 seconds elapsed.
3) GCsort single process : Elapsed Time 00hh 00mm 49ss 000ms
4) GCsort 2 thread : Elapsed Time 00hh 00mm 36ss 000ms
5) GCSort 4 thread : Elapsed Time 00hh 00mm 29ss 000ms

Best Regards.
Sauro

👍
1

MT_Elapsed.txt

mt_script.sh

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

add option to sort/merge with multipe processes/thread

A free COBOL compiler

Group

Searches

Help

#460 add option to sort/merge with multipe processes/thread

Discussion