Menu

#928 Panic with Berkeley B.D and not with VBISAM

unclassified
open
3
2024-01-18
2023-11-03
No

Panic with BDB. The program gives the following errors. Exactly the same program with exactly the same data with VBISAM works correctly.

PARTE.10: page 1148: reference count overflow
PANIC: Argumento inv?lido
PANIC: fatal region error detected; run recovery
attempt to reference invalid memory address (signal)

Related

Wish List: #455

Discussion

  • Simon Sobisch

    Simon Sobisch - 2023-11-03

    I guess you are on GC 3.2? For issues like that please post cobcrun --verbose --info (this will also show the BDB version in use).

    Please attach a program to reproduce - otherwise we cannot do much.

     
  • Vincent (Bryan) Coen

    On 07/11/2023 17:15, Mickey White wrote:

    Windows GnuCOBOL does tell the version of BDB. Linux does Not ?

    E.G.,  db_verify -V
    Berkeley DB 18.1.32: (February 19, 2019)

    Same apples to the others

     
    • Juan Carlos Escartí

      db_verify is in SuSe in the db48-utils package. But the last time I installed it, it didn't seem compatible with the libraries that GNU Cobol uses for BDB, and I currently don't have it installed.

       
  • Juan Carlos Escartí

    Standar SUSE Linux Repositoris 15.4

    Copyright (C) 2023 Free Software Foundation, Inc.
    License LGPLv3+: GNU LGPL version 3 or later <https://gnu.org/licenses/lgpl.html>
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Keisuke Nishida, Roger While, Ron Norman, Simon Sobisch, Edward Hart
    Built     Aug 23 2023 12:00:00
    Packaged  Aug 23 2023 14:17:53 UTC
    build information
    build environment        : x86_64-suse-linux-gnu
    CC                       : gcc
    C version                : "7.5.0"
    CPPFLAGS                 :  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
    CFLAGS                   : -O2 -ffat-lto-objects -fstack-clash-protection
                               -fexceptions -fstack-protector-strong -pipe
                               -finline-functions -fsigned-char -Wall
                               -Wwrite-strings -Wmissing-prototypes
                               -Wno-format-y2k
    LD                       : /usr/x86_64-suse-linux/bin/ld -m elf_x86_64
    LDFLAGS                  :  -Wl,-z,relro,-z,now,-O1
    
    GnuCOBOL information
    COB_MODULE_EXT           : so
    dynamic loading          : system
    "CBL_" param check       : enabled
    64bit-mode               : yes
    BINARY-C-LONG            : 8 bytes
    endianness               : little-endian
    native EBCDIC            : no
    variable file format     : 0
    sequential file handler  : built-in
    indexed file handler     : BDB, version 4.8.30
    mathematical library     : GMP, version 6.1.2 (compiled with 6.3)
    XML library              : libxml2, version 2.10.3
    JSON library             : json-c, version 0.13.0
    extended screen I/O      : ncursesw, version 6.1.20180317 (CHTYPE=32, WIDE=1) 
    mouse support            : yes
    
     

    Last edit: Simon Sobisch 2023-11-03
  • Simon Sobisch

    Simon Sobisch - 2023-11-03

    You may want to try to get an updated BDB version. The last one that is available under the old license is 6.0.19 (likely only MSYS2 and arch provide those as binaries), the version with that license that is most used is BDB 5.3, which contains several fixes over 4.8.

    In any case: Can you provide a minimal example program that fails?

     
  • Juan Carlos Escartí

    Thanks for your quick response. Yes, on Monday I will have a set of tests prepared.
    In another sense
    I have been looking at the code mainly from fileio.c and screenio.c. A few years ago I had a similar problem. I had to run our programs on HP/Ux, Aix, Solaris, ATT, SCO, etc. At first we started with solutions similar to yours. Conditioning IF that will determine the actions according to the S.O.. Soon this caused multiple side effects. The solution that worked perfectly for us was to have a file that included the particularities of each OS. and at compile time a small script put the correct includes and programs according to the OS. With this solution we were able to have operational versions of our system for more than 10 platforms simultaneously.
    Hoping the usefulness of these tips, receive cordial greetings.
    Juan Carlos Escartí

     
    👍
    1
  • Juan Carlos Escartí

    Continuing with the thread of your answer. I prefer with BDB to follow the standard version of the SUSE repository, which I imagine is "your official version for Linux". Frankly, Ron's VBISAM 2.1.1 is much better and faster. In the stress and balance tests VBISAM is almost perfect. BDB loses information in conversions. Changing to BDB 6.0.19 forces me to load "my libraries", do my compilation, etc. etc. This makes it very difficult to track and fix bugs, because different systems can have different behaviors. This is why it could be interesting to have a VM where everything was "Exactly the same" as what you are using and thus facilitate the elimination of errors and problems.

     
  • Juan Carlos Escartí

    Parcial result test


    Task write 1.388.081 records in indexed file (TEST000 program)


    GNU 3.2.2 with VBISAM 2.2
    real 2m4,048s
    user 1m37,741s
    sys 0m25,193s

    GNU 3.2.2 with BDB
    real 199m35,132s
    user 197m59,859s
    sys 1m20,177s

    MF II v 1.1 with C-ISAM
    real 5m55.896s
    user 1m20.837s
    sys 1m55.523s


    Task Rewrite indexed file 1.388.081 read data from txt file (TEST001 program)


    GNU 3.2.2 with VBISAM 2.2
    real 23m44,634s
    user 23m16,076s
    sys 0m28,450s

    MF II v 1.1 with C-ISAM
    real 0m26.588s
    user 0m16.189s
    sys 0m9.893s

    GNU 3.2.2 with BDB
    real 0m3,625s
    user 0m3,032s
    sys 0m0,394s

    Pending:
    Task Read next record to index file 1.388.081 records (TEST002 program)
    Task Read index file 1.388.081 primary key input sequential file random keys (TEST003 program)

    We will soon attach programs and data to be able to reproduce the tests
    Kind regards

     
    👍
    3
  • Vincent (Bryan) Coen

    Please provide the test program sources as I would like to test this on my system that uses BDB db-18.1.40

     
    • Mickey White

      Mickey White - 2023-11-07
       
      • Juan Carlos Escartí

        That's another intensive data rewriting program. These programs are a different set of tests (TEST000/004) on a single file. I'm going to see if I can finish it today and upload the programs and data.

         
  • Juan Carlos Escartí

    Here are the tests, they are in tgz, tar gzip format. Deploy to any directory and read Reame.1st and Readme.md. They occupy 90Mb compressed and around 700Mb decompressed. The results depend on the GNU version and indexing engine. As a curious note, BDB seems to have a sleep in the write because the system time is similar to VBISAM and C-ISAM, but the total time is almost on the order of 1000 times greater.
    Note: The file is too large to attach to the post. I upload it to
    https://www.liberatusdeudas.es/test_index_gnucob.tgz
    Regards

     
    • Simon Sobisch

      Simon Sobisch - 2023-11-10

      Thank you for those test routines, this is useful.

      You really should have said that TEST000 writes not only 1.388.081 records, but that it checks first if they exist (which is fine and takes nearly no time), but more important: that this is file has 13 alternate keys that all allow duplicates; and there are a lot of duplicates in there...

      Just checked the places the cpu cycles are spent in (reduced to 99999 additions). The following checks were done without DB_HOME set (that leads to "less feature-complete than VB-ISAM", so to be fair one has to set enable that, which leads to everything but READ taking 10-20% longer with BDB).

      With BDB 5.3: 98.9% of the cpu cycles are spent in indexed_write_internal(libcob), below that with 97.4% in the function get_dupno() - this is getting the highest number of the alternate keys, and that check needs to be done for each of it on every write, to create an unique key for the alternate "databases".

      If you know that and don't need a direct read with the alternate keys (in most cases those are only read sequentially) you can adjust the COBOL part to drop the WITH DUPLICATES and add the primary key or a record number to each yourself. This drops the cpu and therefore time needed by nearly 98%.

      It may be possible to speed get_dupno() up by adjusting the code to do a read-back to search the highest duplicated for a given alternate key.

      I'll do a draft tomorrow and post results back, maybe even a patch. If this works out then we get an improved speed for writing to ALTERNATE keys that have a WITH DUPLICATES clause and a huge improvement if there are a lot of duplicates.

       
    • Simon Sobisch

      Simon Sobisch - 2023-11-10

      But none of those test create the PANIC - can you provide a testcase for that, too?
      Do you have DB_HOME set?

       
    • Simon Sobisch

      Simon Sobisch - 2023-11-10

      @juanc [feature-requests:#455] includes not well tested code to improve BDB times for everything but DELETE and READ with the primary key, mostly for WRITE (cut time down to less than 1%!) and REWRITE.

      Can you please:

      • run make checkall with the change applied, to verify that the change does not include any "known up front" regressions
      • provide times for all 4 tests for all your environments (either replace the function for the BDB test or check both variants for comparison)
      • see if you recognize any regressions for reading/writing the duplicates that were not covered by the testsuite

      Just one note: It seems that at least part of the alternate keys in the test are commonly identical "empty". For improved performance you may consider to use SUPPRESS KEY and therefore drop those to be written for the alternate keys.

       

      Related

      Wish List: #455


      Last edit: Simon Sobisch 2023-11-10
  • Juan Carlos Escartí

    Thanks for your work Simon
    I am going to digest your answer since I still don't understand the GNUCOBOL jargon very well and I have to familiarize myself with the C programs and the compiler.

    The results you get are really good and seem to solve this performance problem.

    The tests that I have given you are a part of a program that passes all the data, about 80 files and 2,550,000 records, and I think GNU has some problems. It is generally where I find panics (in complete passes or updates), and also slowing down of the general execution.

    Today Error:

    try to reference unallocated memory (se?al SIGSEGV)

    *** glibc detected *** /usr1/condor/proggnu/GNUPDAT: free(): invalid pointer: 0x0acf0238 ***
    *** glibc detected *** /usr1/condor/proggnu/GNUPDAT: corrupted double-linked list: 0x0acf0238 ***
    Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

    This error today in DIARIO file in the record with Key 42963688.

    The behavior is snowball-like with both VBISAM and BDB. As it reads more records, it slows down more and more often it comes out with errors like this, which in the next run does not reproduce exactly the same.

    I am attaching the complete pass program. If you are interested, I will prepare a complete test set of the program with its data.

    I'm going to try the solutions you've made and I'll tell you something shortly.
    Thank you again and best regards

     
    • Simon Sobisch

      Simon Sobisch - 2024-01-18

      Hi @juanc,

      comming back to this issue (BDB side). So: did you still had problems with the performance-patch for BDB referenced above?

      If is: is the environment variable DB_HOME set? This more or less activates locking necessary for multi-user/multi-process environments on single files - and does change BDB's behaviour a lot. If it is set, then please try without it.

      If you still have the issue then it likely would be reasonable to try with a newer BDB version (depending on your plans to distribute binaries to another place, this would mean BDB 6 something, otherwise most current one - @chaat could help with finding the download).

      And if the issue happens with different records, we still could use a tool like rr (if you run on supported hardware) to record the run, so once it fails we can actually debug the issue using its recording.

      I am attaching the complete pass program. If you are interested, I will prepare a complete test set of the program with its data.

      Of course, that will be interesting in general, also for performance work.

       
  • Juan Carlos Escartí

    GNUPDAT Program with libraries displayed

     
  • Juan Carlos Escartí

    Dear Simon

    Reading your answer further it seems that this is a side effect of trying to keep many indexed handlers in the same file, see post https://sourceforge.net/p/gnucobol/bugs/928/#bd64

    Everything that seems to have to be done in the C program that manages indexed files is solved by C-ISAM, VBISAM, and probably BDB natively. iswrite, isrewrite, isrewcurr etc are native functions of the isam libraries.

    I think putting an "extra" layer to the indexing manager will always degrade its performance, and we will have lateral and erratic errors.

    I am going to prepare the same programs made in C with C-ISAM and VBISAM and that are exactly the same as what the COBOL programs do. I think that within 2 weeks I will be able to have them.

    I think that these programs could remain as generic utilities for transferring data from plain text files to VBISAM and/or BDB and if there are version changes in VBISAM.

    With them we will see the loss factor that we have in GNU and in MF with the indexing handler of the compiler over the native C handler, and we can easily isolate the errors due to the indexing libraries and those that are due to the GNUCOBOL compiler, more easily.

    Kind regards

     
    • Simon Sobisch

      Simon Sobisch - 2023-11-11

      There is no layer with any of the ISAM libraries... Nearly - if those are known to not support sparse keys or returning a status 02, then there's a layer to provide the same effects to the COBOL program).

      There must be a handler with BDB as this is just an unstructured key/value db using btree, which, at the time of implementation, was widely available.

      There is an option in BDB to use "secondary" DBs and to have multiple DBs in one file, which moves a bit of the layer into the BDB functions, but that would be a bigger rewrite (patches welcome, especially when adding the COBOL fixed file attributes to one of the secondary DBs - this is totally missing in our layer).

      Summary: you won't get different results "in C", especially not with ISAM libraries. To get different results with BDB you have to change the logic.

       

      Last edit: Simon Sobisch 2023-11-12
  • Juan Carlos Escartí

    Note on the previous ticket.
    At 6:33 today I started a GNUPDAT run on VBISAM.
    It's 09:25 I have to leave and it's not over yet. We've been there for 180 minutes so far.
    For other applications that I have in C with C-ISAM, with VBISAM in C exactly the same will not take more than 2/3 minutes.

     
    • Simon Sobisch

      Simon Sobisch - 2023-11-11

      That is a VBISAM specific issue. Please create a separate bug report specifying the exact version of GnuCOBOL and VBISAM used (in the latter case: where did you got it from) and a test program for this single scenario.

      This should be relatively easy to tackle afterwards.

       

      Last edit: Simon Sobisch 2023-11-12
  • Juan Carlos Escartí

    I'm back from my outing
    It's 4:41 p.m., almost 12 hours
    The process continues
    Performance similar to 8-bit processors and systems from the 80s.
    Well that's what we have
    I hope we solve it soon
    Kind regards

     
  • Juan Carlos Escartí

    Reading your answers I almost totally agree.
    It seems to be a VBISAM rewrite issue
    BDB the rewrite is very fast.
    If it is a VBISAM or GNUCOBOL-VBISAM problem, I think the answer will be given by the C program that does exactly the same thing.
    I think the important thing is to find the origin of the problems and solve them.
    This will make GNUCOBOL the only alternative for the coming decades.
    Kind regards

     
    • Simon Sobisch

      Simon Sobisch - 2023-11-11

      You provide test data and COBOL test program in a new bug issue, I reproduce and fix it. No need for some C program here...

       

Log in to post a comment.