Panic with BDB. The program gives the following errors. Exactly the same program with exactly the same data with VBISAM works correctly.
PARTE.10: page 1148: reference count overflow
PANIC: Argumento inv?lido
PANIC: fatal region error detected; run recovery
attempt to reference invalid memory address (signal)
I guess you are on GC 3.2? For issues like that please post
cobcrun --verbose --info
(this will also show the BDB version in use).Please attach a program to reproduce - otherwise we cannot do much.
On 07/11/2023 17:15, Mickey White wrote:
E.G., db_verify -V
Berkeley DB 18.1.32: (February 19, 2019)
Same apples to the others
db_verify is in SuSe in the db48-utils package. But the last time I installed it, it didn't seem compatible with the libraries that GNU Cobol uses for BDB, and I currently don't have it installed.
Standar SUSE Linux Repositoris 15.4
Last edit: Simon Sobisch 2023-11-03
You may want to try to get an updated BDB version. The last one that is available under the old license is 6.0.19 (likely only MSYS2 and arch provide those as binaries), the version with that license that is most used is BDB 5.3, which contains several fixes over 4.8.
In any case: Can you provide a minimal example program that fails?
Thanks for your quick response. Yes, on Monday I will have a set of tests prepared.
In another sense
I have been looking at the code mainly from fileio.c and screenio.c. A few years ago I had a similar problem. I had to run our programs on HP/Ux, Aix, Solaris, ATT, SCO, etc. At first we started with solutions similar to yours. Conditioning IF that will determine the actions according to the S.O.. Soon this caused multiple side effects. The solution that worked perfectly for us was to have a file that included the particularities of each OS. and at compile time a small script put the correct includes and programs according to the OS. With this solution we were able to have operational versions of our system for more than 10 platforms simultaneously.
Hoping the usefulness of these tips, receive cordial greetings.
Juan Carlos Escartí
Continuing with the thread of your answer. I prefer with BDB to follow the standard version of the SUSE repository, which I imagine is "your official version for Linux". Frankly, Ron's VBISAM 2.1.1 is much better and faster. In the stress and balance tests VBISAM is almost perfect. BDB loses information in conversions. Changing to BDB 6.0.19 forces me to load "my libraries", do my compilation, etc. etc. This makes it very difficult to track and fix bugs, because different systems can have different behaviors. This is why it could be interesting to have a VM where everything was "Exactly the same" as what you are using and thus facilitate the elimination of errors and problems.
Parcial result test
Task write 1.388.081 records in indexed file (TEST000 program)
GNU 3.2.2 with VBISAM 2.2
real 2m4,048s
user 1m37,741s
sys 0m25,193s
GNU 3.2.2 with BDB
real 199m35,132s
user 197m59,859s
sys 1m20,177s
MF II v 1.1 with C-ISAM
real 5m55.896s
user 1m20.837s
sys 1m55.523s
Task Rewrite indexed file 1.388.081 read data from txt file (TEST001 program)
GNU 3.2.2 with VBISAM 2.2
real 23m44,634s
user 23m16,076s
sys 0m28,450s
MF II v 1.1 with C-ISAM
real 0m26.588s
user 0m16.189s
sys 0m9.893s
GNU 3.2.2 with BDB
real 0m3,625s
user 0m3,032s
sys 0m0,394s
Pending:
Task Read next record to index file 1.388.081 records (TEST002 program)
Task Read index file 1.388.081 primary key input sequential file random keys (TEST003 program)
We will soon attach programs and data to be able to reproduce the tests
Kind regards
Please provide the test program sources as I would like to test this on my system that uses BDB db-18.1.40
I think he posted the pgm and data here:
https://sourceforge.net/p/gnucobol/discussion/cobol/thread/9ecf27fcfe/?limit=25&page=4#6c6b
That's another intensive data rewriting program. These programs are a different set of tests (TEST000/004) on a single file. I'm going to see if I can finish it today and upload the programs and data.
Here are the tests, they are in tgz, tar gzip format. Deploy to any directory and read Reame.1st and Readme.md. They occupy 90Mb compressed and around 700Mb decompressed. The results depend on the GNU version and indexing engine. As a curious note, BDB seems to have a sleep in the write because the system time is similar to VBISAM and C-ISAM, but the total time is almost on the order of 1000 times greater.
Note: The file is too large to attach to the post. I upload it to
https://www.liberatusdeudas.es/test_index_gnucob.tgz
Regards
Thank you for those test routines, this is useful.
You really should have said that TEST000 writes not only
1.388.081
records, but that it checks first if they exist (which is fine and takes nearly no time), but more important: that this is file has 13 alternate keys that all allow duplicates; and there are a lot of duplicates in there...Just checked the places the cpu cycles are spent in (reduced to 99999 additions). The following checks were done without DB_HOME set (that leads to "less feature-complete than VB-ISAM", so to be fair one has to set enable that, which leads to everything but READ taking 10-20% longer with BDB).
With BDB 5.3: 98.9% of the cpu cycles are spent in
indexed_write_internal
(libcob), below that with 97.4% in the functionget_dupno()
- this is getting the highest number of the alternate keys, and that check needs to be done for each of it on every write, to create an unique key for the alternate "databases".If you know that and don't need a direct read with the alternate keys (in most cases those are only read sequentially) you can adjust the COBOL part to drop the
WITH DUPLICATES
and add the primary key or a record number to each yourself. This drops the cpu and therefore time needed by nearly 98%.It may be possible to speed
get_dupno()
up by adjusting the code to do a read-back to search the highest duplicated for a given alternate key.I'll do a draft tomorrow and post results back, maybe even a patch. If this works out then we get an improved speed for writing to
ALTERNATE
keys that have aWITH DUPLICATES
clause and a huge improvement if there are a lot of duplicates.But none of those test create the PANIC - can you provide a testcase for that, too?
Do you have
DB_HOME
set?@juanc [feature-requests:#455] includes not well tested code to improve BDB times for everything but
DELETE
andREAD
with the primary key, mostly forWRITE
(cut time down to less than 1%!) andREWRITE
.Can you please:
make checkall
with the change applied, to verify that the change does not include any "known up front" regressionsJust one note: It seems that at least part of the alternate keys in the test are commonly identical "empty". For improved performance you may consider to use
SUPPRESS KEY
and therefore drop those to be written for the alternate keys.Related
Wish List: #455
Last edit: Simon Sobisch 2023-11-10
Thanks for your work Simon
I am going to digest your answer since I still don't understand the GNUCOBOL jargon very well and I have to familiarize myself with the C programs and the compiler.
The results you get are really good and seem to solve this performance problem.
The tests that I have given you are a part of a program that passes all the data, about 80 files and 2,550,000 records, and I think GNU has some problems. It is generally where I find panics (in complete passes or updates), and also slowing down of the general execution.
Today Error:
try to reference unallocated memory (se?al SIGSEGV)
*** glibc detected *** /usr1/condor/proggnu/GNUPDAT: free(): invalid pointer: 0x0acf0238 ***
*** glibc detected *** /usr1/condor/proggnu/GNUPDAT: corrupted double-linked list: 0x0acf0238 ***
Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
This error today in DIARIO file in the record with Key 42963688.
The behavior is snowball-like with both VBISAM and BDB. As it reads more records, it slows down more and more often it comes out with errors like this, which in the next run does not reproduce exactly the same.
I am attaching the complete pass program. If you are interested, I will prepare a complete test set of the program with its data.
I'm going to try the solutions you've made and I'll tell you something shortly.
Thank you again and best regards
Hi @juanc,
comming back to this issue (BDB side). So: did you still had problems with the performance-patch for BDB referenced above?
If is: is the environment variable
DB_HOME
set? This more or less activates locking necessary for multi-user/multi-process environments on single files - and does change BDB's behaviour a lot. If it is set, then please try without it.If you still have the issue then it likely would be reasonable to try with a newer BDB version (depending on your plans to distribute binaries to another place, this would mean BDB 6 something, otherwise most current one - @chaat could help with finding the download).
And if the issue happens with different records, we still could use a tool like
rr
(if you run on supported hardware) to record the run, so once it fails we can actually debug the issue using its recording.Of course, that will be interesting in general, also for performance work.
GNUPDAT Program with libraries displayed
Dear Simon
Reading your answer further it seems that this is a side effect of trying to keep many indexed handlers in the same file, see post https://sourceforge.net/p/gnucobol/bugs/928/#bd64
Everything that seems to have to be done in the C program that manages indexed files is solved by C-ISAM, VBISAM, and probably BDB natively. iswrite, isrewrite, isrewcurr etc are native functions of the isam libraries.
I think putting an "extra" layer to the indexing manager will always degrade its performance, and we will have lateral and erratic errors.
I am going to prepare the same programs made in C with C-ISAM and VBISAM and that are exactly the same as what the COBOL programs do. I think that within 2 weeks I will be able to have them.
I think that these programs could remain as generic utilities for transferring data from plain text files to VBISAM and/or BDB and if there are version changes in VBISAM.
With them we will see the loss factor that we have in GNU and in MF with the indexing handler of the compiler over the native C handler, and we can easily isolate the errors due to the indexing libraries and those that are due to the GNUCOBOL compiler, more easily.
Kind regards
There is no layer with any of the ISAM libraries... Nearly - if those are known to not support sparse keys or returning a status 02, then there's a layer to provide the same effects to the COBOL program).
There must be a handler with BDB as this is just an unstructured key/value db using btree, which, at the time of implementation, was widely available.
There is an option in BDB to use "secondary" DBs and to have multiple DBs in one file, which moves a bit of the layer into the BDB functions, but that would be a bigger rewrite (patches welcome, especially when adding the COBOL fixed file attributes to one of the secondary DBs - this is totally missing in our layer).
Summary: you won't get different results "in C", especially not with ISAM libraries. To get different results with BDB you have to change the logic.
Last edit: Simon Sobisch 2023-11-12
Note on the previous ticket.
At 6:33 today I started a GNUPDAT run on VBISAM.
It's 09:25 I have to leave and it's not over yet. We've been there for 180 minutes so far.
For other applications that I have in C with C-ISAM, with VBISAM in C exactly the same will not take more than 2/3 minutes.
That is a VBISAM specific issue. Please create a separate bug report specifying the exact version of GnuCOBOL and VBISAM used (in the latter case: where did you got it from) and a test program for this single scenario.
This should be relatively easy to tackle afterwards.
Last edit: Simon Sobisch 2023-11-12
I'm back from my outing
It's 4:41 p.m., almost 12 hours
The process continues
Performance similar to 8-bit processors and systems from the 80s.
Well that's what we have
I hope we solve it soon
Kind regards
Reading your answers I almost totally agree.
It seems to be a VBISAM rewrite issue
BDB the rewrite is very fast.
If it is a VBISAM or GNUCOBOL-VBISAM problem, I think the answer will be given by the C program that does exactly the same thing.
I think the important thing is to find the origin of the problems and solve them.
This will make GNUCOBOL the only alternative for the coming decades.
Kind regards
You provide test data and COBOL test program in a new bug issue, I reproduce and fix it. No need for some C program here...