#27 gtcm (omi) fails with BI journaling

open
nobody
None
5
2012-12-29
2002-03-13
Edwin Clubb
No

The gtcm_server process dies when attempting to access
any GTM database that has before image journaling
enabled. It also leaves behind corrupted shared
memory, so that the database cannot be rundown until
the system has been rebooted. Normal journaling does
not cause any problems.

Running gtcm_server with the "-D -" option produced a
large number of error messages. Here is a sample:

%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-I-FILERENAME, File /usr/local/vmacs/email.mjl is
renamed to /usr/local/vmacs/email.mjl_2002071121207_4
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-I-FILERENAME, File /usr/local/vmacs/email.mjl is
renamed to /usr/local/vmacs/email.mjl_2002071121207_5
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat

We (the VMTH, UC Davis) are in the process of replacing
our DTM server with GTM and need to be able to both
support existing DTM clients using OMI and replicate
our production database on a backup server. Thanks.

Discussion

  • parigi prasad
    parigi prasad
    2002-03-13

    Logged In: YES
    user_id=210966

    Hi,

    We couldn't reproduce the reported problem in our
    in-house testing. The error JNLTRANSGTR means the last
    record in journal file has greater transaction number than
    database. In such instances, GTM automatically creates a new
    journal file and all the updates go to this new journal
    file.

    Please let us know the steps used to recreate the
    reported problem and also the GTM version used to test with.

    Thanks,
    Prasad.

     
  • Steven Estes
    Steven Estes
    2002-03-13

    Logged In: YES
    user_id=97877

    Ed, could you clarify whether you are doing database
    replication via your own OMI application or you are using
    GT.M's database replication reature (and hence before image
    journaling) with an OMI server? Thanks..

    Steve

     
  • Edwin Clubb
    Edwin Clubb
    2002-03-16

    Logged In: YES
    user_id=89296

    Steve,

    The answer to your question is both. We have a production DTM server that we are "shadowing" in near realtime
    (using a DTM utililty) to a GTM database via OMI. This has been working very well. We are getting close to the
    point where we want to swap out our DTM server with GTM, but leave the DTM client systems in place. We will
    continue to use them for some time to come. Much of the code running on these systems will be rewritten for
    GTM, but this will allow us to migrate at a managable pace. Before going live with GTM however, we also want to
    replicate the GTM server to a failover system, hence the need for BI journaling.

    I think the initial error messages I reported are actually an artifact from the original error. I compiled and ran the
    debug version of gtcm_server and here is what happened when I re-enabled before image journaling:

    Thu Mar 14 12:28:12 2002
    gtcm_server: socket registered at port 6100
    gtcm_server: connection 1 from lemon.vmth.ucdavis.edu (5.37.237.169) by user <> at Thu Mar 14 12:30:07 2002
    Thu Mar 14 12:30:15 2002
    gtcm_server: connection 1 to lemon.vmth.ucdavis.edu (5.37.237.169) closed
    gtcm_server: 8 seconds connect time
    gtcm_server: 2 transactions
    gtcm_server: 0 errors
    gtcm_server: 197 bytes recv'd
    gtcm_server: 95 bytes sent
    gtcm_server: connection 2 from lemon.vmth.ucdavis.edu (5.37.237.169) by user <LEMON> at Thu Mar 14 12:30:37
    2002
    %GTM-F-ASSERT, Assert failed /usr/local/gtm/sr_port/jnl_output.c line 145
    %GTM-F-ASSERT, Assert failed line 0

    Line 145 appears to be:
    assert(n_with_align < jb->alignsize);

    I notice that there is an undocumented "mupip set -journal" suboption "alignsize", which I didn't use. Would this be
    related to my problem? Is it related to the unusual block size I have set up (16K)?

    Originally, we were shadowing our DTM system to a single GTM database. This had grown pretty large (12+ GB)
    and so I decided to break it up. I extracted all the data to a flat file using mupip. I then created a new global
    directory with 4 regions and loaded everything back in. This worked fine and I was able to continue shadowing our
    DTM system to the new GTM database files. The problem only began when I enabled BI journaling. Normal
    journaling works fine. Furthermore, I am not able to reproduce the problem with BI journaling enabled by making
    simple global sets (i.e. a for loop making sets to a single global). However, the DTM shadow utility, which
    replicates the global sets from our DTM server, causes gtcm_server to die every time I attempt it.

    I am using GTM 4.3 on Red Hat 7.2. I also tried this on Red Hat 7.1, and different hardware, with identical results.

     
  • Steven Estes
    Steven Estes
    2002-03-19

    Logged In: YES
    user_id=97877

    Ed,

    While we are not aware of any specific problems with the
    GT.M OMI server and journaling, it is also not a high use
    component (I can think of one other customer that may be
    using it) and thus has a very low level of exercise,
    testing, and compatibilty checks. Being as resource
    constrained as we are right now, we would need a failing
    test case (not involving DTM which we do not have) in order
    to persue this much further on our end.

    But you also may be on the right track with the blocksize
    being 16K. This is not a size any customers I know of are
    currently using so again, it doesn't get as much exercise
    as do the more typcial blocksizes of 4K and 8K. There are
    issues with journaling when the block size exceeds 16K but
    it could be there is a cusp problem with the 16K size
    itself.

    We are also still not clear whether you are just using BI
    journaling so you can manually restore the journals on
    another database or whether you have full blown GT.M
    replication source and update servers going doing the
    updates in realtime. If that is the case, then the general
    concensus here is that there are likely issues with the OMI
    server participating with replication as it is not a
    configuration we have tested. The challenge would likely be
    getting the replication instance file set up correctly.

    One more thought, if you drop from a 16K to an 8K
    blocksize, the size of your journal file will be cut in
    half for the same number of updates since the BI journal is
    comprised of full blocks. Thats not only a lot less
    diskspace but half the IO time -- to the database,
    journals, and over the wires to your replicated partner. I
    think the standard RH 7.2 systems supports large files so
    the journal will get to 4G and switch to a new journal file
    on the fly. Hope this helps.

    Steve

     
  • Edwin Clubb
    Edwin Clubb
    2002-03-20

    Logged In: YES
    user_id=89296

    Steve,

    My problems with before image journaling appear to be resolved. After reading your recommendations to use
    smaller block sizes, I found that I was in fact able to squeeze our data into 4K blocks (barely!), using a record size
    of 3924 and key size of 172. After extracting all our global data, recreating the database files and loading the
    globals back in, I was able to turn on BI journaling and run GTCM_SERVER without any problems. Thanks for all
    your help.

    Ed