The gtcm_server process dies when attempting to access
any GTM database that has before image journaling
enabled. It also leaves behind corrupted shared
memory, so that the database cannot be rundown until
the system has been rebooted. Normal journaling does
not cause any problems.
Running gtcm_server with the "-D -" option produced a
large number of error messages. Here is a sample:
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-I-FILERENAME, File /usr/local/vmacs/email.mjl is
renamed to /usr/local/vmacs/email.mjl_2002071121207_4
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-I-FILERENAME, File /usr/local/vmacs/email.mjl is
renamed to /usr/local/vmacs/email.mjl_2002071121207_5
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
%GTM-E-JNLTRANSGTR, transaction number in journal is
greater than in database
%GTM-E-JNLOPNERR, Error opening journal file
/usr/local/vmacs/email.mjl
for region /usr/local/vmacs/email.dat
We (the VMTH, UC Davis) are in the process of replacing
our DTM server with GTM and need to be able to both
support existing DTM clients using OMI and replicate
our production database on a backup server. Thanks.
Logged In: YES
user_id=210966
Hi,
We couldn't reproduce the reported problem in our
in-house testing. The error JNLTRANSGTR means the last
record in journal file has greater transaction number than
database. In such instances, GTM automatically creates a new
journal file and all the updates go to this new journal
file.
Please let us know the steps used to recreate the
reported problem and also the GTM version used to test with.
Thanks,
Prasad.
Logged In: YES
user_id=97877
Ed, could you clarify whether you are doing database
replication via your own OMI application or you are using
GT.M's database replication reature (and hence before image
journaling) with an OMI server? Thanks..
Steve
Logged In: YES
user_id=89296
Steve,
The answer to your question is both. We have a production DTM server that we are "shadowing" in near realtime
(using a DTM utililty) to a GTM database via OMI. This has been working very well. We are getting close to the
point where we want to swap out our DTM server with GTM, but leave the DTM client systems in place. We will
continue to use them for some time to come. Much of the code running on these systems will be rewritten for
GTM, but this will allow us to migrate at a managable pace. Before going live with GTM however, we also want to
replicate the GTM server to a failover system, hence the need for BI journaling.
I think the initial error messages I reported are actually an artifact from the original error. I compiled and ran the
debug version of gtcm_server and here is what happened when I re-enabled before image journaling:
Thu Mar 14 12:28:12 2002
gtcm_server: socket registered at port 6100
gtcm_server: connection 1 from lemon.vmth.ucdavis.edu (5.37.237.169) by user <> at Thu Mar 14 12:30:07 2002
Thu Mar 14 12:30:15 2002
gtcm_server: connection 1 to lemon.vmth.ucdavis.edu (5.37.237.169) closed
gtcm_server: 8 seconds connect time
gtcm_server: 2 transactions
gtcm_server: 0 errors
gtcm_server: 197 bytes recv'd
gtcm_server: 95 bytes sent
gtcm_server: connection 2 from lemon.vmth.ucdavis.edu (5.37.237.169) by user <LEMON> at Thu Mar 14 12:30:37
2002
%GTM-F-ASSERT, Assert failed /usr/local/gtm/sr_port/jnl_output.c line 145
%GTM-F-ASSERT, Assert failed line 0
Line 145 appears to be:
assert(n_with_align < jb->alignsize);
I notice that there is an undocumented "mupip set -journal" suboption "alignsize", which I didn't use. Would this be
related to my problem? Is it related to the unusual block size I have set up (16K)?
Originally, we were shadowing our DTM system to a single GTM database. This had grown pretty large (12+ GB)
and so I decided to break it up. I extracted all the data to a flat file using mupip. I then created a new global
directory with 4 regions and loaded everything back in. This worked fine and I was able to continue shadowing our
DTM system to the new GTM database files. The problem only began when I enabled BI journaling. Normal
journaling works fine. Furthermore, I am not able to reproduce the problem with BI journaling enabled by making
simple global sets (i.e. a for loop making sets to a single global). However, the DTM shadow utility, which
replicates the global sets from our DTM server, causes gtcm_server to die every time I attempt it.
I am using GTM 4.3 on Red Hat 7.2. I also tried this on Red Hat 7.1, and different hardware, with identical results.
Logged In: YES
user_id=97877
Ed,
While we are not aware of any specific problems with the
GT.M OMI server and journaling, it is also not a high use
component (I can think of one other customer that may be
using it) and thus has a very low level of exercise,
testing, and compatibilty checks. Being as resource
constrained as we are right now, we would need a failing
test case (not involving DTM which we do not have) in order
to persue this much further on our end.
But you also may be on the right track with the blocksize
being 16K. This is not a size any customers I know of are
currently using so again, it doesn't get as much exercise
as do the more typcial blocksizes of 4K and 8K. There are
issues with journaling when the block size exceeds 16K but
it could be there is a cusp problem with the 16K size
itself.
We are also still not clear whether you are just using BI
journaling so you can manually restore the journals on
another database or whether you have full blown GT.M
replication source and update servers going doing the
updates in realtime. If that is the case, then the general
concensus here is that there are likely issues with the OMI
server participating with replication as it is not a
configuration we have tested. The challenge would likely be
getting the replication instance file set up correctly.
One more thought, if you drop from a 16K to an 8K
blocksize, the size of your journal file will be cut in
half for the same number of updates since the BI journal is
comprised of full blocks. Thats not only a lot less
diskspace but half the IO time -- to the database,
journals, and over the wires to your replicated partner. I
think the standard RH 7.2 systems supports large files so
the journal will get to 4G and switch to a new journal file
on the fly. Hope this helps.
Steve
Logged In: YES
user_id=89296
Steve,
My problems with before image journaling appear to be resolved. After reading your recommendations to use
smaller block sizes, I found that I was in fact able to squeeze our data into 4K blocks (barely!), using a record size
of 3924 and key size of 172. After extracting all our global data, recreating the database files and loading the
globals back in, I was able to turn on BI journaling and run GTCM_SERVER without any problems. Thanks for all
your help.
Ed