> 1. It runs very stable for 1 week, at which time it stopped responding du=
> to a full disk. The journal log had grown to 370G. During the entire ru=
> the response times were stable.
I experienced a similar issue two weeks ago. The exploding journal log
is presumably caused by a bug concerning binary resource storage: for
unknown reasons, a binary resource stored in my database had length 0,
which caused a circular link in the page headers. Trying to save the
page information, the journal log run into an endless-loop, writing
the same pages over and over again.
I will try to resolve the circular-link problem within the next days.
However, I'm not sure where that 0-byte binary resource came from.
> 2. If disabling the restore element, the system seems to hang every two
> minutes but sometimes for durations up to 3 minutes at a time - the clien=
> will error out.
The non-transactional mode now needs more testing. The test-suite
shows various issues if you disable recovery altogether. We need to
fix these. So far, I mainly worked on getting recovery stable.
> I am assuming this is related to "sync-period" but would like to understa=
> it better and it's relation to cache size.
sync-period triggers a database sync, i.e. all dirty pages in the
cache are written to disk at once. A sync can only occur if the
database is idle. All transactions will be stopped during this time.
If you have many dirty pages and a large cache, a sync may thus take
several seconds. In general, a huge cache will speed up queries, but
is also results in longer sync times (hmm, we should think about
writing out more pages in between sync events ... maybe depending on
With recovery enabled, a sync event is the same as a database
checkpoint, which means the database is materialized and the journal
is cleaned up. A checkpoint will also be triggered if the journal file
grows beyond the defined limit. A small journal size setting (in your
case size=3D"10M") results in very frequent checkpoints under high load.
This explains why you don't see the long sync times with recovery
> - What would be the best way to tune this server to reduce or eliminate =
> 2-3 minute delay from clients?
With recovery disabled, you could try a lower setting for sync-period
to trigger syncs more frequently. The single sync will then be faster.
Reducing the cacheSize by half would be another option.
On the other hand, with recovery enabled, the journal size could be
slightly bigger. The default 10M are really small and may result in
high disk activity.
> - can we prevent the journal log from growing to such a huge size? (stop
> and archive, etc..)
I hope we can fix the binary resource bug soon and that the journal
log will remain within bounds afterwards.
> - If we shutdown the server safely, can the journal be removed?