Thank you for this report. FYI, with 2.0, we also put an index on doFields.pid which made ingest performance scale much more linearly.
2.0 also includes a re-work of the internal implementation of ingest, which significantly cuts down the base ingest time. If you really want to squeeze out the best ingest performance possible with 2.0, you can disable the Resource Index (the new searchable RDF triplestore we added) which isn't essential to the core operation of the server.
Looking at the current DB schema I don't see an index on doRegistry.doPID. It sounds like your analysis concluded that an index would help there, too. We'll look into doing that for 2.1.
From: email@example.com on behalf of Tim DiLauro
Sent: Thu 3/10/2005 9:34 AM
To: Fedora Users list
Subject: [Fedora-users] Fedora 1.2.1 and MySQL has poor ingest performancewhen many existing objects
We've been testing Fedora 1.2.1 (and, separately, DSpace) as part of a
project to explore archiving, format migration, and export issues.
Because we ingested our test archive multiple times without removing
previous content, we had an opportunity to see, qualitatively, how a
Fedora 1.2.1 repository performed during ingest when it contained a
lot of content.
As the the amount of content increased, we began to experience very
bad ingest performance. At the same time, the MySQL process was
consuming more CPU. After monitoring the queries, it became clear
that there were only a few queries that were causing the problems.
They were causing table scans on doRegistry and doFields.
I was able to solve the problem adding two new indexes with the
create index doPID on doRegistry (doPID);
create index PID on doFields (pid);
The performance improvement was dramatic and the ingest times scale
more linearly with the size of the ingest (as opposed to the amount of
content already in the repository).
I don't know if these issues are applicable to 2.0, since we have not
tested it yet.
Fedora-users mailing list