1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #431 (closed defect: fixed)

Opened 17 months ago

Last modified 17 months ago

Read-only tx per query on cluster defeats cache

Reported by: thompsonbry Owned by: thompsonbry
Priority: major Milestone: Query
Component: Bigdata Federation Version: TERMS_REFACTOR_BRANCH
Keywords: Cc:

Description

As documented at [1], there is a major performance hit associated with the use of a read-only tx per query on the cluster. The problem is not the overhead of the interaction with the transaction manager. It is that the distinct transaction identifiers defeat the cache mechanisms in the data service, all of which are based on the long time stamp associated with the request. The problem is that the various tx ids assigned for queries all read from the same commit point, but they are never the less distinct values and hence the cache entries are not reused.

This issue exists to document the problem as it pertains to query and its resolution via the proposed workaround. See [1] for the long term fix for this and related cache problems. A workaround for this issue is also described at [1].

[1] https://sourceforge.net/apps/trac/bigdata/ticket/266 (Refactor native long tx id to thin object.)

Change History

Changed 17 months ago by thompsonbry

  • status changed from new to accepted

Per [1], modified the BigdataSail? to use a read-historical operation rather than a read-only transaction for getReadOnlyConnection(). The commit time against which the read is being carried out MUST be pinned by the application to prevent state associated with that commit point from being released. Typically this is coordinated by the NanoSparqlServer? or by the application using a higher level protocol.

Committed revision r5783.

[1] https://sourceforge.net/apps/trac/bigdata/ticket/266#comment:3 (Refactor native long tx id to thin object.)

Changed 17 months ago by thompsonbry

Bug fix to the change set above. It was testing the AbstractTripleStore? instance rather than the backing IIndexManager.

Committed revision r5801.

Changed 17 months ago by thompsonbry

Modified the NanoSparqlServer? to support -1 for the readLock option. This will pin the last commit time on the database. This is now the default for the nanoSparqlServer.sh script on a cluster. This change was made to improve the efficiency of query against a fixed commit time, which is the most common use case for a cluster.

Raised the READ_CONSISTENT option onto the IBigdataClient interface. The ClientIndexView? pays attention to this when a read-only operation is requested for a global index view which is either READ_COMMITTED or UNISOLATED.

Committed revision r5803.

Changed 17 months ago by thompsonbry

  • status changed from accepted to closed
  • resolution set to fixed

When -readLock is specified to the NanoSparqlServer?, the timestamp winds up being a read-only transaction identifier. Various fixes and optimizations related to READ_LOCK, including support for READ_LOCK := -1 (last commit time on the database).

Committed revision r5804.

Note: See TracTickets for help on using tickets.