Ticket #475 (closed enhancement: fixed)
Optimize serialization for query messages on cluster
| Reported by: | thompsonbry | Owned by: | thompsonbry |
|---|---|---|---|
| Priority: | major | Milestone: | Query |
| Component: | Bigdata Federation | Version: | BIGDATA_RELEASE_1_1_0 |
| Keywords: | Cc: |
Description (last modified by thompsonbry) (diff)
Quite a bit of the query overhead on the cluster is RMI serialization costs for query. Optimize the serialization of the StartOpMessage?, HaltOpMessage?, and IChunkMessage. For the messages which are not already pure interfaces (startOp(), haltOp()), turn the method signatures into interface only signatures so we can forward version the API.
There is also significant overhead associated with IQueryPeer#getServiceUUID(). That overhead comes from calling getServiceUUID() on the proxy object, which is turned in an RMI. This could be fixed by (a) using a smart proxy pattern; (b) sending only the UUID of the service rather than its proxy and resolving the proxy against the local cache of known services; or (c) casting to the appropriate interface so we can obtain the ServiceID from the proxy and then converting that directly into the UUID of the service.
I have broken the IChunkMessage#getQueryController() aspect out into its own issue [1]. A related issue is vectoring the messages per host in order to reduce the #of messages which are being sent [2].
[1] https://sourceforge.net/apps/trac/bigdata/ticket/487 (The query controller should be discoverable)
[2] https://sourceforge.net/apps/trac/bigdata/ticket/488 (Vector query engine messages)