Menu

Performance

Shane Saxon

[Home] - [Architecture] - [Development_Strategy] - hosted at openANTz.com

ANTz - minimize latency, while maximizing bandwidth and processing power.

also see [Performance_Tips] on how to choose hardware and optimize content.

updated on 2012-07-17


Performance

<33ms latency for most user operations at full frame rate (60FPS.)

User commands typically effect the displayed scene state within 1-2 cycles. However, some functions can take seconds or minutes, (such as loading a large dataset.) Internal calculations are mostly within 1 cycle, live external IO is 1-2 cycles, (keyboard, mouse, audio, etc....) Video latency is typically 3-7 frames or more depending on compression codec. Low-latency video of 2-3 frames is possible with either nVidia Quadro SDI Capture or AMD FirePro SDI-Link.

ANTz minimizes latency by running nearly all operations in a pseudo-state-machine fashion. Most operations are completed within the cycle and multi-cycle logic is avoided at all costs. However, occasional system hiccups (on MS Windows & OSX) consume about 50-100ms at a random interval. This can be minimized through careful system management or simply use Linux. For details see [State_Machine].


Hardware Benchmarks

....
Laptop - (MacBook Pro - Early 2011 - 2.2Ghz i7, 8GB, Radeon HD 6750M 1GB)

25,000 nodes rendered in realtime (15-30fps) as 3D objects (nodes.)

.....
Desktop - (ASUS Rampage IV, 3.6GHz i7 3820, 16GB 4X4GB, GeForce GTX 680)

Workstation - (ASUS Z9PE-D8 WS, Dual 3.3GHz Xeon E5-2643, 32GB 8X4GB, GeForce GTX 680)

100,000 nodes rendered in realtime (15-30fps) as 3D objects.

  • approx. 30 visual/spatial parameters per node
  • 100,000 x 30 = 3 million visualized parameters

Desktop and Workstation performance are similar. Future upgrades to the code will likely change this by utilizing more cores and enable improved IO.


Things to Consider

Stereoscopic 3D with OpenGL requires nVidia Quadro or ATI FirePro card.

Important to have a good power supply and lots of cooling.

GPU is the primary performance factor. ANTz currently does NOT support SLI or CrossFireX multi-GPU architecture. So currently, the fastest single chip GPU solution is best.

CPU clock rate is a secondary factor, the more GHz the better.

RAM should be fast enough to match the CPU(s). To run at full speed, multi-core CPU's require multiple sticks of RAM, based on the number of cores.

ANTz has been tested to work on netbooks and older systems running Windows XP, Linux and Mac OSX 10.5.8 on Intel and G4 PowerPC systems.

Systems w/o a separate GPU perform poorly, aka: integrated / on-board video chip.


Realtime IO Channels

Approximate performance estimate based on related tests using Fibre Channel to the Violin Memory 1010 FLASH cache.

4Gbps of IO - node parameters to/from track data.

Maximum size of the external dataset is a factor of the network infrastructure.
- multi-TeraByte datasets are accessible in realtime when using FLASH cache.
- 100,000 *IOPS per ANTz system requires a minimum of 4TB SSA storage, (Solid State Array based on FLASH.)

*IOPS = IO operations per second (4kB records.)

**In general the above numbers are conservative estimates based on real-world testing of various core methods that ANTz builds upon... using state-of-the-art hardware such as Enterprise level FLASH cache by TMS and VM. The Fore-mentioned manufactures claim spec's 2-5X more, which are 'not-to-exceed' specs.


Near Future:

Projected future performance based on code optimizations to improve the efficiency of OpenGL draw routines and take advantage of multi-GPU configurations.

Estimate based on a 4-way SLI config using a pair of GeForce GTX 690 cards.

500,000 records displayed as 3D objects (pin nodes)
- 30 distinct parameters displayed per record object
- 500,000 x 30 = 15 million visualized parameters

10Gb/s realtime data
- core technology tests done using the Texas Memory RAMSAN 610 and Violin Memory 1010.

Cluster Visual Environments:
We plan to implement the Equalizer C++ library to help with synchronization of cluster based visual environments, such as VR Caves. ANTz is currently based on GLUT which allows for porting to systems such as the Star Cave at Calit2.

Genlock is a crucial component to cluster based tiled displays that is often overlooked, (or ignored due to cost / performance of commodity game cards.) When operating in a high-speed environment, it is often deemed 'necessary' to synchronize the vertical refresh across all systems. Available solutions include the nVidia Quadro G-Sync card and AMD ATI FirePro S400 Synchronization Module.

To help with physics interactions and data processing we plan to use the OpenCL library by the Khronos Group. It allows writing C code to run across multicore GPU, Cell and CPU processors.


Ultimate Goal:

The ultimate goal is to close the human-computer cognitive loop.

  • Latency should be kept below the threshold at which humans perceive interactions in real-time.
  • IO bandwidth on par with the spatial and temporal resolution of the nervous systems external senses.
  • Processing power must be sufficient to create the illusion of a real-world physical environment.

It is a reasonable expectation that capable hardware will exist within the not too distant future, (if Moore's Law holds up.) Do not confuse this requirement with that of a full Human Being level AI, this is not required. We have plenty of people to form the 'Intelligence' component of the mixed neural network. We simply need to link human beings in a fully immersive mixed reality.


Related

Wiki: Architecture
Wiki: Development_Strategy
Wiki: Home
Wiki: Performance_Tips
Wiki: State_Machine