Re: [Flightgear-devel] HLA developments

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Stuart,

> I've started looking at the HLA integration for FG, after some very

This is very good news. I've been wondering for a while about HLA.

> A lot of the challenge is likely to be in designing the data to be
> shared across the RTI, which I think will require a lot more thought
> than one might naively assume.

Sharing data across compute nodes is the classic problem of distributed 
simulation.

Firstly some history as this has fortunately already happened and been 
solved.

In the early 1980's the hardware that was used for the (large mini) 
computers that we were using for the simulation host had pretty much the 
reached the peak of possible performance with the technology. At that 
time the main vendors were Gould (SEL, Encore), Perkin-Elmer (Concurrent 
Computer Corporation) and DEC/VAX, . I'm not going to cover DEC/VAX as 
only CAE used them and although I believe that DEC hardware was the best 
I never actually worked on a sim that used them.

At Singer Link-Miles (SLM, Lancing, England) in around 1982 development 
of a multi-computer solution using individual boards to provide compute 
nodes, each with an Intel CPU (80286) and using ethernet 10BaseT to 
provide internconnect[1]. This was named MST or FDS. This was a famously 
late and not very successful project. The basic problems was transfer of 
data over ethernet fast enough to get to the latency requirements whilst 
maintaining a 30hz real time executive. This project had great potential 
because of the low cost of the 80286. I think this approach was used, 
and had worked well in their Image(tm) series of Image Generators. 
Although a few simulators were shipped with FDS, SLM went out of 
business shortly afterwards so we'll never know if they could have got 
it working.

At around the same time Rediffusion Simulation Limited (RSL, Crawley, 
England) were developing SCI-Clone which was built around the same 
distributed compute nodes principle, except they were using a SEL 32/27 
CPU/PPU and using their newly developed Reflective Memory technology to 
provide shared memory across these compute nodes.

Out of the two approaches the RSL approach was probably more expensive 
in terms of hardware cost[2], but the jewel of this system was the 
Reflective Memory.

Reflective Memory had a area of RAM[3] for local reading and used a 
hardware bus (running at 26Mhz) to simulatenously write modified values 
across all of the nodes connected to it. Simple, efficient and a genius 
design. The instant a value was changed on one node it would be written 
to all of the other nodes. A developer would not need to be aware that 
the local RAM was mapped to a reflected area. A typical simulator 
configuration would be 3 or 4 32/27 SCI-Clone nodes connected via RM.

Reflective Memory allowed what we called datapool to be shared between 
all of the nodes. Datapool was typically around 64k, but never much more 
than 128k[4]

So the main simulation computer was running nicely and is what we'd now 
call scaleable; if you ran out of spare time you added a node (and the 
project manager winced at the budgetary impact).

At the time (1984-1986) RSL were also developing their revolutionary 
touch screen based Instructor Station; but this was running on a M68k 
based Versabus system running under unixalike. This was connected to the 
main simulation computer via RS232 to access the shared memory. Although 
delivered late and full of bugs the Instructor Station did work, but 
RS232 datapool communications was not fast or reliable[5].

Development of the Reflective Memory continued with a VME version of the 
hardware. This is where it starts to get interesting; the VME card had 
2mb of onboard RAM and operated exactly as any VME memory card would. 
So, at last the Instructor Station could join in the shared memory 
party. Following on from this the simulation compute nodes moved over 
from SCI-Clone[7] to be able to run on VME CPU cards (167/197) - which 
achieved the goal that SLM set out to achieve with their FDS. Obviously 
the move to VME required significant software development.

The way that the RM was used on the instructor station differed between 
the two instructor stations that were in use at the time (it's around 
1989 now). The two approaches were

1 - Directly access datapool based on the address; the name to address 
translation was provided by a dictionary accesible both at runtime and 
build time. The was used on the second instructor station (runtime and 
buildtime) and the main simulation nodes (usually via a /COMMON/ block)

2 - Having a translation (data access layer) that mapped the datapool 
variables to a local index based on ID. All access through the layer 
used the LocalID - and this was handled by both the serial comms and the 
reflective memory versions, depending on which layer was loaded. I wrote 
the instructor station module for reflective memory comms, and it is 
still in use today on a fair number of certified simulators; the extra 
layer was optimised and the overhead wasn't noticable. The performance 
improvement of switching the instructor station to reflective memory was 
staggering compared to serial comms. Under serial comms the B747-400 
engines maintenance page would overload the serial link such that only 
around half the variables on display would be updated in a second.

In real time simulators all nodes must start each frame at exactly the 
same point by the wall clock and the simulator clock. Any node drift 
causes massive problems that aren't fixable. A node can start late but 
it must finish on time. In a certified simulator if any node runs over 
the end of the frame [8] the whole thing (including the motion based) 
performs an emergency stop and needs to be restarted. The image 
generator is an exception to this as it is usually only receiving 
position information[9] and tends to do its own thing in terms of frame 
rate indepedently to the host simulation nodes' frame rate. Image 
generators have been known to drop frames, and part of the certification 
process requires banking over the aircraft to fill the screens with 
terrain and check for any evidence of frame drop.

There is no locking in a real-time simulator; you can't do this as 
modules will overrun the frame and the simulation will halt.

To (finally) return to the point of all of this

> A lot of the challenge is likely to be in designing the data to be shared across the RTI

So the first thing in the HLA is to figure out how to synchronize the 
start time of each node. If you don't do this then sharing the data 
becomes much harder.

Latency is one of the main problems. It is not sufficient to merely 
ensure that nodes start off with the same data (i.e. a bulk update at 
the end of each frame) - as this introduces a minium of one frame 
latency and often more.

The second thing is to realise that the property tree as it is currently 
is in need of some rework because of the ownship (single desktop 
aircraft) approach. This is easier than it sounds - basically most of 
the property tree becomes part of the aircraft and only a few items are 
shared. This will also allow the switching of aircraft. The reason to 
consider this now, and maybe not implement it, is to ensure that the 
design will support this when it is time to implement it.

Now, at last, onto the sharing of data. I recommend that we follow the 
reflective memory approach of transparent write through node writes - 
but in software. There are two ways to implement this, for localhost we 
use a shared memory region (boost has good ones that I've used) - and 
then for distributed nodes a network interface, or for the very 
fortunate[10] the boost shared memory region can be mapped across 
multiple nodes using hardware

For this to work all writes really need to be reflected across all nodes 
at the exact point the write is made in any given node. The avoidance of 
locks requires strict coding discipline; usually a given module in a 
system is considered by the developers to own the write access to its 
output variables. There is also the demand/monitor convention, such that 
if you want to request the change of a property that your module doesn't 
own (e.g. the user changing altitude via Phi) it would be written to a 
different location in the property tree (e.g. 
/position/demanded-altitude) - and the module (usually flight) would 
pick this up and eventually set /position/altitude to the demanded 
altitude, and then reset position/demanded-altitude to value that 
indicates that /position/demanded-altitude can again be modified 
(-10e100 is a suitable value for this).

Having worked with the two methods I outlined above of shared data over 
reflective memory I would recommend adopting the second approach; having 
a layer in each node that handles the shared data and maps this to the 
property tree. It's slightly less efficient but it will allow network 
based reflective memory and listeners to work.

It is also important to seperate out anything that might take longer 
than a frame into a background process (in terms of the simulation).

Other points to consider with reflective memory systems are endianness 
and different formats for floating point.

Depsite appearances moving the rendering to a seperate node/process is 
probably the easiest to implement as the inter module dependencies 
aren't that strong, or at least those that are strong can be shipped as 
a block update at the end of the simulation frame.

We should use boost for shared memory. When I wrote SwiftIOS - which is 
basically an operating system (and CPU) emulation (at the unix system 
call level) I used boost to implement the shared memory and a fair few 
other things and it worked really well.

I'm volunteering to help design and code this.
--Richard
-----------------------------------------------------
[1] Initially the design was to transfer of 'large' blocks (32k) of 
shared data over ethernet at the start or end of each frame. This should 
theoretically have worked but took too long to and became slower the 
more nodes were added. A different method of sharing data may have been 
used on the (few) simulators that were shipped with FDS.
[2] ISTR The 32/27 was tens of thousands of dollars - but it worked 
about about the same as a single larger computer.
[3] The designers of the RM interface made allowance for up to 2Mbyte of 
shared memory; given that the amount required to be shared was 32k for a 
737 sized aircraft, upto around 64k for a 747 this was a suitably over 
engineered system, remember at the same time someone else was convinced 
that 640k would be sufficient. The RM designers went for something way 
over specified.
[4] Aircraft with more engines and systems, such as the B52, were rather 
large but I don't know the exact numbers.
[5] Serial Comms was weak in terms of performance and reliability; 
although some of this was down to poor coding and understanding of the 
way to do I/O on the unixalike - and I fixed most of these issues 20 
years later.
[6] or MultiSel, or whatever oversized room filling kit was in use.
[7] The Swiss Hark simulator was MultiSel and had 4 nodes, all of which 
was rehosted onto a single M88K baesd system and ran at twice the rate 
(60hz) with ample spare time.
[8] On a 30hz simulator we had 33ms, but the spare time requirements 
reduced this to 27ms.
[9] Starting with SPX Image generators used a dedicated 10bastT ethernet 
running at the protocol level.
]10] I'm fortunate as I've system (on loan) with a set of reflective 
memory cards.

Re: [Flightgear-devel] HLA developments

FlightGear Flight Simulator: free open-source multiplatform flight sim

Re: [Flightgear-devel] HLA developments