|
From: Richard H. <rj...@za...> - 2015-11-19 12:25:33
|
Hi Stuart, > I've started looking at the HLA integration for FG, after some very This is very good news. I've been wondering for a while about HLA. > A lot of the challenge is likely to be in designing the data to be > shared across the RTI, which I think will require a lot more thought > than one might naively assume. Sharing data across compute nodes is the classic problem of distributed simulation. Firstly some history as this has fortunately already happened and been solved. In the early 1980's the hardware that was used for the (large mini) computers that we were using for the simulation host had pretty much the reached the peak of possible performance with the technology. At that time the main vendors were Gould (SEL, Encore), Perkin-Elmer (Concurrent Computer Corporation) and DEC/VAX, . I'm not going to cover DEC/VAX as only CAE used them and although I believe that DEC hardware was the best I never actually worked on a sim that used them. At Singer Link-Miles (SLM, Lancing, England) in around 1982 development of a multi-computer solution using individual boards to provide compute nodes, each with an Intel CPU (80286) and using ethernet 10BaseT to provide internconnect[1]. This was named MST or FDS. This was a famously late and not very successful project. The basic problems was transfer of data over ethernet fast enough to get to the latency requirements whilst maintaining a 30hz real time executive. This project had great potential because of the low cost of the 80286. I think this approach was used, and had worked well in their Image(tm) series of Image Generators. Although a few simulators were shipped with FDS, SLM went out of business shortly afterwards so we'll never know if they could have got it working. At around the same time Rediffusion Simulation Limited (RSL, Crawley, England) were developing SCI-Clone which was built around the same distributed compute nodes principle, except they were using a SEL 32/27 CPU/PPU and using their newly developed Reflective Memory technology to provide shared memory across these compute nodes. Out of the two approaches the RSL approach was probably more expensive in terms of hardware cost[2], but the jewel of this system was the Reflective Memory. Reflective Memory had a area of RAM[3] for local reading and used a hardware bus (running at 26Mhz) to simulatenously write modified values across all of the nodes connected to it. Simple, efficient and a genius design. The instant a value was changed on one node it would be written to all of the other nodes. A developer would not need to be aware that the local RAM was mapped to a reflected area. A typical simulator configuration would be 3 or 4 32/27 SCI-Clone nodes connected via RM. Reflective Memory allowed what we called datapool to be shared between all of the nodes. Datapool was typically around 64k, but never much more than 128k[4] So the main simulation computer was running nicely and is what we'd now call scaleable; if you ran out of spare time you added a node (and the project manager winced at the budgetary impact). At the time (1984-1986) RSL were also developing their revolutionary touch screen based Instructor Station; but this was running on a M68k based Versabus system running under unixalike. This was connected to the main simulation computer via RS232 to access the shared memory. Although delivered late and full of bugs the Instructor Station did work, but RS232 datapool communications was not fast or reliable[5]. Development of the Reflective Memory continued with a VME version of the hardware. This is where it starts to get interesting; the VME card had 2mb of onboard RAM and operated exactly as any VME memory card would. So, at last the Instructor Station could join in the shared memory party. Following on from this the simulation compute nodes moved over from SCI-Clone[7] to be able to run on VME CPU cards (167/197) - which achieved the goal that SLM set out to achieve with their FDS. Obviously the move to VME required significant software development. The way that the RM was used on the instructor station differed between the two instructor stations that were in use at the time (it's around 1989 now). The two approaches were 1 - Directly access datapool based on the address; the name to address translation was provided by a dictionary accesible both at runtime and build time. The was used on the second instructor station (runtime and buildtime) and the main simulation nodes (usually via a /COMMON/ block) 2 - Having a translation (data access layer) that mapped the datapool variables to a local index based on ID. All access through the layer used the LocalID - and this was handled by both the serial comms and the reflective memory versions, depending on which layer was loaded. I wrote the instructor station module for reflective memory comms, and it is still in use today on a fair number of certified simulators; the extra layer was optimised and the overhead wasn't noticable. The performance improvement of switching the instructor station to reflective memory was staggering compared to serial comms. Under serial comms the B747-400 engines maintenance page would overload the serial link such that only around half the variables on display would be updated in a second. In real time simulators all nodes must start each frame at exactly the same point by the wall clock and the simulator clock. Any node drift causes massive problems that aren't fixable. A node can start late but it must finish on time. In a certified simulator if any node runs over the end of the frame [8] the whole thing (including the motion based) performs an emergency stop and needs to be restarted. The image generator is an exception to this as it is usually only receiving position information[9] and tends to do its own thing in terms of frame rate indepedently to the host simulation nodes' frame rate. Image generators have been known to drop frames, and part of the certification process requires banking over the aircraft to fill the screens with terrain and check for any evidence of frame drop. There is no locking in a real-time simulator; you can't do this as modules will overrun the frame and the simulation will halt. To (finally) return to the point of all of this > A lot of the challenge is likely to be in designing the data to be shared across the RTI So the first thing in the HLA is to figure out how to synchronize the start time of each node. If you don't do this then sharing the data becomes much harder. Latency is one of the main problems. It is not sufficient to merely ensure that nodes start off with the same data (i.e. a bulk update at the end of each frame) - as this introduces a minium of one frame latency and often more. The second thing is to realise that the property tree as it is currently is in need of some rework because of the ownship (single desktop aircraft) approach. This is easier than it sounds - basically most of the property tree becomes part of the aircraft and only a few items are shared. This will also allow the switching of aircraft. The reason to consider this now, and maybe not implement it, is to ensure that the design will support this when it is time to implement it. Now, at last, onto the sharing of data. I recommend that we follow the reflective memory approach of transparent write through node writes - but in software. There are two ways to implement this, for localhost we use a shared memory region (boost has good ones that I've used) - and then for distributed nodes a network interface, or for the very fortunate[10] the boost shared memory region can be mapped across multiple nodes using hardware For this to work all writes really need to be reflected across all nodes at the exact point the write is made in any given node. The avoidance of locks requires strict coding discipline; usually a given module in a system is considered by the developers to own the write access to its output variables. There is also the demand/monitor convention, such that if you want to request the change of a property that your module doesn't own (e.g. the user changing altitude via Phi) it would be written to a different location in the property tree (e.g. /position/demanded-altitude) - and the module (usually flight) would pick this up and eventually set /position/altitude to the demanded altitude, and then reset position/demanded-altitude to value that indicates that /position/demanded-altitude can again be modified (-10e100 is a suitable value for this). Having worked with the two methods I outlined above of shared data over reflective memory I would recommend adopting the second approach; having a layer in each node that handles the shared data and maps this to the property tree. It's slightly less efficient but it will allow network based reflective memory and listeners to work. It is also important to seperate out anything that might take longer than a frame into a background process (in terms of the simulation). Other points to consider with reflective memory systems are endianness and different formats for floating point. Depsite appearances moving the rendering to a seperate node/process is probably the easiest to implement as the inter module dependencies aren't that strong, or at least those that are strong can be shipped as a block update at the end of the simulation frame. We should use boost for shared memory. When I wrote SwiftIOS - which is basically an operating system (and CPU) emulation (at the unix system call level) I used boost to implement the shared memory and a fair few other things and it worked really well. I'm volunteering to help design and code this. --Richard ----------------------------------------------------- [1] Initially the design was to transfer of 'large' blocks (32k) of shared data over ethernet at the start or end of each frame. This should theoretically have worked but took too long to and became slower the more nodes were added. A different method of sharing data may have been used on the (few) simulators that were shipped with FDS. [2] ISTR The 32/27 was tens of thousands of dollars - but it worked about about the same as a single larger computer. [3] The designers of the RM interface made allowance for up to 2Mbyte of shared memory; given that the amount required to be shared was 32k for a 737 sized aircraft, upto around 64k for a 747 this was a suitably over engineered system, remember at the same time someone else was convinced that 640k would be sufficient. The RM designers went for something way over specified. [4] Aircraft with more engines and systems, such as the B52, were rather large but I don't know the exact numbers. [5] Serial Comms was weak in terms of performance and reliability; although some of this was down to poor coding and understanding of the way to do I/O on the unixalike - and I fixed most of these issues 20 years later. [6] or MultiSel, or whatever oversized room filling kit was in use. [7] The Swiss Hark simulator was MultiSel and had 4 nodes, all of which was rehosted onto a single M88K baesd system and ran at twice the rate (60hz) with ample spare time. [8] On a 30hz simulator we had 33ms, but the spare time requirements reduced this to 27ms. [9] Starting with SPX Image generators used a dedicated 10bastT ethernet running at the protocol level. ]10] I'm fortunate as I've system (on loan) with a set of reflective memory cards. |