#1050 Link Plugin v1.2.0 concurrency issues

unspecified
unread
nobody
1.2.4
5
2014-08-26
2013-10-22
Zack Sawyer
No

While making the LinkAPI I stumbled upon a possible concurrency issue. After reviewing the code of the link-plugin to see how this is handled there I had to see that it is not being handled at all.

Since the shared memory is freely available for read and write (as far as the OS permits) it is possible that - due to the frequent (per frame) updates - data might have changed during the consecutive reading of members in the struct.

I am imagining a call chain of lm->xxx where the first call will get a result for dwcount 1 and the following lm->yyy will get the result for dwcount 2 - since there was an update in between.

While this may actually be insignificant and possibly negligible it constitutes unexpected behavior. As one would expect all data collected "at the same time" (i.e. the function fetch ) to be related to the same frame/dwcount, which is currently not guaranteed.

The way to fix this imho is to use a semaphore. It might be best to use what ever the OS provides regarding semaphores or use dwcount to match data. So simply if there is a dwcount mismatch from before the first and after the last read then the whole data-group is invalid.

Discussion

  • Stefan H.
    Stefan H.
    2013-10-25

    For reference a C&P from https://github.com/zsawyer/mumble-LinkAPI/issues/1#issuecomment-26849323:

    "The possibility of races is a known issue for the link protocol. While every link consuming application must be prepared to handle invalid data the effect of the races are rather harmless. The most likely fields to race are the float vectors and if those aren't from the same frame this (usually) won't cause any audible errors. The other fields don't change so much and if you actually happen to race on them you'll have garbage data until the next read (which likely isn't noticeable either). We though about locking schemes but that's not that straight-forward cross-platform."

    As to how this might be resolved in the future: I'm not sure which way is best. It's tricky. Reliable behavior based on fields in the structure aren't that easy to manage as you would need memory barriers to ensure your assumptions about when a field is updated are actually correct. System specific synchronization primitives would have to be used in a completely non-blocking way (can't block the game, can't block fetching) which should certainly be possible.

    What's most important is to keep it simple. If users can't understand what we do on the first glance they won't bother taking the risk to integrate this code into their game. The current code gives us that so we are reluctant to touch it for what - from experience - is still a theoretical issue (not the races - those will happen - but visible negative effects on UX).

     
    Last edit: Stefan H. 2013-10-25
  • Zack Sawyer
    Zack Sawyer
    2013-10-26

    Thanks for your reply.

    Also from that issue thread.

    The API could probably facilitate the cross-platform capabilities where possible (pthread, pthread w32 comes to mind) but also support legacy non-synched structures. This way the API user doesn't have to worry about race conditions but simply uses the API.

    While in most games the race-conditions might not have an audible effect I imagine that this is going to be a problem with games that utilize high velocity (flight or space simulators) where movement means a big change in those position vectors. I also remember already having weird audio position effects when in a vehicle in PR:BF2 - that might be related.

    What adds to my fear of race conditions is that I imagine that they happen always when the write cycle is faster than the read cycle. I am not sure how likely that is but definitely depends on the implementation of the Link-Plugin using code."

    It is true that the concept of synchronization barriers is trickier than simply reading/writing whenever. However I imagine that the clientele we are talking about (game developers) would have knowledge of the parallel processing domain. And even if not the proposed legacy support would allow them to ignore the above.

    Aside from that the memory structure could probably use an overhaul anyways. The huge context and the tiny identity do not seem to fit the current use-cases anymore.
    For the context: it mustn't hold more than the game and server info. Everything else is managed by identity's contents. It seems to me that the case where extended support is used the first thing that happens is that the player is separated/moved into a different channel, assigned talk rights and all those complex things based on a very size-wise limited identity.