Menu

#19 crash when sending multiple commands in rapid succession

1.0
open
nobody
None
2020-07-15
2020-07-15
Lee Harr
No

Originally reported on Google Code with ID 44 ``` What steps will reproduce the problem? telnet to server paste 2 lines: login wiznname password @set name on void to newname

What is the expected output? What do you see instead? normal login, plus message about name of room being changed.

Instead get: I am having trouble with that command. Try "@help set" Debug 'NoneType' object has no attribute 'name' You have been teleported. void You look at the void.

And in the log: player wiznname logged in Traceback (most recent call last): File ".../src/tzprotocol.py", line 412, in dispatch func(self, rest) File ".../src/wizard.py", line 112, in cmd_set obj = find(r, s.room, s.player, s.room) File ".../src/share.py", line 1120, in find if objname == room.name: AttributeError: 'NoneType' object has no attribute 'name'

```

Reported by missive@hotmail.com on 2010-12-30 16:03:00
Comments (13)

Former user Account Deleted

``` r446 prevents the specific race condition mentioned here.

When a new line comes in before the login process has completed, the new line is held for a bit, then sent back through again.

I'm not going to close this issue yet, as I suspect other race conditions may be lurking. I want to see if I can find an example.

I'm not sure it will be possible to trigger the problem by simply pasting long strings of commands. I think this fix will stop that from being a problem. However, I think that with certain delays between commands there may be problems.

I may need to create a test client that can send strings of commands at specified intervals to trigger the problem.

(A test client that can send scripts of commands and check for expected output would be good anyhow, so maybe testing for race conditions could be part of a more general test setup).

```

Reported by `missive@hotmail.com` on 2010-12-31 05:39:06 - Status changed: `Started`
    2010-12-31
Former user Account Deleted

``` Thanks, the revision seems to stop all the issues I was having.

Here's an example where you see something that should be atomic but isn't: even if everything works quite right, the output doesn't appear in order...

------ paste this, quickly: ------ @dig a to templo .a @destroy door ..................................

Expected output: ------ You dig a to templo. templo You speak a word of power and ... ------

What we get: ------ You dig a to templo. templo You speak a word of power and ... A new exit door appears. ------

The message "A new exit door appears" is being sent to those who are in room "templo" which is correct, but it should appear when the @dig is done. Since I've already used that path to go to templo (.a), when I arrive there the door should be there already, so I shouldn't arrive in that room in time to see the message... ```

Reported by `marcos.marado@optimus.pt` on 2010-12-31 10:52:18
    2010-12-31
Former user Account Deleted

``` OK, I looked at the code and found out who's the culprit. This is here "by design", but I don't understand why, and since this code exists since r2, I don't have a way to get a clue about your intentions... Hoping you might help me here ;-)

At rooms.py, the action function waits (asyncronously) 0.1 before actually doing its work. This is on purpose, and documented:

" Uses a callLater to make sure that reactions to the action occur after the action has been reported. "

What I don't understand is how can you achieve this by having a callLater, on the contrary: if you want to have reactions to the action occur after the action has been reported, then the operation you're doing should react first, report after, right? In this case, the action is "report an action" (say to everyone near this room that a new exit door appeared), but if you only say it sometime *after* the door creation, instead of right away, then you aren't making sure that reactions only occur after the the action was reported, in fact you're opening window to let reactions actually occuring before the action being done...

Anyway, my temp fix is taking that async timer out of the way, but I'm sure that you have a good reason to have that timer there, so maybe the correct fix is to have the possibility of flagging room.action to tell if we want it sync or async... ```

Reported by `mindboosternoori` on 2011-11-08 19:53:53
    2011-11-08
Former user Account Deleted

``` Another action that isn't as atomic as it should: purge.

1) Create a user "a" with password "a"; 2) login a a 3) in another terminal, connect to the MUD and paste this in the terminal:

purge a a login a a

When "purge a a" ends executing the user "a" should be completely purged, but it isn't, so when "login a a" happens, we still have an "a"... the result:

" Connection purged. Log in with "login <name> <password>" You feel yourself being ripped away from where you are... You have been teleported. void You do not see that here. "

After this, you won't be able to type anything (since in the meantime the purge really ends its effects).

From the log: 2011-11-09 19:53:36+0000 [TZ,1,127.0.0.1] player a logged in 2011-11-09 19:53:36+0000 [TZ,0,127.0.0.1] Lost a client!

As you can see, the "Lost a client!" only happens *after* the "player a logged in"... ```

Reported by `mindboosternoori` on 2011-11-09 19:57:02
    2011-11-09
Former user Account Deleted

``` Here's a patch fixing this: https://github.com/marado/tzmud/commit/4825c6620675ffecdc3dba599384ae3d4152de23

Twisted won't let us have a sync _purge, so I'm not sure if there any other things that depend on the purge actually being complete other than login, but if there are...

A safer solution would be to actually have a isLogged() function that doesn't only check the value of the is_logged boolean... But, well, this works.

```

Reported by `mindboosternoori` on 2011-11-09 21:17:06
    2011-11-09
Former user Account Deleted

``` Probably I was just coding defensively. Other than follow, I can't find any actions that require a delay, and follow has its own separate delay built in.

It looks like an action can specify a delay other than the default 0.1 seconds if it wants to. It's possible that dig should specify no delay.

Maybe what we should do is have the default be no delay, but still keep the code there that will allow an action to specify a delay if needed.

```

Reported by `missive@hotmail.com` on 2011-11-11 18:17:13
    2011-11-11
Former user Account Deleted

``` Well... If we don't have any delay there (see my patch here: https://github.com/marado/tzmud/commit/5bfe1644da16303fd053826a6c32ad01be77d7cc ), we can just delay the action call itself in the cases we need to, right? ```

Reported by `mindboosternoori` on 2011-11-11 20:05:42
    2011-11-11
Former user Account Deleted

``` Hmm...

Looking closer, I see that teleport is using the delay values:

room.action(dict(act='teleport_character_away', delay=0.2, actor=None, character=self)) reactor.callLater(0.4, self.move, destination) destination.action(dict(act='teleport_character_in', delay=0.6, actor=None, character=self))

So, it is trying to order those events in to the proper sequence.

How is that holding up under the no delay patch?

```

Reported by `missive@hotmail.com` on 2011-11-11 20:53:30
    2011-11-11
Former user Account Deleted

``` You're right, yes. But we can just reactor.callLater(0.4, room.action, dict) and reactor.callLater(0.6, destination.action, dict) instead of calling those actions now and telling them to only execute in 0.x ... ```

Reported by `mindboosternoori` on 2011-11-12 02:42:06
    2011-11-11
Former user Account Deleted

``` Right. I think the code path works out exactly the same.

I guess it is not as clear that that is an asynchronous call, but to my eye, it looks cleaner with the delay= value in the dict.

Either way, I don't think the behavior in the face of a bunch of rapid calls involving teleport is what would (or should) be expected ...

@teleport to house @teleport @teleport to house @teleport

You feel yourself being ripped away from where you are... You feel yourself being ripped away from where you are... You feel yourself being ripped away from where you are... You feel yourself being ripped away from where you are... You have been teleported. void You have been teleported. void

While someone watching sees: lee waves his hands around mysteriously. lee waves his hands around mysteriously. lee waves his hands around mysteriously. lee waves his hands around mysteriously. lee disappears. lee disappears. lee disappears. lee disappears. lee appears. lee appears.

Or even just: lee appears. lee appears.

In your work with other MUDS and talkers, have you seen any kind of testing framework for this kind of thing?

I think before any major changes begin, it would be good to set up tests that specify exactly what should happen given a particular sequence (and timing) of commands.

Maybe like:
w1 # logs in character wizard 1
p1 # logs in character player 1
p2

+p2 house # input from player 2 -w1 p2 leaves to the light. # output seen by w1 -p1 p2 leaves to the light. -p2 house 1.5 # wait here for 1.5 seconds before sending next line +w1 @teleport house -p1 waves his hands around mysteriously. -w1 You feel yourself being ripped away from where you are... -p1 disappears. -w1 house -p2 w1 appears.

```

Reported by `missive@hotmail.com` on 2011-11-12 14:19:42
    2011-11-12
Former user Account Deleted

```

I guess it is not as clear that that is an asynchronous call, but to my eye, it looks cleaner with the delay= value in the dict.

Either way works for me.

Either way, I don't think the behavior in the face of a bunch of rapid calls involving teleport is what would (or should) be expected ...

You're right, it isn't. The problem is that in lots of stuff, including @teleport, we shouldn't have anything asyncronous, at all. @teleport should:

1) on origin place, say that player "waved his hands" 2) remove character from origin place 3) tell player that he feels being ripped away 4) tell people on origin place that player has disappeared 5) put player on new place (I assume that this function triggers the player seeing the place he now is) 6) say in new place "player x appears"

Each of this items must be done in this order, and must be syncronous - meaning that each function will only return when all it is meant to do is done.

In your work with other MUDS and talkers, have you seen any kind of testing framework for this kind of thing?

Nothing, sorry, most of the MUDs and talkers I've seen only work 100% syncronously, some being only one non-threaded process... ```

Reported by `mindboosternoori` on 2011-12-07 19:11:22
    2011-12-07
Former user Account Deleted

``` OK, I restarted looking at this, and decided that I should investigate the teleport issue in particular. Threading through the code, I realized this:

@teleport house You feel yourself being ripped away from where you are... @teleport You feel yourself being ripped away from where you are... @teleport You feel yourself being ripped away from where you are... You have been teleported. void

Which means that: 1) When you're teleported to somewhere else, you have say "You feel yourself being ripped away from where you are..."; 2) When you're teleported to the same place you already were, you also see (afterwards), "You have been teleported." and the place where you were teleported to (which is always where you were already).

I have a couple of considerations here already: a) do we really want that it is possible to teleport something to where it already is? I'm OK with either choice, but it makes more sense to me if you simply can't teleport to where you already are...; b) We obviously shouldn't have a different behaviour, so the question is: should we display "You have been teleported." and where to... always or never? Once again, I'm OK with either choice, but it makes more sense to me if we always display that.

Now, I went to investigate *why* we were showing that info sometimes, and some other times we weren't. Seems that the difference is in the function "near_teleport_character_in", which is meant to be a function to warn those who are in some place that they suddenly started seeing something because it was teleported next to the player. IMHO someone who is teleported to where it was an instant ago doesn't see anything, even they feel something. So, I think we should say "You feel yourself being ripped away from where you are..." but it doesn't make sense saying "You have been teleported" (at least not here in the code, where the sentence actually means "you see yourself appearing here because you have been teleported", which makes no sense) - and since you're in the same place it also doesn't make sense that you decide to look around.

Here's a patch fixing this, please push it to your own code if you agree with my reasoning: https://github.com/marado/tzmud/commit/a952f0da1318fd22750b4fd3f5cdab067909daba ```

Reported by `mindboosternoori` on 2012-03-09 15:00:56
    2012-03-09
Former user Account Deleted

``` Back - sorry for the delay in giving you a complete reply. To answer your question:

"How is that holding up under the no delay patch?"

The answer is: applying the no delay patch *and* the one in c13, the no delay patch still wasn't enough. A quick look was enough to find out why: between those two actions with delay (that the "no delay" patch makes synchronous) there's this call:

reactor.callLater(0.4, self.move, destination)

which is obviously an async call.

Replacing it with "self.move(destination)" fixes the @teleport issues.

So - as far as I am concerned - the issues on *this* bug report are fixed with the two aforementioned patches *and* this replacement.

Now, if we do that, we should create another issue named "validate all callLater calls to see if we really want them async". A quick grep shows things that surely are meant to be async (database doing pack_regularly, !shutdown with delay), but others are probably like teleport, so if we go this way there's that extra work needed.

If you're OK with this solution, I am offering to do the remaining work (create that issue, validate each callLater, produce needed patches to fix this once for all). ```

Reported by `mindboosternoori` on 2012-04-11 19:02:35

Discussion


Log in to post a comment.