WorkingWiki / Bugs / #448 Comet-type stuff

Lee Worden - 2013-11-12

Specifically the SSE technique. This will not work on IE, but like, I bet a lot of WW stuff doesn't work on IE anyway. More seriously, I can make IE fall back to the current CGI way of doing actions.

I think I'm going to write a test API action in the extension, to see how well SSE works or doesn't, in context.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-12

Simple comet test works, at least in Firefox. Sends an update each second, and quits on the serverside when the client side aborts the Comet connection. Great news!

This seems to mean I can do this within the standard WW api framework (well, by jettisoning the framework in midstream to do my own low-level output, but anyway I can do it while keeping my code within the standard api class hierarchy, which is great).

Last edit: Lee Worden 2013-11-12

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-12

One concern is what will happen if the connection gets lost midway - will the client be able to reconnect? Will reconnecting start a second instance of the operation? That would be bad. Will the first instance of the operation think it has been aborted?

If so, maybe it would be worth having a level of indirection - do an ajax call to initiate the operation, have it return quickly while leaving the server cranking on the operation and dumping output to a file, and give the client a key that it can use to make a comet connection to follow the progress of the operation, i.e. spool the contents of the output file. In that case, reopening the comet connection would be harmless. Even then, though, I'd want to distinguish a lost connection from an intentional abort.

In some cases I might be able to spawn a Unix command and return, like when creating a background job, but in other cases I might need to do long-running operations in php. Can I have it return the Ajax response quickly and keep running on the server side? I don't think so, but maybe.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Lee Worden - 2013-11-13
  
  Tried a quick test where the Comet call starts loading updates once per second into the browser, and then I disconnect and reconnect the wireless... The data stops coming of course, but it keeps running on the server, and the client never tries to reconnect or anything, it just seems to get stuck waiting forever - the little text still says "Connected to lalashan" and there's a spinner in the tab...
  
  There are some events in the EventSource object that I haven't provided handlers for - maybe one of them will help handle this...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Lee Worden - 2013-11-14
    
    A more precise test. Start up the one-per-second updates; switch to a different wireless network while it's receiving and see what happens; switch back and see what happens.
    
    When I switch away to a different network, I get nothing: no updates, no error event, no reconnect.
    
    When I switch back, after a pause of 10 or 15 seconds I get all the backed-up updates at once, and then they continue.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Lee Worden - 2013-11-14
      
      So I'm not thrilled about this lack of robustness. Also, when it does reconnect, it does it by calling the same URL over again. So I would have to take measures to make sure it doesn't initiate a second instance of the create-background-job or whatever operation.
      
      I'm thinking of using Ajax polling rather than Comet after all. It's more noise on the wire due to the 2-way interaction, but gives me more control over reconnecting, and so I can do it more robustly.
      
      Either way I'm going to have to separate reporting on jobs' progress from doing the jobs, on the server side, to make it possible to drop and reconnect without restarting the job.
      
      I'm thinking of an architecture where the job says "Started", gives the client a unique key, and closes the connection, then does the job. As it does the job it will dump progress reports into memcache, using the unique key, and the client will use the key to request updates to those reports.
      
      But I'll have to work out whether it's possible to return a response and then finish the job. MW doesn't do that as is. But I think I can do it if I seize control of the HTTP output, as I've been doing for the Comet response.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Lee Worden - 2013-11-14
        
        Or I might be able to fix up Comet reconnection myself (within this indirection framework with memcache keys) - which would give me the advantages of the above scenario plus less noise on the wire, and probably fewer delays because the updates keep coming without the lag for me to request them over and over.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Lee Worden - 2013-11-14
        
        I should also find out what a WebWorker is and whether I should use one to consolidate multiple Comet streams into one. I already have whatever one Comet thing happening concurrently with background job updates every 60 seconds, and those things could be folded into a single stream of updates.
        
        But I should postpone this until I have some faint idea whether multiple things happening at once is a real issue or not.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Lee Worden - 2013-11-17
        
        A WebWorker by the way is basically a JS background thread that you can create to do asynchronous work, such as listen for events from an SSE connection. I see no advantage in using this on a single page, since the event framework in the main thread is fine for that.
        
        However, if the number of open SSE connections to the server becomes a problem - there may be a limit to how many it can support - there's the option of using a single shared WebWorker with a single connection for all the WW pages that are open in ones browser, rather than one or more connection per tab.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
      - Lee Worden - 2013-11-14
        
        OK, setting aside whether I'll use Ajax or Comet or Bon Ami or whatever to convey the shit to the client, how am I going to have one process do the work and another one report on its progress?
        
        Time to survey the list of things that will be done this way.
        
        Updating the list of background jobs can be done either by an Ajax call every 60 seconds, or by a Comet stream of minutely updates.
        
        Create background job.
        
        Merge background job.
        
        Destroy background job?
        
        Create preview session.
        
        Merge preview session.
        
        Make a project file.
        
        Load a project file, I think, even when not making, because the directory might be locked.
        
        Clear working directory.
        
        List working directory.
        
        Anything that might encounter a lock should be in the comet(-like) framework, so that the user can get the feedback "Waiting for lock from XXX" and have the option to abort or wait until the operation completes.
        
        That means that even quick, one-step operations like "delete project file" can require progress updates.
        
        Are there operations that happen without locking? I think things like "add source file" on the MP page, since it only alters the project description, not the directory. So it might be worth using simple Ajax for that, though once the above framework is in use, it might be parsimonious just to use it across the board.
        
        So anyway, some of the output from these operations is generated directly from the PHP ("Waiting for lock from XXX"), and other output comes from Unix commands (the output of make, for instance). It might make more sense to collect the output in a Unix file rather than in memcache, since I can redirect the output of cp and make to there with less hassle.
        
        So a sequence something like:
        
        When operation is requested, generate a unique filename, convey it to the client and close the HTTP response, then do the operation.
        
        Collect all relevant output in that file.
        
        Client makes separate call(s) to retrieve the contents of the file as it's being filled.
        
        At some point the file should get deleted.
        
        Hopefully lag in retrieving the file contents on NFS won't get in the way of getting timely updates.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Lee Worden - 2013-11-14
        
        So why don't I mock up a version of this and see what it takes.
        
        A fake-job API call that generates a tempfile and takes a while to fill it
        
        A status-update API call that the client can call to follow additions to the tempfile
        
        Test it by modifying the Comet-testing JS that I already have.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Lee Worden - 2013-11-14
        
        So how to get an API call to send its response early? What does ApiBase, ApiMain, etc do after it calls my class's execute()?
        
        it's called from ApiMain::executeAction()
        
        which then calls $this->printResult(false).
        
        that outputs the result in the appropriate format, so I think I'd like to call that early and then do something fancy.
        
        my class's $mModule member points to the ApiMain object that will do the printing.
        
        but of course printResult() is protected, so I can't call it :(
        
        unless my class descends from ApiMain (a subclass of ApiBase) instead of from ApiBase directly? Is that too weird?
        
        or what if I use DeferredUpdates, which get called after ApiMain does its work?
        
        It looks like my ApiBase-subclass object will still be around and not deallocated
        
        You do this by creating a DeferrableUpdate-type object and putting it on the queue; then later it calls that object's doUpdate() function. Since DeferrableUpdate is an interface, not a class, that object can in fact be my Api object. So its execute() can produce the initial Ajax output, and its doUpdate() can flush and close the output stream, and then do the long job, writing its subsequent output to the updates file. Worth a try.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-15

Well I'm having a hell of a time getting the initial call to send out its response before it does the work - the mw.Api object waits until the server has done all the work and terminated before it registers that it's gotten a response worth processing. I'm controlling the HTTP headers and stuff, but for some reason when I set "Connection: close" I get "Connection: keep-alive" regardless. Maybe the proxy on yushan is overriding my "Connection" header?

Should I try to work with RHPCS to figure out if it's the proxy? I want to be robust and not put special requirements on sysadmins like details of how you need to configure your proxy.

What if I do a hybrid approach where the initial call produces a key, spools output to a status file, and also streams its output to the client? The client can happily receive it up till it loses the connection, then can reconnect using the key to listen in on the ongoing operation without restarting it.

Weird, but maybe doable. Maybe too complicated though. Might be worth taking it slower and figuring out what's "the right way" to do it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Lee Worden - 2013-11-15
  
  Ha! Or! I have the long job use Comet-style output to send its response, because that gets processed immediately. But as soon as the client gets the first piece of data from it, it drops the connection and make a request to listen in on the status file. That way the long job can do its work without having to also produce constant updates for the client. This will be especially important when the long job is making system calls to make, cp, etc, and the output is going to the file only.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Lee Worden - 2013-11-15
    
    This is generally working and seems workable. I am having trouble with NFS though. Even though my code calls fwrite() and fflush() every second while writing to the status file, the other process that's watching it, on a different cluster node, doesn't see anything but a 0-byte file for a long time. It updates about every 30 seconds, dumping out 30 seconds worth of data. If I were writing in C, I think I could call ioctl( FIOSYNC ) to make NFS flush out the file to/from the server, but I don't see how to do that from PHP. More to come about that...
    
    I feel like this is fixable though, and this approach will work.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Lee Worden - 2013-11-15
      
      It looks like I may be able to get NFS to sync the file by closing and reopening it, or by locking it. If I can do it by locking that would be great, because I don't think I can close and reopen it while a long make process is appending to it.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Lee Worden - 2013-11-15
        
        Whoa dude, that locking trick actually does work! This is great because I think I can get it to sync by having my php process lock it over and over while some other make process is spooling data to it...
        
        [aside: someday I'm going to understand nfs better... we really want it to sync better on its own, because when one node updates a project file and another node retrieves it, it really would be nice if we could trust it was getting the updated file. I have problems all the time where I edit and save part of the WW code, and load a page to test it, but it's still seeing the old version of the code... I just have to be patient and let it have about 15 or 30 seconds to catch up after I save...]
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Lee Worden - 2013-11-15
        
        PS. if you're following along at home, you can watch the test data update all quick style in the browser by clicking on a 1.21 page's main header, the one that gives the title of the page. I'll probably remove that testing easter egg pretty soon, when I get to using the comet code for something real.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-15

Something to think about btw: if the client is connected to a secondary process, not the one that's doing the operation, how is it going to abort the operation?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-17

I started a nice discussion about this on wikitech-l by the way: http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073116.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2013-11-23

I have developed a Comet demo that does a lot of what I want for this: https://github.com/worden-lee/cometdemo. That's a way of putting the code out into the world and making it available for other uses, and of course I'll use it in WW.

Next is to build this into an implementation of "merge background job", I think, because it's awkward to have "destroy" and "merge" side by side and try to remember that the links behave so differently. The merge action requires:

Shift the existing WWAction merge code into an ApiBase subclass and write glue code that lets it get used in the WWAction framework

Get ProjectEngine's merge operation to put its output stream into a place where I can stream it to the client. Merge is basically a cp operation, so this includes asking cp to provide verbose output, so the user will have a handle on how the process is going. It'll also include outputting the text of the cp command itself, and anything else that's worth telling the user.

Adapt the merge ApiBase code to provide a Comet stream when appropriate

Adapt the Comet JS code to make the merge API call and receive its output stream

When that's done, I'll have

An upgraded merge operation

Integration of the CometDemo code into WW

A framework for integrating ApiBase and WWAction classes

A framework for passing streaming output from PE thru WW to the client, with WW possibly adding some messages to the stream along the way.

And hopefully, a reusable CometOperation class on the JS side, which I don't have now, it's kind of ad-hoc code.

This should put me in a good position to upgrade other operations more quickly afterward.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Lee Worden - 2013-11-23
  
  It seems like PE will need to have some kind of dance where it
  
  forks the unix process
  
  continues to run in a polling cycle, waking up fairly frequently and doing a lock/unlock to get the output file to sync across NFS
  
  check for a user abort in some way. The watcher API process will have to convey some kind of signal telling this process to stop.
  
  I have been using the output file's permissions as a signal, by creating the file with read-write permissions and having the output-producing process change it to read-only when it's done writing. This tells the output-spooling process that it can stop. I don't know what would happen if I had the output-spooling process turn off the write permission to tell the output-producing process to stop - would there be confusion in using the same bit for both directions? Would it interrupt the writing in a disorderly way? Maybe I should turn off the read permission instead, to signal that it won't be reading any more? The problem with that is that I probably will want to keep reading, to provide feedback on whether it actually stops running or not. I guess I could also use the execute bit or sticky bit or whatever, even without actually using those bits for their intended purpose. And of course I could also create a second file for this communication channel, but I'd rather not add more parts to the machine if I can help it...
  
  Last edit: Lee Worden 2013-11-26
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Lee Worden - 2013-11-26
    
    Using the read bit to signal the writer to stop won't work, because I want to read the file both before and after; using the write bit as bidirectional signal will cause problems, for instance if there's a second watcher process - it'll mistakenly think it's time to stop watching and delete the file. My sense of humor wants me to kill the writer process by setting the execute bit, so maybe I'll do it that way.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Lee Worden - 2014-06-12

I've got Ajax dynamic loading of project files running, though it's surely not perfect. But I won't be deploying it for testing right away because I don't want to disrupt the MMED meeting with buggy code deployments (since I just did disrupt it twice :().

So I'm going to start work on this Comet version of it, which is the next step. I'll put the Ajax dynamic loading interface out there for testing, while I'm working on this Comet stuff behind the scenes.

As I'm thinking of it now, I think the Comet protocol works something like this.

Client initiates an operation, by sending to the server the relevant request data, plus a randomly generated key.

Server starts the operation and sends back confirmation that it's started. On server side, the operation's output is stored in a log file identified by the key.

If the initial confirmation isn't received, client can resend the initial request with the same key, and server will re-confirm without starting a second instance of the operation.

Client initiates a Comet connection, identified by the operation's key, to follow the operation's progress.

This also can be done more than once if network is failing, without harm. Whenever it succeeds, the client will get all the data.

If connection fails partway through, client can reconnect and say what it's received, and server will give it all the updates from that point forward.

When operation finishes, client confirms that it's finished and server deletes the file.

note .make.log files and the like will not be deleted: this operation log file is a separate thing.

If the file deletion step never happens, server will delete the file after it's been stale for 24 hours.

That seems pretty solid as long as I can figure out how to dump all the relevant messages into a single file - which seems pretty achievable, using a combination of >, | tee, and direct writing from PHP code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Lee Worden - 2014-06-26
  
  As I'm thinking of it now, I think the Comet protocol works something like this.
  
  So actually I think I can simplify that - we don't need to send a request to initiate the operation, then get confirmation back, then start following the operation's progress. Instead,
  
  Client initiates an operation by making a request to api.php with format=comet or whatever, and with a randomly generated key included.
  
  Server starts doing the operation, and either starts sending updates or disconnects.
  
  If client doesn't hear back or gets disconnected or anything, it resends the same request, with the same key.
  
  If it's already gotten some updates, it says what was the last one it got.
  
  If server gets such a request and key matches an operation that's underway, it sends updates from that operation, from whatever appropriate starting point.
  
  When operation finishes, server says it's done, client confirms, and then server erases the log file.
  
  confirms by repeating the same original request, more or less, with a special confirm-finished thing.
  
  There may also be a separate comet-updates api action that can be called, as an alternative to resending a bunch of parameters from the original request that don't need to be included once it's underway. So then client will maybe
  
  Send original request.
  
  Resend original request if needed, until some updates are received.
  
  If further reconnection is needed, use a simpler comet-updates request with 'last-received=XXX'.
  
  When done, confirm by sending comet-updates request with 'last-received=done'.
  
  This is not really much simpler, so it might turn out to be better to use the earlier proposal. Let's see how it shapes up in the code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Comet-type stuff

Group

Searches

Help

#448 Comet-type stuff

Related

Discussion