From: Adam R. <ad...@ex...> - 2010-01-26 11:59:41
|
2010/1/25 Thomas White <tho...@gm...>: > Adam, > > From my point of view, there are some important differences. Scheduling > functions can be used to execute a function asynchronously if the execution > time is set say 1 sec after the current time but this is where the > similarity ends. > > 1. So far all eXist functions are executed asynchronously (except scheduled > jobs and triggers). If I need to get data from say 25 or 250 remote sources > at the moment we will need to do it one data chunk at a time, one after > another. What about if we need to fetch 5000 RSS feeds? > We do need asynchronous commands. Ah okay, I think I understand now. Yes, it seems like an interesting feature to add to eXist. I guess we could add this onto the end of the roadmap if the other developers agree, however there is some work involved in this and as I am sure we are pretty busy. So how soon do you need this functionality? > 2.execute-before function - ability to conditionally execute a job with > variable delay. > 3. callback function - ability to call a specific code after the job is > done. > 4. ability to group jobs in batches, batch functions and batch callback > function can provide very powerful way to perform an asynchronous final > operation on a group of asynchronous functions. > > So far we have been thinking synchronously only - we need all data on the > server and we can produce the whole page at once. But it is getting much > more interesting and much more powerful when we think asynchronously on the > client - we can create and/or update partially many areas on the screen > simultaneously. For example use case 3. federated search, on a web client. > Imagine an application where the web client has to display results from 25 > servers where each of them has different latency. Whenever any of the data > becomes available - it is displayed without page refresh. > > Some will say just AJAX in action, but I will say it is going to a very > different level. Why? Because we have progress update during the request and > we have batch operations and especially ability to cancel whole batch of > requests when needed. > > Let take an application where the use will see on a same screen results from > 50 servers asynchronously. It is an application that gets quotes for car > insurance and it takes between 5 and 45 seconds for different quotes to be > completed. > > Case 1. Traditional AJAX approach - Every quote area on the screen has its > own AJAX request for a specific quote provider. When the user clicks on the > button, then: > > browser opens 50 TCP connections to the server until all quotes are received > (Windows limits up to 2 simultaneous TCP connections to a domain, that can > introduce additional delay). > On the server 50 synchronous TCP connections to the quote providers. > When every data chunk is recieved two TCP ports will be released on the > server. > On the server this user so far will occupy 100 TCP ports = > 50(browser-server)+50( server - quote providers ). > > After 10 seconds we have recieved data for say 10 of the quotes and the user > decides to amend something and press the "Get Quotes" button again. Then: > > The browser opens new 50 synchronous requests. > On the server this user now will occupy 180 TCP ports = 50(new browser > requests )+ 40(incompleted old browser requests) +50( new server-quote > providers) + 40(incomplete old server-quote providers connections). > There is nothing to cancel the old 40 connections to the browser add 40 > incomplete server-quote providers connections except connection timeout. > As a result the server can get out of TCP ports very quickly and the data > will be delivered slowly especially if more users click the quote button > earlier. It is getting worse very quickly when the users press the quote > button prematurely or when the users refresh the page. Scalability is > severely effected by the user behavior and by the number of users. > > Case 2. We have batch of asynchronous quote requests on the server. > There will be no long waiting requests on TCP connections on the client. > "Get Quotes" button will call a XQuery and quickly receive the batchID and > an initial estimated delay. For simplicity let say the client will call > getBatchStatus every second, closing the TCP connection after receiving the > data.When any of the jobs is complete, a quick call to fetch the data is > made. Then: > > On the browser there are no long waiting opened TCP connections. We have > quick fetch of the status and one call to fetch received data every second > and then all TCP ports are closed. > On the server, we have 50( server - quote providers ) TCP connections + 1 > every second from the browser, closed immediately + 1 connection for the > received data, all quotes delivered in one call, closed immediately . > > Now when the user clicks the "Get Quotes" button earlier, then we first > cancel all incomplete calls to quote providers on the server by calling > closeAll, releasing all TCP ports and then we make the new 50 requests. The > result: > > On the browser still have a call or two every second. No hanging TCP ports, > no timeouts. > On the server we have exactly the same 50( server - quote providers ) TCP > connections + 1 or 2 every second from the browser. > The scalability of the server is not effected by the user behaviour at all > and it can take much more users. > > > 5. Use case 4 federated search, on a the server I believe is very > important if we want to query more then one server in real time. > This scenario can do a pretty good job for awhile, before the eXist real > clustering is ready. > > I hope this explains your question. > > Regards, > Thomas > > ------ > > Thomas White > > Mobile:+44 7711 922 966 > Skype: thomaswhite > gTalk: thomas.0007 > Linked-In:http://www.linkedin.com/in/thomaswhite0007 > facebook: http://www.facebook.com/thomas.0007 > > > > > 2010/1/25 Adam Retter <ad...@ex...>: >> Can you not already do most (if not all) of this by Scheduling XQuery >> jobs with eXist's Scheduler? >> >> >> 2010/1/25 Thomas White <tho...@gm...>: >>> I would like to propose a new functionality that I believe could be very >>> beneficial for eXist users: >>> >>> Asynchronous Execution Pipeline >>> >>> This a mechanism for execution of number of asynchronous jobs >>> simultaneously. It is very useful for executing long running jobs or in >>> cases where it is impossible to predict how long it will take to perform >>> the >>> operation. Every job will run as a separated thread and the jobID and the >>> estimated delay will be returned immediately to the caller. >>> >>> Use cases: >>> >>> 1. Executing long running queries >>> >>> Callback function will be used to store the result, at a location >>> according >>> to the function-parameters. >>> A client checking periodically the status of this job will take next >>> action. >>> >>> 2. Fetching data from (large) number of remote URLs >>> >>> An XQuery or a scheduled job creates XX execution pipeline entries for >>> each >>> remote server. >>> Callback functions are used to store the results, at a location according >>> to >>> the function-parameters. >>> The batch callback function will combine the result and trigger the next >>> action. >>> >>> 3. Federated search, on a web client >>> >>> A web client sends a search request to a local XQuery, that creates XX >>> execution pipeline entries for each remote server and returns to the web >>> client a batch-id. >>> The web client queries the status for the jobs with this batch-id >>> periodically and when some of the jobs has status 'completed', web client >>> gets the result for this job and displays it on the screen >>> asynchronously. >>> >>> 4. Federated search, on a the server >>> >>> A web client sends a search request to a local XQuery, that creates XX >>> execution pipeline entries for each remote server and returns to the web >>> client a batch-id. >>> Every job callback function will save the result at a location according >>> to >>> the function-parameters. The batch callback function will combine the >>> result. >>> The web client queries the status for this batch periodically and when >>> the >>> batch is completed, web client gets the result and displays combined >>> result >>> set on the screen asynchronously. >>> >>> 5. Data Replication >>> >>> An XQuery or a scheduled job creates XX execution pipeline entries for >>> each >>> remote server. >>> Execute-before function will identify what needs to be replicated. >>> The main function does the replication. >>> The batch callback function moves the replication marker. >>> >>> A call to the Execution Pipe Line: >>> execution-pipeline:addJob( function, function-parameters, >>> pipeline-parameters ) >>> returning : >>> handlerID, estimated-delay, function-parameters >>> >>> >>> To get the result we need to call another function: >>> execution-pipeline:getJobResults( handlerID, autoClose ) >>> returning either: >>> the result data set. if autoClose is true then close the job and >>> release >>> all used resources. >>> or >>> same handlerID, new-estimated-delay,function -parameters >>> or >>> unknown-handlerID error >>> >>> execution-pipeline:getJobStatus( handlerID ) >>> returns >>> status of the job, function-parameters for this job >>> >>> execution-pipeline:getBatchStatus( batch-ID ) >>> returns >>> the status for all jobs from a particular batch ID. >>> >>> >>> execution-pipeline:getStatus( ) >>> returns >>> the status for all jobs. >>> >>> >>> execution-pipeline:closeJob( handlerID ) >>> execution-pipeline:closeBatch( batchID ) >>> execution-pipeline:closeAll( ) >>> >>> >>> function-parameters: >>> >>> job-statistic-id: used to keep average time for execution of this >>> function. >>> average time= (previous-average-time + last-execution-time)/2. URL with >>> specific parameters could be used as an ID. >>> execute-before function: when provided, it will be called before calling >>> the >>> main function for this job. If the result is 0 then proceed with the main >>> function, otherwise use the result as number of milliseconds to put this >>> job >>> to sleep and try later. >>> callback function: when provided callback-function will be called as >>> callback-function( handlerID, result, function-parameters ). if it >>> returns >>> true() the job will be closed. >>> any other parameters that may be used by the callback function. >>> >>> pipeline-parameters: >>> >>> batch-ID - to group >>> batch-callback-function: called when all jobs from the batch are >>> completed. >>> any other parameters that may be used by the callback function. >>> >>> Any comments? >>> >>> Thomas >>> >>> >>> ------ >>> >>> Thomas White >>> >>> Mobile:+44 7711 922 966 >>> Skype: thomaswhite >>> gTalk: thomas.0007 >>> Linked-In:http://www.linkedin.com/in/thomaswhite0007 >>> facebook: http://www.facebook.com/thomas.0007 >>> >>> >>> ------------------------------------------------------------------------------ >>> Throughout its 18-year history, RSA Conference consistently attracts the >>> world's best and brightest in the field, creating opportunities for >>> Conference >>> attendees to learn about information security's most important issues >>> through >>> interactions with peers, luminaries and emerging and established >>> companies. >>> http://p.sf.net/sfu/rsaconf-dev2dev >>> _______________________________________________ >>> Exist-development mailing list >>> Exi...@li... >>> https://lists.sourceforge.net/lists/listinfo/exist-development >>> >>> >> >> >> >> -- >> Adam Retter >> >> eXist Developer >> { United Kingdom } >> ad...@ex... >> irc://irc.freenode.net/existdb >> > > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |