Menu

#39 Corpus pipeline startup/shutdown processing

closed
None
5
2007-07-04
2007-06-15
No

It would be good to have a way to do processing once at startup and once at shutdown of a corpus pipeline. In addition, there should be a way to make datastructures created at corpus startup available to PRs.

For example, it would be easy to write PRs that use a database connection or file handle created at startup to access or write information for each document processed. Or one could create an empty statistics datastructure at startup, collect statistical information in the PR and display/write out the final stats at or after shutdown.

Since the datastructure to be created at startup is specific to each PR, this should be an PR feature really:

Maybe a possible way to implement this would be to have a convention that if a PR implements the methods startupPipeline(corpus) and shutdownPipeline(), these methods get registered to be run at startup/shutdown.

Discussion

  • Johann Petrak

    Johann Petrak - 2007-06-15

    Logged In: YES
    user_id=1472154
    Originator: YES

    Attempt to implement this according to the discussion with Ian on the developers list. There is now a new AnalyserControllerEvent and an interface AnalyserControllerListener for corpusExecutionStarted, corpusExecutionEnded, and corpusExecutionInterrupted methods.
    Classes SerialAnalyserController and ConditionalSerialAnalyserController fire the corpusExecutionStarted event before a corpus is processed, and the corpusExecutionEnded event after the corpus has ended.

    For a PR to use this the PR must implement CreoleListener, ControllerListener and AnalyserControllerListener.
    In its init() method it must
    Gate.getCreoleRegister().addCreoleListener(this);
    List<Resource> controllers;
    try {
    controllers = (List<Resource>)
    Gate.getCreoleRegister().getAllInstances("gate.Controller");
    } catch (gate.util.GateException e) {
    // deal with it!
    }
    for(Resource c : controllers) {
    System.out.println("Registering to controller "+c.getName());
    ((AbstractController)c).addControllerListener(this);
    }

    In addition it must implement the following methods:

    public void resourceLoaded(CreoleEvent e) {
    if(e.getResource() instanceof Controller) {
    System.out.println("Watching newly loaded controller for add PR events");
    ((AbstractController)e.getResource()).addControllerListener(this);
    }
    }

    public void resourceUnloaded(CreoleEvent e) {
    if(e.getResource() instanceof Controller) {
    ((AbstractController)e.getResource()).removeControllerListener(this);
    }
    }

    public void resourceAdded(ControllerEvent e) {
    if(e.getPr() == this) {
    Controller c = (Controller)e.getSource();
    if(c instanceof gate.creole.SerialAnalyserController)
    ((SerialAnalyserController)c).addAnalyserControllerListener(this);
    else if(c instanceof gate.creole.ConditionalSerialAnalyserController)
    ((ConditionalSerialAnalyserController)c).addAnalyserControllerListener(this);
    }
    }

    public void resourceRemoved(ControllerEvent e) {
    if(e.getPr() == this) {
    Controller c = (Controller)e.getSource();
    if(c instanceof gate.creole.SerialAnalyserController)
    ((SerialAnalyserController)c).removeAnalyserControllerListener(this);
    else if(c instanceof gate.creole.ConditionalSerialAnalyserController)
    ((ConditionalSerialAnalyserController)c).removeAnalyserControllerListener(this);
    }
    }

    public void resourceRenamed(Resource r,String from, String to) {
    }
    public void datastoreClosed(CreoleEvent e) {
    }
    public void datastoreCreated(CreoleEvent e) {
    }
    public void datastoreOpened(CreoleEvent e) {
    }
    public void corpusExecutionStarted(AnalyserControllerEvent e){
    // Do some stuff before the corpus is getting processed
    }
    public void corpusExecutionInterrupted(AnalyserControllerEvent e){

    }
    public void corpusExecutionEnded(AnalyserControllerEvent e){
    // Do whatever is needed after the corpus has been processed
    }

    File Added: FR1737743.zip

     
  • Johann Petrak

    Johann Petrak - 2007-06-15
     
  • Johann Petrak

    Johann Petrak - 2007-06-15
    • assigned_to: nobody --> johann_p
     
  • Valentin Tablan

    Valentin Tablan - 2007-06-19

    Logged In: YES
    user_id=1280870
    Originator: NO

    I think that a more robust way of implementing this is for the Controller implementation to check the PRs added to itself and automatically add them as listeners to self if they implement the AnalyserControllerListener interface. This way we can make sure that all "interested" PRs get to listen to events from all controllers that contain them. We would also need to make sure that the listeners are remopved when the PR is disposed or removed from the controller (but that should be fairly easy to do).

    This solution would be easier to implement than having all PRs register as Creole listeners and check for creation of new controllers. PRs would also have no reliable way of knowing when they were added or removed from a controller.

     
  • Johann Petrak

    Johann Petrak - 2007-06-20

    Logged In: YES
    user_id=1472154
    Originator: YES

    I agree with that -- my original proposal (on the developer list) was to let the analyser controllers check the PR whether it implements a certain method (interface, as you suggest, might be better) and simply call the corresponding startupCorpus method if it is present (i.e. if the interface is implemented) before actually iterating through the corpus. I do not know what the advantage/disadvantage of using listeners instead would be, but in both cases the overhead of registering for the add to pipeline event etc. would be avoided.

    Maybe we should discuss the details on the developer list?

     
  • Ian Roberts

    Ian Roberts - 2007-07-03

    Patch adding listener support to Executable interface

     
  • Ian Roberts

    Ian Roberts - 2007-07-03

    Logged In: YES
    user_id=1157323
    Originator: NO

    This patch shows my idea for implementing a more general form of the thing Johann is after:

    - I've added ExecutionEvent and ExecutionListener in gate.event, and added methods addExecutionListener and removeExecutionListener to the Executable interface (extended by Controller and ProcessingResource). The idea is that anything executable should be able to take listeners and notify them at the start and end of execution.
    - I've modified AbstractController and AbstractProcessingResource to handle the listener registration and to fire off the necessary events in their execute methods, delegating the actual execution to a new protected executeImpl method.

    For this to work properly, subclasses of AbstractController and AbstractProcessingResource need to override executeImpl instead of execute. My patch shows how this works for the controller implementations, we'd have to modify our PRs as well if this idea goes into GATE proper. PRs will still work if they override execute() but they won't fire the new events until they're changed to use executeImpl() instead.

    With this approach it's probably not such a good idea to have controllers automatically add their contained PRs as listeners - just because a PR implements ExecutionListener doesn't necessarily mean that it is the containing controllers that the PR is interested in listening to...

    What do you think of this idea? I guess a major release is a good time to put in something like this execute->executeImpl change (though I strongly suspect that 99.9% of the PRs out there extend AbstractProcessingResource anyway).
    File Added: executionlistener.diff

     
  • Ian Roberts

    Ian Roberts - 2007-07-03

    Logged In: YES
    user_id=1157323
    Originator: NO

    After further discussions with Valy we've come to the conclusion that changing the Executable interface has too much potential to break third-party PRs. "Magically" adding and removing listeners is also asking for trouble, e.g. the same PR can technically appear more than once in the same controller (if you're programming to the API, the GUI doesn't allow this).

    So it looks like something closer to the original suggestion would be best:

    - add an interface ControllerAware, with methods controllerExecutionStarted(Controller) throws ExecutionException, controllerExecutionFinished(Controller, Throwable).
    - If a PR implements ControllerAware, the controller will call the started and finished methods at the start and end of its execution.

    If a PR needs access to the corpus it is running over it can get it via the controller (assuming the controller is a CorpusController rather than a simple pipeline):

    Corpus c = ((CorpusController)controller).getCorpus();

     
  • Ian Roberts

    Ian Roberts - 2007-07-04

    Logged In: YES
    user_id=1157323
    Originator: NO

    This has now been implemented (svn rev. 8859). I've added the interface gate.creole.ControllerAwarePR with the relevant notification methods, and updated the standard controllers to call these methods at the appropriate times.

     
  • Ian Roberts

    Ian Roberts - 2007-07-04
    • assigned_to: johann_p --> ian_roberts
    • status: open --> closed
     

Log in to post a comment.