In many respects, I'm still recovering from the great meeting last weekend and thinking through everything that was demonstrated and talked about.  One thing that I would like to contribute back to the group is a description of what a "workflow" might look like as part of an ingestion front end to Fedora.  One can't help but be impressed with the work that Rutgers did with their front end to the NJDH, and it seems like each one of us is considering what such a workflow would look like and what it would do.

This has been talked about a great deal in Ohio as well given the widely divergent uses for our repository, and it has lead to a notion of an "Ingestion Workflow Framework" that could guide an object through the steps from uploading to registration in the repository.

Below is a snapshot of what we're calling 'workflow':

Ingestion Workflow Framework

Part of the CommunityConfiguration for each DRC Community is the specification of how new content is processed as it enters the Community. The processing happens in Queues that make up the IngestionWorkflowFramework, ultimately leading to the publication of the content in the DRC Community. Workflow Queues are set up to provide for -- among other functions -- a hold on publication pending EditorModerator approval, a web-assisted peer review process, enrichment of metadata, and setting of digital rights and access management.

Under the guidance of the IngestionWorkflowFramework, processing of an object is stewarded among a string of Queues as defined by the EditorModerator for a particular Community. Once an item finishes in one Queue -- for instance, the setting of digital rights by the AuthorCreator -- it moves to the next Queue in the string -- say, to be held pending EditorModerator approval. There is no limit to the length or complexity of the string of Queues.

In the initial stages of the implementation of the IngestionWorkflowFramework, it is anticipated that this will be a serial list of Queues (in other words, an object cannot be in more than one Queue). Practically speaking, this means that a metadata enhancement process cannot occur while the object is undergoing the peer review process; once an object leaves the peer review process it can enter into the metadata enhancement process. It is hoped that a later implementation will allow for parallel processing in multiple queues in cases where this makes sense while ensuring the integrity of the object and its metadata.

List of Queues

These processing Queues are currently defined for the DRC:

Examples

The most simple string of queues is simply "PublishQueue": as soon as the object is submitted by the AuthorCreator, it is made available to the CommunityAudience. Another simple string is "ReviewQueue". When the object is submitted, the EditorModerator is notified and has the choice of which queue(s) in which to place the object (into PublishQueue, for instance, to make it available or into MetadataEnhancementQueue followed by PublishQueue).

The string of queues can be complex as well: "PeerReviewQueue ResubmitQueue ReviewQueue MetadataEnhancementQueue PublishQueue". In this example, a submitted item is immediately send to commentators as part of the peer review process. The item is sent back to the submitter for revision and is then held in the editor/moderator's queue for a decision. When the editor/moderator gives approval, the item is put in a queue of objects to be reviewed by a library cataloger or subject specialist to enhance the object's metadata. At the conclusion of the enhancement process, the item is available to the community's audience.

The IngestionWorkflowFramework concept allows for a great deal of flexibility in defining how content will be handled in a community. For instance, if the last two queues the the preceeding example were switched (specifying "PeerReviewQueue ResubmitQueue ReviewQueue PublishQueue MetadataEnhancementQueue"), objects approved by the EditorModerator would be immediately published, then enter a queue for metadata enhancement by a cataloger.

Differing Workflows for Various Users

A refinement to the IngestionWorkflowFramework will be the ability for the EditorModerator to specify different strings of queues for different groups of users. Based on attributes received for a user through ShibAccessMgt, this can allow an EditorModerator to give different levels of publication privileges to the Community's content. A Community can be set, for example, to use a workflow queue string of "PublishQueue" for all faculty members of a department (meaning that the content will be published immediately) and a workflow queue string of "ReviewQueue PublishQueue" for all graduate students of the department (meaning that content will be held for EditorModerator approval before publication in the Community).

What is missing in here now is the Records Management aspects of the OASIS framework as described by Kevin of Yale and Robert and Eliot of Tufts.  Much of it could probably be embedded into the wizards, or it may require a separate workflow queue of its own.

Would a workflow engine like this be useful to others?  If so, are there components you'd like to see?  (Has anyone seen anything else like that that could be used to give us a jump-start?)


Peter
-- 
Peter Murray                       http://www.pandc.org/peter/work/
Assistant Director, Multimedia Systems  tel:+1-614-728-3600;ext=338
OhioLINK: the Ohio Library and Information Network   Columbus, Ohio