|
From: Martin L. <mar...@us...> - 2004-03-30 19:27:23
|
Update of /cvsroot/babeldoc/babeldoc/readme/userguide In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4646/readme/userguide Modified Files: chapter4.xml chapter5.xml Log Message: Minor documentation updates Index: chapter4.xml =================================================================== RCS file: /cvsroot/babeldoc/babeldoc/readme/userguide/chapter4.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** chapter4.xml 7 Apr 2003 03:53:42 -0000 1.2 --- chapter4.xml 30 Mar 2004 19:15:41 -0000 1.3 *************** *** 1,82 **** <?xml version="1.0" encoding="ISO-8859-1"?> ! <chapter> ! <title>Journal</title> ! <section> ! <title>Introduction</title> ! <para>The journal keeps track of documents as they move through the system as well as the status of each operation performed on the document. The primary purpose of the journal is to provide a safe environment for the processing of documents. There are a number of mission critical situations where losing data is not acceptable. It is possible to recreate document processing if an error condition should arise. Errors can be both external and internal. Internal problems could be temporary database errors, disk space, etc. External causes could be erroneous documents, network outages, etc.</para> ! <para>Each document is associated with a <userinput>JournalTicket</userinput> which is assigned uniquely just as the document enters the pipeline. Each operation upon a document for a JournalTicket (hereafter also refered to as a ticket) is performed at a step. Steps start at zero and increase until the document is finished processing. Each operation (or pipelinestage) on a document can be uniquely identified by a combination of a ticket and a step.</para> ! </section> ! <section> ! <title>Journal Operations</title> ! <para>A journal operation indicates what happened in the journal for the document at that pipelinestage. This is essential for determining problems with document processing. There are a number of journal operations available:</para> ! <orderedlist> ! <listitem><userinput>newTicket</userinput>. This operation is the first operation (step 0) when a document is introduced into a pipeline. This returns a new ticket.</listitem> ! <listitem><userinput>forkTicket</userinput>. This operation occurs when a document is split intoor similar operation. The forked ticket is a new ticket but is associated with its parent ticket in theTicket lineage may thus be traced.</listitem> ! <listitem><userinput>updateStatus</userinput>. This operation will cause the status of this ticket to be updated and the step updated. The ticket is unchanged, the step is incremented.</listitem> ! <listitem><userinput>updateDocument</userinput>. This operation writes the document to the journal data store (implementation dependant). The ticket is unchanged and step is incremented.</listitem> ! <listitem><userinput>replay</userinput>. The operation causes the document associated with the ticket to be replayed from the step specified. This operation can only succeed if the document was updated (see update document operation).</listitem> ! </orderedlist> ! </section> ! ! <section> ! <title>Journal Implementations</title> ! <para>The implementation of the journal depends on your specific circumstances. There are currently three implementations that are available. Which specific journal to use is defined in the configuration file: <userinput>config/journal/config.properties</userinput>. The journal to be used is set in the single name/value pair: <userinput>journal</userinput>. The options are: </para> ! <orderedlist> ! <listitem>simple</listitem> ! <listitem>mysql</listitem> ! <listitem>oracle</listitem> ! <listitem>ejb</listitem> ! </orderedlist> ! <section> ! <title>Simple Journal</title> ! <para>The simple journal implements its operations as disk files and directories. It is not intended as a robust, enterprise level implementation. It also lacks structured query functions for querying, etc. Its configuration file is <userinput>config/journal/simple/config.properties</userinput>. This file has a number of configuration options</para> ! <orderedlist> ! <listitem><userinput>simpleJournalDir</userinput>: The directory to create the log-detail files.</listitem> ! <listitem><userinput>SimpleJournalLog</userinput>: The path to the journal file. See later.</listitem> ! <listitem><userinput>logMaxSize</userinput>: This will roll-over the log file once the journal log reaches this size.</listitem> ! </orderedlist> ! <para>For each operation logged to the journal, it is logged line by line to the journal log file. The lines are comma-separated values <firstterm>CSV</firstterm> separated and can be parsed by third party applications. The columns are:</para> ! <orderedlist> ! <listitem><userinput>ticket number</userinput>: the ticket number is currently the time in milliseconds at time of creation of the ticket.</listitem> ! <listitem><userinput>step</userinput>: the step number - starting from 0</listitem> ! <listitem><userinput>operation</userinput>: The particular operation being executed</listitem> ! <listitem><userinput>timestamp</userinput>: The time in milliseconds when the operation was logged</listitem> ! <listitem><userinput>status information</userinput>: The fail / success for updateStatus.</listitem> ! <listitem><userinput>pipeline stage name</userinput> The stage within the pipeline when this step was logged.</listitem> ! <listitem><userinput>additional status information</userinput>: The additional status information that indicates further information about this journal log.</listitem> ! </orderedlist> ! <para>For each ticket, there is a directory created with the value of the ticket (this is long string of numbers - its actually the time in milliseconds of when the ticket was created. Inside this directory there are step delta files which represents each step in the log for that ticket. The contents of the delta file may be the status string or the document itself (if the operation is a updateDocument). The document is persisted as a object serialization.</para> ! </section> ! ! <section> ! <title>Jdbc Journal</title> ! <para>It is possible to use a database to store the journal log and the document data. Currently oracle and mysql are supported. The schema creation scripts are in the directory <userinput>readme/sql</userinput>. The document data is stored as binary data (<firstterm>BLOBs</firstterm>). Each vendor supports BLOBS slightly differently hence the specific database support. There are three fundamental tables to store the journal data (the table <userinput>table_key</userinput> is for unique key generation), being:</para> ! <orderedlist> ! <listitem><userinput>log</userinput>: Stores tickets and steps for the tickets as well as the operation details for that ticket step. The other_data columns can either store the status message for update status operations or the parent ticket id for fork ticket operations.</listitem> ! <listitem><userinput>journal</userinput>: Stores the document as a blob for the ticket step. This is associated with update document operations.</listitem> ! <listitem><userinput>journal_data</userinput>: Storage for the enriched variables associated with the document. The primary reason that these variables are stored separately is that they can be used as query parameters for console operations. Note that long and binary variables are not stored to the database and that strings can get truncated.</listitem> ! </orderedlist> ! <para>The configuration for both the Mysql and Oracle journals are stored in the configuration file: <userinput>config/journal/sql/config.properties</userinput>. The only configuration option in this file is <userinput>resourceName</userinput> indicates that name of the resource that will manage the database connection. Current the journal is implmented in a separate schema (instance, whatever) than the other database storage areas (user, and console).</para> ! </section> ! <section> ! <title>Ejb Journal Implementation</title> ! <para>The intent of this journal implementation is to store the operation journal implementation in a J2EE container. Currently <productname>Jboss</productname> is explicitly supported but not to the exclusion of other containers. This implementation is really a shell around either the <userinput>simple</userinput> or <userinput>sql</userinput> journal implementations but running in a remote server. By this means, it is possible to move the journal operation to a central location. The main issue that can arise with the EJB The configuration for the ejb implementation is stored in the configuration file: </para> ! </section> ! </section> ! ! <section> ! <title>Journal Tool</title> ! <para>The <userinput>journal tool</userinput> allows access to the journal from the command line. This enables complex queries to be applied against the journal. There are four separate types of queries:</para> ! <orderedlist> ! <listitem><userinput>LIST</userinput>: List all the tickets and the steps in the journal. This can produce lots of output. This can be limited by the flag -n (no more than this many lines of output). It is also possible to start from another index other than zero using the -i flag</listitem> ! <listitem><userinput>TICKETSTEPS</userinput>: List all the ticketsteps for the supplied ticket.</listitem> ! <listitem><userinput>DISPLAY</userinput>: Displays the contents of the document stored at the ticket/step to the screen</listitem> ! <listitem><userinput>REPROCESS</userinput>: This will reintroduce the document at the the point it was stored or later.</listitem> ! </orderedlist> ! <para>There are a number of options which can change the display of the data from the tool - use the -h command line to get all the options for this tool</para> ! </section> ! </chapter> --- 1,104 ---- <?xml version="1.0" encoding="ISO-8859-1"?> ! <chapter> ! <title>Journal</title> ! <section> ! <title>Introduction</title> ! <para>The journal keeps track of documents as they move through the system as well as the status of each operation performed on the document. The primary purpose of the journal is to provide a safe environment for the processing of documents. There are a number of mission critical situations where losing data is not acceptable. It is possible to recreate document processing if an error condition should arise. Errors can be both external and internal. Internal problems could be temporary database errors, disk space, etc. External causes could be erroneous documents, network outages, etc.</para> ! <para>Each document is associated with a <userinput>JournalTicket</userinput> which is assigned uniquely just as the document enters the pipeline. Each operation upon a document for a JournalTicket (hereafter also refered to as a ticket) is performed at a step. Steps start at zero and increase until the document is finished processing. Each operation (or pipelinestage) on a document can be uniquely identified by a combination of a ticket and a step.</para> ! </section> ! <section> ! <title>Journal Operations</title> ! <para>A journal operation indicates what happened in the journal for the document at that pipelinestage. This is essential for determining problems with document processing. There are a number of journal operations available:</para> ! <orderedlist> ! <listitem> ! <userinput>newTicket</userinput>. This operation is the first operation (step 0) when a document is introduced into a pipeline. This returns a new ticket.</listitem> ! <listitem> ! <userinput>forkTicket</userinput>. This operation occurs when a document is split into many documents or similar operations. The forked ticket is a new ticket but is associated with it's parent ticket in the Ticket lineage and may thus be traced.</listitem> ! <listitem> ! <userinput>updateStatus</userinput>. This operation will cause the status of this ticket to be updated and the step updated. The ticket is unchanged, the step is incremented.</listitem> ! <listitem> ! <userinput>updateDocument</userinput>. This operation writes the document to the journal data store (implementation dependant). The ticket is unchanged and step is incremented.</listitem> ! <listitem> ! <userinput>replay</userinput>. This operation causes the document associated with the ticket to be replayed from the step specified. This operation can only succeed if the document was updated (see update document operation).</listitem> ! </orderedlist> ! </section> ! <section> ! <title>Journal Implementations</title> ! <para>The implementation of the journal depends on your specific circumstances. There are currently three implementations that are available. Which specific journal to use is defined in the configuration file: <userinput>config/journal/config.properties</userinput>. The journal to be used is set in the single name/value pair: <userinput>journalType</userinput>. The options are: </para> ! <orderedlist> ! <listitem>simple</listitem> ! <listitem>mysql</listitem> ! <listitem>oracle</listitem> ! <listitem>ejb</listitem> ! </orderedlist> ! ! <section> ! <title>Simple Journal</title> ! <para>The simple journal implements it's operations as disk files and directories. It is not intended as a robust, enterprise level implementation. It also lacks structured query functions for querying, etc. Its configuration file is <userinput>config/journal/config.properties</userinput>. This file has a number of configuration options</para> ! <orderedlist> ! <listitem> ! <userinput>simpleJournalDir</userinput>: The directory to create the log-detail files.</listitem> ! <listitem> ! <userinput>simpleJournalLog</userinput>: The path to the journal file. See later.</listitem> ! <listitem> ! <userinput>logMaxSize</userinput>: This will roll-over the log file once the journal log reaches this size.</listitem> ! </orderedlist> ! <para>For each operation logged to the journal, it is logged line by line to the journal log file. The lines are comma-separated values <firstterm>(CSV)</firstterm> and can be parsed by third party applications. The columns are:</para> ! <orderedlist> ! <listitem> ! <userinput>ticket number</userinput>: the ticket number is currently the time in milliseconds at time of creation of the ticket.</listitem> ! <listitem> ! <userinput>step</userinput>: the step number - starting from 0</listitem> ! <listitem> ! <userinput>operation</userinput>: The particular operation being executed</listitem> ! <listitem> ! <userinput>timestamp</userinput>: The time in milliseconds when the operation was logged</listitem> ! <listitem> ! <userinput>status information</userinput>: The fail / success for updateStatus.</listitem> ! <listitem> ! <userinput>pipeline stage name</userinput> The stage within the pipeline when this step was logged.</listitem> ! <listitem> ! <userinput>additional status information</userinput>: The additional status information that indicates further information about this journal log.</listitem> ! </orderedlist> ! <para>For each ticket, there is a directory created with the value of the ticket (this is long string of numbers - its actually the time in milliseconds of when the ticket was created. Inside this directory there are step delta files which represents each step in the log for that ticket. The contents of the delta file may be the status string or the document itself (if the operation is updateDocument). The document is persisted as an object serialization.</para> ! </section> ! ! <section> ! <title>Jdbc Journal</title> ! <para>It is possible to use a database to store the journal log and the document data. Currently oracle and mysql are supported. The schema creation scripts are in the directory <userinput>readme/sql</userinput>. The document data is stored as binary data (<firstterm>BLOBs</firstterm>). Each vendor supports BLOBS slightly differently, hence the specific database support. There are three main tables involved in storing the journal data (the table <userinput>table_key</userinput> is for unique key generation), being:</para> ! <orderedlist> ! <listitem> ! <userinput>log</userinput>: Stores tickets and steps for the tickets as well as the operation details for each ticket step. The log_other_data column can either store the status message for updateStatus operations or the parent ticket id for forkTicket operations.</listitem> ! <listitem> ! <userinput>journal</userinput>: Stores the document as a blob for the ticket step. This is associated with updateDocument operations.</listitem> ! <listitem> ! <userinput>journal_data</userinput>: Storage for the enriched variables associated with the document. The primary reason that these variables are stored separately is that they can be used as query parameters for console operations. Note that long and binary variables are not stored to the database and that strings can get truncated.</listitem> ! </orderedlist> ! <para>The configuration for both the Mysql and Oracle journals are stored in the configuration file: <userinput>config/journal/sql/config.properties</userinput>. The only configuration option in this file is <userinput>resourceName</userinput> indicates that name of the resource that will manage the database connection. Current the journal is implmented in a separate schema (instance, whatever) than the other database storage areas (user, and console).</para> ! </section> ! ! <section> ! <title>Ejb Journal Implementation</title> ! <para>The intent of this journal implementation is to store the operation journal implementation in a J2EE container. Currently <productname>Jboss</productname> is explicitly supported but not to the exclusion of other containers. This implementation is really a shell around either the <userinput>simple</userinput> or <userinput>sql</userinput> journal implementations but running in a remote server. By this means, it is possible to move the journal operation to a central location. The configuration for the ejb implementation is stored in the configuration file: </para> ! </section> ! </section> ! <section> ! <title>Journal Tool</title> ! <para>The <userinput>journal tool</userinput> allows access to the journal from the command line. This enables complex queries to be applied against the journal. There are four separate types of queries:</para> ! <orderedlist> ! <listitem> ! <userinput>-L or --list</userinput>: List all the tickets and the steps in the journal. This can produce lots of output. This can be limited by the flag -n (no more than this many lines of output). It is also possible to start from another index other than zero using the -i flag</listitem> ! <listitem> ! <userinput>-T ticket-number or --tickets ticket-number</userinput>: List all the ticketsteps for the supplied ticket.</listitem> ! <listitem> ! <userinput>-D ticket-number.step or --document ticket-number.step</userinput>: Displays the contents of the document stored at the ticket/step to the screen</listitem> ! <listitem> ! <userinput>-R ticket-number.step or --replay ticket-number.step</userinput>: This will reintroduce the document at the the point it was stored or later.</listitem> ! </orderedlist> ! <para>There are a number of options which can change the display of the data from the tool - use the -h command line to get all the options for this tool</para> ! </section> ! </chapter> Index: chapter5.xml =================================================================== RCS file: /cvsroot/babeldoc/babeldoc/readme/userguide/chapter5.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** chapter5.xml 13 Aug 2003 11:48:24 -0000 1.2 --- chapter5.xml 30 Mar 2004 19:15:42 -0000 1.3 *************** *** 1,23 **** <?xml version="1.0" encoding="ISO-8859-1"?> ! <chapter> ! <title>Scanner</title> ! <section> ! <title>Introduction</title> ! <para>The scanner is a tool that scans for messages from a variety of sources and when a message is found, it is fed into the pipeline. The scanner is an automation tool in that a system can be built up using scanners and pipelines. This is an alternative to the <userinput>process</userinput> script which feeds a simple document into the pipeline when run. The scanner is currently capable of scanning a directory in a filesystem, a mailbox on a mail-server and a JMS queue. The period of scan and the pipeline to feed as well as other specific configuration options are all set in the <userinput>config/scanner/config/</userinput> file. There may be one or many scanning threads active, each configured differently. For example, one scanner thread could be polling a mailbox once every 60secs while another scanning a directory every 10seconds. The scanner is also capable of scanning based on a schedule specified in the same way that CRON is on UNIX systems.</para> ! </section> ! <section> ! <title>Starting scanner</title> ! <para>Scanner tool is started by <userinput>babeldoc scanner</userinput> command. This command will use configuration from <userinput>scanner/config</userinput>. If you want to use configuration from some different you can use <userinput>-s another_configuration</userinput> switch to specify configuration that should be used instead of default one. </para> ! </section> ! <section> ! <title>Configuration</title> ! <para>There are two kinds of configuration options available:</para> ! <orderedlist> ! <listitem><userinput>general</userinput>: these options are global and apply to all types of scanners.</listitem> ! <listitem><userinput>specific</userinput>: Options for a certain kind of scanner. For example the configuration: 'host' is only pertinent to the email scanner.</listitem> ! </orderedlist> ! <para>Each of the options are laid out as: Global Options and Scanner types.</para> ! </section> &scanners; --- 1,26 ---- <?xml version="1.0" encoding="ISO-8859-1"?> ! <chapter> ! <title>Scanner</title> ! <section> ! <title>Introduction</title> ! <para>The scanner is a tool that scans for messages from a variety of sources and when a message is found, it is fed into the pipeline. The scanner is an automation tool, in that a system can be built up using scanners and pipelines. This is an alternative to the <userinput>process</userinput> script which feeds a single document into the pipeline when run. The scanner is currently capable of scanning a directory in a filesystem, a mailbox on a mail-server, an FTP servcer, a web server, a database via a SQL query, external application output and a JMS queue. The period of scan and the pipeline to feed, as well as other specific configuration options are all set in the <userinput>config/scanner/config.properties</userinput>userinput> file. There may be one or many scanning threads active, each configured differently. For example, one scanner thread could be polling a mailbox once every 60 secs while another is scanning a directory every 10 seconds. The scanner is also capable of scanning based on a schedule specified in the same way that CRON is on UNIX systems.</para> ! <para>General attributes available are <userinput>file_name</userinput>, <userinput>scan_path</userinput> and <userinput>scan_date</userinput>.</para> ! </section> ! <section> ! <title>Starting scanner</title> ! <para>The scanner tool is started by running the command <userinput>babeldoc scanner</userinput>. This command will use configuration from <userinput>config/scanner/config.properties</userinput>. If you want to use configuration from a different file you can use <userinput>-s another_configuration</userinput> switch to specify the configuration that should be used instead of default one. </para> ! </section> ! <section> ! <title>Configuration</title> ! <para>There are two kinds of configuration options available:</para> ! <orderedlist> ! <listitem> ! <userinput>general</userinput>: these options are global and apply to all types of scanners.</listitem> ! <listitem> ! <userinput>specific</userinput>: Options for a certain kind of scanner. For example the configuration: 'host' is only pertinent to the email scanner.</listitem> ! </orderedlist> ! <para>The options for each scanner type are laid out below.</para> ! </section> &scanners; |