Babeldoc: Universal Document Processor / Discussion / Open Discussion: help

Bill Harrelson - 2003-05-19

I've just downloaded Babeldoc and am trying to work my way through the examples. I'm finding either the configuration information pretty light, or I'm reading it wrong. In any case, the scanner example doesn't do anything at all - just sits there after I copy the file stats.xml into c:\tmp\in. I'm on WinXP Pro, set up my path and BABELDOC_HOME env. vars. etc. The doc creation example works just fine, but the scanner example does nothing.

I then tried the Higgsbros example (setting classpath to c:\Higgsbros;%classpath% and get
<2003-05-18 20:49:09,320> ERROR [main] : Error starting scanner: com.babeldoc.scanner.ScannerConfigurationException: No scanner threads provided

funny thing is that when I close the console window and re-open another to get a clean environment, I get the same error message trying to run the basic scanner example until I blow away the directory and re-extract it.

Clearly I'm doing something wrong. Can you point me to documentation on the scanner parameters and configuration information beyond what's in the User Guide?

Thanks,

BillH

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dejan Krsmanovic - 2003-05-19
  
  There are two ways to start scanner with the given configuration.
  1.
  - Set BABELDOC_HOME environment variable to point to build folder of Babeldoc installation. So if you have unpacked Babeldoc*.zip into c:\babeldoc, set your enviroment variable to c:\babeldoc\build. You must do this in order to Babeldoc work correctly.
  
  - Add %BABELDOC_HOME%\bin to your Path environment variable.
  - Change your working directory to example you want to start. Examples can be found under build\examples folder. Your working folder should contain other folders like (pipeline, scanner...)
  - Now start scanner with command 'babeldoc scanner'
  
  2.
  - Set BABELDOC_HOME
  - Set BABELDOC_USER variable to folder of example you want to start
  - Start %BABELDOC_HOME%\bin\babeldoc scanner
  
  Please let us know if you still have a problems.
  Dejan
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bill Harrelson - 2003-05-19
  
  Thanks for the quick reply. I had done most of that in (1) with the exception of changing to the directory of the example. The BABELDOC_USER variable (2) is new to me - I didn't see it in the doc, thanks.
  
  Anyway that at least gets the examples to start, but the scanner just sits there. In the example in the white paper, I copy stats.xml into the c:\tmp\in directory and nothing happens. In the Higgsbros example I created an order.csv file and dropped it into c:\tmp\orders\in. The config files seem to correspond to the doc.
  
  Any ideas?
  
  Thanks,
  
  BillH
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dejan Krsmanovic - 2003-05-19
    
    Can you send here your scanner configuration (scanner/config.properties file) for both examples you try to run.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Dejan Krsmanovic - 2003-05-19
      
      Also send output you got after starting scanner if any.
      
      Dejan
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Bill Harrelson - 2003-05-19
      
      Hi Dejan,
      
      Thanks for the quick response. Here is the configuration file from the \examples\scanner directory:
      
      directory.type=directory
      directory.period=10000
      directory.inDirectory=/tmp/in
      directory.doneDirectory=/tmp/done
      directory.pipeline=test
      
      =====
      Here are the environment variables:
      
      C:\Babeldoc\build\examples>echo %path%
      C:\WINNT\system32;C:\WINNT;C:\WINNT\system32\WBEM;C:\PROGRAM FILES\THINKPAD\UTIL
      ITIES;C:\PROGRA~1\MICROS~2\Office;C:\PROGRA~1\COMMON~1\XCPCSYNC\TRANSL~1\LTNTS4\
      ;C:\Program Files\Symantec\pcAnywhere\;C:\jakarta-ant-1.5.1\bin;C:\j2sdk1.4.1_01
      \bin;c:\babeldoc\build\bin
      
      C:\Babeldoc\build\examples>Echo %BABELDOC_HOME%
      c:\babeldoc\build
      
      C:\Babeldoc\build\examples>echo %BABELDOC_USER%
      %BABELDOC_USER%
      
      ====
      Here is the output:
      
      C:\Babeldoc\build\examples>babeldoc scanner
      Scanner directory config = directory
      <2003-05-19 12:19:35,127> INFO [main] : Starting thread: directory...
      <2003-05-19 12:19:35,167> INFO [main] : Thread directory scanning
      
      ====
      ....then it just waits and never does anything
      
      stats.xml is in C:\tmp\in
      
      Here is the configuration file from C:\Higgsbros\config\scanner directory:
      
      Higgsbros.type=directory
      Higgsbros.period=10000
      Higgsbros.inDirectory=c:/tmp/orders/in
      Higgsbros.doneDirectory=c:/tmp/orders/done
      Higgsbros.pipeline=Higgsbros
      
      Here is the output (same environment as above):
      
      C:\Higgsbros\config>babeldoc scanner
      Scanner Higgsbros config = Higgsbros
      <2003-05-19 12:22:32,813> INFO [main] : Starting thread: Higgsbros...
      <2003-05-19 12:22:32,823> INFO [main] : Thread Higgsbros scanning
      
      ====
      ....then it just waits and never does anything
      
      order.csv is in C:\tmp\orders\in
      
      I'm sure this is probably just a configuration problem, but any help is appreciated.
      
      Thanks again,
      
      Bill
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Klaus Koenig - 2003-05-19
  
  Hi,
  
  I have downloaded babeldoc 2 days ago and today I am experiencing the same problems as Bill is having, that is scanner starts but doen't do anything even if the file is in the "in" directory.
  
  Any Suggestion ?
  
  Also, is there a special pipeline stage which allows to pass a document to a java class supplied by the user ?
  
  thanks
  
  Klaus Koenig
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dejan Krsmanovic - 2003-05-20
    
    Yes, of course. You can extend almost every Babeldoc component. Writing your own pipeline stage is quite straightforward, you should extend com.babeldoc.core.pipeline.PipelineStage class and implement its process method. You should also add an entry in service/query.properties.
    
    I think there should be info about this in developers guide.
    
    Answer to scanner problem comming soon!
    Dejan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dejan Krsmanovic - 2003-05-20
    
    I found the where is the problem. Your configuration is OK, there is the bug in Babeldoc 1.0.0 version that will be fixed with 1.0.1 version
    
    Problem is in using filter property. This property is used for specifying regular expression filter for filenames. This property is optional and by default all files should be accepted. But, because of bug this is not true in current version. To prevent it, set filter to accept all files (some other regular expression that matches files you want to process) by putting something like this in your scanner config file.
    
    directory.filter=.*
    
    Please, let me know if this helps.
    I have also submited new bug report in tracker with this bug.
    
    Dejan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Bill Harrelson - 2003-05-20
      
      Yes, thanks, this gets the scanner example to work and causes the document in Higgsbros to be picked up and inserted into the pipeline.
      
      (Now I just have to figure out what I did wrong in the pipeline)
      
      Thanks again.
      
      Bill
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Dejan Krsmanovic - 2003-05-20
        
        What problems do you have with pipeline?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Bill Harrelson - 2003-05-20
        
        I was getting pipeline not found. I think that I don't clearly understand the relative referencing in configuration files. And I don't understand when something is a name (eg. for a pipeline like: Higgsbros) and when it's a directory or file reference.
        
        example:
        
        I have in
        c:\Higgsbros\scanner\config.properties:
        directory.type=directory
        directory.period=10000
        directory.inDirectory=c:/tmp/orders/in
        directory.doneDirectory=c:/tmp/orders/done
        directory.pipeline=higgsbros
        directory.filter=.*
        (which is finding the file in /in just fine and attempting to start the pipeline now) I assume from this specification that there should be a c:\Higgsbros\pipeline directory and in it a pipeline prefixed by "higgsbros". So I have in
        c:\Higgsbros\pipeline:
        higgsbros.type=simple
        higgsbros.configFile=c:/Higgsbros/pipeline/simple/order-outbound
        
        and in
        c:\Higgsbros\pipeline\simple\order-outbound.properties:
        
        entryStage=convert
        convert.stageType=FlatToXML
        convert.nextStage=transform
        convert.flatToXmlFile=test/order-convert.xml
        transform.stageType=XslTransform
        transform.nextStage=emailer
        transform.transformationFile=test/transform.xsl
        emailer.stageType=SmtpWriter
        emailer.nextStage=null
        emailer.smtpHost=appropriatesmtphost
        emailer.smtpFrom=orders@higgsbros.com
        emailer.smtpTo=orders-in@somewhereappropriate.com
        emailer.smtpSubject=order
        emailer.smtpMessage=${document.toString()}
        
        Now, should the "test" directory be a sub-directory of
        C:\Higgsboro\pipeline\simple?
        
        or of c:\Higgsboro? any way, I'm not sure it's finding the pipeline.
        The exception I get is:
        
        <2003-05-20 09:47:34,117> INFO [directory] : Processing 1 of total 1 messages
        <2003-05-20 09:47:34,347> INFO [directory] : Allocate ticket 1053438454327 for
        message order.csv
        <2003-05-20 09:47:34,577> ERROR [directory] : Error processing document
        com.babeldoc.core.pipeline.PipelineException: PipelineStage: entryStage not foun
        d
        
        So, I'm unclear on the directory referencing assumptions in the config files. Again, help is appreciated.
        
        Thanks,
        
        Bill
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Dejan Krsmanovic - 2003-05-20
        
        In c:\Higgsbros\pipeline folder you should have config.properties file with configuration:
        
        higgsbros.type=simple
        higgsbros.configFile=pipeline/simple/order-outbound
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Bill Harrelson - 2003-05-20
        
        I was getting pipeline not found. I think that I don't clearly understand the relative referencing in configuration files. And I don't understand when something is a name (eg. for a pipeline like: Higgsbros) and when it's a directory or file reference.
        
        example:
        
        I have in
        c:\Higgsbros\scanner\config.properties:
        directory.type=directory
        directory.period=10000
        directory.inDirectory=c:/tmp/orders/in
        directory.doneDirectory=c:/tmp/orders/done
        directory.pipeline=higgsbros
        directory.filter=.*
        (which is finding the file in /in just fine and attempting to start the pipeline now) I assume from this specification that there should be a c:\Higgsbros\pipeline directory and in it a pipeline prefixed by "higgsbros". So I have in
        c:\Higgsbros\pipeline:
        higgsbros.type=simple
        higgsbros.configFile=c:/Higgsbros/pipeline/simple/order-outbound
        
        and in
        c:\Higgsbros\pipeline\simple\order-outbound.properties:
        
        entryStage=convert
        convert.stageType=FlatToXML
        convert.nextStage=transform
        convert.flatToXmlFile=test/order-convert.xml
        transform.stageType=XslTransform
        transform.nextStage=emailer
        transform.transformationFile=test/transform.xsl
        emailer.stageType=SmtpWriter
        emailer.nextStage=null
        emailer.smtpHost=appropriatesmtphost
        emailer.smtpFrom=orders@higgsbros.com
        emailer.smtpTo=orders-in@somewhereappropriate.com
        emailer.smtpSubject=order
        emailer.smtpMessage=${document.toString()}
        
        Now, should the "test" directory be a sub-directory of
        C:\Higgsboro\pipeline\simple?
        
        or of c:\Higgsboro? any way, I'm not sure it's finding the pipeline.
        The exception I get is:
        
        <2003-05-20 09:47:34,117> INFO [directory] : Processing 1 of total 1 messages
        <2003-05-20 09:47:34,347> INFO [directory] : Allocate ticket 1053438454327 for
        message order.csv
        <2003-05-20 09:47:34,577> ERROR [directory] : Error processing document
        com.babeldoc.core.pipeline.PipelineException: PipelineStage: entryStage not foun
        d
        
        So, I'm unclear on the directory referencing assumptions in the config files. Again, help is appreciated.
        
        Thanks,
        
        Bill
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- bruce mcdonald - 2003-05-20
  
  Hello Bill,
  
  The pipeline configuration is routed out of the config/pipeline/config.properties file. This lists each of the pipelines, the pipeline type and the configuration file for each one. So for purposes of illustration:
  
  mypipeline.type=simple ; Configured by properties file
  mypipeline.configFile=pipeline/mypipeline
  
  This tells babeldoc that there is a pipeline called "mypipeline", its a "simple" pipeline (configured from a properties file as opposed to an XML file) and the location of the configuration file is: config/pipeline/mypipeline.properties. Note that all the configuration options start from a "config" directory. You can call your configuration directory anything you want, but it must be either in the CLASSPATH (not suggested) or in the BABELDOC_USER environment variable.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bill Harrelson - 2003-05-20
  
  Well, that worked,(changing to relative reference) - I'm surprised that the
  c:\Higgsbros\pipeline\config.properties file can't use a full path reference(e.g.):
  higgsbros.type=simple
  higgsbros.configFile=c:/Higgsbros/pipeline/simple/order-outbound
  
  Now the error that I get is:
  
  <2003-05-20 10:24:28,350> INFO [directory] : PipelineStage name: convert
  <2003-05-20 10:24:28,370> ERROR [directory] : Error processing document
  com.babeldoc.core.pipeline.PipelineException: Invalid pipeline stage type: FlatToXML
  
  I can't find the documentation for FlatToXML, but it's in the usage white paper example. Has this changed?
  
  Thanks,
  
  Bill
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dejan Krsmanovic - 2003-05-20
  
  Not intentionally!
  This is bug. Try adding directory service, and in in create query.properties.
  Put following line there:
  PipelineStage.FlatToXML=com.babeldoc.conversion.pipeline.stage.FlatToXmlPipelineStage
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Bill Harrelson - 2003-05-20
    
    I think that I don't know what this means:
    
    Try adding directory service, and in in create query.properties.
    Put following line there:
    PipelineStage.FlatToXML=com.babeldoc.conversion.pipeline.stage.FlatToXmlPipelineStage
    
    Can you point me to somewhere in the documentation? I can't find directory service or query.properties in the userguide.
    
    Thanks,
    Bill
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Dejan Krsmanovic - 2003-05-20
      
      Sorry, but English is not my native language. Also I was in a hurry when I was writing last reply. ;) I'll try again:
      In the c:\Higghsbros folder (folder that contains pipeline and scanner folders) create new folder called service. Then create query.properties file in this folder with line:
      PipelineStage.FlatToXML=com.babeldoc.conversion.pipeline.stage.FlatToXmlPipelineStage
      Now try running scanner. Note that this is just a workaround for this bug. Version 1.0.1 will fix this bug.
      
      Dejan
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Klaus Koenig - 2003-05-20
  
  Hi,
  
  following the instruction scanner works for me too, but I'm getting always the "Invalid pipeline stage type: FlatToXML " error, even with query.properties set in service direcotry. Any suggestion ?
  
  Thanks
  
  Klaus
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bill Harrelson - 2003-05-20
  
  I am also trying to use the mail scanner with the following lines as the second scanning thread in my scanner\config.properties file:
  
  mailbox.type=mailbox
  mailbox.host=pop.correctmaileraddress
  mailbox.username=babeldocq1
  mailbox.password=babeldocq1
  mailbox.pipeline=higgsbros
  mailbox.period=30000
  mailbox.protocol=pop3
  
  Then I send the .csv file as text in the body of the message. The mailbox thread detects the presense of mail, but does nothing with it (doesn't start the higgsbros thread the way the directory scanner does). And, there doesn't seem to be an option to delete the message from the mailbox, as it just keeps on detecting the mail again and again and doing nothing.
  
  As always, any help will be appreciated.
  
  Thanks,
  
  Bill
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dejan Krsmanovic - 2003-05-20
    
    Are you using the same config files as in the previous example? Is there any output (some exception)?
    By default messages are deleted from mail server (in fact they must be deleted since pop3 protocol does not support folders other then INBOX on mail server). I will check this tommorow morning at the office. This is quite strange to me since we are using mailbox scanner every day without problems.
    
    Dejan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Bill Harrelson - 2003-05-21
      
      I now have most of Higgsbros working - I had a few typos in the convert and the xsl files and I inserted some new pipeline stages to FileWrite the documents in intermediate stages so that I could see what was happening, but it seems to all work (I removed the [1] from orders[1]/...)
      
      I'm still having problems with the mail reader, writer works fine. The mail reader worked earlier at one point when I was on the same network as the server and now doesn't again. (I 'm on a laptop and move between networks.) When I watch my mail server, the connection opens, the scanner detects the document, and then just hangs - the connection between the mail reader and the mail server stays open, then 30 seconds later, the scanner opens another connection. From that point on there never seems to be more than 3 connections open, but there is always one. Scanner just keeps reprinting the message that 1 document was detected.
      
      Hope this helps.
      
      Thanks,
      
      Bill
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - bruce mcdonald - 2003-05-21
        
        Excellent news Bill!
        
        Can you provide some details about your email server? I have tested this with MS Exchange. Another idea is that we try and duplicate your issues. This should isolate some problems. If this works, join the babeldoc-user list and we can set this up.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Klaus Koenig - 2003-05-20
  
  Hi,
  
  now with the service subdirectory containing query.properties file the name FlatToXmlPipelineStage is correctly translated, but now I'm having classpath problem , because the classloader fails to instantiate com.babeldoc.conversion.pipeline.stage.FlatToXmlPipelineStage class. BABELDOC_HOME and BABELDOC_USER are correctly set. Could someone explain me whether a particular CLASSAPTH need to be set in order to make babeldoc able to load all jar files contained in build\lib directory ?
  
  Thanks
  
  Klaus
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

help - newbie

Forums

Help

help - newbie

stats.xml is in C:\tmp\in

order.csv is in C:\tmp\orders\in

help - newbie

Forums

Help

help - newbie document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

stats.xml is in C:\tmp\in

order.csv is in C:\tmp\orders\in

help - newbie