Menu

Loading database from flat files

Help
2003-12-19
2013-04-16
  • Mitch Christensen

    Hi,

    I need a pipeline that will scan a set of sub-directories, read a tab-delimited text file (*.ODI)and update a database based on the text file contents.

    This seems right up BabelDoc's alley.  I am new to BableDoc and was wondering if I could get some pointers.  I was thinking of the following pipeline,

    DirectoryScanner->FlatToXml->XpathSplitter->XslTransform->SqlWriter

    Am I even in the ballpark?

     
    • Sherman Wood

      Sherman Wood - 2003-12-19

      I have not played with the FlatToXml stage, but the general pattern you outline is fine.

      Sherman

       
      • maxdaros

        maxdaros - 2004-11-02

        Hi all, pheraps my question seems trivial (sorry for this)

        I have a pipeline like FlatToXml --> XslkTransform --> SqlWriter

        here is the pipeline configuration file:
        entryStage                 = convert

        convert.stageType       = FlatToXml
        convert.nextStage       = transform
        convert.flatToXmlFile    = convertions/file-convert.XML
        convert.tracked           = true

        transform.stageType     = XslTransform
        transform.nextStage       = dbwriter
        transform.transformationFile = dbloader/convertions/transform.xsl
        transform.tracked          = true

        dbwriter.stageType        = SqlWriter
        dbwriter.nextStage         = null
        dbwriter.messageTag      = messageTagErrorSql
        dbwriter.resourceName    = jdbcConnection
        dbwriter.sql                   = ${document}
        dbwriter.failOnFirst         = false
        dbwriter.useBatch          = true
        dbwriter.tracked            = true

        after the transform phase the output file is:
        <?xml version="1.0" encoding="UTF-8"?>

            INSERT INTO tb_Max ( Text, Description ) VALUES ("Marco","Descrizione di Marco");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Mauro","Descrizione di Mauro");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Giovanni","Descrizione di Giovanni");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Francesco","Descrizione di Francesco");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Aldo","Descrizione di Aldo");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Luigi","Descrizione di Luigi");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Anna","Descrizione di Anna");
         
            INSERT INTO tb_Max ( Text, Description ) VALUES ("Eleonora","Descrizione di Eleonora");

        Unfortunately I have a problem submitting this document to the SqlWriter stage. Can anyone help me? How can I submit a sql query file obtained by a previous step?

        Thanks in advance ;)

         
        • Mitch Christensen

          We do this very thing (successfully).

          First off, get rid of the XML preamble (i.e. <?xml...?>) by setting the document output type to 'text' in your XSLT script.  This is done by adding the following:

          <xsl:output method="text"/>

          to the top of your XSLT script.

          Once you have pure SQL in your output document the SqlWriter should handle it just fine.

           
    • Dejan Krsmanovic

      I have used Babeldoc for similiar tasks (altough I had XML documents instead of 'flat' files). Besides other tasks, I have used XslTransform stage to transform XML file to sql script that was executed inside SqlWriter stage. So, maybe you won't need XpathSplitter at all.

      On the other side I needed many other pipeline stages for performing other tasks (validations etc). Anyway, this was one year ago when Babeldoc was in 0.5 version and when many other features and useful pipeline stages haven't existed so I believe now it wouldn't be that complex...

      Dejan

       
    • Mitch Christensen

      Excellent solution.

      I now have a pipeline that reads a CSV flat-file, converts to XML using FlatToXml, then to an SQL script (multiple insert statements) using XslTransform.  Now I want to execute that SQL script using SqlWriter.  However, the 'sql' config parameter to SqlWriter takes an SQL script rather than processing the pipeline document contents.

      I tried doing the following:

      dbload.sql=$document

      however I always get the following exception:

      "Error executing script: Non supported SQL92 token at position: 71: mime-type"

      Is there a trick to getting SqlWriter to process the document contents as a multi-statment SQL script?

      TIA,
      -Mitch

       
    • Dejan Krsmanovic

      Try using braces. That is:
      dbload.sql=${document}
      or
      dbload.sql=${document.toString()}

       
      • Dejan Krsmanovic

        I would also suggest you to use FileWriter stage after XslTrasnform, so you can check if your document is valid.

         
        • Mitch Christensen

          I sincerely appreciate the response.

          Unfortunately, I've done all that.  My SQL statement is simply "SELECT SYSDATE FROM DUAL", which works fine if I paste it directly in the loaddb.sql parameter in the pipeline.config file.

          When I step into SqlWriterPipelineStage.jave line 177, the sqlScript value is as follows (via debugger inspection):

          sqlScript= "com.babeldoc.core.pipeline.PipelineDocument@1938039[attributes={binary=false, file_name=\\dev\\rapid\\babeldoc\\io\\done\\results.txt},buffer={83,69,76,69,67,84,32,115,121,115,100,97,116,101,32,70,82,79,77,32,68,85,65,76}]"

          it is as if the toString() method is never getting called.  The data in the 'buffer=' array is my SQL query.  This is beginning to feel like a bug.

          Any other suggestions?

          Thanks again for your help.
          -Mitch

           
    • Mitch Christensen

      Actually, looking at the code, $document.toString() results in the creation of a jakarta commons ToStringBuilder() which is documented to do exactly what I'm seeing (i.e. string-ify complex objects).  However, this is not the same as accessing the document contents as a string.

      It appears that there is no way (currently) to access the PipelineDocument contents as a string.

      Should I open a bug?

      -Mitch

       
      • Dejan Krsmanovic

        Yes, you are right. This is a bug. If you take a look at CVS repository you could notice that toString() method was changed in latest revision. Before that, toString would return document content.
        As I currently cannot see how the document content could be accessed I guess this should be reverted to the earlier revision, or new method should be introduced.
        So please submit this issues as new bug. I would like to discuss about this with Bruce, since he had changed toString() method. This is very serious issue since few examples from documentation uses this method too.

         
        • Mitch Christensen

          I did open bug #866167 last night.  I forgot to update this thread with this fact.

          FWIW, since PipelineDocument is a complex object, I'm not sure that using toString() to represent the contents is such a good idea anyways.  It may be better to go with the getContents() method I proposed in the bug report.

          Of course this doesn't address the broken examples (or outstanding code others may have written that relies on toString()).

          -Mitch

           
    • bruce mcdonald

      bruce mcdonald - 2003-12-27

      You are right - the toString on the PipelineDocument is the currently correct.  What is needed is a way to correcly convert the array of bytes that to a string object.

      You can get the pipelinedocument payload as an array of bytes - can this be used directly?

       
      • Mitch Christensen

        My description of bug #866167 describes the solution that I have implemented locally.

        This involves adding a getContents() method to PipelineDocument so I can do the following:

        dbload.sql=$document.contents

        This makes sense to me, and works fine.

        I did try to use the byte array to no avail.  It needs to be a String, and I couldn't figure a way to do the conversion within a velocity reference.

        -Mitch

         
    • Santiago Begué

      Santiago Begué - 2004-03-04

      Scanning from a directory and sending whith a smtpwriter I've founded the same problem for write the document contents in the smtpMessage.
      I've implemented your solution as suggested in the bug track and works ok.

       
    • maxdaros

      maxdaros - 2004-11-02

      Many thanks, mitchellch

      I'm trying with your suggestion.
      Now the output file after the transform phase is pure sql, so, I'm a little bit near the solution.

      But the SqlWriter stage throws me an Exception:

      the command line with it's output is:
      babeldoc process -p dbloader -f data/inputData.txt
      <2004-11-02 17:58:33,900> INFO  [main] :  PipelineStage name: convert
      <2004-11-02 17:58:34,181> INFO  [main] :  PipelineStage name: fWriter
      <2004-11-02 17:58:34,197> INFO  [main] :  PipelineStage name: transform
      <2004-11-02 17:58:34,384> INFO  [main] :  PipelineStage name: fWriter2
      <2004-11-02 17:58:34,400> INFO  [main] :  PipelineStage name: dbwriter
      <2004-11-02 17:58:34,634> WARN  [main] :  Error executing script: Syntax error or access violation m
      essage from server: "You have an error in your SQL syntax.  Check the manual that corresponds to you
      r MySQL server version for the right syntax to use near 'com.babeldoc.core.pipeline.PipelineDocument
      @e06940[attributes=,"
      <2004-11-02 17:58:34,634> ERROR [main] :  [DefaultPipelineStageErrorHandler.handlePipelineStageError
      ] PipelineStage name: dbwriter Error: com.babeldoc.core.pipeline.PipelineException: Error executing
      sql: .... (continues) ...

      I'm not sure that I can forward the output document of the transorm phase to SqlWriter stage by the attribute dbwriter.sql=$[document}

      Is there any example (apart examples in the babeldoc user guide) that I can examinate?

      Thank you very much.

      Massimo

       
      • johnMac

        johnMac - 2004-11-02

        Try dbwriter.sql=$document.contents
        ...johnM

         
    • maxdaros

      maxdaros - 2004-11-02

      mmmhh,

      $document.contents doesn't work.

      and dbwriter.sql={$document.contents}

      causes the following Exception:
      <2004-11-02 21:17:23,091> WARN  [main] :  Error executing script: Query can not be null or empty
      <2004-11-02 21:17:23,111> ERROR [main] :  [DefaultPipelineStageErrorHandler.handlePipelineStageError] PipelineStage name
      dbwriter Error: com.babeldoc.core.pipeline.PipelineException: Error executing sql:
      <2004-11-02 21:17:23,111> ERROR [main] :  [PipelineFeeder.process]
      com.babeldoc.core.pipeline.PipelineException: Error executing sql:
              at com.babeldoc.sql.pipeline.stage.SqlWriterPipelineStage.process(Unknown Source)
              at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
              at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
              at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source)
              at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source)
              at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source)
              at com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source)
              at com.babeldoc.core.pipeline.feeder.FeederFactory.process(Unknown Source)
              at com.babeldoc.core.pipeline.command.PipelineFeeder.process(Unknown Source)
              at com.babeldoc.core.pipeline.command.PipelineFeeder.execute(Unknown Source)
              at com.babeldoc.core.BabeldocCommand.executeCommand(Unknown Source)
              at com.babeldoc.core.BabeldocCommand.<init>(Unknown Source)
              at com.babeldoc.core.pipeline.command.PipelineFeeder.<init>(Unknown Source)
              at com.babeldoc.core.pipeline.command.PipelineFeeder.main(Unknown Source)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:585)
              at com.babeldoc.core.Main.handleCommand(Unknown Source)
              at com.babeldoc.core.Main.main(Unknown Source)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:585)
              at com.babeldoc.init.Main.runBabeldocMain(Unknown Source)
              at com.babeldoc.init.Main.main(Unknown Source)
      Caused by: com.babeldoc.core.pipeline.PipelineException: Query can not be null or empty
       
      • Mitch Christensen

        Hey,

        I think you might be dealing with an old issue.  I fixed this problem locally, and posted the solution to the bug tracker (see https://sourceforge.net/tracker/?group_id=56976&atid=482496&func=detail&aid=866167\).  The fix requires building a new Babeldoc (very easy actually).

        I assume that becuase the bug still has a status of 'open' that the fix had not yet been incorporated back into the codebase.

        The problem is (or may be) that there was a bug in Babeldoc that results in $document (which really results in a call to document.toString()) returning an array of bytes, which isn't what your SqlWriter stage expects.  My proposed fix simply fixes document.getString() to return the truly stringified document contents.

        In a nutshell, if all other things are correct, implement the fix described in bug #866167 and you should be good to go.

         
    • maxdaros

      maxdaros - 2004-11-02

      Fine!!!

      I've modified com.babeldoc.core.pipeline.PipelineDocument as you said in the bug tracker, and now the pipeline works fine.

      Thank you very much ;)

       
      • Mitch Christensen

        Glad to hear it.

        It would be nice if I could appeal to someone with committer priveliges to apply this fix to the codebase.

        Is there anyone still supporting/working on Babeldoc?

        -Mitch

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.