Conversion almost perfect

Help
kenara
2005-07-07
2013-06-04
  • kenara

    kenara - 2005-07-07

    Hi,

    I am using wvWare to convert uploaded doc files into text. My script calls wvWare -x and all goes uneventfully.
    I can then get rid of nondisplayable chars with Python regex, matching '\W', but this still leaves some ascii garbage at the top of each file. The example I'm looking at has lots of 'P', a 'bjbj', a few 'S' ...

    I've searched a bit, but would like to like to know the way to  remove all this 'header' stuff.

    Thanks in advance

    Ken  

     
    • kenara

      kenara - 2005-07-07

      for now (maybe forever), I replace \W with 'cutoffhere' and split on the last 'cutoffhere'...

      Better ideas welcome!

       

Log in to post a comment.