Conversion almost perfect

Help
kenara
2005-07-07
2013-06-04
  • kenara
    kenara
    2005-07-07

    Hi,

    I am using wvWare to convert uploaded doc files into text. My script calls wvWare -x and all goes uneventfully.
    I can then get rid of nondisplayable chars with Python regex, matching '\W', but this still leaves some ascii garbage at the top of each file. The example I'm looking at has lots of 'P', a 'bjbj', a few 'S' ...

    I've searched a bit, but would like to like to know the way to  remove all this 'header' stuff.

    Thanks in advance

    Ken  

     
    • kenara
      kenara
      2005-07-07

      for now (maybe forever), I replace \W with 'cutoffhere' and split on the last 'cutoffhere'...

      Better ideas welcome!