Conversion almost perfect

Brought to you by: cinamod, dfaure_kde, doml, fjfranklin, and 3 others

Conversion almost perfect

Forum: Help

Creator: kenara

Created: 2005-07-07

Updated: 2013-06-04

kenara - 2005-07-07

Hi,

I am using wvWare to convert uploaded doc files into text. My script calls wvWare -x and all goes uneventfully.
I can then get rid of nondisplayable chars with Python regex, matching '\W', but this still leaves some ascii garbage at the top of each file. The example I'm looking at has lots of 'P', a 'bjbj', a few 'S' ...

I've searched a bit, but would like to like to know the way to remove all this 'header' stuff.

Thanks in advance

Ken

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- kenara - 2005-07-07
  
  for now (maybe forever), I replace \W with 'cutoffhere' and split on the last 'cutoffhere'...
  
  Better ideas welcome!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.