I'm new to Aperture, so please bare with me if my questions are to simplistic;
I've managed to pass word documents and extract the data. When I access the plain text representation its encoded as UTF, i.e.:
"TITLE \\* Upper \\* MERGEFORMAT \u0014SCOS-2000 ADD\u0015\r\n\r\n\u0013 DOCPROPERTY Subject \\* MERGEFORMAT \u0014File Transfer\u0015\u0007\u0007\u0001\u0007\r\nReference: \u0013 DOCPROPERTY EgosFullReference \\* MERGEFORMAT \u0014EGOS-MCS-S2K-SDD-1212\u0015\r\nVersion: \u0013 DOCPROPERTY EgosIssue \\* MERGEFORMAT \u00141\u0015.\u0013 DOCPROPERTY EgosRevision \\* MERGEFORMAT \u00140\u0015\r\nDate: \u0013 DOCPROPERTY EgosDate \\* MERGEFORMAT \u00142009-01-03\u0015\u0007\u0007\u000CDocument Title:\u0007\u0013 DOCPROPERTY Title \\* MERGEFORMAT \u0014SCOS-2000 ADD\u0015\u0007\u0007Document Reference:\u0007\u0013 DOCPROPERTY EgosFullReference \\* MERGEFORMAT \u0014EGOS-MCS-S2K-SDD-1212\u0015\u0007\u0007Document Version:\u0007\u0013 DOCPROPERTY EgosIssue \\* MERGEFORMAT \u00141\u0015.\u0013 DOCPROPERTY EgosRevision \\* MERGEFORMAT \u00140\u0015\u0007Date:\u0007\u0013 DOCPROPERTY EgosDate \\* MERGEFORMAT \u00142009-01-03\u0015\u0007\u0007Abstract\u0007\u0007\u000..."
I would like to have the 'raw' text, i.e. remove all special characters such as '\u0013' and '\r'. I guess I need to configure my word extractor, but cant find out what I need to set (charset doesnt seem to do what I need) or the RDFcontainer, but I cant find any relevant settings.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.