-
Yes, that fixed it!
2009-11-05 21:48:47 UTC in Aperture
-
cfmfluit committed revision 2103 to the Aperture SVN repository, changing 3 files.
2009-10-20 10:17:01 UTC in Aperture
-
cfmfluit committed revision 2101 to the Aperture SVN repository, changing 1 files.
2009-10-13 09:26:05 UTC in Aperture
-
cfmfluit committed revision 2100 to the Aperture SVN repository, changing 1 files.
2009-10-13 08:50:41 UTC in Aperture
-
cfmfluit committed revision 2067 to the Aperture SVN repository, changing 1 files.
2009-09-09 10:36:42 UTC in Aperture
-
See this page for info on a notorious case concerning the metadata found in a Word document: http://www.computerbytesman.com/privacy/blair.htm.
This document talks about the revision log inside a Word document. I checked with the Aperture trunk: our metadata only contains the last contributor from this revision log. Is it possible to extract the other contributors as well, through POI or...
2009-09-03 15:56:22 UTC in Aperture
-
DataObjectFactory is responsible for translating a MimeMessage data structure into a tree of DataObjects. It is used in a number of places. Right now, the use of this class is hardcoded in all those places. However, the desired mapping of a MimeMessage to a DataObject tree may be application-dependent. A possible improvement would be to turn DataObjectFactory into an API with associated registry,
2009-09-03 15:35:31 UTC in Aperture
-
The current OpenXmlExtractor uses quite low-level heuristics to extract the full-text from the MS Office 2007 family of document formats. The implementation stems from when Office 2007 was still in beta and there were no OSS Java libraries available to do the text extraction. Apache POI has recently made significant advances with OpenXML processing. Its accuracy may by now be a lot better than...
2009-09-03 15:26:47 UTC in Aperture
-
The current PlainTextExtractor has difficulties processing plain text files that do not contain a UTF Byte Order Mark (BOM). If a BOM is missing, it defaults to the platform charset when transforming bytes into characters. This results in e.g. Asian text files resulting in complete garbage when being read on an English platform.
I have attached a new implementation that tries to guess the...
2009-09-03 15:18:28 UTC in Aperture
-
The current RtfExtractor uses javax.swing.text.rtf.RTFEditorKit to create a data structure out of a RTF stream and extract the text from it. This RTF parser seems to be very buggy, in practice we get lots of Exceptions, e.g.
java.lang.NullPointerException
at java.util.Hashtable.put(Unknown Source)
at javax.swing.text.rtf.RTFReader$AttributeTrackingDestination.handleKeyword(Unknown Source)
2009-09-03 15:02:13 UTC in Aperture