From: <ant...@gm...> - 2007-12-11 23:21:38
|
Keith R. Bennett pisze: > Hello, all. > > Currently, out of the box, Aperture seems to parse Excel files as a > single stream of text; that is, I get a single plainTextContent RDF > triplet for the text in all worksheets in the file. That makes sense. > However, I would like to have finer control -- for example, I would > like to be able to get each sheet's data separately. Is there any way > to do that with Aperture? If not, does Poi support that and will I > need to write my own code and not use Aperture for this? > > Thanks, > Keith Aperture doesn't support this kind of fine-grained extraction at the moment. I guess it might be possible. POI definitely does support interacting with an excel sheet on the cell level. We dicussed similar scenarios during the work on NIE. We dropped it though because devising a generic ontology that would allow the documents to be divided into parts is tricky. The simple solution would be to hack something for excel and something else for word and something still else for power point etc, but then you get things like 'before', 'after', 'above', 'to the left of' etc. When you start thinking about it you quickly arrive at the generic ontology of space (and time, power point presentations happen in time...). Such an extractor would have to be configurable. Most people need plain text as it is now, other may need each cell. Which doesn't mean that something like this won't appear in future :) At the moment, you'd need to write your own code with POI to do this. Antoni Mylka ant...@gm... |