From: DVD <dv...@ne...> - 2009-03-15 02:36:39
|
Hello: I just downloaded and browsed the source and went through a few tests. I wonder if the following primitive functions could be added short-circuited in the current implementation for light weight text extraction. I have a file entirely read into a byte array and I'd like to extract its text with a static method as below pseudo code aeh = ApertureExtractorHandler.initialize(); (run only once) String s = aeh.extractText(byteArray, fileName, encoding); I looked at the extractors and they all have RDF related stuff and the examples show to get text would involve quite a few RDF related init process. I just wonder if this package would offer a non-RDF alternatives. Thanks very much. |
From: Darren G. <da...@on...> - 2009-03-17 13:00:51
|
Take a look at the fileinspector example app. I pulled the text extraction code out of that into its own project and it seems to work, but the subcrawlers don't seem to work at all on linux. On Sat, 2009-03-14 at 22:36 -0400, DVD wrote: > Hello: > > I just downloaded and browsed the source and went through a few tests. > I wonder if the following primitive functions could be added short-circuited > in the current implementation for light weight text extraction. > I have a file entirely read into a byte array and I'd like to extract > its text with > a static method as below pseudo code > aeh = ApertureExtractorHandler.initialize(); (run only once) > > String s = aeh.extractText(byteArray, fileName, encoding); > > I looked at the extractors and they all have RDF related stuff and > the examples show to get text would involve quite a few RDF related init > process. > I just wonder if this package would offer a non-RDF alternatives. > > Thanks very much. > > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > _______________________________________________ > Aperture-devel mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/aperture-devel |
From: Antoni M. <ant...@gm...> - 2009-03-18 14:20:46
|
Darren Govoni pisze: > Take a look at the fileinspector example app. I pulled the text > extraction code out of that into its own project and it seems to work, > but the subcrawlers don't seem to work at all on linux. Don't they? I use Linux all the time for development and testing and they seem OK. Could you please file a bug report, describe your setup and tell what exactly doesn't work? Antoni Mylka ant...@gm... |
From: Antoni M. <ant...@gm...> - 2009-03-18 14:22:24
|
DVD pisze: > Hello: > > I just downloaded and browsed the source and went through a few tests. > I wonder if the following primitive functions could be added short-circuited > in the current implementation for light weight text extraction. > I have a file entirely read into a byte array and I'd like to extract > its text with > a static method as below pseudo code > aeh = ApertureExtractorHandler.initialize(); (run only once) > > String s = aeh.extractText(byteArray, fileName, encoding); > > I looked at the extractors and they all have RDF related stuff and > the examples show to get text would involve quite a few RDF related init > process. > I just wonder if this package would offer a non-RDF alternatives. > > Thanks very much. > This would be doable. A module that says: 1. Make sure appropriate jars are in the classpath. 2. Initialize once. 3. Use many times, hand in the stream, return a String. 4. Accept the simplifications: - no metadata (author, title etc.) - many objects in a single stream - fulltext concatenated (e.g. email with a pdf attachment) - no way to tell where one ends and the other begins We're working on the mavenization at the moment. This could be a separate maven module. It's difficult to tell when we'll be able to get down to this, so if you come up with some code, feel free to send it, it doesn't have to be state-of-the-art, we can polish the rough edges together. You could also file a feature request, that will help to track the progress. Antoni Mylka ant...@gm... |
From: Darren G. <da...@on...> - 2009-03-19 16:35:52
|
Yes. I would love to. Where should I do this? I download 1.2.0 clean, run the file inspector, point it to a zip or tar and it says "No extractor available for this mime type." My efforts to get the subcrawler code example to work produces the same results. I stepped through the code and it correctly invokes the Zip handler, but an error happens when it returns no entries for the file and aborts the process. Not sure why (endian issue maybe?), but I figured it some kind of anomaly. I'm using Ubuntu 64bit desktop. On Wed, 2009-03-18 at 15:20 +0100, Antoni Mylka wrote: > Darren Govoni pisze: > > Take a look at the fileinspector example app. I pulled the text > > extraction code out of that into its own project and it seems to work, > > but the subcrawlers don't seem to work at all on linux. > > Don't they? I use Linux all the time for development and testing and > they seem OK. Could you please file a bug report, describe your setup > and tell what exactly doesn't work? > > Antoni Mylka > ant...@gm... > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > _______________________________________________ > Aperture-devel mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/aperture-devel |
From: Antoni M. <ant...@gm...> - 2009-03-21 22:55:21
|
Darren Govoni pisze: > Yes. I would love to. Where should I do this? In our issue tracker: http://sourceforge.net/tracker/?atid=779500&group_id=150969&func=browse Also, please try to produce a zip test file, that exhibits that problem (no files extracted) and that doesn't contain any sensitive files. I would like to include it in our test suite. Antoni Mylka ant...@gm... > I download 1.2.0 clean, run the file inspector, point it to a zip or tar > and it says "No extractor available for this mime type." My efforts to > get the subcrawler code example to work produces the same results. I > stepped through the code and it correctly invokes the Zip handler, but > an error happens when it returns no entries for the file and aborts the > process. Not sure why (endian issue maybe?), but I figured it some kind > of anomaly. > > I'm using Ubuntu 64bit desktop. > > On Wed, 2009-03-18 at 15:20 +0100, Antoni Mylka wrote: >> Darren Govoni pisze: >>> Take a look at the fileinspector example app. I pulled the text >>> extraction code out of that into its own project and it seems to work, >>> but the subcrawlers don't seem to work at all on linux. >> Don't they? I use Linux all the time for development and testing and >> they seem OK. Could you please file a bug report, describe your setup >> and tell what exactly doesn't work? >> >> Antoni Mylka >> ant...@gm... >> >> ------------------------------------------------------------------------------ >> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are >> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and >> easily build your RIAs with Flex Builder, the Eclipse(TM)based development >> software that enables intelligent coding and step-through debugging. >> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com >> _______________________________________________ >> Aperture-devel mailing list >> Ape...@li... >> https://lists.sourceforge.net/lists/listinfo/aperture-devel > > |
From: Darren G. <da...@on...> - 2009-03-22 23:25:09
|
Will do thanks. I will do a clean code checkout, build and test and include the exact steps, platform I am using, etc. You guys make an awesome product and I know you're very busy with the maven conversion, so thanks for the time. On Sat, 2009-03-21 at 23:55 +0100, Antoni Mylka wrote: > Darren Govoni pisze: > > Yes. I would love to. Where should I do this? > > In our issue tracker: > > http://sourceforge.net/tracker/?atid=779500&group_id=150969&func=browse > > Also, please try to produce a zip test file, that exhibits that problem > (no files extracted) and that doesn't contain any sensitive files. I > would like to include it in our test suite. > > Antoni Mylka > ant...@gm... > > > I download 1.2.0 clean, run the file inspector, point it to a zip or tar > > and it says "No extractor available for this mime type." My efforts to > > get the subcrawler code example to work produces the same results. I > > stepped through the code and it correctly invokes the Zip handler, but > > an error happens when it returns no entries for the file and aborts the > > process. Not sure why (endian issue maybe?), but I figured it some kind > > of anomaly. > > > > I'm using Ubuntu 64bit desktop. > > > > On Wed, 2009-03-18 at 15:20 +0100, Antoni Mylka wrote: > >> Darren Govoni pisze: > >>> Take a look at the fileinspector example app. I pulled the text > >>> extraction code out of that into its own project and it seems to work, > >>> but the subcrawlers don't seem to work at all on linux. > >> Don't they? I use Linux all the time for development and testing and > >> they seem OK. Could you please file a bug report, describe your setup > >> and tell what exactly doesn't work? > >> > >> Antoni Mylka > >> ant...@gm... > >> > >> ------------------------------------------------------------------------------ > >> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > >> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > >> easily build your RIAs with Flex Builder, the Eclipse(TM)based development > >> software that enables intelligent coding and step-through debugging. > >> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > >> _______________________________________________ > >> Aperture-devel mailing list > >> Ape...@li... > >> https://lists.sourceforge.net/lists/listinfo/aperture-devel > > > > > |