Hello again! One more question:)
Is it possible to parse PDF file from InputStream?
I'm trying to do the following:
PDDocument.createFromLocator(new StreamLocator(inputStream, "", ""));
However I receive UnsopportedOperationException from StreamLocator.getRandomAccess() method.
I've looked through the code and it looks like StreamLocator cannot be used to parse documents in any way.
So what the purpose of this class then?
And how can I extract text from InputStream? or at least FileInputStream?
Thanks for your answer!
It is NOT possible (and desirable) to parse from an InputStream as PDF parsing is not a "streamed" (reading byte by byte) operation. For efficient PDF parsing you need random access to the data - therefore a plain InputStream is not suitable.
StreamLocator exists as this package is low level code that is used in many other environments where random access is not needed. It gives transparent access to the streams it is defined upon.
To adapt a plain InputStream you could either create a temporary file (as in ClassResourceLocator) or read the whole file and create a ByteArrayLocator. If you need to, you can easily wrap this "caching" behavior in an ILocator type of your own.
If you have a FileInputStream you may be better of using a FileLocator.