From: Alex M. H. <amh...@on...> - 2011-01-25 09:20:08
|
Hi, I am looking for a solution to upload Word and PDF files and place copies of their plain text in the body of their associated File: page to allow search engine indexing as well as plain text preview of the text. (Why would anyone want to look at the plaintext instead of the actual file? Bandwidth limitations, working from a public computer without a Word/OpenOffice installation or a proper PDF viewer, etc.) In the past, I have used a modified version of the old SMWHalo SMWUploadConverter sub-extension (I believe from the 1.4.x era or even earlier; certainly before the RichMedia extension) to upload Word or PDF files, then render them, character-for-character, in the main section of a File page (of course, allowing room at the top/bottom for some semantic annotations). It worked quite well on Halo and non-Halo SMW installations (1.4.0-1.5.2) on MW 1.15.x on a Linux box. Unfortunately, it does not seem to work on MW 1.16+ because of this: "Fatal error: Cannot access protected property UploadFromFile:: $mLocalFile in /path/to/extensions/SMWUploadConverter_copy/SMW_UploadConverter-amh-monolithic.php on line 63" For my applications, I do not need to process any files other than Word or PDF. Code is pastied at http://pastie.org/private/tsnl8lulxnxw3hjw08ehg I have tried the FileIndexer extension, which does allow the file content to be indexed, but does not by default allow extract plaintext onto the file page (and if it is configured to do that, it does not place it on the page in exactly the same format as it appears on the page, it drops all words of less than 3 characters, replaces all newlines with |, drops all other non-word characters, and eliminates pagebreaks, so skimming the plain text preview once you arrive on the page is essentially impossible). RichMedia extension is too big and complex (and Windows server dependent) to use for my purposes (I'm on Linux), plus it requires patches to core MW code as well as a full Halo installation, which is not appropriate or feasible for several applications. Any suggestions about how the code in the pastie above could be modified to work with MW 1.16+? My PHP capabilities (not to mention time) are somewhat limited, but I think that the solution would be relatively quick and straightforward for someone is a bit more conversant in PHP than a simple lawyer (me). Any input would be very much appreciated. Thanks, Alex -- ontolawgy™ LLC: connect . . . the . . . dots™ http://ontolawgy.com |