|
From: Budd, S. <s....@ic...> - 2002-06-10 09:04:40
|
Star office can save directly to good html format or to Ms Office 2000/Xp format. The choice is made within the "save as" option. You should be able to index directly from the saved docuement if you chose the "Web page" format. -----Original Message----- From: Gilles Detillieux [mailto:gr...@sc...] Sent: Saturday, June 08, 2002 4:31 AM To: cba...@eu... Cc: htd...@li... Subject: Re: [htdig-dev] openoffice parser According to EuropeanServers - Christophe BAEGERT: > we use htdig on word documents, but now we've switched to OpenOffice.org, and > we haven't any parser. Does it exist or is it planned ? I haven't heard of or found leads to an OpenOffice.org to HTML document converter. However, the OpenOffice.org web site states that these documents are XML, so it should be pretty easy for someone familiar with basic HTML, and with Perl, awk or sed scripting, to whip up a rudimentary XML to HTML converter specific for these documents, so it could pick out the elements you want and surround them by appropriate HTML tags so htdig indexes them using the word types you want (i.e. titles, meta keywords & description, hyperlinks and their descriptions, plain text). Even simpler, you could probably feed the XML straight into htdig's HTML parser and it would at least index most/all the text as plain text. Not having actually seen any OpenOffice.org documents, though, it's hard for me to speculate on exactly how easy or difficult the task might be. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink _______________________________________________ htdig-dev mailing list htd...@li... https://lists.sourceforge.net/lists/listinfo/htdig-dev |