From: David A. <D.J...@so...> - 2002-06-18 14:18:19
|
Normally, a .txt file is "plain/text" and htdig will not need a converter script, though you could provide one if you wished. I think a parser which can provide meta tags and title from any given plain/text page would be too difficult, but if you have a set of document= s in a common style then you might be able to do something. .asp URLs are usually scripts which return "plain/HTML", and their author= s should have provided sensible META tags and titles. Htdig does need a converter for Acrobat files (.pdf), and the xpdf packag= e is suitable. Xpdf includes code for converting .pdf files into HTML, but (IMHO) the pdf2html.pl script included with doc2html will make a better j= ob of META tags and title. pdf2html.pl uses pdfinfo and pdftotext from the xpdf package. -- David Adams Computing Services Southampton University ----- Original Message ----- From: "Am=E9lie Frenette" <amelie.frenette@UMontreal.CA> To: <htd...@li...> Sent: Tuesday, June 18, 2002 2:31 PM Subject: [htdig] pdf, txt, asp, etc. Hi, I know that a parser can convert pdf, txt, asp files to html. For pdf and txt files, does all the information is transfered into the body of the html f= ile ? If so, if HT://Dig is set to consider meta tags and title, it will not be useful ? Thanks for your support, Am=E9lie Frenette =C9tudiante =E0 la ma=EEtrise Sciences de l'information Sp=E9c. Gestion de l'information =E9lectronique Universit=E9 de Montr=E9al -------------------------------------------------------------------------= --- Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf <<< _______________________________________________ htdig-general mailing list <htd...@li...> To unsubscribe, send a message to <htd...@li...> with a subject of unsubscri= be FAQ: http://htdig.sourceforge.net/FAQ.html |