From: David A. <D.J...@so...> - 2004-05-18 13:43:36
|
I don't know the answer to this, but someone in the htdig mailing list = probably does. David Adams ----- Original Message -----=20 From: Mohamed Ahmed Khonji=20 To: d.j...@so...=20 Sent: Tuesday, May 18, 2004 12:17 PM Subject: PDF indexing problem. Dear Mr. Adams, I am a newbie in Linux world, but I managed to familiarize = my self as if working on Windows as much I can. The problem is after indexing the website, and while searching I get the = PDF header as output like: =20 %PDF-1.3 %=E2?I? 134 0 obj << /Linearized 1 /O 136 /H [ 1629 900 ] /L = 1773217 /E 744863 /N 8 /T 1770418 >> endobj xref 134 60 0000000016 00000 = n 0000001551 00000 n 0000002529 00000 n 0000002703 00000 n 0000003149 = 00000 n 0000003205 00000 n 0000003551 00000 n 0000004093 00000 n = 0000004145 00000 n 0000004197=20 And I got these errors after running rundig (13 lines): =20 !! Malformed UTF-8 character (overflow at 0x860e94e4, byte 0x9d, = after start byte 0xbf) in substitution (s///) at = /opt/htdig/scripts/doc2html.pl line 503,<FILE> line 4.=20 I am using: 1) HT://DIG 3.2.0-16 with the web =91plug-in=92 2) RH 8.0 3) Doc2html which calls pdf2html =20 I searched the Internet for a solution, but I couldn=92t find any! Could you please assist me to solve this problem? =20 Thanking you, Regards, Mohammed Ahmed. =20 =20 |