|
From: Eric A. <and...@ce...> - 2004-02-06 04:59:43
|
Neal Chant wrote: > Hi List, > > Just testing sprawler for one of our requirements. > Setup is very simple - 1 x .htm file to start with, have included output for > info. > Seems to have a problem going through the htm file "body Unknown char % in > body: 0" > The .htm file is simple "line 1, line 2" etc. > > Any pointers to solving this? Thanks for the debug info - does it do this with more than one file? Also, can you send me the .htm file you are using as a sample? > Thanks in advance > > Neal Chant > Systems Administration > Mercury International > > > > > index path: /data2/IT/CONTRACTS/ > document paths: /data2/IT/CONTRACTS/ > url locations: > reindex interval (mins): 1440 > indexable extensions: html htm > known languages: czech danish dutch english french german hungarian italian > norwegian polish portugese spanish turkish > Building index list.. > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype > local -type f > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type > f > Successfully added 1 documents to queue > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt > Begin indexing documents > One # = 0 documents > 0% 50% 100% > [Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 > Title: test htm > Filesize: 264 > Has 8 words - 8 total document words > checking words in document and removing stopwords > Unknown char % in body: 0 > Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 > Attempt to free unreferenced scalar at ./indexer.pl line 253 > Segmentation fault -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |