|
From: Neal C. <nc...@me...> - 2004-02-03 12:17:04
|
Hi List, Just testing sprawler for one of our requirements. Setup is very simple - 1 x .htm file to start with, have included output for info. Seems to have a problem going through the htm file "body Unknown char % in body: 0" The .htm file is simple "line 1, line 2" etc. Any pointers to solving this? Thanks in advance Neal Chant Systems Administration Mercury International index path: /data2/IT/CONTRACTS/ document paths: /data2/IT/CONTRACTS/ url locations: reindex interval (mins): 1440 indexable extensions: html htm known languages: czech danish dutch english french german hungarian italian norwegian polish portugese spanish turkish Building index list.. /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype local -type f /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type f Successfully added 1 documents to queue Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt Begin indexing documents One # = 0 documents 0% 50% 100% [Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 Title: test htm Filesize: 264 Has 8 words - 8 total document words checking words in document and removing stopwords Unknown char % in body: 0 Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 Attempt to free unreferenced scalar at ./indexer.pl line 253 Segmentation fault |