You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(5) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
|
Feb
(13) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: <ben...@id...> - 2004-05-22 13:02:41
|
Dear Open Source developer I am doing a research project on "Fun and Software Development" in which I kindly invite you to participate. You will find the online survey under http://fasd.ethz.ch/qsf/. The questionnaire consists of 53 questions and you will need about 15 minutes to complete it. With the FASD project (Fun and Software Development) we want to define the motivational significance of fun when software developers decide to engage in Open Source projects. What is special about our research project is that a similar survey is planned with software developers in commercial firms. This procedure allows the immediate comparison between the involved individuals and the conditions of production of these two development models. Thus we hope to obtain substantial new insights to the phenomenon of Open Source Development. With many thanks for your participation, Benno Luthiger PS: The results of the survey will be published under http://www.isu.unizh.ch/fuehrung/blprojects/FASD/. We have set up the mailing list fa...@we... for this study. Please see http://fasd.ethz.ch/qsf/mailinglist_en.html for registration to this mailing list. _______________________________________________________________________ Benno Luthiger Swiss Federal Institute of Technology Zurich 8092 Zurich Mail: benno.luthiger(at)id.ethz.ch _______________________________________________________________________ |
|
From: Neal C. <nc...@me...> - 2004-02-10 15:01:07
|
No change.. Shall I keep these one liners off the list? Can always publish later... Neal -----Original Message----- From: Eric Anderson [mailto:and...@ce...] Sent: 10 February 2004 2:09 PM To: Neal Chant Cc: spr...@li... Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Googling suggests that this version of Perl is broken! > The work arounds haven't worked - set LANG as eng_GB as opposed to > eng_GB.UTF-8. > > Will be happy to hold off upgrading Perl so you have a broken (!#?) test > bed - will that help? > Let me know.... Hmm.. Try: export LANG=C (for the bash shell. ymmv) Does that help it? Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Eric A. <and...@ce...> - 2004-02-10 14:09:58
|
Neal Chant wrote: > Googling suggests that this version of Perl is broken! > The work arounds haven't worked - set LANG as eng_GB as opposed to > eng_GB.UTF-8. > > Will be happy to hold off upgrading Perl so you have a broken (!#?) test > bed - will that help? > Let me know.... Hmm.. Try: export LANG=C (for the bash shell. ymmv) Does that help it? Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-10 14:04:14
|
Googling suggests that this version of Perl is broken! The work arounds haven't worked - set LANG as eng_GB as opposed to eng_GB.UTF-8. Will be happy to hold off upgrading Perl so you have a broken (!#?) test bed - will that help? Let me know.... Neal -----Original Message----- From: spr...@li... [mailto:spr...@li...]On Behalf Of Eric Anderson Sent: 10 February 2004 1:36 PM To: Neal Chant Cc: spr...@li...; MSupport Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Found it! > > It runs fine on RedHat8 with Perl 5.8.0-55 > But not on RH9 with Perl 5.8.0-88 or 88.3 > > Humm, RH do get bits wrong, I'll have a Google and see what's up. > Will also compare the output of perl -V. > > In answer to previous, straight forward ./ to execute. > Pick language - Good Perl OK, Bad Perl same error. Yea! Finally! Thanks for spotting it. I was starting to think maybe I had made changes somewhere that weren't in the tarball. Now that you seem to know the error, hopefully we can find a workaround or a patch to make sprawler work on all versions of perl. I'd be interested to hear your feedback on it also. Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Sprawler-general mailing list Spr...@li... https://lists.sourceforge.net/lists/listinfo/sprawler-general |
|
From: Eric A. <and...@ce...> - 2004-02-10 13:36:11
|
Neal Chant wrote: > Found it! > > It runs fine on RedHat8 with Perl 5.8.0-55 > But not on RH9 with Perl 5.8.0-88 or 88.3 > > Humm, RH do get bits wrong, I'll have a Google and see what's up. > Will also compare the output of perl -V. > > In answer to previous, straight forward ./ to execute. > Pick language - Good Perl OK, Bad Perl same error. Yea! Finally! Thanks for spotting it. I was starting to think maybe I had made changes somewhere that weren't in the tarball. Now that you seem to know the error, hopefully we can find a workaround or a patch to make sprawler work on all versions of perl. I'd be interested to hear your feedback on it also. Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-10 10:06:57
|
Found it! It runs fine on RedHat8 with Perl 5.8.0-55 But not on RH9 with Perl 5.8.0-88 or 88.3 Humm, RH do get bits wrong, I'll have a Google and see what's up. Will also compare the output of perl -V. In answer to previous, straight forward ./ to execute. Pick language - Good Perl OK, Bad Perl same error. Neal -----Original Message----- From: spr...@li... [mailto:spr...@li...]On Behalf Of Eric Anderson Sent: 09 February 2004 5:04 PM To: Neal Chant Cc: spr...@li...; MSupport Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Here you go, output is $debug_verbose = "1"; > html tarred and attached. Well, the difference is the newlines (dos newlines versus unix newlines), but that doesn't hurt anything. My indexer still works fine. I have a few questions - What version of perl are you using? Are you using any command line switches, or environment variables to affect how perl is being executed? What OS is it running on? Can you comment out the "pick_language" routine, line 242, and tell me what happens? Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Sprawler-general mailing list Spr...@li... https://lists.sourceforge.net/lists/listinfo/sprawler-general |
|
From: Eric A. <and...@ce...> - 2004-02-09 17:04:16
|
Neal Chant wrote: > Here you go, output is $debug_verbose = "1"; > html tarred and attached. Well, the difference is the newlines (dos newlines versus unix newlines), but that doesn't hurt anything. My indexer still works fine. I have a few questions - What version of perl are you using? Are you using any command line switches, or environment variables to affect how perl is being executed? What OS is it running on? Can you comment out the "pick_language" routine, line 242, and tell me what happens? Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-09 16:07:12
|
Here you go, output is $debug_verbose = "1"; html tarred and attached. Cheers Neal -- begin output -- index path: /data2/IT/CONTRACTS/indexes/ document paths: /data2/IT/CONTRACTS/ url locations: reindex interval (mins): 1440 indexable extensions: html htm known languages: czech danish dutch english french german hungarian italian norwegian polish portugese spanish turkish Building index list.. /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype local -type f /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type f Adding /data2/IT/CONTRACTS/Untitled-2.htm to queue Adding /data2/IT/CONTRACTS/test.htm to queue Successfully added 2 documents to queue Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.czech.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.danish.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.dutch.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.english.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.french.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.german.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.hungarian.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.italian.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.norwegian.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.polish.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.portugese.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.spanish.txt Loading stopwords list from /data2/IT/CONTRACTS/indexes/stopwords.turkish.txt Begin indexing documents One # = 0 documents 0% 50% 100% [Indexing (1/2) /data2/IT/CONTRACTS/Untitled-2.htm at 1076340721 Title: test htm Filesize: 264 test htm line one line two line three etc Has 8 words - 8 total document words checking words in document and removing stopwords --> test : test --> htm : htm --> line : line --> one : one --> line : line --> two : two --> line : line --> three : three --> etc : etc Unknown char % in body: 0 Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 Attempt to free unreferenced scalar at ./indexer.pl line 253 Segmentation fault -- end -- -----Original Message----- From: spr...@li... [mailto:spr...@li...]On Behalf Of Eric Anderson Sent: 09 February 2004 3:21 PM To: Neal Chant Cc: spr...@li...; MSupport Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Just run through your comments, thanks.. > I originally commented out the version number and have seperated the indexes > out. > The error is still occurring :( > >>>Attempt to free unreferenced scalar at ./indexer.pl line 253 >>>Segmentation fault > > > ls -la gives the file as 264 - d'ya reckon this is the problem or a > red-herring? > > can I get further debug info for you? Very strange. Can you tar/gzip the file and resend? Also, in the indexer.pl program, set: $debug_verbose = "1"; Then send me the output.. Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Sprawler-general mailing list Spr...@li... https://lists.sourceforge.net/lists/listinfo/sprawler-general |
|
From: Eric A. <and...@ce...> - 2004-02-09 15:21:27
|
Neal Chant wrote: > Just run through your comments, thanks.. > I originally commented out the version number and have seperated the indexes > out. > The error is still occurring :( > >>>Attempt to free unreferenced scalar at ./indexer.pl line 253 >>>Segmentation fault > > > ls -la gives the file as 264 - d'ya reckon this is the problem or a > red-herring? > > can I get further debug info for you? Very strange. Can you tar/gzip the file and resend? Also, in the indexer.pl program, set: $debug_verbose = "1"; Then send me the output.. Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-06 19:00:28
|
Just run through your comments, thanks.. I originally commented out the version number and have seperated the indexes out. The error is still occurring :( >>Attempt to free unreferenced scalar at ./indexer.pl line 253 >>Segmentation fault ls -la gives the file as 264 - d'ya reckon this is the problem or a red-herring? can I get further debug info for you? cheers Neal -----Original Message----- From: spr...@li... [mailto:spr...@li...]On Behalf Of Eric Anderson Sent: 06 February 2004 2:06 PM To: Neal Chant Cc: spr...@li...; MSupport Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Hi Eric, > > Thanks for the reply. > Yes, it gives the same error for more than one file, originally tried with > 2400 files. > My test .htm is attached - very simple created in dreamweaver. Ok - I've run it through my test indexer, and it works ok, so it must be something simple :) One thing I had to do, is edit the $version line in the indexer, since it looks like cvs mangled it. Most likely I'll have to just remove that line, but here's what it looks like now (line 42 in indexer.pl): $version = qq~$Revision: 1.2\$ $Name\$ ~; Most likely I'll have to remove it, but you can replace it for now, until rev 1.1 of sprawler-lite comes out. >>Seems to have a problem going through the htm file "body Unknown char % in >>body: 0" I forgot to mention, that this isn't an error - it's status info. It means "I found x percentage of unkown (non roman) characters in the body" - it helps with language detection. The zero means it found all roman/english chars (0% of chars were unknown). >> >>index path: /data2/IT/CONTRACTS/ >>document paths: /data2/IT/CONTRACTS/ [..snip..] Ok, it's a good idea to separate the index store area, from the indexable data area, so I would do something like: index path: /data2/IT/CONTRACTS/ document paths: /data2/IT/CONTRACTS/INDEXES/ and just make sure you have made the directory for the INDEXES and that it is of course writable by you. >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt >>Begin indexing documents >>One # = 0 documents >>0% 50% 100% >>[Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 >>Title: test htm >>Filesize: 264 Here's something strange - my indexer reports the Filesize as 251. Where'd those other 13 bytes go? If I do an ls -al on the file: -rw-r--r-- 1 anderson wheel 251 Feb 6 07:48 Untitled-2.htm I see that it truly is 251 bytes. What does yours say? >>Has 8 words - 8 total document words >>checking words in document and removing stopwords >>Unknown char % in body: 0 >>Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 >>Attempt to free unreferenced scalar at ./indexer.pl line 253 >>Segmentation fault Here's what it should look like: index path: /indexes/ document paths: /tmp/sprawler/ url locations: doc1/ reindex interval (mins): 1440 indexable extensions: html htm known languages: czech danish dutch english french german hungarian italian norwegian polish portugese spanish turkish Building index list.. /usr/bin/find /tmp/sprawler/ -iname '*.html' -print -fstype local -type f /usr/bin/find /tmp/sprawler/ -iname '*.htm' -print -fstype local -type f Successfully added 1 documents to queue Loading stopwords list from /indexes/stopwords.czech.txt Loading stopwords list from /indexes/stopwords.danish.txt Loading stopwords list from /indexes/stopwords.dutch.txt Loading stopwords list from /indexes/stopwords.english.txt Loading stopwords list from /indexes/stopwords.french.txt Loading stopwords list from /indexes/stopwords.german.txt Loading stopwords list from /indexes/stopwords.hungarian.txt Loading stopwords list from /indexes/stopwords.italian.txt Loading stopwords list from /indexes/stopwords.norwegian.txt Loading stopwords list from /indexes/stopwords.polish.txt Loading stopwords list from /indexes/stopwords.portugese.txt Loading stopwords list from /indexes/stopwords.spanish.txt Loading stopwords list from /indexes/stopwords.turkish.txt Begin indexing documents One # = 0 documents 0% 50% 100% [Indexing (1/1) /tmp/sprawler/Untitled-2.htm at 1076076065 Title: test htm Filesize: 251 Has 8 words - 8 total document words checking words in document and removing stopwords Unknown char % in body: 0 Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 #(9 total words to be saved) Flushing word data to disk... Done with /indexes/doc.db-new file Making directory: /indexes/words-new/ .. DONE! -------------------------------------------- ] DONE! Total documents indexed: 1 Total bytes indexed: 251 Total words indexed: 8 Total words in index: -1 Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Sprawler-general mailing list Spr...@li... https://lists.sourceforge.net/lists/listinfo/sprawler-general |
|
From: Eric A. <and...@ce...> - 2004-02-06 14:06:37
|
Neal Chant wrote: > Hi Eric, > > Thanks for the reply. > Yes, it gives the same error for more than one file, originally tried with > 2400 files. > My test .htm is attached - very simple created in dreamweaver. Ok - I've run it through my test indexer, and it works ok, so it must be something simple :) One thing I had to do, is edit the $version line in the indexer, since it looks like cvs mangled it. Most likely I'll have to just remove that line, but here's what it looks like now (line 42 in indexer.pl): $version = qq~$Revision: 1.2\$ $Name\$ ~; Most likely I'll have to remove it, but you can replace it for now, until rev 1.1 of sprawler-lite comes out. >>Seems to have a problem going through the htm file "body Unknown char % in >>body: 0" I forgot to mention, that this isn't an error - it's status info. It means "I found x percentage of unkown (non roman) characters in the body" - it helps with language detection. The zero means it found all roman/english chars (0% of chars were unknown). >> >>index path: /data2/IT/CONTRACTS/ >>document paths: /data2/IT/CONTRACTS/ [..snip..] Ok, it's a good idea to separate the index store area, from the indexable data area, so I would do something like: index path: /data2/IT/CONTRACTS/ document paths: /data2/IT/CONTRACTS/INDEXES/ and just make sure you have made the directory for the INDEXES and that it is of course writable by you. >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt >>Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt >>Begin indexing documents >>One # = 0 documents >>0% 50% 100% >>[Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 >>Title: test htm >>Filesize: 264 Here's something strange - my indexer reports the Filesize as 251. Where'd those other 13 bytes go? If I do an ls -al on the file: -rw-r--r-- 1 anderson wheel 251 Feb 6 07:48 Untitled-2.htm I see that it truly is 251 bytes. What does yours say? >>Has 8 words - 8 total document words >>checking words in document and removing stopwords >>Unknown char % in body: 0 >>Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 >>Attempt to free unreferenced scalar at ./indexer.pl line 253 >>Segmentation fault Here's what it should look like: index path: /indexes/ document paths: /tmp/sprawler/ url locations: doc1/ reindex interval (mins): 1440 indexable extensions: html htm known languages: czech danish dutch english french german hungarian italian norwegian polish portugese spanish turkish Building index list.. /usr/bin/find /tmp/sprawler/ -iname '*.html' -print -fstype local -type f /usr/bin/find /tmp/sprawler/ -iname '*.htm' -print -fstype local -type f Successfully added 1 documents to queue Loading stopwords list from /indexes/stopwords.czech.txt Loading stopwords list from /indexes/stopwords.danish.txt Loading stopwords list from /indexes/stopwords.dutch.txt Loading stopwords list from /indexes/stopwords.english.txt Loading stopwords list from /indexes/stopwords.french.txt Loading stopwords list from /indexes/stopwords.german.txt Loading stopwords list from /indexes/stopwords.hungarian.txt Loading stopwords list from /indexes/stopwords.italian.txt Loading stopwords list from /indexes/stopwords.norwegian.txt Loading stopwords list from /indexes/stopwords.polish.txt Loading stopwords list from /indexes/stopwords.portugese.txt Loading stopwords list from /indexes/stopwords.spanish.txt Loading stopwords list from /indexes/stopwords.turkish.txt Begin indexing documents One # = 0 documents 0% 50% 100% [Indexing (1/1) /tmp/sprawler/Untitled-2.htm at 1076076065 Title: test htm Filesize: 251 Has 8 words - 8 total document words checking words in document and removing stopwords Unknown char % in body: 0 Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 #(9 total words to be saved) Flushing word data to disk... Done with /indexes/doc.db-new file Making directory: /indexes/words-new/ .. DONE! -------------------------------------------- ] DONE! Total documents indexed: 1 Total bytes indexed: 251 Total words indexed: 8 Total words in index: -1 Eric -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-06 13:09:01
|
Hi Eric, Thanks for the reply. Yes, it gives the same error for more than one file, originally tried with 2400 files. My test .htm is attached - very simple created in dreamweaver. Neal -----Original Message----- From: Eric Anderson [mailto:and...@ce...] Sent: 06 February 2004 4:59 AM To: Neal Chant Cc: spr...@li...; MSupport Subject: Re: [Sprawler-general] Error - "Unknown char % in body" Neal Chant wrote: > Hi List, > > Just testing sprawler for one of our requirements. > Setup is very simple - 1 x .htm file to start with, have included output for > info. > Seems to have a problem going through the htm file "body Unknown char % in > body: 0" > The .htm file is simple "line 1, line 2" etc. > > Any pointers to solving this? Thanks for the debug info - does it do this with more than one file? Also, can you send me the .htm file you are using as a sample? > Thanks in advance > > Neal Chant > Systems Administration > Mercury International > > > > > index path: /data2/IT/CONTRACTS/ > document paths: /data2/IT/CONTRACTS/ > url locations: > reindex interval (mins): 1440 > indexable extensions: html htm > known languages: czech danish dutch english french german hungarian italian > norwegian polish portugese spanish turkish > Building index list.. > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype > local -type f > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type > f > Successfully added 1 documents to queue > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt > Begin indexing documents > One # = 0 documents > 0% 50% 100% > [Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 > Title: test htm > Filesize: 264 > Has 8 words - 8 total document words > checking words in document and removing stopwords > Unknown char % in body: 0 > Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 > Attempt to free unreferenced scalar at ./indexer.pl line 253 > Segmentation fault -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Eric A. <and...@ce...> - 2004-02-06 04:59:43
|
Neal Chant wrote: > Hi List, > > Just testing sprawler for one of our requirements. > Setup is very simple - 1 x .htm file to start with, have included output for > info. > Seems to have a problem going through the htm file "body Unknown char % in > body: 0" > The .htm file is simple "line 1, line 2" etc. > > Any pointers to solving this? Thanks for the debug info - does it do this with more than one file? Also, can you send me the .htm file you are using as a sample? > Thanks in advance > > Neal Chant > Systems Administration > Mercury International > > > > > index path: /data2/IT/CONTRACTS/ > document paths: /data2/IT/CONTRACTS/ > url locations: > reindex interval (mins): 1440 > indexable extensions: html htm > known languages: czech danish dutch english french german hungarian italian > norwegian polish portugese spanish turkish > Building index list.. > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype > local -type f > /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type > f > Successfully added 1 documents to queue > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt > Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt > Begin indexing documents > One # = 0 documents > 0% 50% 100% > [Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 > Title: test htm > Filesize: 264 > Has 8 words - 8 total document words > checking words in document and removing stopwords > Unknown char % in body: 0 > Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 > Attempt to free unreferenced scalar at ./indexer.pl line 253 > Segmentation fault -- ------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Today is the tomorrow you worried about yesterday. ------------------------------------------------------------------ |
|
From: Neal C. <nc...@me...> - 2004-02-03 12:17:04
|
Hi List, Just testing sprawler for one of our requirements. Setup is very simple - 1 x .htm file to start with, have included output for info. Seems to have a problem going through the htm file "body Unknown char % in body: 0" The .htm file is simple "line 1, line 2" etc. Any pointers to solving this? Thanks in advance Neal Chant Systems Administration Mercury International index path: /data2/IT/CONTRACTS/ document paths: /data2/IT/CONTRACTS/ url locations: reindex interval (mins): 1440 indexable extensions: html htm known languages: czech danish dutch english french german hungarian italian norwegian polish portugese spanish turkish Building index list.. /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.html' -print -fstype local -type f /usr/bin/find /data2/IT/CONTRACTS/ -iname '*.htm' -print -fstype local -type f Successfully added 1 documents to queue Loading stopwords list from /data2/IT/CONTRACTS/stopwords.czech.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.danish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.dutch.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.english.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.french.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.german.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.hungarian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.italian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.norwegian.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.polish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.portugese.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.spanish.txt Loading stopwords list from /data2/IT/CONTRACTS/stopwords.turkish.txt Begin indexing documents One # = 0 documents 0% 50% 100% [Indexing (1/1) /data2/IT/CONTRACTS/Untitled-2.htm at 1075807682 Title: test htm Filesize: 264 Has 8 words - 8 total document words checking words in document and removing stopwords Unknown char % in body: 0 Language Selection: unknown ->> 0 / Charratio: 0 - Reason: () stage 3 Attempt to free unreferenced scalar at ./indexer.pl line 253 Segmentation fault |
|
From: Ross D. <sto...@ya...> - 2003-09-23 05:10:20
|
--- Dru Dru <dr...@ya...> wrote:
> > Yes, we could - that's not a bad idea, and I can't
> > think of anything
> > that would deter us from that. The source is
> open,
> > and free to use
> > under the GPL. My only concern is that I (and
> > others) would post their
> > "great new ideas" and [insert evil company name
> > here] copies our idea
> > and patents it, and then possibly we get sued.
> Does
> > "prior art" protect
> > us against this? Do we not worry about it, and
> just
> > move on and hope
> > for the best?
> >
> > Eric
>
> If everyone thought like an Open Source individual,
> then we would not have anything to worry about, but
> we
> have the SCO's of the world that really don't care
> about helping out the Open Source community and just
> care about pleasing their investors and will stop at
> nothing to make their stock go up one more $1 (which
> is how I feel many corporations work). Don't you
> think
> if we come up with some revolutionary ideas and
> start
> posting them to the forum, Google or any of the
> other
> popular search groups will more then likely discover
> them and patent it themselves then tell us we can't
> use it? They already have the money for lawyers so
> it
> probably wouldn't take them long to patent it. We
> might be able to prove that we thought of it first,
> but why risk it. I feel we need to keep this stuff
> private and should seek a patent on it if it's as
> good
> as Eric believes it is.
>
> Disclaimer: I'm an internet security professional,
> so
> it's my job to be paranoid and think worst case
> scenario.
>
i must say i agree. all patentable ideas should be
sent to that "secret" address, to protect us from
getting sued (worst case). I'm working on getting us
a patent lawyer, hopefully one of the 2 will help us out.
=====
-Ross
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
|
|
From: Dru D. <dr...@ya...> - 2003-09-19 19:32:51
|
http://www.cnn.com/2003/TECH/internet/09/19/microsoft.google.ap/index.html __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Dru D. <dr...@ya...> - 2003-09-18 16:11:53
|
> Yes, we could - that's not a bad idea, and I can't > think of anything > that would deter us from that. The source is open, > and free to use > under the GPL. My only concern is that I (and > others) would post their > "great new ideas" and [insert evil company name > here] copies our idea > and patents it, and then possibly we get sued. Does > "prior art" protect > us against this? Do we not worry about it, and just > move on and hope > for the best? > > Eric If everyone thought like an Open Source individual, then we would not have anything to worry about, but we have the SCO's of the world that really don't care about helping out the Open Source community and just care about pleasing their investors and will stop at nothing to make their stock go up one more $1 (which is how I feel many corporations work). Don't you think if we come up with some revolutionary ideas and start posting them to the forum, Google or any of the other popular search groups will more then likely discover them and patent it themselves then tell us we can't use it? They already have the money for lawyers so it probably wouldn't take them long to patent it. We might be able to prove that we thought of it first, but why risk it. I feel we need to keep this stuff private and should seek a patent on it if it's as good as Eric believes it is. Disclaimer: I'm an internet security professional, so it's my job to be paranoid and think worst case scenario. __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Dru D. <dr...@ya...> - 2003-08-23 15:54:09
|
Guys, I came across this site that I plan on signing up for: http://safari.informit.com/ they have some really good publishers (O'Reily, Cisco Press, etc.) participating. Maybe Eric can use some of his persuading skills to get us a discount ;-) -Dru ===== http://www.drusshop.com __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Eric A. <and...@ce...> - 2003-08-22 20:42:41
|
Dru Dru wrote: >>Just to get everyone started, I'll start proposing >>some code we need to >> work on. I do have some code I have written >>previously, that we could >> use as a base, but I'm afraid of stunting >>creativity, >>so I'd like to >> brainstorm a little here first, then gather our >>thoughts and start >> coding. >> >> Here's some thoughts: >> >> There are two main divisions of code: indexing, and >>searching. >> >> The indexer, I believe, should have the following >>qualities: >> >> - configurable by a simple INI style conf file >> - resilient to reboots (in other words, needs to be >>able to continue >> where it left off) >> - distributable (so we can have 10, 20, 30, etc, >>indexers running >> simultaneously on different machines, all indexing >>different URL's) >> - somewhat portable, possibly a linux,freebsd, and >>windows versions >> - optimized for speed (which means it adjusts for a >>slow system on a >> fast link, or fast system on a slow link) >> >> I'm envisioning a "master" indexer, which delegates >>certain batches of >> URL's to be indexed to each indexing "client". The >>client requests a >> batch of URLs, indexes them, then sends the indexed >>data back to the >> master, which then incorporates that data into the >>full index. This way >> we can spread out many indexers on high speed >>internet connections, and >> only send the master the already indexed data for >>inclusion in the main >> system. If we can create a windows client (like >>the >>SETI project did), >> and a "buzz" for the coolness of helping the only >>open source, free, >> non-profit search engine on the planet, we could >>potentially get >> hundreds or thousands of machines indexing the >>internet for us, free. > > > I like this idea, but we need to make sure we write a > really tight secure client. I remember there was a > security vulnerability with SETI@Home and I had to > block it at our corp. firewall. Good point - we're in luck though, we have someone here who is in to internet security, and I'm sure wouldn't mind making this project nice and secure like that.. :) >> We'll have to decide how to store data, or index >>it, >>the most efficient >> way. I have thought about this, and also tried >>many >>many options >> already, some failing, and some even working fairly >>well. >> >> I believe that plain old unix filesystems are fast, >>very fast. > > > I agree. Hopefully know will want to run our db's on a > windows box ;-) oh.. that's just not funny.. ;P >> Remember, databases use the filesystem to store >>their >>data, so the db is >> only as fast as its filesystem. It's all about HOW >>you organize that >> data. If we know how we search for that data, then >>we can custom make a >> db structure, file structure or layout, etc, that >>would allow us to find >> data at amazingly fast speeds. >> >> More thoughts to come over the next few days.. >> > > > I needed a little primer on search engines, and I > found a decently written one here: > http://computer.howstuffworks.com/search-engine.htm as > well as another good resource: > http://www.searchenginewatch.com/ Great pointers! We should put these up on our website.. which doesn't exist yet. Currently, rdickey (ross) is working on this, but he jsut went back into college, so he'll be kind of busy.. If you are interested, please take the bull by the horns and run with it.. Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ |
|
From: Eric A. <and...@ce...> - 2003-08-22 20:39:13
|
Welcome aboard Dru! Dru Dru wrote: > Hello everyone. I've just joined the team, I guess I > made a good impression on Eric ;-) > > Here's a little about myself, I'll try to keep it > short. > > I've started in LAN/WAN (actually LAN/MAN) and > graduated to Internet Security (which I've been doing > for the past 5 years). > > I'm strongest with Perl and feel competent with all > aspects except for Perl/Tk and OO (ok and mod_perl) > which I want to learn > both. I'm actually reading Mastering Perl/Tk now to > learn it and write a little program to organize my > huge collection of woodworking > magazines (woodworking is my other hobby. More about > that here: http://www.drusshop.com). I taught Perl > myself and > have written at least 2 dozens scripts/programs with > the largest being 350 lines. > > I'm currently pursuing a bachelors degree in Computer > Science and I'm just about finished with my first C++ > course (take the final > this Friday). I breezed along (thanks to knowing > Perl), until we got to Classes. I still haven't quite > grasped them. > > I decided to join this project because I would like to > improve my programming skills, and get the experience > of working on a large programming project. I'll be > honest with you that I'm not the greatest programmer, > but I > like challenging work and I work away on something > until I complete it. Out of all the "Help Wanted's" > posted, > for a Perl Developer, Sprawler seemed the most > interesting. After Eric told me a little more about > the project, > I'm quite intrigued and can't wait to get started. > > If you need anything, feel free to email me. I check > it a few times a day. If you would be kind > enough to drop me a note kind of telling me where you > are in the world and just a bit about yourself, I > would appreciate it. > > Thanks, > Dru -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ |
|
From: Dru D. <dr...@ya...> - 2003-08-22 03:29:35
|
> Just to get everyone started, I'll start proposing > some code we need to > work on. I do have some code I have written > previously, that we could > use as a base, but I'm afraid of stunting > creativity, > so I'd like to > brainstorm a little here first, then gather our > thoughts and start > coding. > > Here's some thoughts: > > There are two main divisions of code: indexing, and > searching. > > The indexer, I believe, should have the following > qualities: > > - configurable by a simple INI style conf file > - resilient to reboots (in other words, needs to be > able to continue > where it left off) > - distributable (so we can have 10, 20, 30, etc, > indexers running > simultaneously on different machines, all indexing > different URL's) > - somewhat portable, possibly a linux,freebsd, and > windows versions > - optimized for speed (which means it adjusts for a > slow system on a > fast link, or fast system on a slow link) > > I'm envisioning a "master" indexer, which delegates > certain batches of > URL's to be indexed to each indexing "client". The > client requests a > batch of URLs, indexes them, then sends the indexed > data back to the > master, which then incorporates that data into the > full index. This way > we can spread out many indexers on high speed > internet connections, and > only send the master the already indexed data for > inclusion in the main > system. If we can create a windows client (like > the > SETI project did), > and a "buzz" for the coolness of helping the only > open source, free, > non-profit search engine on the planet, we could > potentially get > hundreds or thousands of machines indexing the > internet for us, free. I like this idea, but we need to make sure we write a really tight secure client. I remember there was a security vulnerability with SETI@Home and I had to block it at our corp. firewall. > > We'll have to decide how to store data, or index > it, > the most efficient > way. I have thought about this, and also tried > many > many options > already, some failing, and some even working fairly > well. > > I believe that plain old unix filesystems are fast, > very fast. I agree. Hopefully know will want to run our db's on a windows box ;-) > Remember, databases use the filesystem to store > their > data, so the db is > only as fast as its filesystem. It's all about HOW > you organize that > data. If we know how we search for that data, then > we can custom make a > db structure, file structure or layout, etc, that > would allow us to find > data at amazingly fast speeds. > > More thoughts to come over the next few days.. > I needed a little primer on search engines, and I found a decently written one here: http://computer.howstuffworks.com/search-engine.htm as well as another good resource: http://www.searchenginewatch.com/ -Dru ===== http://www.drusshop.com __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Dru D. <dr...@ya...> - 2003-08-22 03:09:59
|
Hello everyone. I've just joined the team, I guess I made a good impression on Eric ;-) Here's a little about myself, I'll try to keep it short. I've started in LAN/WAN (actually LAN/MAN) and graduated to Internet Security (which I've been doing for the past 5 years). I'm strongest with Perl and feel competent with all aspects except for Perl/Tk and OO (ok and mod_perl) which I want to learn both. I'm actually reading Mastering Perl/Tk now to learn it and write a little program to organize my huge collection of woodworking magazines (woodworking is my other hobby. More about that here: http://www.drusshop.com). I taught Perl myself and have written at least 2 dozens scripts/programs with the largest being 350 lines. I'm currently pursuing a bachelors degree in Computer Science and I'm just about finished with my first C++ course (take the final this Friday). I breezed along (thanks to knowing Perl), until we got to Classes. I still haven't quite grasped them. I decided to join this project because I would like to improve my programming skills, and get the experience of working on a large programming project. I'll be honest with you that I'm not the greatest programmer, but I like challenging work and I work away on something until I complete it. Out of all the "Help Wanted's" posted, for a Perl Developer, Sprawler seemed the most interesting. After Eric told me a little more about the project, I'm quite intrigued and can't wait to get started. If you need anything, feel free to email me. I check it a few times a day. If you would be kind enough to drop me a note kind of telling me where you are in the world and just a bit about yourself, I would appreciate it. Thanks, Dru ===== http://www.drusshop.com __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |