You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Gabriele B. <an...@ti...> - 2002-02-12 19:29:23
|
Ciao guys,
I am working on a PHP wrapper project for ht://Dig. I read an
interesting guide on the contributed work, but I think it is kinda old now,
especially by keeping in mind new versions of PHP.
Basically, I would like to create an XML file as output of the htsearch
program, then use an XML parser from the PHP script. The PHP opens a pipe
to the htsearch program and the XML reads its pointer.
I got some problems as far as the excerpt is concerned. I was just
wondering, if somebody of you is interested on it. And of course has some
ideas and opinion!
Thank you and Ciao, yours
-Gabriele
--
Gabriele Bartolini - Web Programmer
Current Location: Prato, Tuscany, Italy
an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> find bin/laden -name osama -exec rm {} \;
-
Important:
--------------
I've experienced problems when receiving e-mail sent to the
address: an...@us.... I think I lost much of it.
So if you sent me a message, and I never replied to you,
that's probably the reason. Please update your address book to
this one: an...@ti.... Sorry and thank you!
|
|
From: Geoff H. <ghu...@ws...> - 2002-02-12 15:39:14
|
On Tue, 12 Feb 2002, Dan Cutting wrote: > But to the point of my post: the phrase searching seems a little flakey. I > can search for some phrases with no problem, but if I extend the phrase by a > word here or there it tends to return no results. It looks like it might Yes, it's a known bug. I would use the 3.2.0b4 snapshots which are certainly more stable than 3.2.0b3 and fix the security hole in 3.2.0b3. As to whether it's ready for a production environment, I can't say. Certainly if you find bugs, we'll try to fix them as fast as possible. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: <seb...@la...> - 2002-02-12 13:26:21
|
Hi, I work on htdig and i have to make it work on java pages, but it apparently doesn't done before. I have not found perl script to do that (as scripts of "doc2html" to translate documents into html pages). Is there someone who knows about a perl script or anything else which does it ? I have a java class webcrawler to interpret the pages. It apparently work correctly for the dynamic links and others. It output the interpreted page with the good links but i've tryed to call it in htdig and i'm not sure of the output to use with the webcrawler and how to call it correctly. If important i work on Openlinux 2.4, apache 1.3.20, jdk 1.2.2_008 and htdig 3.1.5 Thanks "Acc=E9dez au courrier =E9lectronique de La Poste : www.laposte.net ; 3615 LAPOSTENET (0,13 =80/mn) ; t=E9l : 08 92 68 13 50 (0,34=80/mn)" |
|
From: Dan C. <dan...@so...> - 2002-02-12 07:12:53
|
Hi all, I've just installed v3.2.0b3 with the hope it will be stable enough to use in a relatively high volume production environment. I notice it was released about a year ago; has anybody else run it in production successfully? But to the point of my post: the phrase searching seems a little flakey. I can search for some phrases with no problem, but if I extend the phrase by a word here or there it tends to return no results. It looks like it might have something to do with short or bad words being omitted and thus not being recognised in a phrase. Does that sound plausible?If I turn off bad words and minimum word length, won't it make searches slower and less relevant in general? Dan ********************************************************************** visit http://www.solution6.com visit http://www.eccountancy.com - everything for accountants. UK Customers - http://www.solution6.co.uk ********************************************************************* This email message (and attachments) may contain information that is confidential to Solution 6. If you are not the intended recipient you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to the official business of Solution 6 are neither given nor endorsed by it. ********************************************************************* |
|
From: Geoff H. <ghu...@ws...> - 2002-02-12 06:25:03
|
Begin forwarded message: > From: Jessica Biola <jes...@ya...> > Date: Mon Feb 11, 2002 03:48:45 AM US/Central > To: ghu...@us... > Subject: Modify htsearch to run under fastcgi > > Geoff, I was wondering if you would be interest in > taking on a project to modify htsearch (3.2.0 beta) so > that it properly runs in fastcgi? I'd be willing to > pay a reasonable amount for such development. Please > let me know if this is something you'd be interested > in or that you have the time to develop. > > Sincerely, > Jes > > __________________________________________________ > Do You Yahoo!? > Send FREE Valentine eCards with Yahoo! Greetings! > http://greetings.yahoo.com |
|
From: LIGHT88 <LI...@nj...> - 2002-02-12 04:49:59
|
Hello, Can Dig be put onto and used with data on a CD? |
|
From: Neal R. <ne...@ri...> - 2002-02-12 04:00:07
|
Guys, Here's a well tested piece of code for replacing the calls to 'system(mv)' in htfuzzy/EndingsDB.cc & htfuzzy/Synonym.cc Example Useage: file_copy(root2word.get(), config["endings_root2word_db"].get(), FILECOPY_OVERWRITE_ON); unlink(root2word.get()); returns TRUE or FALSE By well tested I mean shipping and executing tens, if not hundreds, of thousands of times a day in our code. Works across file systems in Linux, Solaris, FreeBSD & Windows NT 4.0, & Windows 2000, and probably many others. Note that each system defines BUFSIZ to be optimal in its libc header files. Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Jamie A. <Jam...@sl...> - 2002-02-12 03:35:54
|
Here's a possible replacement for the hash function in the
Dictionary class - this one is much more forgiving on strings
where the only difference is at the end (for example URLs from
db-driven sites where the only difference in the URL is a
parameter at the end). It's also very slightly better for
/usr/dict/words too.
unsigned int hashCode2(const char *key)
{
char *test;
long conv_key = strtol(key, &test, 10);
if (key && *key && !*test) // Conversion succeeded
{
return conv_key;
}
char *base = (char*)malloc( strlen( key ) +2);
char *tmp_key = base;
strcpy( tmp_key, key );
unsigned int h = 0;
int length = strlen(tmp_key);
if (length >= 16)
{
tmp_key += strlen(tmp_key) - 15;
length = strlen(tmp_key);
}
for (int i = length; i > 0; i--)
{
h = (h * 37) + *tmp_key++;
}
free( base );
return h;
}
Jamie Anstice
Search Scientist, S.L.I. Systems, Inc
jam...@sl...
ph: 64 961 3262
mobile: 64 21 264 9347
|
|
From: Jamie A. <Jam...@sl...> - 2002-02-12 01:57:30
|
There is a buglet with the external transport stuff which stops redirects
from an external transport from working.
htdig/Document.cc(ln 515) - Document::Retrieve
Currently:
if (transportConnect == HTTPConnect)
redirected_to = ((HtHTTP_Response *)response)->GetLocation();
Should be
if (transportConnect == HTTPConnect || transportConnect ==
externalConnect)
redirected_to = ((HtHTTP_Response *)response)->GetLocation();
cheers
Jamie Anstice
Search Scientist, S.L.I. Systems, Inc
jam...@sl...
ph: 64 961 3262
mobile: 64 21 264 9347
|
|
From: J. op d. B. <MSQ...@st...> - 2002-02-11 09:01:37
|
You got a point there... Maybe.... On Fri, 8 Feb 2002, Geoff Hutchison wrote: > At 11:17 AM +0100 2/8/02, J. op den Brouw wrote: > >Does anyone know the email addresses of the maintainers? > > I guess this would be a reason for that low-volume htdig-mirrors > mailing list I suggested to you. ;-) > > -Geoff > --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
|
From: Jamie A. <Jam...@sl...> - 2002-02-10 22:17:54
|
>At 2:53 PM -0500 2/8/02, William R. Knox wrote: >>In parsedcdate, assume that unqualified dates are at noon instead of >>midnight. If no one is ever more than 12 hours plus or minus UTC, this is >>actually a very easy hack, er, solution. > >I seemed to think that this was probably true but figured I'd check >an official timezone map to be sure. ><http://aa.usno.navy.mil/faq/docs/world_tzones.html> > >Interestingly at least the US Navy has two zones at +13 and +14 hours >respectively. Admittedly these are some small islands out in the >Pacific and I don't know how many servers are running out there. New Zealand & Fiji are at GMT+12, and NZ at least goes to GMT+13 during the summer. Tonga standard is at GMT+13, so I think that it's a hack that's simple, easy and wrong. Jamie Anstice Search Scientist, S.L.I. Systems, Inc jam...@sl... ph: 64 961 3262 mobile: 64 21 264 9347 |
|
From: Geoff H. <ghu...@us...> - 2002-02-10 08:13:48
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Geoff H. <ghu...@ws...> - 2002-02-09 04:57:33
|
At 12:20 AM -0700 2/8/02, Neal Richter wrote: >One real long term advantage to using httrack or something similar would >be to offload the forward code maintenance of some of the htdig transport >code to another project. Leaving htdig developers more time to work on >other features. Of course this is an over simplification, choosing a >quickly changing and complicated site-copier could prove to be painful. We looked at this possibility at the beginning of 3.2 development. At the time, the major possibilities were libwww from W3C, libghttp from GNOME, and swiping code from curl which had just appeared. I was quite happy with the idea of using libwww since that's obviously well-maintained. The conclusion was basically that this was a bit of overkill and none of them (at least at that point) had particularly clean or necessarily fixed APIs. But keep in mind that Gabriele does use htnet/ for ht://Check and certainly some features and bug-testing occur on both sides. I'm not aware of other projects using the code, but it's possible. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-09 04:27:35
|
At 11:08 PM -0700 2/7/02, Neal Richter wrote: >What is the future of this code? I've read ill-understood (by >me) references to new searching/parsing code in previous posts. >... >I'm wondering if it wouldn't be easier to start with qtest.cc and >implement the 'hit' extraction if this code is going to get rewritten >massively. Bingo. Unfortunately, neither Quim or I have had the time to devote to this project. Somewhere around here I have the additions to DocMatch which added the scoring functionality for the new htsearch. I'll find those and commit them soon. The basic idea is that qtest.cc will form the framework for a new htsearch. Much of the CGI input and command-line parsing in htsearch.cc is fine and can be moved as-is. Additionally qtest.cc needs to parse the search_algorithms attribute and set the OrFuzzyExpander appropriately. I should have time tomorrow to sit down and do that much since people seem to be interested in picking this up. For now, I'm going to ignore the collections and plug the ResultList directly into the current Display code. What else has been discussed as far as necessary cleanups and coding on the htsearch side? * Re-think collections (i.e. how should they fit into the framework cleanly) -- We're all willing to break collections for some period while we redo the rest of htsearch. * Improve the sorting algorithm for results -- Right now the entire list of results is sorted, regardless of how many are displayed. Much better to make a heap and pull them off as needed. * Clean up the Display class (not many specific details on this yet) * Add query and results caching -- Quim implemented some simple caching already, but an on-disk cache of recent queries would improve performance tremendously. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-09 04:27:29
|
At 9:45 AM -0600 2/8/02, Gilles Detillieux wrote: > > So, this suggests to me that Geoff did properly upload >> the files to their download server. However, when I FTP to >> ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest >> files aren't there. ... > >Sorry, that should be ftp://download.sourceforge.net/pub/sourceforge/htdig/ <sigh> They were definitely uploaded and I checked both through the HTTP link and the FTP link. In the Site Status page, there's this comment: (2002-01-10 08:26:53) As documented in Support Request 501887, some users have encountered issues in making use of SourceForge.net download services, due to a routing problem involving one of our new mirrors. I'll submit another Support Request. It seems to be striking a few other projects that made releases about the same time as us. In *theory* it should work for both HTTP and FTP once it's uploaded through the project page. If you want a laugh, consider that the downloads page for ht://Dig claims that no one has currently downloaded 3.1.6 through SourceForge. Yesterday, I believe it had several hundred. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-09 03:57:42
|
At 2:53 PM -0500 2/8/02, William R. Knox wrote: >In parsedcdate, assume that unqualified dates are at noon instead of >midnight. If no one is ever more than 12 hours plus or minus UTC, this is >actually a very easy hack, er, solution. I seemed to think that this was probably true but figured I'd check an official timezone map to be sure. <http://aa.usno.navy.mil/faq/docs/world_tzones.html> Interestingly at least the US Navy has two zones at +13 and +14 hours respectively. Admittedly these are some small islands out in the Pacific and I don't know how many servers are running out there. >This seems like the best course of action, though it has the >disadvantage of increasing the size of the database for all files, >even though only a limited number use the additional date >information. I suppose the meta date info could only be populated if >the file has it, though. Certainly a META "date" field would only be populated if the document has it. OTOH, there's a limited amount of overhead for merely adding a record to each DocumentRef as it needs to distinguish between each bit of information stored for a given document. >The additional advantage of this is that, currently, if the meta tag >on a file stays the same, I don't think it ever gets reindexed - however, >the meta tag could stay the same and the contents could change. No, I really doubt this. It sends the date it has in the database in an If-Modified-Since request to the server as well as comparing it to the date returned by the server. If the server doesn't return a date, it uses the current time to the indexer. The If-Modified-Since test is only conceivably a problem if someone adds a META date tag with a time in the *future*. The test by htdig is even more restrictive: if (doc->ModTime() == ref->DocTime()) // retrieved but not changed So if the server ignored the If-Modified-Since and sends the document anyway, it'll only be ignored by ht://Dig if the time is *exactly* the same, which is pretty unlikely if the document itself was changed. Setting the document date to the META date only happens after parsing occurs and no additional checking is performed. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-09 03:57:41
|
At 11:17 AM +0100 2/8/02, J. op den Brouw wrote: >Does anyone know the email addresses of the maintainers? I guess this would be a reason for that low-volume htdig-mirrors mailing list I suggested to you. ;-) -Geoff |
|
From: William R. K. <wk...@mi...> - 2002-02-08 21:03:31
|
OK, I have applied the patch supplied by Gilles, and it fixed the lack of use of the date Meta tag in v.3.1.6. However, I have now come across a more subtle problem - it boils down to the use or lack thereof of the timezone information. In htsearch/Display.cc, the DocTime variable is assumed to be in UTC, as modification dates returned by a server are in UTC. However, the meta Date tags are likely already in the local time zone, so when the comparison happens between DocTime and timet_startdate and timet_enddate on lines 1417 and 1418 of Display.cc, it is between a local time in the case of DocTime and a UTC time in the case of timet_(start|end)date (due to the use of mktime on lines 1329 and 1330 instead of localtime). This, along with converting the date string YYYY-MM-DD to YYYY-MM-DD 00:00:00 when meta tags are read, means that searching for a start day of files that have meta tags will result in missing a days worth of files. The solution to this could be one of a few things - Tell people they should use UTC in their meta tags. All in all, not really a bad solution, but less than ideal for sites that already have it in place. In parsedcdate, assume that unqualified dates are at noon instead of midnight. If no one is ever more than 12 hours plus or minus UTC, this is actually a very easy hack, er, solution. In parsedcdate, reset the time obtained to UTC by pulling the current time zone offset (difference between localtime and mktime) - this assumes, however, that the server and the search engine are in the same time zone. Create a separate value for each document indicating whether or not the date was obtained from the modification time or from the meta tag. Create a separate value that is only for document modification date information, and keep populating the current DocTime value like you do now. The additional advantage of this is that, currently, if the meta tag on a file stays the same, I don't think it ever gets reindexed - however, the meta tag could stay the same and the contents could change. Separating out the two would prevent this error from occurring. This seems like the best course of action, though it has the disadvantage of increasing the size of the database for all files, even though only a limited number use the additional date information. I suppose the meta date info could only be populated if the file has it, though. Opinions, anyone? Bill Knox Senior Operating Systems Programmer/Analyst The MITRE Corporation |
|
From: Gilles D. <gr...@sc...> - 2002-02-08 15:45:25
|
According to me: > So, this suggests to me that Geoff did properly upload > the files to their download server. However, when I FTP to > ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest > files aren't there. ... Sorry, that should be ftp://download.sourceforge.net/pub/sourceforge/htdig/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gilles D. <gr...@sc...> - 2002-02-08 15:36:40
|
According to J. op den Brouw: > I am NOT able to download 3.1.6 via: > > download.sourceforge.net I checked the download link for ht://Dig from the SourceForge web site, and it lists 3.1.6 diffs and tarball just fine, but they seem to be links to http://prdownloads.sourceforge.net/htdig/... (see http://sourceforge.net/project/showfiles.php?group_id=4593&release_id=73048). So, this suggests to me that Geoff did properly upload the files to their download server. However, when I FTP to ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest files aren't there. So, I would guess that SourceForge's own internal mirroring to its download server farm isn't working right, but then of course I'm so far beyond ever being surprised about such problem on SnaFu.net that it's almost not worth the energy to shrug it off. Sorry, am I being cynical again? > htdig.europeanservers.net Their mirror doesn't seem to have been updated in over a year and a half. They don't even have 3.2.0b3. If we can't get them to update, we should definitely take them off the list. > www.it.htdig.org > > Does anyone know the email addresses of the maintainers? The www.europeanservers.net site lists in...@eu... as the contact. I can drop them a note to see what comes of it. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gabriele B. <g.b...@co...> - 2002-02-08 13:33:58
|
I just sent this e-mail to my friend Marco Nenciarini, who's the maintainer of the italian mirror. Thanks Jesse and bye -Gabriele At 11.17 08/02/2002 +0100, J. op den Brouw wrote: >I am NOT able to download 3.1.6 via: > >download.sourceforge.net >htdig.europeanservers.net >www.it.htdig.org > >Does anyone know the email addresses of the maintainers? > >--jesse >-------------------------------------------------------------------- >J. op den Brouw Johanna Westerdijkplein 75 >Haagse Hogeschool 2521 EN DEN HAAG >Faculty of Engeneering Netherlands >Electrical Engeneering +31 70 4458936 >-------------------- J.E...@st... -------------------- > >Linux - because reboots are for hardware changes > > >_______________________________________________ >htdig-dev mailing list >htd...@li... >https://lists.sourceforge.net/lists/listinfo/htdig-dev Ciao, Ciao -Gabriele -- Gabriele Bartolini - Computer Programmer U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa g.b...@co... | http://www.po-net.prato.it/ The nice thing about Windows is - It does not just crash, it displays a dialog box and lets you press 'OK' first. |
|
From: J. op d. B. <MSQ...@st...> - 2002-02-08 10:17:53
|
I am NOT able to download 3.1.6 via: download.sourceforge.net htdig.europeanservers.net www.it.htdig.org Does anyone know the email addresses of the maintainers? --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
|
From: Neal R. <ne...@ri...> - 2002-02-08 07:24:31
|
Hey all, I mentioned this briefly in a reply to Geoff... here it is fleshed out for pondering. http://www.httrack.com/ I did a little more investigation into httrack.. httrack is a neat and small web-site copier/spider. It builds out of the box as a library (libhttrack.so). There is a supplied example app that uses libhttrack for basic web-site copying.. about 150 lines long. The main httrack exe has a few more lines for various features. So a short path to using httrack to get a feature like this would be to: Httrack path: in htdig 1. Substitute the call to the htdig internal retriever/transport calls 2. stream original URLs, local http-URL & local-disk filename to a logfile 4. add 'URL CacheURL' to the Document class and mifluz/DB 'schema'. 3. fire up retriever to process this log-file to parse & index the files from the local-disk.. adding the original and cached urls to Document object. in htsearch 1. display cached URL to screen given config option. Alternate path: 1. swipe httrack code for creating the directory structures on the fly for storing the spidered web-sites 2. write retrieved documents to local files in Retriever (swipe httrack code for localizing necessary page components?) 3. same changes to Document object. One real long term advantage to using httrack or something similar would be to offload the forward code maintenance of some of the htdig transport code to another project. Leaving htdig developers more time to work on other features. Of course this is an over simplification, choosing a quickly changing and complicated site-copier could prove to be painful. As it is now, I think that whipping up a version of htdig that processes a log like the one described above (with the additions to Document class & DB schema) would be pretty easy. Users can run htdig with a command line switch after running httrack. I am not 100% clear that httrack does a great job localizing web-page components... ie silly web-author using full http links to everything. It does web-http only. There may be other spiders better than httrack... I used it a while back to suck down a bunch of AI FAQs, worked flawlessly. It is well reviewed by users, for whatever that is worth. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Neal R. <ne...@ri...> - 2002-02-08 06:12:37
|
Hey, I've got a question on htsearch code (latest 3.2b4 snapshot): htsearch/htsearch.cc line 145-371 is the 'iterate over collections' loop. lines 161-283 (inside the loop) are setting htconfig variables based on CGI-form parameters. lines 285 & 370 (inside the loop) create and delete a Parser object. Then proceeding to approx line 322 the search is performed on the current database in the collection. Looks like a lot of redundant processing.. What is the future of this code? I've read ill-understood (by me) references to new searching/parsing code in previous posts. FYI: I'm trying to understand this to write an api for the searching code by reorganizing it: int htsearch_open(htsearch_parameters_struct *); int htsearch_query(htsearch_query_struct *); int htsearch_get_next_result(htsearch_query_result_struct *); int htsearch_close(); char * htsearch_get_error(); htsearch_get_next_result(..) is called in a loop to retrieve each hit. Some of the objects become global or at least global-static to the file. There will be PHP wrappers for these functions to call them in PHP pages. I'm wondering if it wouldn't be easier to start with qtest.cc and implement the 'hit' extraction if this code is going to get rewritten massively. Thanks again! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Neal R. <ne...@ri...> - 2002-02-07 21:39:26
|
Succeeded Mandrake 8.1 Failed Windows 200 Professional with Cygwin 1.1.0 iconv problems libiconv will not build under cygwin 1.1.0. Following the directions in the README.win32 file builds libraries (via nmake & visual studion) that do not seem compatible with building mifluz under cygwin. I'll try the latest version of cygwin and post another update. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |