htdig-dev Mailing List for ht://Dig (Page 95)

Brought to you by: angusgb, grdetil, lha, nealr, scherpbier

htdig-dev — Developer Discussion for the ht://Dig project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

Flat | Threaded

<< < 1 .. 93 94 95 96 97 .. 108 > >> (Page 95 of 108)

[htdig-dev] PHP wrapper

From: Gabriele B. <an...@ti...> - 2002-02-12 19:29:23

Ciao guys,

    I am working on a PHP wrapper project for ht://Dig. I read an 
interesting guide on the contributed work, but I think it is kinda old now, 
especially by keeping in mind new versions of PHP.

    Basically, I would like to create an XML file as output of the htsearch 
program, then use an XML parser from the PHP script. The PHP opens a pipe 
to the htsearch program and the XML reads its pointer.

    I got some problems as far as the excerpt is concerned. I was just 
wondering, if somebody of you is interested on it. And of course has some 
ideas and opinion!

    Thank you and Ciao, yours
-Gabriele
--
Gabriele Bartolini - Web Programmer
Current Location: Prato, Tuscany, Italy
an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > find bin/laden -name osama -exec rm {} \;
-
Important:
--------------
I've experienced problems when receiving e-mail sent to the
address: an...@us.... I think I lost much of it.
So if you sent me a message, and I never replied to you,
that's probably the reason. Please update your address book to
this one: an...@ti.... Sorry and thank you!

Re: [htdig-dev] 3.2.0b3 phrase searches

From: Geoff H. <ghu...@ws...> - 2002-02-12 15:39:14

On Tue, 12 Feb 2002, Dan Cutting wrote:

> But to the point of my post: the phrase searching seems a little flakey. I
> can search for some phrases with no problem, but if I extend the phrase by a
> word here or there it tends to return no results. It looks like it might

Yes, it's a known bug. I would use the 3.2.0b4 snapshots which are
certainly more stable than 3.2.0b3 and fix the security hole in
3.2.0b3. As to whether it's ready for a production environment, I can't
say. Certainly if you find bugs, we'll try to fix them as fast as
possible.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

[htdig-dev] htdig and java

From: <seb...@la...> - 2002-02-12 13:26:21

Hi,
 
I work on htdig and i have to make it work on java pages, but 
it apparently doesn't done before. I have not found perl 
script to do that (as scripts of "doc2html" to translate 
documents into html pages).
Is there someone who knows about a perl script or anything 
else which does it ?
 
I have a java class webcrawler to interpret the pages. It 
apparently work correctly for the dynamic links and others.
It output the interpreted page with the good links but i've 
tryed to call it in htdig and i'm not sure of the output to 
use with the webcrawler and how to call it correctly. 
 
If important i work on Openlinux 2.4, apache 1.3.20, jdk 
1.2.2_008 and htdig 3.1.5
 
Thanks

"Acc=E9dez au courrier =E9lectronique de La Poste : www.laposte.net ; 3615 LAPOSTENET  (0,13 =80/mn) ; t=E9l : 08 92 68 13 50 (0,34=80/mn)"

[htdig-dev] 3.2.0b3 phrase searches

From: Dan C. <dan...@so...> - 2002-02-12 07:12:53

Hi all,

I've just installed v3.2.0b3 with the hope it will be stable enough to use
in a relatively high volume production environment. I notice it was released
about a year ago; has anybody else run it in production successfully?

But to the point of my post: the phrase searching seems a little flakey. I
can search for some phrases with no problem, but if I extend the phrase by a
word here or there it tends to return no results. It looks like it might
have something to do with short or bad words being omitted and thus not
being recognised in a phrase. Does that sound plausible?If I turn off bad
words and minimum word length, won't it make searches slower and less
relevant in general?

Dan


**********************************************************************

visit http://www.solution6.com
visit http://www.eccountancy.com - everything for accountants.

UK Customers - http://www.solution6.co.uk

*********************************************************************
This email message (and attachments) may contain information that is confidential to Solution 6. If you are not the intended recipient you cannot use, distribute or copy the message or attachments.  In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments.  Opinions, conclusions and other information in this message and attachments that do not relate to the official business of Solution 6 are neither given nor endorsed by it.
*********************************************************************

[htdig-dev] Fwd: Modify htsearch to run under fastcgi

From: Geoff H. <ghu...@ws...> - 2002-02-12 06:25:03

Begin forwarded message:

> From: Jessica Biola <jes...@ya...>
> Date: Mon Feb 11, 2002  03:48:45  AM US/Central
> To: ghu...@us...
> Subject: Modify htsearch to run under fastcgi
>
> Geoff, I was wondering if you would be interest in
> taking on a project to modify htsearch (3.2.0 beta) so
> that it properly runs in fastcgi?  I'd be willing to
> pay a reasonable amount for such development.  Please
> let me know if this is something you'd be interested
> in or that you have the time to develop.
>
> Sincerely,
> Jes
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE Valentine eCards with Yahoo! Greetings!
> http://greetings.yahoo.com

[htdig-dev] CD

From: LIGHT88 <LI...@nj...> - 2002-02-12 04:49:59

Hello,
Can Dig be put onto and used with data on a CD?

[htdig-dev] donated replacement for system(mv)

From: Neal R. <ne...@ri...> - 2002-02-12 04:00:07

Attachments: filecopy.c filecopy.h

Guys,

	Here's a well tested piece of code for replacing the calls to
'system(mv)' in htfuzzy/EndingsDB.cc & htfuzzy/Synonym.cc

Example Useage:

file_copy(root2word.get(), config["endings_root2word_db"].get(), FILECOPY_OVERWRITE_ON);
unlink(root2word.get());

returns TRUE or FALSE

By well tested I mean shipping and executing tens, if not hundreds, of
thousands of times a day in our code.

Works across file systems in Linux, Solaris, FreeBSD & Windows NT 4.0, &
Windows 2000, and probably many others.

Note that each system defines BUFSIZ to be optimal in its libc header
files.

Thanks.

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

[htdig-dev] suggested new hash function

From: Jamie A. <Jam...@sl...> - 2002-02-12 03:35:54

Here's a possible replacement for the hash function in the 
Dictionary class - this one is much more forgiving on strings 
where the only difference is at the end (for example URLs from 
db-driven sites where the only difference in the URL is a 
parameter at the end). It's also very slightly better for 
/usr/dict/words too.

unsigned int hashCode2(const char *key)
{
    char *test;

    long  conv_key = strtol(key,  &test, 10);
    if (key && *key && !*test) // Conversion succeeded
    {
        return conv_key;
    }

    char *base = (char*)malloc( strlen( key ) +2);
    char *tmp_key = base;
    strcpy( tmp_key, key );

    unsigned int h = 0;
    int                 length = strlen(tmp_key);

    if (length >= 16)
    {
        tmp_key += strlen(tmp_key) - 15;
        length = strlen(tmp_key);
    }
    for (int i = length; i > 0; i--)
    {
        h = (h * 37) + *tmp_key++;
    }
    free( base );
    return h;
}

Jamie Anstice
Search Scientist,  S.L.I. Systems, Inc
jam...@sl...
ph:  64 961 3262
mobile: 64 21 264 9347

[htdig-dev] buglet in external transport handling

From: Jamie A. <Jam...@sl...> - 2002-02-12 01:57:30

There is a buglet with the external transport stuff which stops redirects 
from an external transport from working.

htdig/Document.cc(ln 515) - Document::Retrieve
Currently:
         if (transportConnect == HTTPConnect)
            redirected_to =  ((HtHTTP_Response *)response)->GetLocation();

Should be
         if (transportConnect == HTTPConnect || transportConnect == 
externalConnect)
            redirected_to =  ((HtHTTP_Response *)response)->GetLocation();

cheers

Jamie Anstice
Search Scientist,  S.L.I. Systems, Inc
jam...@sl...
ph:  64 961 3262
mobile: 64 21 264 9347

Re: [htdig-dev] Mirror servers not updated

From: J. op d. B. <MSQ...@st...> - 2002-02-11 09:01:37

You got a point there... Maybe....

On Fri, 8 Feb 2002, Geoff Hutchison wrote:

> At 11:17 AM +0100 2/8/02, J. op den Brouw wrote:
> >Does anyone know the email addresses of the maintainers?
> 
> I guess this would be a reason for that low-volume htdig-mirrors 
> mailing list I suggested to you. ;-)
> 
> -Geoff
> 

--jesse
--------------------------------------------------------------------
J. op den Brouw                           Johanna Westerdijkplein 75
Haagse Hogeschool                                  2521 EN  DEN HAAG
Faculty of Engeneering                                   Netherlands
Electrical Engeneering                                +31 70 4458936
-------------------- J.E...@st... --------------------

Linux - because reboots are for hardware changes

Re: [htdig-dev] Problems using date specific searches

From: Jamie A. <Jam...@sl...> - 2002-02-10 22:17:54

>At 2:53 PM -0500 2/8/02, William R. Knox wrote:
>>In parsedcdate, assume that unqualified dates are at noon instead of
>>midnight. If no one is ever more than 12 hours plus or minus UTC, this 
is
>>actually a very easy hack, er, solution.
>
>I seemed to think that this was probably true but figured I'd check 
>an official timezone map to be sure.
><http://aa.usno.navy.mil/faq/docs/world_tzones.html>
>
>Interestingly at least the US Navy has two zones at +13 and +14 hours 
>respectively. Admittedly these are some small islands out in the 
>Pacific and I don't know how many servers are running out there.

New Zealand & Fiji are at GMT+12, and NZ at least goes to GMT+13 during 
the summer.
Tonga standard is at GMT+13, so I think that it's a hack that's simple, 
easy and wrong.

Jamie Anstice
Search Scientist,  S.L.I. Systems, Inc
jam...@sl...
ph:  64 961 3262
mobile: 64 21 264 9347

[htdig-dev] Current Status as of snapshot 3.2.0b4-20020210

From: Geoff H. <ghu...@us...> - 2002-02-10 08:13:48

STATUS of ht://Dig branch 3-2-x

RELEASES:
   3.2.0b4: In progress
   3.2.0b3: Released:  22 Feb 2001.
   3.2.0b2: Released:  11 Apr 2000.
   3.2.0b1: Released:   4 Feb 2000.

SHOWSTOPPERS:

KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
   wordlist_compress set but work fine without wordlist_compress.
   (the date is definitely stored correctly, even with compression on
    so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
   consistant mapping of input -> config -> template for all inputs where
   it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set 
   correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
   we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
   not FLAG_DESCRIPTION. (PR#859)

PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
  (Should really only attempt to use SQL for doc_db and related, not word_db)

NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz

TESTING:
* httools programs: 
  (htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
  argument handling for parser/converter, allowing binary output from an
  external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.

DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
   (including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
   to template variables. (Relates to PR#648.) Also make sure these config
   attributes are all documented in defaults.cc, even if they're only set by
   input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
   requirements of 3.2.x (e.g. phrase searching, regex matching,
   external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.

OTHER ISSUES:
* Can htsearch actually search while an index is being created?
   (Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
   (It should probably just set everything to empty) This relates to 
   PR#348.

Re: [htdig-dev] Thoughts on a Google-style Cache feature

From: Geoff H. <ghu...@ws...> - 2002-02-09 04:57:33

At 12:20 AM -0700 2/8/02, Neal Richter wrote:
>One real long term advantage to using httrack or something similar would
>be to offload the forward code maintenance of some of the htdig transport
>code to another project.  Leaving htdig developers more time to work on
>other features.  Of course this is an over simplification, choosing a
>quickly changing and complicated site-copier could prove to be painful.

We looked at this possibility at the beginning of 3.2 development. At 
the time, the major possibilities were libwww from W3C, libghttp from 
GNOME, and swiping code from curl which had just appeared.

I was quite happy with the idea of using libwww since that's 
obviously well-maintained. The conclusion was basically that this was 
a bit of overkill and none of them (at least at that point) had 
particularly clean or necessarily fixed APIs.

But keep in mind that Gabriele does use htnet/ for ht://Check and 
certainly some features and bug-testing occur on both sides. I'm not 
aware of other projects using the code, but it's possible.

-Geoff

[htdig-dev] Re: htsearch rewrite (was 3.2b4 htsearch.cc code question)

From: Geoff H. <ghu...@ws...> - 2002-02-09 04:27:35

At 11:08 PM -0700 2/7/02, Neal Richter wrote:
>What is the future of this code?  I've read ill-understood (by
>me) references to new searching/parsing code in previous posts.
>...
>I'm wondering if it wouldn't be easier to start with qtest.cc and
>implement the 'hit' extraction if this code is going to get rewritten
>massively.

Bingo. Unfortunately, neither Quim or I have had the time to devote 
to this project. Somewhere around here I have the additions to 
DocMatch which added the scoring functionality for the new htsearch. 
I'll find those and commit them soon.

The basic idea is that qtest.cc will form the framework for a new 
htsearch. Much of the CGI input and command-line parsing in 
htsearch.cc is fine and can be moved as-is. Additionally qtest.cc 
needs to parse the search_algorithms attribute and set the 
OrFuzzyExpander appropriately.

I should have time tomorrow to sit down and do that much since people 
seem to be interested in picking this up.

For now, I'm going to ignore the collections and plug the ResultList 
directly into the current Display code. What else has been discussed 
as far as necessary cleanups and coding on the htsearch side?
* Re-think collections (i.e. how should they fit into the framework cleanly)
  -- We're all willing to break collections for some period while we 
redo the rest of htsearch.
* Improve the sorting algorithm for results
  -- Right now the entire list of results is sorted, regardless of how 
many are displayed. Much better to make a heap and pull them off as 
needed.
* Clean up the Display class (not many specific details on this yet)
* Add query and results caching
  -- Quim implemented some simple caching already, but an on-disk 
cache of recent queries would improve performance tremendously.

-Geoff

Re: [htdig-dev] Mirror servers not updated

From: Geoff H. <ghu...@ws...> - 2002-02-09 04:27:29

At 9:45 AM -0600 2/8/02, Gilles Detillieux wrote:
>  > So, this suggests to me that Geoff did properly upload
>>  the files to their download server.  However, when I FTP to
>>  ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest
>>  files aren't there.  ...
>
>Sorry, that should be ftp://download.sourceforge.net/pub/sourceforge/htdig/

<sigh> They were definitely uploaded and I checked both through the 
HTTP link and the FTP link.

In the Site Status page, there's this comment:
(2002-01-10 08:26:53)   As documented in Support Request 501887, some 
users have encountered issues in making use of SourceForge.net 
download services, due to a routing problem involving one of our new 
mirrors.

I'll submit another Support Request. It seems to be striking a few 
other projects that made releases about the same time as us. In 
*theory* it should work for both HTTP and FTP once it's uploaded 
through the project page.

If you want a laugh, consider that the downloads page for ht://Dig 
claims that no one has currently downloaded 3.1.6 through 
SourceForge. Yesterday, I believe it had several hundred.

-Geoff

Re: [htdig-dev] Problems using date specific searches

From: Geoff H. <ghu...@ws...> - 2002-02-09 03:57:42

At 2:53 PM -0500 2/8/02, William R. Knox wrote:
>In parsedcdate, assume that unqualified dates are at noon instead of
>midnight. If no one is ever more than 12 hours plus or minus UTC, this is
>actually a very easy hack, er, solution.

I seemed to think that this was probably true but figured I'd check 
an official timezone map to be sure.
<http://aa.usno.navy.mil/faq/docs/world_tzones.html>

Interestingly at least the US Navy has two zones at +13 and +14 hours 
respectively. Admittedly these are some small islands out in the 
Pacific and I don't know how many servers are running out there.

>This seems like the best course of action, though it has the 
>disadvantage of increasing the size of the database for all files, 
>even though only a limited number use the additional date 
>information. I suppose the meta date info could only be populated if 
>the file has it, though.

Certainly a META "date" field would only be populated if the document 
has it. OTOH, there's a limited amount of overhead for merely adding 
a record to each DocumentRef as it needs to distinguish between each 
bit of information stored for a given document.

>The additional advantage of this is that, currently, if the meta tag
>on a file stays the same, I don't think it ever gets reindexed - however,
>the meta tag could stay the same and the contents could change.

No, I really doubt this. It sends the date it has in the database in 
an If-Modified-Since request to the server as well as comparing it to 
the date returned by the server. If the server doesn't return a date, 
it uses the current time to the indexer. The If-Modified-Since test 
is only conceivably a problem if someone adds a META date tag with a 
time in the *future*. The test by htdig is even more restrictive:

               if (doc->ModTime() == ref->DocTime())
		// retrieved but not changed

So if the server ignored the If-Modified-Since and sends the document 
anyway, it'll only be ignored by ht://Dig if the time is *exactly* 
the same, which is pretty unlikely if the document itself was changed.

Setting the document date to the META date only happens after parsing 
occurs and no additional checking is performed.

-Geoff

Re: [htdig-dev] Mirror servers not updated

From: Geoff H. <ghu...@ws...> - 2002-02-09 03:57:41

At 11:17 AM +0100 2/8/02, J. op den Brouw wrote:
>Does anyone know the email addresses of the maintainers?

I guess this would be a reason for that low-volume htdig-mirrors 
mailing list I suggested to you. ;-)

-Geoff

Re: [htdig-dev] Problems using date specific searches

From: William R. K. <wk...@mi...> - 2002-02-08 21:03:31

OK, I have applied the patch supplied by Gilles, and it fixed the lack of
use of the date Meta tag in v.3.1.6. However, I have now come across a
more subtle problem - it boils down to the use or lack thereof of the
timezone information. In htsearch/Display.cc, the DocTime variable is
assumed to be in UTC, as modification dates returned by a server are in
UTC. However, the meta Date tags are likely already in the local time
zone, so when the comparison happens between DocTime and timet_startdate
and timet_enddate on lines 1417 and 1418 of Display.cc, it is between a
local time in the case of DocTime and a UTC time in the case of
timet_(start|end)date (due to the use of mktime on lines 1329 and 1330
instead of localtime). This, along with converting the date string
YYYY-MM-DD to YYYY-MM-DD 00:00:00 when meta tags are read, means that
searching for a start day of files that have meta tags will result in
missing a days worth of files.

The solution to this could be one of a few things -

Tell people they should use UTC in their meta tags. All in all, not really
a bad solution, but less than ideal for sites that already have it in
place.

In parsedcdate, assume that unqualified dates are at noon instead of
midnight. If no one is ever more than 12 hours plus or minus UTC, this is
actually a very easy hack, er, solution.

In parsedcdate, reset the time obtained to UTC by pulling the current time
zone offset (difference between localtime and mktime) - this assumes,
however, that the server and the search engine are in the same time zone.

Create a separate value for each document indicating whether or not the
date was obtained from the modification time or from the meta tag.

Create a separate value that is only for document modification date
information, and keep populating the current DocTime value like you do
now. The additional advantage of this is that, currently, if the meta tag
on a file stays the same, I don't think it ever gets reindexed - however,
the meta tag could stay the same and the contents could change. Separating
out the two would prevent this error from occurring. This seems like the
best course of action, though it has the disadvantage of increasing the
size of the database for all files, even though only a limited number use
the additional date information. I suppose the meta date info could only
be populated if the file has it, though.

Opinions, anyone?

			Bill Knox
			Senior Operating Systems Programmer/Analyst
			The MITRE Corporation

Re: [htdig-dev] Mirror servers not updated

From: Gilles D. <gr...@sc...> - 2002-02-08 15:45:25

According to me:
> So, this suggests to me that Geoff did properly upload
> the files to their download server.  However, when I FTP to
> ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest
> files aren't there.  ...

Sorry, that should be ftp://download.sourceforge.net/pub/sourceforge/htdig/

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] Mirror servers not updated

From: Gilles D. <gr...@sc...> - 2002-02-08 15:36:40

According to J. op den Brouw:
> I am NOT able to download 3.1.6 via:
> 
> download.sourceforge.net

I checked the download link for ht://Dig from the SourceForge web
site, and it lists 3.1.6 diffs and tarball just fine, but they seem
to be links to http://prdownloads.sourceforge.net/htdig/...  (see
http://sourceforge.net/project/showfiles.php?group_id=4593&release_id=73048).
So, this suggests to me that Geoff did properly upload
the files to their download server.  However, when I FTP to
ftp://download.sourceforge.net/sourceforge/htdig/ I see that the latest
files aren't there.  So, I would guess that SourceForge's own internal
mirroring to its download server farm isn't working right, but then
of course I'm so far beyond ever being surprised about such problem on
SnaFu.net that it's almost not worth the energy to shrug it off.  Sorry,
am I being cynical again?

> htdig.europeanservers.net

Their mirror doesn't seem to have been updated in over a year and a half.
They don't even have 3.2.0b3.  If we can't get them to update, we should
definitely take them off the list.

> www.it.htdig.org
> 
> Does anyone know the email addresses of the maintainers?

The www.europeanservers.net site lists in...@eu... as the
contact.  I can drop them a note to see what comes of it.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] Mirror servers not updated

From: Gabriele B. <g.b...@co...> - 2002-02-08 13:33:58

I just sent this e-mail to my friend Marco Nenciarini, who's the maintainer 
of the italian mirror.

Thanks Jesse and bye
-Gabriele

At 11.17 08/02/2002 +0100, J. op den Brouw wrote:

>I am NOT able to download 3.1.6 via:
>
>download.sourceforge.net
>htdig.europeanservers.net
>www.it.htdig.org
>
>Does anyone know the email addresses of the maintainers?
>
>--jesse
>--------------------------------------------------------------------
>J. op den Brouw                           Johanna Westerdijkplein 75
>Haagse Hogeschool                                  2521 EN  DEN HAAG
>Faculty of Engeneering                                   Netherlands
>Electrical Engeneering                                +31 70 4458936
>-------------------- J.E...@st... --------------------
>
>Linux - because reboots are for hardware changes
>
>
>_______________________________________________
>htdig-dev mailing list
>htd...@li...
>https://lists.sourceforge.net/lists/listinfo/htdig-dev
Ciao,

Ciao
-Gabriele

--
Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
g.b...@co... | http://www.po-net.prato.it/

  The nice thing about Windows is - It does not just crash,
  it displays a dialog box and lets you press 'OK' first.

[htdig-dev] Mirror servers not updated

From: J. op d. B. <MSQ...@st...> - 2002-02-08 10:17:53

I am NOT able to download 3.1.6 via:

download.sourceforge.net
htdig.europeanservers.net
www.it.htdig.org

Does anyone know the email addresses of the maintainers?

--jesse
--------------------------------------------------------------------
J. op den Brouw                           Johanna Westerdijkplein 75
Haagse Hogeschool                                  2521 EN  DEN HAAG
Faculty of Engeneering                                   Netherlands
Electrical Engeneering                                +31 70 4458936
-------------------- J.E...@st... --------------------

Linux - because reboots are for hardware changes

[htdig-dev] Thoughts on a Google-style Cache feature

From: Neal R. <ne...@ri...> - 2002-02-08 07:24:31

Hey all,
	
I mentioned this briefly in a reply to Geoff... here it is fleshed out for
pondering.

	http://www.httrack.com/

I did a little more investigation into httrack..

httrack is a neat and small web-site copier/spider.  It builds out of the
box as a library (libhttrack.so).  There is a supplied example app that
uses libhttrack for basic web-site copying.. about 150 lines long.  The
main httrack exe has a few more lines for various features.

So a short path to using httrack to get a feature like this would be to:

Httrack path:

in htdig
1. Substitute the call to the htdig internal retriever/transport calls
2. stream original URLs, local http-URL & local-disk filename to a logfile
4. add 'URL CacheURL' to the Document class and mifluz/DB 'schema'.
3. fire up retriever to process this log-file to parse & index the files
from the local-disk.. adding the original and cached urls to Document
object.

in htsearch
1. display cached URL to screen given config option.


Alternate path:

1. swipe httrack code for creating the directory structures on the
fly for storing the spidered web-sites
2. write retrieved documents to local files in Retriever (swipe httrack
code for localizing necessary page components?)
3. same changes to Document object.


One real long term advantage to using httrack or something similar would
be to offload the forward code maintenance of some of the htdig transport
code to another project.  Leaving htdig developers more time to work on
other features.  Of course this is an over simplification, choosing a
quickly changing and complicated site-copier could prove to be painful.


As it is now, I think that whipping up a version of htdig that processes a
log like the one described above (with the additions to Document
class & DB schema) would be pretty easy.  Users can run htdig with a
command line switch after running httrack.

I am not 100% clear that httrack does a great job localizing web-page
components... ie silly web-author using full http links to everything.  It
does web-http only.

There may be other spiders better than httrack... I used it a while back
to suck down a bunch of AI FAQs, worked flawlessly.  It is well reviewed
by users, for whatever that is worth.


-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

[htdig-dev] 3.2b4 htsearch.cc code question

From: Neal R. <ne...@ri...> - 2002-02-08 06:12:37

Hey,

	I've got a question on htsearch code (latest 3.2b4 snapshot):

htsearch/htsearch.cc

line 145-371 is the 'iterate over collections' loop.

lines 161-283 (inside the loop) are setting htconfig variables based on
CGI-form parameters.

lines 285 & 370 (inside the loop) create and delete a Parser object.

Then proceeding to approx line 322 the search is performed on the current
database in the collection.

Looks like a lot of redundant processing..

What is the future of this code?  I've read ill-understood (by
me) references to new searching/parsing code in previous posts.

FYI: I'm trying to understand this to write an api for the searching code
by reorganizing it:

int htsearch_open(htsearch_parameters_struct *);
int htsearch_query(htsearch_query_struct *);
int htsearch_get_next_result(htsearch_query_result_struct *);
int htsearch_close();
char * htsearch_get_error();

htsearch_get_next_result(..) is called in a loop to retrieve each hit.
Some of the objects become global or at least global-static to the file.

There will be PHP wrappers for these functions to call them in PHP pages.

I'm wondering if it wouldn't be easier to start with qtest.cc and
implement the 'hit' extraction if this code is going to get rewritten
massively.

Thanks again!

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

Re: [htdig-dev] Re: mifluz merge

From: Neal R. <ne...@ri...> - 2002-02-07 21:39:26

Succeeded
Mandrake 8.1  

Failed
Windows 200 Professional with Cygwin 1.1.0

iconv problems

libiconv will not build under cygwin 1.1.0.  Following the
directions in the README.win32 file builds libraries (via nmake &
visual studion) that do not seem compatible with building mifluz under
cygwin.

I'll try the latest version of cygwin and post another update.

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 93 94 95 96 97 .. 108 > >> (Page 95 of 108)