You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Brian W. <bw...@st...> - 2002-09-12 02:10:21
|
Hi,
I have been playing with my Adjustable Logging patch.
I now have it working for 3.2.0b3, but I am stuck on
the configure step. I have added the following to
configure.in:
AC_MSG_CHECKING(if fcntl can lock files?)
AC_TRY_COMPILE([#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
struct flock flk;],
[flk.l_type = F_RDLCK | F_WRLCK;
flk.l_start = 0;
flk.l_whence = SEEK_SET | SEEK_END;
flk.l_len = 0;
flk.l_pid = getpid();
fcntl( 7, F_SETLKW | F_SETLK, &flk );],
[AC_MSG_RESULT(yes);AC_DEFINE(__FILE_LOCKING__)],
[AC_MSG_RESULT(no)])
This is producing a correct looking configure script
and it looks like it it is running OK, ( I am getting
the "Yes" and "No" responses as expected ) but I have
no idea how to propogate my AC_DEFINE down to the
C++ code.
Also, I decided to put all the file locking
stuff in a separate C++ class with an interface which
looks like this:
class LockedFile
{
public:
LockedFile();
// Destructor - will close the file if it hasn't
// been closed already
~LockedFile();
// This method opens a file, The path and mode parameters
// are the same as those accepted by the standard fopen
// function. It will return false if it fails to open
// the file, in which case it will set the global
// value errno as per the fopen function
//
// By deafult it will wait for the lock, but it can
// be told to not wait
int Open( const char* path, const char* mode, int waitForLock );
int Open( const char* path, const char* mode );
// This method will close the file if it hasn't been
// closed already
void Close();
// This allows an object of this type to be used
// at any point where a FILE* is expected
operator FILE*();
};
which I have put in htlib/FileLocking.[h|cc], and
I have modified htlib/Makefile.am accordingly.
This means that all the evil mucking about
with #defines is limited to a single place, and the
functionality is available to anyone else who
needs it. If the file locking is not available, it
just calls "fopen".
Sample USage:
#include "FileLocking.h"
LockedFile lkdf;
if ( lkdf.Open("/path/to/file", "w" ) )
{
fprintf( lkdf, "Hello Wordl\n" );
lkdf.Close();
}
Are there any problems with doing this? Are there
any conventions I should be following that I am not?
Regs
Brian
-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: bw...@st...
Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste
|
|
From: Neal R. <ne...@ri...> - 2002-09-11 22:00:37
|
Geoff, I'm working on adapting your mifluz-merge to build libhtdig.. I'm, seeing a memory freeing error and this is playing a role: In HtWordList.cc The constructor creates a new HtConfiguration object with 'new'.. and deletes the object in the destructor. Is there any reason we can't move to using the global _config = HtConfiguration::config(); as is done in many other places? I tried it and it seems to work fine. Any objections? Thanks! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2002-09-11 16:50:54
|
According to Lachlan Andrew: > On Wed, 11 Sep 2002 05:06, you wrote: > > > The other problem is the format of "kurl"s, which use a > > > single '/' after the ':'. > > > > The proper fix would be in htcommon/URL.cc, where it > > handles parsing of URLs. I think it has to deal with > > exceptions like this on a case by case basis. > > Thanks for that. I was thinking of having a list of known > services, specifying the number of leading slashes, with > default entries: > mailto, news : 0 > http, ftp, most others : 2-or-more > > When there are two-or-more slashes, the user and port will > be parsed as currently. Otherwise, it will all be treated > as "path". > > When an external transport mechanism is specified as, say, > 'https' it will be added with "two slashes". However, if > it is specified as, say, "man:" or "help:/" or "https://" > (with a colon), then the number of slashes can be specified > explicitly. That will avoid hard-coding the KDE stuff into > htDig:// > > Does that sound feasible? That sounds like a great idea to me. Would you be willing to implement it? If you can provide patches, I can make sure they make it into the CVS tree. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Daniel N. <dan...@t-...> - 2002-09-11 12:39:19
|
On Wednesday 11 September 2002 04:12, you wrote: > How are you indexing? Are you sure that you're running htpurge after > running htdig? Ooops, that's it. After running htpurge it worked. Sorry, I switched so=20 often between 3.1 and 3.2 for testing things that a was a bit confused... Thanks Daniel =2D-=20 http://www.danielnaber.de |
|
From: Geoff H. <ghu...@ws...> - 2002-09-11 02:12:06
|
> I'm using 3.2 (snapshot 20020825). For some terms, I get output like > "Matches 1 to 3 of 3" bu only 2 matches are displayed. I even found one > case where there should be 4 matches according to $MATCHES, but there > were How are you indexing? Are you sure that you're running htpurge after running htdig? There may be documents marked to be purged (which will not show up in results) that still have words attached. It sounds like at least a minor bug though, so I'll put this on the list to investigate. -Geoff |
|
From: Lachlan A. <lh...@ee...> - 2002-09-11 00:57:23
|
On Wed, 11 Sep 2002 05:06, you wrote:
> Where did you get your code in June? I fixed this
> problem in the CVS tree back on January 11/02. You must
> have had an older snapshot.
Oops! Yes, I was getting my directories confused. I
apologise for casting any aspersions...
> > The other problem is the format of "kurl"s, which use a
> > single '/' after the ':'.
>
> The proper fix would be in htcommon/URL.cc, where it
> handles parsing of URLs. I think it has to deal with
> exceptions like this on a case by case basis.
Thanks for that. I was thinking of having a list of known
services, specifying the number of leading slashes, with
default entries:
mailto, news : 0
http, ftp, most others : 2-or-more
When there are two-or-more slashes, the user and port will
be parsed as currently. Otherwise, it will all be treated
as "path".
When an external transport mechanism is specified as, say,
'https' it will be added with "two slashes". However, if
it is specified as, say, "man:" or "help:/" or "https://"
(with a colon), then the number of slashes can be specified
explicitly. That will avoid hard-coding the KDE stuff into
htDig://
Does that sound feasible?
Thanks again for your feedback :)
--
Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678
Dept Electrical and Electronic Engg CRICOS Provider Code
University of Melbourne, Victoria, 3010 AUSTRALIA 00116K
|
|
From: Gilles D. <gr...@sc...> - 2002-09-10 17:37:08
|
According to Martin Vorlaender: > sv...@kb... wrote: > > But the rest of the variables or most of them are mentioned > > on page http://www.htdig.org/confindex.html > > under the config attributes > > search_results_header, > > search_results_footer, > > search_results_wrapper > > > > So why not STARSLEFT and STARSRIGHT? > > I see what you mean. And I have to agree. Perhaps these > attributes' entries should only contain a link to the > hts_templates.html page where *all* template variables are > described (except HTSEARCH_RESULTS which only makes sense > in the context of search_results_wrapper). Well, for starters, not all template variables can be used in the header and footer. For example, STARSLEFT and STARSRIGHT can only be used in the result templates (see http://www.htdig.org/attrs.html#template_map), as these are among several template variables that relate to a specific matched document, so mentioning them in the descriptions for search_results_footer or search_results_header would be inappropriate. These latter descriptions include only a subset of the template variables that you'd most likely need to work with in the header and footer. A link to hts_templates.html would be a good idea, though. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Daniel N. <dan...@t-...> - 2002-09-10 15:10:47
|
Hi, I'm using 3.2 (snapshot 20020825). For some terms, I get output like=20 "Matches 1 to 3 of 3" bu only 2 matches are displayed. I even found one=20 case where there should be 4 matches according to $MATCHES, but there were= =20 only 2. For others terms, everything is correct. This is on a big index=20 with >20.000 document, I unfortunately cannot reproduce with a small=20 index. Maybe someone has any idea anyway? Regards Daniel =2D-=20 http://www.danielnaber.de |
|
From: Gilles D. <gr...@sc...> - 2002-09-09 22:59:25
|
According to Lachlan Andrew: > I've been trying to use htDig://'s ExternalTransport to > index KDE's "kurl"s, such as help:/ man:/ info:/ etc. > > As the code stands (or stood, circa June) it seg faults > when it dereferences _access_time immediately after > deleting it in Reset(). This can be fixed by the trivial > patch below, but I'm surprised it hasn't caused havoc > earlier. Has anyone else used (tested?) ExternalTransport? > (I haven't yet checkied if this patch introduces a > memory leak.) Where did you get your code in June? I fixed this problem in the CVS tree back on January 11/02. You must have had an older snapshot. > The other problem is the format of "kurl"s, which use a > single '/' after the ':'. This complies with RFC 1738, > since they don't "involve the direct use of an IP-based > protocol to a specified host on the Internet". However > htDig:// parses the url and returns "host not known" before > calling ExternalTransport. Moreover, it automatically > translates the help:/ into help:// . Technically, the > simplest hack is to make KDE accept arbitrarily many '/'s > (but getting the KDE team to accept them may be harder :) > but it may be more elegant to fix htDig://. Comments or > pointers would be welcome. The proper fix would be in htcommon/URL.cc, where it handles parsing of URLs. I think it has to deal with exceptions like this on a case by case basis. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Martin V. <mv...@PD...> - 2002-09-09 10:13:14
|
sv...@kb... wrote: > But the rest of the variables or most of them are mentioned > on page http://www.htdig.org/confindex.html > under the config attributes > search_results_header, > search_results_footer, > search_results_wrapper > > So why not STARSLEFT and STARSRIGHT? I see what you mean. And I have to agree. Perhaps these attributes' entries should only contain a link to the hts_templates.html page where *all* template variables are described (except HTSEARCH_RESULTS which only makes sense in the context of search_results_wrapper). cu, Martin -- OpenVMS: | Martin Vorlaender VMS & WNT programmer When you KNOW | work: mv...@pd... where you want | http://www.pdv-systeme.de/users/martinv/ to go today. | home: ma...@ra... |
|
From: <sv...@kb...> - 2002-09-09 10:02:45
|
But the rest of the variables or most of them are mentioned on page http://www.htdig.org/confindex.html under the config attributes search_results_header, search_results_footer, search_results_wrapper So why not STARSLEFT and STARSRIGHT? regards ------------------------------------------------------------ S=F8ren Vejrup Carlsen, DWA, Det Kongelige Bibliotek tlf: (+45) 33 47 48 41 email: sv...@kb... email: sv...@us... ------------------------------------------------------------- Non omnia possumus omnes --- Macrobius, Saturnalia, VI, 1, 35 ------- = =20 "Martin Vorlaender" = =20 <mv...@PD...> To: <htdig-d= ev...@li...> =20 Sent by: cc: <svc@kb.= dk> =20 htd...@li... Subject: RE:= [htdig-dev] List of variables which are =20 eforge.net available in Htd= ig =20 = =20 = =20 09-09-02 11:48 = =20 = =20 = =20 sv...@kb... wrote: > Jim Cole wrote: >> sv...@kb...'s bits of Sat, 7 Sep 2002 translated to: >>> Is there a list of variables, which can be used in the search templates. >> >> See http://www.htdig.org/hts_templates.html > > However, on the page http://www.htdig.org/confindex.html, > there is no mention at all to the variables STARSLEFT and > STARSRIGHT, or > indeed their connection with the 'image_url_prefix' attribute. Why should there be? These are not configuration variables, but template variables. So they need to only be mentioned on the page about templates. cu, Martin -- One OS to rule them all | Martin Vorlaender | VMS & WNT programmer One OS to find them | work: mv...@pd... One OS to bring them all | http://www.pdv-systeme.de/users/martinv/ And in the Darkness bind them.| home: ma...@ra... ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=3Dsourceforge1&refcode1=3Dvs3390 _______________________________________________ htdig-dev mailing list htd...@li... https://lists.sourceforge.net/lists/listinfo/htdig-dev = |
|
From: Martin V. <mv...@PD...> - 2002-09-09 09:48:43
|
sv...@kb... wrote: > Jim Cole wrote: >> sv...@kb...'s bits of Sat, 7 Sep 2002 translated to: >>> Is there a list of variables, which can be used in the search templates. >> >> See http://www.htdig.org/hts_templates.html > > However, on the page http://www.htdig.org/confindex.html, > there is no mention at all to the variables STARSLEFT and > STARSRIGHT, or > indeed their connection with the 'image_url_prefix' attribute. Why should there be? These are not configuration variables, but template variables. So they need to only be mentioned on the page about templates. cu, Martin -- One OS to rule them all | Martin Vorlaender | VMS & WNT programmer One OS to find them | work: mv...@pd... One OS to bring them all | http://www.pdv-systeme.de/users/martinv/ And in the Darkness bind them.| home: ma...@ra... |
|
From: <sv...@kb...> - 2002-09-09 09:40:34
|
Dear Jim. Thank you for your answer. However, on the page http://www.htdig.org/confindex.html, there is no mention at all to the variables STARSLEFT and STARSRIGHT, o= r indeed their connection with the 'image_url_prefix' attribute. regards ------------------------------------------------------------ S=F8ren Vejrup Carlsen, DWA, Det Kongelige Bibliotek tlf: (+45) 33 47 48 41 email: sv...@kb... email: sv...@us... ------------------------------------------------------------- Non omnia possumus omnes --- Macrobius, Saturnalia, VI, 1, 35 ------- = =20 Jim Cole = =20 <gre...@yg...> To: svc@kb.d= k =20 Sent by: cc: "htdig-d= ev...@li..." =20 htd...@li... <htdig-dev@lists= .sourceforge.net> =20 eforge.net Subject: Re:= [htdig-dev] List of variables which are =20 available in Htd= ig =20 = =20 08-09-02 02:57 = =20 = =20 = =20 sv...@kb...'s bits of Sat, 7 Sep 2002 translated to: >Is there a list of variables, which can be used in the search templat= es. >I don't seem to be able to find one. >It would be nice to have one. See http://www.htdig.org/hts_templates.html Jim ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=3Dsourceforge1&refcode1=3Dvs3390 _______________________________________________ htdig-dev mailing list htd...@li... https://lists.sourceforge.net/lists/listinfo/htdig-dev = |
|
From: Martin V. <mv...@PD...> - 2002-09-09 07:30:07
|
Gabriele Bartolini wrote: > thanks for pointing to the cookies patch. Yes, it is a > backporting patch from 3.2.x and works pretty well on my > Linux system. I expect other people on different platforms > to test it and give me some feedback regarding portability. I gave it a try just yesterday night. The VMS C++ compiler was unhappy about these things (from memory, as I develop at home, but read the list at work - sorry): - It didn't like the std::cout default assignments. I removed "std" and it was okay. Obviously it's not using the std namespace. - There's an inline declaration without a definition (HtSetDate?) I moved the procedure body into the header file. - Httimegm is not defined / timegm.c is missing. As it hadn't occured to me that this is a backport, I did't yet look into the 3.2.x distribution. - It complains about a missing constructor String(char *). This is just a missing #include "htString.h". - I still have to look into the #ifdef _LIBC sections. Compaq C is probably sufficiently POSIX but uses a different macro. Besides that, everything looks good. Thanks a lot. cu, Martin -- So long, and thanks | Martin Vorlaender | VMS & WNT programmer for all the books... | work: mv...@pd... In Memoriam Douglas Adams | http://www.pdv-systeme.de/users/martinv/ 1952-2001 | home: ma...@ra... |
|
From: Lachlan A. <lh...@ee...> - 2002-09-09 01:50:06
|
Greetings,
I've been trying to use htDig://'s ExternalTransport to
index KDE's "kurl"s, such as help:/ man:/ info:/ etc.
As the code stands (or stood, circa June) it seg faults
when it dereferences _access_time immediately after
deleting it in Reset(). This can be fixed by the trivial
patch below, but I'm surprised it hasn't caused havoc
earlier. Has anyone else used (tested?) ExternalTransport?
(I haven't yet checkied if this patch introduces a
memory leak.)
The other problem is the format of "kurl"s, which use a
single '/' after the ':'. This complies with RFC 1738,
since they don't "involve the direct use of an IP-based
protocol to a specified host on the Internet". However
htDig:// parses the url and returns "host not known" before
calling ExternalTransport. Moreover, it automatically
translates the help:/ into help:// . Technically, the
simplest hack is to make KDE accept arbitrarily many '/'s
(but getting the KDE team to accept them may be harder :)
but it may be more elegant to fix htDig://. Comments or
pointers would be welcome.
Cheers,
Lachlan
*** ExternalTransport.cc Fri Sep 6 22:44:49 2002
--- ExternalTransport.cc.lha Fri Sep 6 22:44:53 2002
***************
*** 188,193 ****
--- 188,194 ----
// Set up a response for this request
_Response->Reset();
// We just accessed the document
+ _Response->_access_time = new HtDateTime;
_Response->_access_time->SettoNow();
--
Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678
Dept. Electrical and Electronic Eng. CRICOS Provider Code
University of Melbourne, Victoria, 3010 AUSTRALIA 00116K
|
|
From: Brian W. <bw...@st...> - 2002-09-09 01:15:43
|
Ok. I will volunteer to do this. I'll sketch up a mini-design and post it here. If I can take you up on you offer of creating an intial version of the xml file, that would be great. Regs Brian At 00:46 7/09/2002, Geoff Hutchison wrote: >On Fri, 6 Sep 2002, Brian White wrote: > > > I was looking at defaults.cc and I was wondering if > > it might be better managing the info as an XML file > > and then using that as a basis for generating > > defaults.cc and the HTML docs. > >Yes, this would actually be quite wonderful. Currently, it's hard to >"validate" changes you make to defaults.cc. It's also a minor pain to >insert and format HTML, since it has to be properly escaped. (No, it's not >a big deal, but XML would obviously be easier.) > > > Disadvantages > > * Part of the build process for exexcutabe would require > > perl to exist > >No, not really. We have lots of "autogenerated" files in 3.2. You'd only >need Perl if you modified defaults.xml and needed to generate the new >defaults.cc. > > > * After rabbiting on like this I now have to decide > > if I willing to put my money where my mouth is..... > >Yes, now that's the question. :-) > >Formatting defaults.cc into defaults.xml isn't hard and I'd be glad to do >that with some emacs macros. But I'd be glad to accept this change if you >(or someone) will write the defaults_generate.pl script. Ideally the >script would have some nice error-checking to tell you if you've left out >a field, etc. > >-- >-Geoff Hutchison >Williams Students Online >http://wso.williams.edu/ > > > >------------------------------------------------------- >This sf.net email is sponsored by: OSDN - Tired of that same old >cell phone? Get a new here for FREE! >https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 >_______________________________________________ >htdig-dev mailing list >htd...@li... >https://lists.sourceforge.net/lists/listinfo/htdig-dev ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Adam B. <ad...@fr...> - 2002-09-08 11:34:06
|
Hi All, Thanks for the feedback. I've installed the 3.2 Beta4 snapshot that was=20 available. The cookie support is working fine using: disable_cookies: false=20 I was able to log in to our WebGUI (highly recommended:=20 http://www.plainblack.com/webgui) development site by using:=20 start_url:http://www.mysite.org.au/?op=3Dlogin&username=3Dusername&identi= fier=3Dpassword=20 (this approach also works for the Phprojekt component of the site). The=20 cookies must have worked because Htdig was subsequently able to index=20 protected pages. As a first timer with Htdig my first impressions have been very positive = with=20 snappy performance and 90% of the functionality we require out of the box= =2E Our site has multiple user and groups who can see diferent pages. They al= so=20 need to be able to choose the type of information source: organisations,=20 publications and/or discussions. I am currently intending to implement th= is=20 by creating numerous different htdig databases created by searching selec= t=20 parts of the site with appropriate permissions. A wrapper script will com= bine=20 several searches and order the results by category. The site is for an NGO in Melbourne, Australia that provides women's=20 information services. Our approach is to make the search engine the prima= ry=20 source of public and internal information. Results with Htdig are=20 encouraging. regards, Adam On Saturday 07 September 2002 07:09, Gabriele Bartolini wrote: > Ciao Gilles, > > thanks for pointing to the cookies patch. Yes, it is a backporting > patch from 3.2.x and works pretty well on my Linux system. I expect oth= er > people on different platforms to test it and give me some feedback > regarding portability. > > As far as patch posting is concerned, I don't know whether Joe rece= ived > my e-mail 2 days ago. If you need more stuff please tell me, ok? > > >How you'd use that for authorisation may be another matter, though. > >I don't know if Gabriele's cookie support has a way of pre-loading coo= kies > >from another browser into htdig's cookie jar. Care to comment, Gabrie= le? > >(Or anyone else for that matter?) > > Well, to be sincere it is not a difficult thing. The code was desig= ned > to support any kind of Cookies storage, it is pretty flexible indeed. F= or > now only the memory jar is supported. This regards though the persisten= cy > of cookies among different crawls, which is different to the pre-loadin= g of > cookies. This should not be a big work to do as I just said. > > We only need a format for storing them. We could decide this, if yo= u > want. My vote would be to store them in a text file using the HTTP synt= ax. > By doing this, we would avoid any kind of parsing and the insertion int= o > the memory Jar would be straightforward. > > I don't know though if this is a flexible solution and the right o= ne, > because it seems to go against the usual way of configuration and leads= in > some cases to confusion (a user should use the specific and exact synta= x in > order to issue one or more cookies). Any ideas? > > Ciao ciao and thanks for rising up the topic > -Gabriele |
|
From: Geoff H. <ghu...@us...> - 2002-09-08 07:13:55
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
(mifluz merge essentially finished, contact Geoff for patch to test)
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Jim C. <gre...@yg...> - 2002-09-08 00:57:48
|
sv...@kb...'s bits of Sat, 7 Sep 2002 translated to: >Is there a list of variables, which can be used in the search templates. >I don't seem to be able to find one. >It would be nice to have one. See http://www.htdig.org/hts_templates.html Jim |
|
From: <sv...@kb...> - 2002-09-07 16:09:56
|
Dear All. Is there a list of variables, which can be used in the search template= s. I don't seem to be able to find one. It would be nice to have one. It is not totally obvious, that the variables STARSLEFT and STARSRIGHT = are related to the config variable 'star_image'. These variables are not mentioned in the article about 'star_image' regards ------------------------------------------------------------ S=F8ren Vejrup Carlsen, DWA, Det Kongelige Bibliotek tlf: (+45) 33 47 48 41 email: sv...@kb... email: sv...@us... ------------------------------------------------------------- Non omnia possumus omnes --- Macrobius, Saturnalia, VI, 1, 35 -------= |
|
From: Jim C. <gre...@yg...> - 2002-09-07 06:39:09
|
Gilles Detillieux's bits of Fri, 6 Sep 2002 translated to: >This is what I really hate about C++! So many of the so-called standard >classes aren't really that standard, and vary from one implementation to >another, and one release to another. How intuitive is this? bad() is >not to be taken as an antonym of good()? If what you're saying is true, No argument. The naming leaves a lot to be desired in this case. Also the inconsistency in what the accessors do. eof() and bad() check only eofbit and badbit, respectively. fail() checks failbit *and* badbit (for historical purposes, according to a footnote). good() essentially checks all three, ensuring that none are set. >then 3.1.x's htlib/Configuration.cc code will also bomb if it can't open Yes, it will ;) I just verified this with htnotify. However for htnotify it is only an issue if -c is used, since configFile is set to DEFAULT_CONFIG_FILE by default. From what I can tell, all of the other programs are protected from this problem by a call to access() (and then reportError() if access() fails), which precedes the calls to Configuration::Read(). >the config file. So, just how standard is the behaviour you describe? Based on my experience, it is standard behavior; however I of course won't claim that that experience covers the majority of possible compiler/OS combinations. Inconsistency and naming issues aside, the standard is crystal clear in terms of specified behavior, so there is really no good excuse for a non-compliant implementation. >If we start using in.good() in htnotify.cc and Configuration.cc, instead >of !in.bad(), will the code work correctly on all supported platforms, >or will it start to bomb on some of the systems where !in.bad() used to >work fine? Do we need configure tests for all this nonsense? I would argue that any compiler that is unable to handle such a change is very much broken in this regard. For whatever it is worth, my suggestion would be to add an access() call to htnotify in the 3.1.x branch, as implemented in the other programs, and then migrate from !bad() to good() in the 3.2.x branch. I think the access() call will be enough to protect htnotify from an invalid config file argument and at the same time minimize the amount of code that needs to be touched. If there are any platforms for which configure tests are needed, hopefully that will get flushed out in the 3.2.x betas. >but I can't find any information on what the distinction is between >badbit and failbit. Is this consistent across all platforms? If so, Again based on my own experience, it is consistent. As for the distinction between the two bits, my understanding is that badbit is used for cases where the stream encounters a problem so severe that all bets are off; there is no reasonable recourse other than abandoning the stream object altogether. On the other hand, the failbit indicates that some sort of stream related failure occurred, but that recovery is not out of the question; in most cases it should be possible to clear the bit, put things back to some known state, and then continue (e.g. clear the bits, call ifstream's open using a different file name, and continue). Jim |
|
From: Gabriele B. <an...@ti...> - 2002-09-06 21:07:02
|
Ciao Gilles,
thanks for pointing to the cookies patch. Yes, it is a backporting
patch from 3.2.x and works pretty well on my Linux system. I expect other
people on different platforms to test it and give me some feedback
regarding portability.
As far as patch posting is concerned, I don't know whether Joe received
my e-mail 2 days ago. If you need more stuff please tell me, ok?
>How you'd use that for authorisation may be another matter, though.
>I don't know if Gabriele's cookie support has a way of pre-loading cookies
>from another browser into htdig's cookie jar. Care to comment, Gabriele?
>(Or anyone else for that matter?)
Well, to be sincere it is not a difficult thing. The code was designed
to support any kind of Cookies storage, it is pretty flexible indeed. For
now only the memory jar is supported. This regards though the persistency
of cookies among different crawls, which is different to the pre-loading of
cookies. This should not be a big work to do as I just said.
We only need a format for storing them. We could decide this, if you
want. My vote would be to store them in a text file using the HTTP syntax.
By doing this, we would avoid any kind of parsing and the insertion into
the memory Jar would be straightforward.
I don't know though if this is a flexible solution and the right one,
because it seems to go against the usual way of configuration and leads in
some cases to confusion (a user should use the specific and exact syntax in
order to issue one or more cookies). Any ideas?
Ciao ciao and thanks for rising up the topic
-Gabriele
--
Gabriele Bartolini - Web Programmer
Current Location: Prato, Tuscany, Italia
an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> find bin/laden -name osama -exec rm {} \;
|
|
From: Joe R. J. <jj...@cl...> - 2002-09-06 19:13:56
|
On Fri, 6 Sep 2002, Gilles Detillieux wrote: > Date: Fri, 6 Sep 2002 13:50:49 -0500 (CDT) > From: Gilles Detillieux <gr...@sc...> > To: ad...@fr... > Cc: htd...@li..., htd...@li... > Subject: [htdig-dev] Re: [htdig] Using cookies > > According to Adam Brown: > > The site I am indexing uses cookies for authorisation. I am guessing I will > > need to write a wrapper to htdig to log in to the site with the appropriate > > user name and password and then store the cookie data in Htdig somewhere so > > that it is authorised to browse the site. > > > > How do I do this or could someone please point me to the appropriate > > documentation. > > I don't know if you noticed, but Gabriele posted a patch to 3.1.6 > yesterday that adds cookie support. It hasn't made it onto the patch > archives yet, but no doubt it will soon (right, Joe?), so you can look > for it there if you can't find it on the htdig-dev mailing list archives. > It seems to be a backport of the cookie support in 3.2.0b4. So, whether > you use a recent snapshot of 3.2.0b4, or use 3.1.6 with Gabriele's patch, > you should be able to get cookie support going with htdig. > > How you'd use that for authorisation may be another matter, though. > I don't know if Gabriele's cookie support has a way of pre-loading cookies > from another browser into htdig's cookie jar. Care to comment, Gabriele? > (Or anyone else for that matter?) Thank you Gilles for the reminder. I saved it yesterday to cookies.gz.0 file, but then I got caught up with a flood of work, and forgot to move it to the patch site;( It's now in: ftp://ftp.ccsf.org/htdig-patches/3.1.6/cookies.gz.0 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
|
From: Gilles D. <gr...@sc...> - 2002-09-06 18:51:15
|
According to Adam Brown: > The site I am indexing uses cookies for authorisation. I am guessing I will > need to write a wrapper to htdig to log in to the site with the appropriate > user name and password and then store the cookie data in Htdig somewhere so > that it is authorised to browse the site. > > How do I do this or could someone please point me to the appropriate > documentation. I don't know if you noticed, but Gabriele posted a patch to 3.1.6 yesterday that adds cookie support. It hasn't made it onto the patch archives yet, but no doubt it will soon (right, Joe?), so you can look for it there if you can't find it on the htdig-dev mailing list archives. It seems to be a backport of the cookie support in 3.2.0b4. So, whether you use a recent snapshot of 3.2.0b4, or use 3.1.6 with Gabriele's patch, you should be able to get cookie support going with htdig. How you'd use that for authorisation may be another matter, though. I don't know if Gabriele's cookie support has a way of pre-loading cookies from another browser into htdig's cookie jar. Care to comment, Gabriele? (Or anyone else for that matter?) -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2002-09-06 15:56:18
|
According to Jim Cole: > Gilles Detillieux's bits of Thu, 5 Sep 2002 translated to: > >Yes, this was reported just a few weeks ago, and a patch was provided. > >See ftp://ftp.ccsf.org/htdig-patches/3.1.6/ > > Sorry. I somehow missed that post. However, even with the patch I > believe the code in both 3.1.6 and 3.2.x is incorrect. If the > name of a non-existent file was provided, the same problem with > infinite looping could occur. As I understand the standard, the > check of in.bad() is of no use with regard to whether the file > was opened successfully. If on the other hand in.good() is > checked, then it would ensure that neither badbit nor failbit is > set. This is what I really hate about C++! So many of the so-called standard classes aren't really that standard, and vary from one implementation to another, and one release to another. How intuitive is this? bad() is not to be taken as an antonym of good()? If what you're saying is true, then 3.1.x's htlib/Configuration.cc code will also bomb if it can't open the config file. So, just how standard is the behaviour you describe? If we start using in.good() in htnotify.cc and Configuration.cc, instead of !in.bad(), will the code work correctly on all supported platforms, or will it start to bomb on some of the systems where !in.bad() used to work fine? Do we need configure tests for all this nonsense? On the g++ implementations I have on two different Red Hat Linux systems, good() seems to represent the absence of eofbit, badbit and failbit, but I can't find any information on what the distinction is between badbit and failbit. Is this consistent across all platforms? If so, why haven't we had a flurry of bug reports about htdig 3.1.x bombing when the config file can't be opened? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |