You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Neal R. <ne...@ri...> - 2002-04-04 01:07:50
|
If any of you have RedHat PowerTools CDs it may have a demo version of Insure++-lite on it. I know the Redhat 6.1 PowerTools CDs have it. We paid for the full version. I updated the web-site UI so it may have changed from the first time you looked at it. Thanks! On Wed, 3 Apr 2002, Bill Broadley wrote: > Woah, excellent, many thanks, now I'm tempted to try to find > some myself. Thanks for contribution. > > On Wed, Apr 03, 2002 at 03:40:21PM -0700, Neal Richter wrote: > > HtDiggers, > > > > I ran htdig & htsearch through gprof (profiling) & Insure++ (memory > > leak/corruption). > > > > http://ai.rightnow.com/htdig/index.html > > > > No memory corruption, but plenty of leaks! > > > > I'm all set up to do this so please e-mail me if you've want a specific > > utility tested. > > > > Next up: > > Rational Quantify (profiling) & Purify (memory leak/corruption) > > > > Thanks > > > > -- > > Neal Richter > > Knowledgebase Developer > > RightNow Technologies, Inc. > > Customer Service for Every Web Site > > > > > > > > _______________________________________________ > > htdig-dev mailing list > > htd...@li... > > https://lists.sourceforge.net/lists/listinfo/htdig-dev > > -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Neal R. <ne...@ri...> - 2002-04-03 22:40:32
|
HtDiggers, I ran htdig & htsearch through gprof (profiling) & Insure++ (memory leak/corruption). http://ai.rightnow.com/htdig/index.html No memory corruption, but plenty of leaks! I'm all set up to do this so please e-mail me if you've want a specific utility tested. Next up: Rational Quantify (profiling) & Purify (memory leak/corruption) Thanks -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Gilles D. <gr...@sc...> - 2002-04-03 17:21:25
|
According to Gabriele Bartolini: > I was attempting to use the url_rewrite_urls attribute, because I need > it in a special case. > > While trying it, I noticed this thing, and if it is possible I would > like to have an explanation from you (particularly by Gilles and Geoff, I > guess). > > Is there a reason why URLs belonging to the start list are neither > normalized nor rewritten? Just wondering ... Otherwise we should add these > two lines to the Initial method of the Retriever class: > > u.normalize(); > u.rewrite(); > > after the 'URL u(tokens[i]);' row. I'm guessing it was just an oversight, or an assumption that the URLs you feed it via start_url would already be in the form you want. I don't see a problem with the modification you suggest, with one very important condition: the rewriting should not be done more than once on a given URL. So, if I'm not mistaken, the URLs from db.docdb and those from db.log have already gone through the process of being normalized and rewritten, and only the URLs from start_url should be processed. I think if you only do the rewrite if from == 1 you should be safe. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: William R. K. <wk...@mi...> - 2002-04-02 17:12:58
|
You need to install a C++ compiler, such as gcc (available at ftp://ftp.sunfreeware.com/pub/freeware/sparc/8/gcc-2.95.3-sol8-sparc-local.= gz), and you will very possibly need to use the GNU make package (you can try to add /usr/ccs/bin to your path and use the Solaris make, but I don't know if it will work). A package for GNU make is available at ftp://ftp.sunfreeware.com/pub/freeware/sparc/8/make-3.79.1-sol8-sparc-local= =2Egz. Make sure to put /usr/local/bin into your path after installing these and before running configure again (if you are going to try to use the Solaris make, put /usr/local/bin BEFORE /usr/ccs/bin in your path so you will use the GNU make if you have to install it in the end without changing your path). Good luck. =09=09=09Bill Knox =09=09=09Senior Operating Systems Programmer/Analyst =09=09=09The MITRE Corporation On Tue, 2 Apr 2002, Willy Calderon wrote: > Date: Tue, 02 Apr 2002 11:28:55 > From: Willy Calderon <wil...@ho...> > To: gre...@yg... > Cc: htd...@li... > Subject: [htdig-dev] Solaris8 > > Hello again > > OK, so I've ditched SGI/Irix as it has been problematic and support is > appalling. I'm working with htdig-3.1.6 and have tried to compile the > program using your instructions (e.g. ./configure, make, then make instal= l). > I've come up with these errors using Sun/Solaris 8 > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > hostname {22} ./configure > loading cache ./config.cache > checking for a BSD compatible install... ./install-sh -c > checking whether build environment is sane... yes > checking whether make sets ${MAKE}... ./configure: make: not found > no > checking for working aclocal... missing > checking for working autoconf... missing > checking for working automake... missing > checking for working autoheader... missing > checking for working makeinfo... missing > configuring ht://Dig version 3.1.6 > checking for gcc... no > checking for cc... no > configure: error: no acceptable cc found in $PATH > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Any help? > > > > > >From: Jim Cole <gre...@yg...> > >To: Willy Calderon <wil...@ho...> > >CC: <htd...@li...> > >Subject: Re: [htdig-dev] SGI C++ Compiler vs. libstdc++ > >Date: Thu, 28 Feb 2002 18:58:28 -0700 (MST) > > > >Willy Calderon's bits of Thu, 28 Feb 2002 translated to: > > > > >checking for ostream.h... no > > >checking for iostream.h... no > > >checking for fstream.h... no > > >configure: error: To compile ht://Dig, you will need a C++ library. Tr= y > > >installing libstdc++. > > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > >This C++ library appears not to exist at present although under SGI IR= IX > > >6.5.3 shows that the libraries are already there. Do I really need > > >libstdc++ ? or is your website accurate when it states that you need > >only > > >the SGI C++ compiler? > > > >The error is due to a failure to find an fstream.h header file. > >It either doesn't exist on the system or it is not visible to > >the configure script (i.e. it is in a non-standard location > >or hidden by strange file permissions). > > > >Do you know whether the file does in fact exist? And if so where > >it is located? If it is not there, is there an fstream header > >file (no .h extension)? > > > >If you are working with an SGI compiler, I don't see that adding > >libstdc++ will do you any good. > > > >Jim > > > > > _________________________________________________________________ > Join the world=92s largest e-mail service with MSN Hotmail. > http://www.hotmail.com > > > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > |
|
From: Willy C. <wil...@ho...> - 2002-04-02 11:29:13
|
Hello again
OK, so I've ditched SGI/Irix as it has been problematic and support is
appalling. I'm working with htdig-3.1.6 and have tried to compile the
program using your instructions (e.g. ./configure, make, then make install).
I've come up with these errors using Sun/Solaris 8
=====================
hostname {22} ./configure
loading cache ./config.cache
checking for a BSD compatible install... ./install-sh -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... ./configure: make: not found
no
checking for working aclocal... missing
checking for working autoconf... missing
checking for working automake... missing
checking for working autoheader... missing
checking for working makeinfo... missing
configuring ht://Dig version 3.1.6
checking for gcc... no
checking for cc... no
configure: error: no acceptable cc found in $PATH
=====================
Any help?
>From: Jim Cole <gre...@yg...>
>To: Willy Calderon <wil...@ho...>
>CC: <htd...@li...>
>Subject: Re: [htdig-dev] SGI C++ Compiler vs. libstdc++
>Date: Thu, 28 Feb 2002 18:58:28 -0700 (MST)
>
>Willy Calderon's bits of Thu, 28 Feb 2002 translated to:
>
> >checking for ostream.h... no
> >checking for iostream.h... no
> >checking for fstream.h... no
> >configure: error: To compile ht://Dig, you will need a C++ library. Try
> >installing libstdc++.
> >=================
> >
> >This C++ library appears not to exist at present although under SGI IRIX
> >6.5.3 shows that the libraries are already there. Do I really need
> >libstdc++ ? or is your website accurate when it states that you need
>only
> >the SGI C++ compiler?
>
>The error is due to a failure to find an fstream.h header file.
>It either doesn't exist on the system or it is not visible to
>the configure script (i.e. it is in a non-standard location
>or hidden by strange file permissions).
>
>Do you know whether the file does in fact exist? And if so where
>it is located? If it is not there, is there an fstream header
>file (no .h extension)?
>
>If you are working with an SGI compiler, I don't see that adding
>libstdc++ will do you any good.
>
>Jim
>
_________________________________________________________________
Join the worlds largest e-mail service with MSN Hotmail.
http://www.hotmail.com
|
|
From: Gabriele B. <g.b...@co...> - 2002-04-02 10:34:12
|
Ciao guys,
I was attempting to use the url_rewrite_urls attribute, because I need
it in a special case.
While trying it, I noticed this thing, and if it is possible I would
like to have an explanation from you (particularly by Gilles and Geoff, I
guess).
Is there a reason why URLs belonging to the start list are neither
normalized nor rewritten? Just wondering ... Otherwise we should add these
two lines to the Initial method of the Retriever class:
u.normalize();
u.rewrite();
after the 'URL u(tokens[i]);' row.
Ciao and thanks
-Gabriele
--
Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
g.b...@co... | http://www.po-net.prato.it/
The nice thing about Windows is - It does not just crash,
it displays a dialog box and lets you press 'OK' first.
|
|
From: Geoff H. <ghu...@us...> - 2002-03-31 08:13:56
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Scott G. <sgi...@su...> - 2002-03-30 10:39:34
|
Here's a brief patch for 3.1.6 to support multiple noindex_start and
noindex_stop directives in the config file. It's a big of a kludge,
but it solved the problem I was trying to solve.
It adds 10 new noindex_start directives, "noindex_start1" through
"noindex_start10". It also adds 10 corresponding noindex_end
directives, "noindex_end1" through "noindex_end10". The standard
noindex_start and noindex_end directives are still supported, and are
considered to be "noindex_start0" and "noindex_end0". The
noindex_start* tags are scanned sequentially, so whichever one matches
first will be the one that is used. Only the end tag for the start
tag that was found will be recognized.
I'm new to this list and somewhat new to htdig, and I hacked this
patch together in a little over an hour, so if there's something
really stupid about it, cut me some slack, tell me what it is, and
I'll fix it. :-)
Patch is at:
http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-multiple-noindex.patch
I look forward to your comments,
----ScottG.
|
|
From: Jim C. <gre...@yg...> - 2002-03-29 03:56:48
|
Donglin Lu's bits of Thu, 28 Mar 2002 translated to: >For performance issues (the maximum number request of >our search function can reach 100 requests / second >over a 3G database), we plan to load all index db >files and except files into memory. > >How can we do that? Do we need to modify the source >code? If the system you are running on supports some sort of RAM disk, you could copy everything to such a disk and specify it as the database directory. Jim |
|
From: Donglin Lu <ql...@ya...> - 2002-03-29 00:16:36
|
Hi, I am new to htdig. For performance issues (the maximum number request of our search function can reach 100 requests / second over a 3G database), we plan to load all index db files and except files into memory. How can we do that? Do we need to modify the source code? Thanks. Alex __________________________________________________ Do You Yahoo!? Yahoo! Movies - coverage of the 74th Academy Awards® http://movies.yahoo.com/ |
|
From: Neal R. <ne...@ri...> - 2002-03-28 18:30:14
|
Geoff: > Gilles: > > The only other thing to consider is the Berkeley DB code, which seems to > > be licensed not only under the University of California, but also > > Harvard > > It's all BSD-style copyright, which is compatible with LGPL. Otherwise > the glibc folks would be sunk, as they include the Berkeley DB too. :-) > Yes. The GPL and LGPL are kind of like a one way door. You can bring in code from other sources that are licensed in a BSD-type fashion (BSD, New BSD, MIT, X11). These licenses are basically equivalent other than various advertising clauses. I've even seen a license that permits any use, but specifically prohibits the use of the authors name in any advertising materials. (GAlib) Note that the University of California recently amended their license to all previous code under the copyright "Regents of the University of California" to strike the advertising clause from the license. This does not apply to other parties using a BSD license with their own copyright. Also note that any Free Software Foundation opinions about the compliance of other licenses with respect to the GPL & LGP are their _opinion_. And their opinion about this only matters to software with the copyright explicitly assigned to the FSF. Using the GPL as license to your software does not give the FSF any legal power to interpret the requirements of the GPL/LGPL license, unless the FSF is the copyright holder. Side note: The FSF feels that BSD licenses with advertising clauses are incompatible with the GPL/LGPL. Other groups, the Linux Kernel developers among others, make no such statement (device driver code exchange between BSD & Linux kernel happens). A copyright holder of software is free to change licenses, reassign copyright, withdraw license and interpret/enforce the requirements of the chosen license as the holder sees fit. If R. Stallman hit his head one day, and convinced the rest of the FSF board, the FSF could withdraw the GPL/LGPL from all GNU software, close the license and charge whatever they wanted for the GNU software. Note that other open source organizations have instituted a policy that code submissions be assigned to the organization. Accepting substantial submissions from third parties can be risky. The assignment of copyright should be ironed out, or else the copyright holder can put their rights in jeopardy. Hypothetical: Person A submits GPL code to Group B, and Group B accepts code. Due to the 'viral' quality of the GPL, it is possible that if Person A was so inclined she could take Group B to court and force Group B to assign co-copyright of all Group A software to Person A under the terms of Person A's license of her software under the GPL. This could result in Person A basically hijacking the copyright of the software. A mess... which is why the FSF, Mozilla, and other groups have a copyright assignment policy... in part to protect the group. OpenBSD just went through something like this. Darren Reed, the author (and copyright holder) of 'Ip Filter' asserted his rights and amended the license interpretation of his software to require that any modified or derived works be authorized by him. OpenBSD didn't like this and proceeded to create a replacement. Because Darren's license was a BSD-type license, with no viral quality, OpenBSD was free to do as they wished. If 'Ip Filter' had been under the GPL, then hypothetically Mr. Reed could have taken the OpenBSD group to court and caused problems. Granted, none of this has been well tested in court and I'm not a lawyer, this is a common interpretation of the issues. Again I want to reiterate that RightNow Technologies will be assigning copyright of contributed software to The HtDig Group. We also are not looking for special privileges. The only thing we are hoping for is that appropriate portions of HtDig be relicensed under the LGPL... or more simply the entire collection is dual licensed. This is to protect the company from some third party contributing code to HtDig without assigning copyright to the group then comming after us for license infringement. As Geoff pointed out, other groups have asked for 'libhtdig' repackagings (GnuCash, KDE, others). This is what we want to provide, and it will benefit any future users of HtDig.. even our direct competitors ;-) Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Geoff H. <ghu...@ws...> - 2002-03-28 04:47:14
|
On Wednesday, March 27, 2002, at 02:10 PM, Gilles Detillieux wrote: > I couldn't say for sure that there are no other such "singletons" in the > code, as I haven't studied all of it, but those are the only ones I can > recall encountering myself, other than of course the config() singleton > which came later (circa last March-May). > > Anyone else know of any? No, I think we got most everything due to the shared-library compilation problems. There may be some monsters lurking in htsearch, but I would guess they can be squashed as I finish the first draft of "ntsearch." (I'm tempted to give it that name temporarily. Over-promised, late and breaks everything?) -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-03-28 03:33:35
|
>> 1) Election of Steering Committee (now effectively those people with >> CVS >> commit access) >> 2) Rules and Procedures for the election and other decision making >> tasks >> 3) Rules and Procedures for working with Corporations >> 4) Governance of Copyright & Licensing >> 5) Code contribution procedures >> >> The charter can be as relaxed and democratic as you wish. I think relaxed is generally the preferred style in the group. ;-) But certainly I think we all would appreciate hearing some genuine legal guidance. I've occasionally wondered whether we're *too* relaxed, esp. in terms of accepting contributed code. (Granted the old saying "beggars can't be choosers," which is why we've generally accepted any code offered that would be useful to the project as a whole.) > shouldn't have to be a whole lot of decisions we need to deliberate on, > once the initial relicensing is done. No, though there might be some thought put into moving away from SourceForge. Not that I'm not grateful for what they've provided, but there have certainly been some bumps along the road. >> For RNT point #4 is the important one. We are trying to establish if >> the >> group is willing to license the HtDig software under both the GPL and >> LGPL. GPL for standard usage, LGPL for usage with 'libhtdig'. To >> accomplish this goal, a steering committee would need to approve such a >> move within the bounds of a group charter. As I mentioned to Neal privately, I'm also all for this concept -- there have been a fair number of "indexing libraries" that I've tried to link with that have come and gone. Many developers in other projects have asked about the concept of a libhtdig (GnuCash, KDE and a few others come to mind) with a simple API. And IMHO, the LGPL is a nice "fit" for a library use. Of course I'm not the only copyright holder. > The only other thing to consider is the Berkeley DB code, which seems to > be licensed not only under the University of California, but also > Harvard It's all BSD-style copyright, which is compatible with LGPL. Otherwise the glibc folks would be sunk, as they include the Berkeley DB too. :-) >> A first step could be to call for a vote to ratify the current >> developers >> with CVS commit access as the steering committee and go forward with >> drafting a charter patterned after the Apache/Debian/FreeBSD etc. I'd obviously defer to legal opinion as well as the list, but I'd prefer to see us mention this to a raft of people who may or may not still subscribe to any ht://Dig mailing lists. In particular, I think everyone on the THANKS list needs to know about this -- even if we don't hear back from them. -Geoff |
|
From: Gilles D. <gr...@sc...> - 2002-03-27 21:08:12
|
Hi, Neal. Although we spoke privately, I'm posting my thoughts here in the hopes of stimulating further discussion on this topic. I think it would be very valuable to ht://Dig's future progress if we took some of the steps you're proposing. According to Neal Richter: > [From Geoff Hutchison] > >I'd be interested in any legal opinion about > >the "ht://Dig Group" issue since there's currently no obvious "owner" of > >all of the ht://Dig code. > > We may be able to help there! ... > To use and contribute effectively to HtDig and the HtDig group we need: > a) To use libhtdig under the LGPL terms > b) To get HtDig to formalize its structure and decision making processes > so that any LGPL relicense (for libhtdig) granted is beyond legal > challenge. I guess the "beyond legal challenge" part is what makes this task sound a bit ominous. Certainly, any major contributors, not the least of which is Andrew Scherpbier (the original author), would have to give their OK to this, either directly or indirectly (i.e. via the proposed steering committee). However, I'm very encouraged by the interest that Right Now is taking in helping out the ht://Dig project, and if all that's standing in the way is relicensing under LGPL, I'm all for clearing away that obstacle. > Issues to consider for a charter: > 1) Election of Steering Committee (now effectively those people with CVS > commit access) > 2) Rules and Procedures for the election and other decision making tasks > 3) Rules and Procedures for working with Corporations > 4) Governance of Copyright & Licensing > 5) Code contribution procedures > > The charter can be as relaxed and democratic as you wish. Relaxed sounds good to me. I for one tend to see red when I hear terms like Steering Committee, but I realise that this doesn't have to require a whole lot of tedium for those involved. It's probably just a more formalised version of the "developer votes" we already do. And, there shouldn't have to be a whole lot of decisions we need to deliberate on, once the initial relicensing is done. > For RNT point #4 is the important one. We are trying to establish if the > group is willing to license the HtDig software under both the GPL and > LGPL. GPL for standard usage, LGPL for usage with 'libhtdig'. To > accomplish this goal, a steering committee would need to approve such a > move within the bounds of a group charter. Further on this point, as Neal pointed out, the mifluz code integrated in ht://Dig was based on earlier ht://Dig code, so any relicensing of ht://Dig should be able to apply to mifluz as well. What's more, since mifluz is a library, licensing it under LGPL seems like a "good fit". The only other thing to consider is the Berkeley DB code, which seems to be licensed not only under the University of California, but also Harvard University and Sleepycat Software. However, they all seem to have pretty simple and similar terms to their licenses, none of which would appear to conflict with LGPL. (As Neal pointed out to me, U of C's license doesn't conflict, as other Berkeley software apparently has been licensed under LGPL.) > A first step could be to call for a vote to ratify the current developers > with CVS commit access as the steering committee and go forward with > drafting a charter patterned after the Apache/Debian/FreeBSD etc. It would probably make sense to invite Andrew to take part in this committee, if he's interested, even though he's not currently one of the people with CVS access (at least I don't think so). He may be more than happy to leave this to others, but that should be his call. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gilles D. <gr...@sc...> - 2002-03-27 20:10:57
|
According to Neal Richter: > As it stands can you think of any other objects besides config using the > 'singleton design'? This is definitely a gotcha for libhtdig. The only other ones that come to mind for me are the Codec classes in htcommon/*Codec.cc, which all define an instance() method which allocates a single instance of the class for use by various parts of the code. If you "grep ::instance */*.cc" you should find all uses of these. I couldn't say for sure that there are no other such "singletons" in the code, as I haven't studied all of it, but those are the only ones I can recall encountering myself, other than of course the config() singleton which came later (circa last March-May). Anyone else know of any? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Neal R. <ne...@ri...> - 2002-03-27 18:29:07
|
> I believe this class is intended as a singleton. Both pointers > end up pointing to the same instance of an HtConfiguration object > that is in fact a static member of the HtConfiguration class. The > intent of such a design is that there only ever be one instance > of the class. Sure and it's referenced all over. I understand its utility for the CGIs and util executables. Multiple config objects are useful in a libhtdig setting... see below. > As a singleton, the class is not intended to be used in this way. > If the static _config member is NULL when config() is called, an > instance is created. Otherwise the existing instance is returned. It may be nice to have a tester function that will return TRUE or FALSE about if _config == NULL. This is in in the context of the code rework as 'libhtdig': The problem was that consecutive calls to htdig_xxxx [for indexing] then htfuzzy_xxx caused problems since both functions try and reinitialize the config object with the defaults. This was causing program hang! Before the calls to htfuzzy_xxx, a call to htdig_close was issued that attempted to destroy the config object. Even though the destructor is empty, C++ is doing something by default to the object. The next call (inside htfuzzy_xxx) to config->Defaults(&defaults[0]); hung the program. > Generally you don't want to destroy the instance since no one > really owns it. True, when HtDig is used as a stand alone executable.. the system takes care of it when the process exits. In the 'libhtdig' setting, a destructor may be useful as part of a cleanup method. If libhtdig were ever used in a server choices would need to be made about the procedure for opening and closing and existence of multiple configurations. What if libhtdig is being used to operate on two separate and independent indexes? Two configurations exist there. Similar to one MySQL server operating on two separate databases with different schemas. Ideally this would be useful: 1) libhtdig contains an array of all necessary global objects/vars. 2) calls to htdig_open() return an indicator value, like a SQL indicator 3) subsequent calls to any libhtdig functionality pass in the indicator. This is used as the index to the global object arrays 4) calls to htdig_close(int) destroy those global objects in the array... like sql_free in MySQL. As it stands can you think of any other objects besides config using the 'singleton design'? This is definitely a gotcha for libhtdig. Thanks! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Geoff H. <ghu...@ws...> - 2002-03-27 15:55:27
|
(BTW Henry, you're subscribed as rzepa, but this came from h.rzepa, thus the posting problems.) > The server we index has all the types set via the Apache mime.types, > and -vvv correctly shows these as mapping to those in the htdig.conf > file. An example of what (should) happen is at > http://www.ch.ic.ac.uk/chemime/test6.html > Having compiled 3.1.16 (using IRIX, we missed 3.1.15 out) I think you mean 3.1.6 and 3.1.5. > at all. The MIME types are all of the type chemical/foo. Might it be > that somewhere hardcoded into htdig are the primary types, and > chemical is not one of them? Nope. We certainly don't hardcode such things and even if we did, I'm pretty sure I'd make sure chemical/* was one of the possibilities. > Can I ask if anyone has tested the external parsing calls, and if anyone > has any suggestions as to what else we might try? Certainly the external parsing and external converter calls have been pretty thoroughly tested. One thing to keep in mind is that if an extension is allowed by the htdig configuration, and an external parser isn't set for that MIME, htdig will assume text/plain and index accordingly. It would help to see some of the htdig -vvv output to see what's happening in more detail--is the external parser being called, is it somehow expecting that it's an external converter, etc. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Rzepa, H. <h....@ic...> - 2002-03-27 12:57:20
|
We have the following apparent failure which I would value any hints for. Way back at the time of ht://dig 3.1.14 we developed a set of external parsers, invoked as usual via external_parsers: chemical/x-pdb "/usr/java/bin/java chemical.Htdigfront" in the conf file (these parsers are used to extract only important tokens from the files, and to derive metadata and heuristic perception about the content; for chemists the molecular formula etc from a molecule coordinate file etc etc). The server we index has all the types set via the Apache mime.types, and -vvv correctly shows these as mapping to those in the htdig.conf file. An example of what (should) happen is at http://www.ch.ic.ac.uk/chemime/test6.html Having compiled 3.1.16 (using IRIX, we missed 3.1.15 out), and with only very minor changes, given it the same server, same conf file and same parsers, it refuses all the external types (see again http://www.ch.ic.ac.uk/chemime/test6.html ). The parsers themselves do appear to be working if given the four htdig arguments manually from the relevant directory, it just appears that htdig is not calling them at all. The MIME types are all of the type chemical/foo. Might it be that somewhere hardcoded into htdig are the primary types, and chemical is not one of them? Its clearly a MIME header issue, since if all the external types are REMOVED from the Apache mime.types file, the headers all come over as text/plain, and htdig 3.1.16 now correctly includes them all as being pure text types. and the external parsers are now not invoked at all, htdig doing all the parsing internally. Can I ask if anyone has tested the external parsing calls, and if anyone has any suggestions as to what else we might try? -- Henry Rzepa. +44 (0870) 132 3747 (eFax) +44 0778 6268 220 (Mobile) http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College, London, SW7 2AY, UK. |
|
From: Jim C. <gre...@yg...> - 2002-03-27 06:03:26
|
Neal Richter's bits of Tue, 26 Mar 2002 translated to:
>Hey all,
>
> So here's a C++ question for you:
>
>/* file1.c */
>//global to file 1
>static HtConfiguration *config = NULL;
>
>/* function1 */
>config = HtConfiguration::config();
>-------------------------
>/* file 2 */
>/* function2 */
>HtConfiguration *config = HtConfiguration::config();
>
>
>These declarations are in two separate files. When the functions are
>called, both 'config' variables point to the SAME spot in global memory
>where a HtConfiguration object lives.
I believe this class is intended as a singleton. Both pointers
end up pointing to the same instance of an HtConfiguration object
that is in fact a static member of the HtConfiguration class. The
intent of such a design is that there only ever be one instance
of the class.
>If I change both declarations to
>
>config = new HtConfiguration();
>
>then the variables point to two different objects... (as is expected).
As a singleton, the class is not intended to be used in this way.
If the static _config member is NULL when config() is called, an
instance is created. Otherwise the existing instance is returned.
>I've got Stroustrup's C++ book and I've looked up the scoping rules that
>govern the '::' instantiation usage... kinda dense. Anyone have a better
>reference to what exactly is happening here? (other than the obvious --
>the object appears to be locally allocated but is really global in scope)
It is allocated on the heap. For the most part, the config()
method is just serving as an accessor.
>Any other interesting notes on this kind of usage?
A singleton is a fairly common design "pattern". A google search
on 'singleton pattern' should provide a lot of useful links.
>I also noticed that a call to config->~HtConfiguration() doesn't do much
>to delete the object's contents.. there is no defined destructor in
>Configuration.cc, just '{}' in the header file.
Generally you don't want to destroy the instance since no one
really owns it.
Jim
|
|
From: Neal R. <ne...@ri...> - 2002-03-27 04:56:10
|
Hey all,
So here's a C++ question for you:
/* file1.c */
//global to file 1
static HtConfiguration *config = NULL;
/* function1 */
config = HtConfiguration::config();
-------------------------
/* file 2 */
/* function2 */
HtConfiguration *config = HtConfiguration::config();
These declarations are in two separate files. When the functions are
called, both 'config' variables point to the SAME spot in global memory
where a HtConfiguration object lives.
If I change both declarations to
config = new HtConfiguration();
then the variables point to two different objects... (as is expected).
I've got Stroustrup's C++ book and I've looked up the scoping rules that
govern the '::' instantiation usage... kinda dense. Anyone have a better
reference to what exactly is happening here? (other than the obvious --
the object appears to be locally allocated but is really global in scope)
Any other interesting notes on this kind of usage?
Any advantage to doing it this way over the old-school C way of declaring
at the top of a file somewhere?
I also noticed that a call to config->~HtConfiguration() doesn't do much
to delete the object's contents.. there is no defined destructor in
Configuration.cc, just '{}' in the header file.
Thanks.
--
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
|
|
From: Neal R. <ne...@ri...> - 2002-03-26 19:04:12
|
[From Geoff Hutchison] >I'd be interested in any legal opinion about >the "ht://Dig Group" issue since there's currently no obvious "owner" of >all of the ht://Dig code. We may be able to help there! At RightNow Technologies, Inc, we extensively use Open Source software and operating systems because they give us quality platforms and development environments. We use Linux, FreeBSD, Apache, MySQL, PHP, and the full suite of GNU development tools. Our 1200+ customers and their data make us one of the top two or three corporate users of MySQL and one of the heaviest users of PHP. In February our Linux servers handled approximately 65 Million page-turns for our hosted customers. We believe that HtDig is a solid piece of software, and we as a company would like to enhance and improve HtDig. RNT is currently a couple weeks away from internal testing of a document archiving project based on HtDig. We've written software that exports documents from a SQL database to XML files and indexes them via HtDig. In developing this software, a significant amount of work has gone into restructuring the HtDig code as a self-contained separate shared library (libhtdig.so) with a functional API. RNT is currently faced with a licensing issue, the current GPL license of HtDig does not lend itself to calling the API functions from closed source code. To use and contribute effectively to HtDig and the HtDig group we need: a) To use libhtdig under the LGPL terms b) To get HtDig to formalize its structure and decision making processes so that any LGPL relicense (for libhtdig) granted is beyond legal challenge. Looking into the future, RightNow Technologies believes that we will be able to make sizable contributions to HtDig software: 1) Internationalization via Unicode 2) Field Based Searching/Indexing 3) Commercial Software QA of HtDig 4) Use of commercial software validation tools to correct memory leaks and other coding errors 5) Additional contributions to the linguistic/NLP aspects of HtDig searching/indexing 6) Possible hardware donations to developers who assist us in implementing desired features. Currently The HtDig developers are divided into two groups, contributors and managers (those with CVS commit access). I've spoken with the big dogs at RightNow Technologies, Inc. about helping the HtDig Group in getting a more formal structure. Alan Rassaby, our General Counsel, has offered to assist the HtDig Group in: a)Understanding your desires for an official charter b)Drawing up a draft charter based on your desires, possibly derived from an existing Open Source charter c)answering any questions the group may have on various copyright and licensing issues. In addition, at your request, RightNow Tech would provide reasonable funds to pay for an independent third party (lawyer) to review the relevant materials. We wish to do this to eliminate any questions about conflict of interest and to give the Group the assurance that the process is fair and ethical. Issues to consider for a charter: 1) Election of Steering Committee (now effectively those people with CVS commit access) 2) Rules and Procedures for the election and other decision making tasks 3) Rules and Procedures for working with Corporations 4) Governance of Copyright & Licensing 5) Code contribution procedures The charter can be as relaxed and democratic as you wish. For RNT point #4 is the important one. We are trying to establish if the group is willing to license the HtDig software under both the GPL and LGPL. GPL for standard usage, LGPL for usage with 'libhtdig'. To accomplish this goal, a steering committee would need to approve such a move within the bounds of a group charter. The Apache Foundation is a great example of an Open Source software project that has a formal structure and is able to effectively work with Corporate Partners. Debian GNU/Linux, FreeBSD, NetBSD are other notable examples. Debian did not create a Debian Foundation, they created Software in the Public Interest, Inc. [http://www.spi-inc.org/] as a non-profit vehicle to support Debian development. The SPI also supports the GNOME project. Debian's charter is also separate from SPI. It is probably not necessary to incorporate as a formal non-profit organization. Joining the SPI maybe a good way to facilitate donations and formal support to the group. Should that be desired, RightNow Tech would be willing to cover reasonable associated costs. Why are we proposing this? RightNow Tech believes that HtDig is great software that deserves support and enhancement. Helping to establish HtDig as a more formal entity allows us to devote more developer time and effort to enhancing HtDig while keeping the bean-counters happy with our participation in the group. Other companies that wish to contribute time/money/effort to furthering HtDig will probably have many of the same concerns. We are not proposing anything earth-shattering.. just a slightly more formal structure with a decision making body to take up issues with. A first step could be to call for a vote to ratify the current developers with CVS commit access as the steering committee and go forward with drafting a charter patterned after the Apache/Debian/FreeBSD etc. If you have any questions please reply! The next step would be for a couple interested people to contact me and set up a correspondence with Alan Rassaby to answer your questions. Feel free to visit http://www.rightnow.com for more information about RNT. Thanks! And check out libhtdig if you haven't already! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Gilles D. <gr...@sc...> - 2002-03-26 16:35:04
|
According to Geoff Hutchison: > On Tue, 26 Mar 2002, Jessica Biola wrote: > > What is db.log? I have many document names popping up > > in there. Is it what was left in the todo/to-crawl > > list? > > Yes, this is exactly right. When it restarts, htdig will put anything in > this file (if it exists) back in the queue to be indexed. > > Do you think the signal handler should add some output to assure people > that it's quitting but needs to finish things to ensure integrity? Yes, this had been suggested by someone else before, and I think it's a good idea! -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Geoff H. <ghu...@ws...> - 2002-03-26 16:30:59
|
On Tue, 26 Mar 2002, Jessica Biola wrote: > What is db.log? I have many document names popping up > in there. Is it what was left in the todo/to-crawl > list? Yes, this is exactly right. When it restarts, htdig will put anything in this file (if it exists) back in the queue to be indexed. Do you think the signal handler should add some output to assure people that it's quitting but needs to finish things to ensure integrity? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Jessica B. <jes...@ya...> - 2002-03-26 10:56:52
|
Thank you. I did a kill -TERM and I am watching it write out to db.log. The problem may have been the fact that I believed it was hanging when actually it was writing out to db.log and cleaning up, and continued to send kill signals, msesing up the natural course of things. What is db.log? I have many document names popping up in there. Is it what was left in the todo/to-crawl list? --- Geoff Hutchison <ghu...@ws...> wrote: > On Mon, 25 Mar 2002, Jessica Biola wrote: > > > What is the best way to stop a current dig in > progress > > without corrupting the integrity of the db.* > files? I > > find that when I send a kill level 9 (KILL) or 15 > > (TERM), it ruins the integrity of the data that > has > > already been crawled, or if I just send a level 1 > > (HUP), it doesn't interrupt it at all and it keeps > on > > crawling. > > ... > > I'm using one of the 3.2.0b4 versions on Linux. > > With 3.1.6 and 3.2.0b2 and later, htdig installs a > signal handler before > begining indexing. Before it handles a KILL or TERM, > it should finish up > the current URL, write the current progress to the > db.log file and quit > cleanly. This may take a second or two. > > If you're seeing it quit directly, the db.log isn't > written and there's > data corruption, this is a bug and more information > would help to track > down the problem (i.e. what compiler did you use, > what version of Linux, > how big was the database, etc.) > > -Geoff > __________________________________________________ Do You Yahoo!? Yahoo! Movies - coverage of the 74th Academy Awards® http://movies.yahoo.com/ |
|
From: Geoff H. <ghu...@ws...> - 2002-03-25 22:25:26
|
On Mon, 25 Mar 2002, Jessica Biola wrote: > What is the best way to stop a current dig in progress > without corrupting the integrity of the db.* files? I > find that when I send a kill level 9 (KILL) or 15 > (TERM), it ruins the integrity of the data that has > already been crawled, or if I just send a level 1 > (HUP), it doesn't interrupt it at all and it keeps on > crawling. > ... > I'm using one of the 3.2.0b4 versions on Linux. With 3.1.6 and 3.2.0b2 and later, htdig installs a signal handler before begining indexing. Before it handles a KILL or TERM, it should finish up the current URL, write the current progress to the db.log file and quit cleanly. This may take a second or two. If you're seeing it quit directly, the db.log isn't written and there's data corruption, this is a bug and more information would help to track down the problem (i.e. what compiler did you use, what version of Linux, how big was the database, etc.) -Geoff |