You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Lachlan A. <lh...@ee...> - 2002-10-18 12:00:58
|
Greetings, On Mon, 14 Oct 2002 05:13, you wrote: > KNOWN BUGS: [deleted] > * If exact isn't specified in the > search_algorithms, $(WORDS) is not set correctly: > PR#405294. (The documentation for 3.2.0b1 was updated, > but can we fix this?) > =A0(More importantly, do we ever want exact to /not/ be > specified?) I just had a look at this, and it seemed to me that=20 $(WORDS) is set correctly, but $(LOGICAL_WORDS) can be=20 missing some operands if there are no fuzzy matches. Was=20 that the original problem? If so, would an acceptable fix be just to insert the exact=20 word whenever there are no fuzzy matches? It could be=20 given a very low weight so as not to disrupt the search. That would be a two-line patch. Cheers, Lachlan --=20 Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678 Dept of Electrical and Electronic Engg CRICOS Provider Code University of Melbourne, Victoria, 3010 AUSTRALIA 00116K |
|
From: Gabriele B. <g.b...@co...> - 2002-10-18 07:00:45
|
> No offense to Gabriele, but I'd rather consider translations to the > documentation _after_ we switch to an XML documentation setup. >=20 > Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can > see a patch in the near future. I'm willing to handle the documentation > fixes by hand if I need to do it. No worries, Geoff! :-) I agree with you. I always have in my mind the famous words of Bill Murray in 'Baby steps'. :-P Let's start with the english one, as is now. We'll worry about translations afterwards if 'the game is worth the candle' (sorry, it is an italian proverb - means if it is worthwhile). My questions regard the net library: I know of 2 'bugs' regarding especially compressed documents (#594790 and #460819), but I'd rather wait to implement them, and just leave the HTTP library as is: as far as you know, are there important bugs I should fix? Ciao and thanks -Gabriele --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Jim C. <gre...@yg...> - 2002-10-18 03:55:38
|
Gilles Detillieux's bits of Thu, 17 Oct 2002 translated to: >3) My own lack of time in being able to get the 3.1.6 fixes/updates >forward ported to 3.2. I'd be thrilled if someone else picked up the ball >on this one, but since pretty much everyone sees me as the 3.1 guy (my >own fault for that), I feel the expectation is that I should be the one >to do this. Gilles - Please let me know what I can do to help you with all the 3.1.x and 3.1.x->3.2 issues that seem to fall into your court. I am a proficient C/C++ programmer. My knowledge of the auto* tools is minimal, but I have a book :) I can find my way around CVS. I can't promise a lot as my free time is nearly non-existent at the moment, not to imply that the same is not true for others around here. However if you have a couple tasks to toss my way, I will see what I can do. I am certainly not going to make any significant contributions by forever sticking to the "not enough time" excuse ;) Jim |
|
From: Brian W. <bw...@st...> - 2002-10-17 23:53:51
|
At 08:29 18/10/2002, Geoff Hutchison wrote: > > Other side projects like defaults.xml are great, but this seems to be > > shaping up to be a much bigger task that originally envisioned, what > >No offense to Gabriele, but I'd rather consider translations to the >documentation _after_ we switch to an XML documentation setup. I've been thinking about this one - and the more I do, the more I agree. The problem isn't creating translated versions of the attributes - the problem is creating translated versions of everything else and managing how that all fits together. I think at some point the documentation needs to be reviewed and part of that should be internationalisation (i18n) - but it doesn't sound like that time is now. Also - bolting a translation system onto the current (:-)!) defaults.xml will take very little retrofitting and reworking so it doesn't need to be specially taken into account. I vote we do the first step and get a basic defaults.xml up - small steps! >Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can >see a patch in the near future. I'm willing to handle the documentation >fixes by hand if I need to do it. > >-Geoff ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Geoff H. <ghu...@ws...> - 2002-10-17 22:30:00
|
I talked to Neal off-list, so I'd like to clarify as well. I think the three of us are thinking basically the same thing, but it doesn't help when we talk about "3.3" or "4.0." So let's talk about "how to get 3.2.0b4 out soon." On Thu, 17 Oct 2002, Gilles Detillieux wrote: > > I guess it comes down to that I think the code is good enough now to > > consider a release in the near-term without a raft of changes/improvements. ... > was the last actual release of 3.2. We need to get 3.2.0b4 out soon, if > only to give b3 a proper burial. I think we all agree here, with the caveat below. I *hate* apologizing for the "known database bug." Neal: I read your statement as "let's release 3.2 with what we have." I'm not sure I agree with that. > compression retro-fit. This has to be the default behaviour - we can't > put another beta out with the current buggy word db compression code. Agreed. If Neal can get me his zlib patch soon, then we can put that in, test and try a 3.2.0b4 with that sooner, rather than later. > 3) My own lack of time in being able to get the 3.1.6 fixes/updates > forward ported to 3.2. If you have a list of particular things, it would help significantly. I'll check through the mailing list, but if you have a list somewhere it'd save some time. > library (iconv). I think Neal's idea of the zlib-WordDB-compression > retrofit has merit, if only to get an interim beta 4 out the door soon. > I see it as a quicker solution to the reliability issue. I think we're all on the same page here, though I'd like to see the patch first, obviously. I've been working on the mifluz merge because I think it needs to be done and b/c I can't see how we can ship a 3.2.0b4 with these database bugs. If there's a smaller bug-fix, that's great. :-) > The only other thing I see as essential for 3.2.0b4 is getting the > 3.1.6 changes in there. Otherwise, there'll be too much confusion I think there are a few remaining minor bugs which we should probably stomp along the way. > Other side projects like defaults.xml are great, but this seems to be > shaping up to be a much bigger task that originally envisioned, what No offense to Gabriele, but I'd rather consider translations to the documentation _after_ we switch to an XML documentation setup. Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can see a patch in the near future. I'm willing to handle the documentation fixes by hand if I need to do it. -Geoff |
|
From: Gilles D. <gr...@sc...> - 2002-10-17 16:43:34
|
According to Neal Richter: > My previous post with the proposed schedule could be restated with the > releases as 3.2beta4, 3.2beta5, 3.2beta6 etc. > > I guess it comes down to that I think the code is good enough now to > consider a release in the near-term without a raft of changes/improvements. > > The need for a release PDQ is a result of Gilles level of frustration > with 3.1.x. Just to clarify my own position, these are the things I'm finding frustrating: 1) Having to repeatedly tell people not to use the 1 1/2 year old 3.2.0b3 release because it's too buggy. You can't blame them for doing this - it was the last actual release of 3.2. We need to get 3.2.0b4 out soon, if only to give b3 a proper burial. 2) Too many questions/complaints about database errors in 3.2 betas. We need something more solid, whether based on the newer mifluz or on a zlib compression retro-fit. This has to be the default behaviour - we can't put another beta out with the current buggy word db compression code. 3) My own lack of time in being able to get the 3.1.6 fixes/updates forward ported to 3.2. I'd be thrilled if someone else picked up the ball on this one, but since pretty much everyone sees me as the 3.1 guy (my own fault for that), I feel the expectation is that I should be the one to do this. Having said that, I also don't want to rush a new release out the door if it's going to mean a whole bunch of new bugs to deal with. But we have to get something happening. I don't want us to stop putting out solid releases for either the sake of ideology (as some members seem to be willing to do), nor for the sake of trying too many new things all at once. > I also think a case could be made for a release with some of the things > on your list along with the zlib-WordDB-compression and a improved > inverted index representation in the WordDB to cut out the excessive number of > rows in the WordDB. > > If we accomplish that, then it gets some of the pressure off to merge > with Mifluz 0.23 to fix bugs. The combination of the two would offset any > WordDB size increase penalty from using zlib page-compression. > > If a short-term need for a release isn't warranted, then as long as we > stagger some of these features into a schedule by priority... it sounds good. > > Let's just get a schedule of deliverables for either a sequence of > 3.2betaX or a sequence of releases. > > For task organization and morale this could be useful. > > So Gilles, is there a short-term need for a release without some of the > larger things on the TODO list? Well, I would dearly love to see 3.2.0b4 out the door in 2-3 months, but frankly I don't see that happening with the latest mifluz code merged in. I have concerns about its portability and dependence on yet another library (iconv). I think Neal's idea of the zlib-WordDB-compression retrofit has merit, if only to get an interim beta 4 out the door soon. I see it as a quicker solution to the reliability issue. The only other thing I see as essential for 3.2.0b4 is getting the 3.1.6 changes in there. Otherwise, there'll be too much confusion about features that have been in 3.1 for almost a year, but not in 3.2. Oh, and documentation updates, of course. Ideally, if we could get 3.1.7 and 3.2.0b4 released in close proximity of each other, and with all 3.1.7 fixes also in 3.2.0b4, then we could feel reasonably confident in saying 3.1.7 is the end of the line for 3.1, and 3.2 is getting solid enough for production use. With that, I think we'd probably cut a quarter to a third of the repeat questions on the lists, that I attribute to a lag in getting new releases out. Other side projects like defaults.xml are great, but this seems to be shaping up to be a much bigger task that originally envisioned, what with the idea of maintaining multiple translations. It's great, but it shouldn't hold up 3.2.0b4, nor the much needed corrections/additions to defaults.cc's documentation fields. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gabriele B. <g.b...@co...> - 2002-10-17 07:04:02
|
Ciao Brian, > Analysis: > * description is the one that will always need it Right. > * I think the values for "block" and "category" should be considered > as 'keys' rather than the actual values - they should be translated > by lookup table. I agree. Keys are better and, whenever there's another key, we just need to modify the DTD. > * examples will *sometimes* require translation Are you planning to use just the value of the attribute in the example? Usually an example is made up of just a pair (name of attribute: value). I guess we could use the value, as long as the attribute comes from the element itself. Sometimes we may need a description for the example too; for instance: accept_language: fr description: set the HTTP language to be sent to the server to 'french' What do you think? > This would then put the capability into the XML file. I would need to > figure out how to do characters like =E9 - possibly as é . As to Well ... I dunno. However this characters are not permitted in XML, if I am not wrong. XML supports unicode and doesn't know anything about SGML entities like 'é'; we should use the exact hex representation code (for latin1 code goes from 80 to D7FF). But this is an almost unknown world for me. > * how this might then be used to generate documentation > * how the translated versions will be maintained We could make up translations group. And ... if worse comes to worst, we could just use the english one. I could maintain the italian one. No troubs. > Note that it isn't a big change, but I think we should leave > it for version 2 defaults.xml. Is it a problem for you to enable translations already? Would it change much for you? Thanks a lot Brian, -Gabriele --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Geoff H. <ghu...@ws...> - 2002-10-17 03:51:39
|
On Wednesday, October 16, 2002, at 07:41 PM, Brian White wrote: > I can use that tool to take a merged version of defaults.cc to produce > a version of defaults.xml. The problem is that a few of the descriptions > will need to be reworked quite heavily by hand to produce valid XML. OK, that makes sense of course. I had forgotten that you wrote a tool to generate the defaults.xml file. I would guess with some care, we can separate the "new" entries and only rework them if needed. -Geoff |
|
From: Brian W. <bw...@st...> - 2002-10-17 02:36:44
|
At 23:25 16/10/2002, Gabriele Bartolini wrote:
> >Well, it is close to ready - I now have it successfully generating
>
>Well, first and foremost, it is the first time I express my opinion=
regarding
>this solution and I think it is really efficient and intelligent. Good on
>ya, mate Brian! :-)
>
>Having said this, and also taking aknowledgement that I don't know how the
>XML file is structured, I want to raise the problem of 'translation' of
>the attributes' descriptions, uses, etc., in different languages.
>
>Any ideas?
Yes.
Let's start with the DTD as it stands:
<!ELEMENT HtdigAttributes ( attribute+ ) >
<!-- attribute:
name : Variable Name
type : Type of Variable
programs : Whitespace separated list of programs/modules
using this attribute
block : Configuration block this can be used in ( optional )
version : Version that introduced the attribute
category : Attribute category (to split documentation)
-->
<!ELEMENT attribute( default, ( nodocs | (example+, description ) ) >
<!ATTLIST attribute name CDATA #REQUIRED
type string|integer|boolean) "string"
programs CDATA #REQUIRED
block CDATA #IMPLIED
version CDATA #REQUIRED
category CDATA #REQUIRED
>
<!-- Default value of attribute - configmacro=3D"true" would indicate the
value is actually a macro ( eg BIN_DIR )
-->
<!ELEMENT default (#PCDATA) >
<!ATTLIST default configmacro (true|false) "false" >
<!-- Basically a flag that suppresses documentation -->
<!ELEMENT nodocs EMPTY>
<!-- An example value that goes into the documentation -->
<!ELEMENT example (#PCDATA) >
<!ENTITY % paratext "#PCDATA|em|strong|a|ref" >
<!ENTITY % text "%paratext;|table|p|br|ol|ul|dl|codeblock" >
<!ELEMENT description (%paratext;) >
... + all the element for formatting the description
The first thing to do is then look at the items that might need
translation:
* description
* block
* category
* example
Analysis:
* description is the one that will always need it
* I think the values for "block" and "category" should be considered
as 'keys' rather than the actual values - they should be translated
by lookup table.
* examples will *sometimes* require translation
To this end, I would suggest changing the following
<!ELEMENT attribute ( default, ( nodocs | (example+, description ) ) >
to
<!ELEMENT attribute ( default, ( nodocs | (example*, docset+ ) ) >
<!-- lang would be the id of the language using a standard identifier,=
or
set to "default" for the default language -->
<!ELEMENT docset ( example*, description ) >
<!ATTLIST docset lang CDATA #REQUIRED >
As an example:
<attribute name=3D"no_title_text"
type=3D"string"
programs=3D"htsearch"
version=3D"3.1.0"
category=3D"Presentation:Text" >
<default>filename</default>
<example>!!!!!?</example>
<docset lang=3D"default" >
<example>No Title Found</example>
<description>This specifies the text to use in search results when=
no
title is found in the document itself. If it is set to
filename, htsearch will use the name of the file itself,
enclosed in square brackets (e.g. [index.html]).
</description>
</docset>
<docset lang=3D"fr" >
<example>Aucun titre retrouv=E9</example>
<description>Ceci sp=E9cifie le texte =E0 utiliser dans les=
r=E9sultats
d'une recherche lorsque aucun titre se trouve dedans le=
document.
Si on le r=E8gle =E0 filename, htsearch se servira du nom du=
fichier
lui-m=EAme, inclus entre crochets (p.ex. [index.html]).
</description>
</docset>
</attribute>
( And no - I don't speak french. I got a friend to do the translation
for me )
This would then put the capability into the XML file. I would need to
figure out how to do characters like =E9 - possibly as é . As to
* how this might then be used to generate documentation
* how the translated versions will be maintained
are different issues altogether!
Note that it isn't a big change, but I think we should leave
it for version 2 defaults.xml.
>Ciao and thanks,
>-Gabriele
-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: bw...@st...
Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste
|
|
From: Brian W. <bw...@st...> - 2002-10-17 02:08:20
|
At 22:49 16/10/2002, Geoff Hutchison wrote: >On Wednesday, October 16, 2002, at 02:27 AM, Brian White wrote: >> * 95% of htdocs/attrs.html > >I guess I'm not clear on what "95%" means. Does this refer to the markup >that you mentioned before? Basically means that at the time of writing that email, I hadn't quite finished writing it. >>I have some code that would help here ( bits of hacked together C and >>Perl code ) - I just want to know whether I need to include it >>in the bundle! > >I'm assuming you mean code to help with the merging? I basically have a tool that, given a copy of the current form of defaults.cc, will produce a usable version of defaults.xml. It was safer to do it programmatically than by hand. I can use that tool to take a merged version of defaults.cc to produce a version of defaults.xml. The problem is that a few of the descriptions will need to be reworked quite heavily by hand to produce valid XML. I will produce my patch, including some covering documentation, and post that to this list. Should be sometime by the end of next week. Brian ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Neal R. <ne...@ri...> - 2002-10-16 23:49:17
|
> Please don't take any of my comments as overly critical or flaming. > You're new to the project and attempting to take on some heavy > lifting--so I'm trying to transfer some experience. Of course not. Let me clarify a bit: My previous post with the proposed schedule could be restated with the releases as 3.2beta4, 3.2beta5, 3.2beta6 etc. I guess it comes down to that I think the code is good enough now to consider a release in the near-term without a raft of changes/improvements. The need for a release PDQ is a result of Gilles level of frustration with 3.1.x. I also think a case could be made for a release with some of the things on your list along with the zlib-WordDB-compression and a improved inverted index representation in the WordDB to cut out the excessive number of rows in the WordDB. If we accomplish that, then it gets some of the pressure off to merge with Mifluz 0.23 to fix bugs. The combination of the two would offset any WordDB size increase penalty from using zlib page-compression. If a short-term need for a release isn't warranted, then as long as we stagger some of these features into a schedule by priority... it sounds good. Let's just get a schedule of deliverables for either a sequence of 3.2betaX or a sequence of releases. For task organization and morale this could be useful. So Gilles, is there a short-term need for a release without some of the larger things on the TODO list? Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Paolo S. <iw...@ir...> - 2002-10-16 21:47:25
|
Hi all. I see there is a problem spidering forums like phpBB and electrifiedpenguin . Because these forums return a Session ID, it occurr that htdig spidering the forum pages will get more than one SID. The result is that 1. the same forum page is indexed more than one time 2. the amount of CPU and time used for indexing is very large. Take a look at the log above.... Thank you. Paolo 217.168.237.106 - - [14/Oct/2002:01:06:54 +0200] "GET /forum/index.php HTTP/1.0" 200 35398 "http://www.ir3ip.net/forum/" "htdig/3.1.5 217.168.237.106 - - [14/Oct/2002:01:06:55 +0200] "GET /forum/search.php HTTP/1.0" 200 19754 "http://www.ir3ip.net/forum/" "htdig/3.1.5 217.168.237.106 - - [14/Oct/2002:01:06:57 +0200] "GET /forum/faq.php HTTP/1.0" 200 51949 "http://www.ir3ip.net/forum/" "htdig/3.1.5 (ro 217.168.237.106 - - [14/Oct/2002:01:07:00 +0200] "GET /forum/memberlist.php HTTP/1.0" 200 22715 "http://www.ir3ip.net/forum/" "htdig/3. 217.168.237.106 - - [14/Oct/2002:01:07:02 +0200] "GET /forum/index.php?sid=6923db608dd988b9167c2464278dcffb HTTP/1.0" 200 35398 "http:/ 217.168.237.106 - - [14/Oct/2002:01:07:04 +0200] "GET /forum/faq.php?sid=6923db608dd988b9167c2464278dcffb HTTP/1.0" 200 51949 "http://w ..... another dozen of lines with the same sid was removed.... after few minutes, several dozens of access with another sid: 217.168.237.106 - - [14/Oct/2002:01:09:19 +0200] "GET /forum/index.php?sid=27619eca3a821c36bbfe3222b99f62aa HTTP/1.0" 200 35398 "http://www.ir3ip.net/forum/viewforum.php?f=12" "htdig/3.1.5 217.168.237.106 - - [14/Oct/2002:01:09:21 +0200] "GET /forum/faq.php?sid=27619eca3a821c36bbfe3222b99f62aa HTTP/1.0" 200 51949 "http://www.ir3ip.net/forum/viewforum.php?f=12" "htdig/3.1.5 (r 217.168.237.106 - - [14/Oct/2002:01:09:23 +0200] "GET /forum/search.php?sid=27619eca3a821c36bbfe3222b99f62aa HTTP/1.0" 200 19754 "http://www.ir3ip.net/forum/viewforum.php?f=12" "htdig/3.1.5 217.168.237.106 - - [14/Oct/2002:01:09:26 +0200] "GET /forum/memberlist.php?sid=27619eca3a821c36bbfe3222b99f62aa HTTP/1.0" 200 22715 "http://www.ir3ip.net/forum/viewforum.php?f=12" "htdig/3 |
|
From: Neal R. <ne...@ri...> - 2002-10-16 19:04:40
|
On Wed, 16 Oct 2002, Geoff Hutchison wrote: > > On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: > > > 2. The mifluz devel list is near death, and it doesn't look like > > anyone > > is actually using mifluz, or furthering development. > > Fine, but that simply does not mean that prior releases were not made > with active users, developers or testing. There has been much more > significant testing (on my part included) on the mifluz framework than > the remainder of the ht://Dig codebase. I agree in theory. In practice until the new code has been verified to be acceptable after a successful merge it is suspect. We hope that it will fix all our problems.. it will be a while before we confirm this. > > 5. The current mifluz code merge has problems with constructors and > > destructors in a library (libhtdig) setting. I would rather help > > No offense, but your argument applies here. Why should libhtdig be a > feature criteria for 3.2.0b4? I agree, it's not a criteria. I will maintain a separate branch for that. > > My experience with the current snapshots is very positive. I've had few > > problems and the indexing it self is pretty solid, especially with the > > new > > zlib WordDB compression. > > Sorry to sound dubious, but speaking of large code merges, you haven't > submitted patches for me to merge into 3.2.0b4 either. As of yet, I > haven't tested your zlib WordDB compression or seen if it has > performance problems relative to 3.2.0b4. Can I claim that your code has > seen as much user-level testing as 3.2.0b4 snapshots? Heh. ;-) I'll get you those ASAP. Zlib is extremely well tested and the changes are a few lines of code. Giving this as a work around to people who encounter the WordDB compression bug is a good alternative to hoping that its fixed in a merged-mifluz codebase. > I'm somewhat trying to play devil's advocate here. My gut feeling is > that the mifluz merge should be aimed towards a 3.2.0b5 release and we > *should* get 3.2.0b4 out the door as stable as possible in the > near-term. But I'm pretty sure that merging in the new mifluz code is an > overall win. I agree in theory. In practice I am motivated to suggest we scale back what is absolutely necessary in order to get users a new release faster. Gilles in particular has voiced frustration over the delay in 3.2 release. And the waste of his time maintaining 3.1.x I'd hate to continue adding to the pile and further frustrate him. If we were a company and were risking the speedy completion of a release by wanting to incorporate a huge chunk of third party code that needs more work... we'd be in real danger of getting fired. I guess I see these things: 1. The 3.2 dev process is too open-ended at present 2. The 3.1.x users need a new release 3. The current 3.2beta4 code offers a significant release to users 4. We are in danger of being waist deep in feature-creep quicksand. If we delay the integration of mifluz and the larger items on your list, we'll have a tractable set of things to do to get a decent release out there. Basically I'm suggesting that for morale purposes alone we do this and set a goal of pushing a 3.2 release out the door by December. Next, we make a list and divide it between smaller changes and larger ones. Smaller ones go into 3.3 (release in March?) and the rest into 4.0. The development could be semi-parallel at this point. You may disagree with the "numbers game" here, but I think it would be good for morale to establish a set of well-reasoned conservative milestones and meet them in the sort-term. If we implement a strategy like this and six-months later we look back and see that we've had 1-2 releases and are moving forward with integration of large new features/code we'll feel much better vs still being in feature-creep quicksand. Here's a proposal http://ai.rightnow.com/htdig/proposed_schedule.html Basically I included only things in 3.2 schedule that are necessary to fix or work around known bugs. Things like Quim's new search frame-work and the excellent XML-config file feature are in 3.3. More open-ended things like mifluz merge and STL and Unicode are in 4.0 & 4.1 Also the Zlib-WordDB in 3.2 and More efficient WordDB inverted index are straight forward and buys us time with the mifluz merge. Anyway.. I'm sure you're you won't agree on my thoughts on the mifluz-merge and this is certainly a conservative viewpoint on it. If we make good progress on the mifluz-merge by the end of the year I'll withdraw any further objections. Eh? Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Geoff H. <ghu...@ws...> - 2002-10-16 14:31:47
|
On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: > 2. The mifluz devel list is near death, and it doesn't look like > anyone > is actually using mifluz, or furthering development. Fine, but that simply does not mean that prior releases were not made with active users, developers or testing. There has been much more significant testing (on my part included) on the mifluz framework than the remainder of the ht://Dig codebase. > Can you say that it has had as much as the average HtDig release? > HtDig > is MUCH more active then mifluz has ever been. In terms of testing by the developers, component-level testing suites and testing before releases--the answer is pretty much yes. Granted, the mifluz releases between 0.14 (currently in 3.2.0b4) and 0.23 have not necessarily received the same pounding as thousands of ht://Dig users. But the users who were active with mifluz poured gigabytes of data through it too. Remember also that we *are* mifluz. Take a look at the copyright designations. > 4. How certain are we that these changes are going to make 3.2beta5 > MORE stable than the current beta? I'm certain. I put a lot of testing into the mifluz code and it's definitely more stable now than it was. > 5. The current mifluz code merge has problems with constructors and > destructors in a library (libhtdig) setting. I would rather help No offense, but your argument applies here. Why should libhtdig be a feature criteria for 3.2.0b4? > 6. It has performance problems. These seem like they're locking issues--it seems like the database is being locked and unlocked way too much. When we're indexing, it seems like the database should be locked in place as much as possible and then unlocked at the end. > My experience with the current snapshots is very positive. I've had few > problems and the indexing it self is pretty solid, especially with the > new > zlib WordDB compression. Sorry to sound dubious, but speaking of large code merges, you haven't submitted patches for me to merge into 3.2.0b4 either. As of yet, I haven't tested your zlib WordDB compression or seen if it has performance problems relative to 3.2.0b4. Can I claim that your code has seen as much user-level testing as 3.2.0b4 snapshots? I'm somewhat trying to play devil's advocate here. My gut feeling is that the mifluz merge should be aimed towards a 3.2.0b5 release and we *should* get 3.2.0b4 out the door as stable as possible in the near-term. But I'm pretty sure that merging in the new mifluz code is an overall win. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-10-16 14:16:47
|
I'm going to take two separate issues and separate them for the moment: 1) What changes are needed for a solid 3.2.0 release. 2) The mifluz merge (in a separate e-mail). Please don't take any of my comments as overly critical or flaming. You're new to the project and attempting to take on some heavy lifting--so I'm trying to transfer some experience. > experience the idea of beta versions is to fix bugs, new features > and major code rework is avoided if possible. This is certainly the traditional definition. In practice with ht://Dig development, this hasn't worked very well. Typically this happens because there simply hasn't been the manpower to tackle several large cleanups at the same time. In the 3.1 "betas," people also came out of the woodwork to contribute their local changes. We do not currently have anything resembling a traditional software development and engineering process. Largely this happens because there has never been a significant number of core developers who can concentrate signficant amounts of time on ht://Dig. (I'm an excellent case in point.) At some time in the future, it would probably be good to move to a more "traditional" release scheme. It would also be good to have more component-level test suites. In the meantime (i.e. for getting 3.2.0 out the door with an appropriate level of stability), I suggest you temporarily accept a more flexible definition of "beta release." The reality starts with the list I mentioned--we absolutely must do some code reworks or we'll be layering more duct tape over our problems. In particular, IMHO, we'll continue to have weird htsearch bugs until we toss the current parser system. > My past experience in importing alot of new code like this is that it's > always harder then it seems that there are lots of bugs. I'm curious how much open-source development you've done. Remember that merging patches is quite typical for maintainers--Gilles and I do this quite often. In the case of ht://Dig, while development resources are at a premium, we have often ported and merged patches. The typical "beta" process with ht://Dig has been quite flexible towards the beginning and as a release like 3.1.0 firms up, fewer patches would be accepted. In answer to the question about 3.2.0 "firming up," remember the maxim about "development resources at a premium." For example, I'd much rather switch to the new htsearch framework because it'll be easier to find bugs. > a case can be made that not only would the code differ significantly > with the previous 3.2betas, it also has a load of new features. Take a look at the release notes for 3.1.0 betas and for previous 3.2.0 betas. As I said, we've had to take a rather flexible interpretation of a "beta" release. We currently don't have "development" or "alpha" releases. They would be nice, but I also have to be realistic about the pace of development and the number of active developers. Spinning a release, no matter what it's called, is a fair amount of work. > Part of it is a moral thing. Sometimes when a release is floundering > and > taking too long, it's better to draw a line and say we're going to fix > these bugs and get it out the door. True. But pretty much every one of the points I mentioned in the previous e-mail goes directly to a bug-fix question. (So does the mifluz merge, but that's a separate e-mail.) > substantial that the release needs to be called "4.0" just to give it > enough credit ;-). Avi Rappaport has said much the same thing. But: a) it's really an issue worthy of a vote on htdig-dev. b) it's not something to worry about until the final release is close to finished. -Geoff |
|
From: Gabriele B. <an...@ti...> - 2002-10-16 13:26:00
|
>Well, it is close to ready - I now have it successfully generating Well, first and foremost, it is the first time I express my opinion regar= ding this solution and I think it is really efficient and intelligent. Good on= ya, mate Brian! :-) Having said this, and also taking aknowledgement that I don't know how th= e XML file is structured, I want to raise the problem of 'translation' of the attributes' descriptions, uses, etc., in different languages. Any ideas? Ciao and thanks, -Gabriele |
|
From: Geoff H. <ghu...@ws...> - 2002-10-16 12:49:58
|
On Wednesday, October 16, 2002, at 02:27 AM, Brian White wrote: > * 95% of htdocs/attrs.html I guess I'm not clear on what "95%" means. Does this refer to the markup that you mentioned before? > I still need to bundle up the changes - I was thinking of creating > a patch based on 3.2.0b4 and just posting that here. Yes, that's probably a good idea. > the status of defaults.cc? How much has to merged in? Will > there need to exist in parallel in the CVS for a peiod? They can't really "exist in parallel" in the CVS--your code, after all, generates defaults.cc. Certainly some amount of merging will be needed for a while, but I don' think that barrier will be too high. But certainly I think your patch will need to be checked fairly carefully for possible "gotchas" and then we'll probably need to merge in Lachlan's proposed fixes. > I have some code that would help here ( bits of hacked together C and > Perl code ) - I just want to know whether I need to include it > in the bundle! I'm assuming you mean code to help with the merging? -Geoff |
|
From: Brian W. <bw...@st...> - 2002-10-16 07:35:34
|
Well, it is close to ready - I now have it successfully generating * htcommon/defaults.cc * htdocs/cf_byprog.html * htdocs/cf_buname.html * 95% of htdocs/attrs.html I still need to bundle up the changes - I was thinking of creating a patch based on 3.2.0b4 and just posting that here. At this stage, however, I have a particular question - what is the status of defaults.cc? How much has to merged in? Will there need to exist in parallel in the CVS for a peiod? I have some code that would help here ( bits of hacked together C and Perl code ) - I just want to know whether I need to include it in the bundle! Regs Brian ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Conrad S. <co...@sc...> - 2002-10-16 00:03:33
|
On Tuesday, October 15, 2002, at 05:50 PM, Jim Cole wrote: > Conrad Schilbe's bits of Tue, 15 Oct 2002 translated to: > >> My first post to this mailing list... if it should go somewhere else, >> let me know. > > For this type of question, the htdig-general lists would be > more appropriate than the developer list. Thanks, any posts of this manner will go there in the future. > >> Currently this is how I see it happening. I create multiple instances >> of rundig that index the individual sites and create database files >> for >> each. I then create a copy of these and merge them into one common >> index. > > Sounds like a reasonable approach. ht://Dig is certainly capable > of handling this type of setup. > >> Is there components of the database that should not be duplicated for >> each site? Is there an easier way to achieve my goals? Maybe a way to >> have one db but specify the site within the args? > > Have you read http://www.htdig.org/FAQ.html#q4.20 ? In particular > the parts about restrict and exclude? Also see Excellent! Looks like this the best way to do it. And in the FAQ... don't I feel like a fool... > > http://www.htdig.org/attrs.html#exclude > http://www.htdig.org/attrs.html#restrict > > These attributes provide a means for filtering search results in > a manner that might be useful for what you are trying to do. > > Jim > > > > ------------------------------------------------------- > This sf.net email is sponsored by: viaVerio will pay you up to > $1,000 for every account that you consolidate with us. > http://ad.doubleclick.net/clk;4749864;7604308;v? > http://www.viaverio.com/consolidator/osdn.cfm > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > > |
|
From: Jim C. <gre...@yg...> - 2002-10-15 23:51:01
|
Conrad Schilbe's bits of Tue, 15 Oct 2002 translated to: >My first post to this mailing list... if it should go somewhere else, >let me know. For this type of question, the htdig-general lists would be more appropriate than the developer list. >Currently this is how I see it happening. I create multiple instances >of rundig that index the individual sites and create database files for >each. I then create a copy of these and merge them into one common >index. Sounds like a reasonable approach. ht://Dig is certainly capable of handling this type of setup. >Is there components of the database that should not be duplicated for >each site? Is there an easier way to achieve my goals? Maybe a way to >have one db but specify the site within the args? Have you read http://www.htdig.org/FAQ.html#q4.20 ? In particular the parts about restrict and exclude? Also see http://www.htdig.org/attrs.html#exclude http://www.htdig.org/attrs.html#restrict These attributes provide a means for filtering search results in a manner that might be useful for what you are trying to do. Jim |
|
From: Conrad S. <co...@sc...> - 2002-10-15 23:33:51
|
My first post to this mailing list... if it should go somewhere else, let me know. I work for a large content provider with other affiliated providers. I would like to be able to index the main site daily and the affiliates weekly. Have separate databases for each entity and a combined database as well. Allowing users to search any individual site or all sites. Currently this is how I see it happening. I create multiple instances of rundig that index the individual sites and create database files for each. I then create a copy of these and merge them into one common index. Is there components of the database that should not be duplicated for each site? Is there an easier way to achieve my goals? Maybe a way to have one db but specify the site within the args? Thanks for any assistance. |
|
From: Neal R. <ne...@ri...> - 2002-10-15 18:38:09
|
> As for the mifluz merge, I don't quite understand your apparent bias
> against it.
1. I'm a fairly conservative Software Engineer. I believe in tractable
sets of reasonable size changes. This is pretty large. In my
experience the idea of beta versions is to fix bugs, new features
and major code rework is avoided if possible.
2. The mifluz devel list is near death, and it doesn't look like anyone
is actually using mifluz, or furthering development.
3. Loic is AWOL, and we have to certainty, other than what he told the
this group, that the current mifluz works as advertized. It sounds
great, but there is no proof or support for the assertions of
feature improvement.
4. How certain are we that these changes are going to make 3.2beta5
MORE stable than the current beta?
5. The current mifluz code merge has problems with constructors and
destructors in a library (libhtdig) setting. I would rather help
the group fix bugs and cleanup code in the current 3.2 than
burn time fixing those problems in the near-term.
6. It has performance problems.
I'm suspicious of starting down a road of swallowing the complete Mifluz
in the near-term. There are alot more unknowns in merging in mifluz than
fixing other issue first.
If Loic were around and the development list not dead I would be less
suspicious.
The list of feature improvements looks great, and it will be good to get
the merges in. In my opinion the process of doing that should be that we
get a working merge (which you are making great progress on) and doing a
kind of feature verify and some reasonable unit testing.
This process has many unknowns and I'd hate to hold up the release for it.
My past experience in importing alot of new code like this is that it's
always harder then it seems that there are lots of bugs.
> Let's pretend I was considering upgrading the Sleepycat DB
> code to version 4.2.x from the current 3.1.x. There are a ton of changes
> there too, but I wouldn't be particularly concerned since I know there's
> external code review and plenty of testing.
Apples vs Oranges to me. BerkelyDB is very well used and well tested...
by several orders of magnitude more than the current Mifluz and a couple
orders more than HtDig.
The idea of moving from 3.2beta4 to 3.2beta5 with the list of
changes above seems like alot! With the changes above,
a case can be made that not only would the code differ significantly
with the previous 3.2betas, it also has a load of new features.
New features late in the release aren't always a good idea.
---------
You're the development leader, and I'll help accomplish the list you
posted.
My input is to ask if we might be better off making a short list of
absolutely necessary bug-fixes for 3.2beta4 and release it soon.
Part of it is a moral thing. Sometimes when a release is floundering and
taking too long, it's better to draw a line and say we're going to fix
these bugs and get it out the door.
The other part is this important question: Does the current 3.2beta4 +
bug-fixes + 3.1.x improvements offer significant improvement to the 3.1.x users?
If it does then we are harming them in the short-term by delaying the release to
implement lots of new features and import code with many unknowns.
My experience with the current snapshots is very positive. I've had few
problems and the indexing it self is pretty solid, especially with the new
zlib WordDB compression.
I've sent gigabytes of text through this code and the memory leaks are not
in the critical class.
> The mifluz code that I've
> merged in has a fair amount of external code review and testing from
> other users.
Can you say that it has had as much as the average HtDig release? HtDig
is MUCH more active then mifluz has ever been.
> Most of the "ht://Dig" modifications in terms of number of lines of
> patch are simply upgrades in the build environment--moving to
> autoconf-2.5x and newer versions of automake, libtool, etc. These need
> to be done before any 3.2 release.
True. And they are good changes to the build env.
I don't have a good feeling for what 3.1.x users want in 3.2 and if they
are willing to wait for lots of changes to the current 3.2beta or would
rather have a reasonable release soon.
The other question is if you compare 3.1.x with 3.2beta4 + your list
above I personally believe that the changes are so pervasive and
substantial that the release needs to be called "4.0" just to give it
enough credit ;-).
Thanks.
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Geoff H. <ghu...@ws...> - 2002-10-15 12:58:30
|
On Monday, October 14, 2002, at 10:15 PM, Brian White wrote: > However - I noticed that defaults.cc only > has an entry for "heading_factor" and not > "heading_factor_1", "heading_factor_2" etc No, this is not an error. Remember that in 3.2, each particular tag would need a separate bit in the "flags" section. So it was decided that the heading_factor attributes should be consolidated into one flag. Yes, it's a loss of search precision, but it's a savings of space. -Geoff |
|
From: Brian W. <bw...@st...> - 2002-10-15 03:18:21
|
I have been doing some stuff on defaults.xml in the background - I should have something soon. However - I noticed that defaults.cc only has an entry for "heading_factor" and not "heading_factor_1", "heading_factor_2" etc I assume this is an error.... Brian ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Geoff H. <ghu...@ws...> - 2002-10-15 03:15:31
|
On Friday, October 11, 2002, at 02:38 PM, Neal Richter wrote: > Giles wrote: >> stated before, I'm willing to maintain the 3.1.x branch as far as >> 3.1.7, >> which will be a bug-fix release only. But if 3.2 doesn't get solid >> soon >> (and it's going to take more than my input and Geoff's to do that), I'm > > What is your summary of the things that need to be done? It's been > pretty > solid in my view. > > I would propose that we make a SHORT list of things that need to be > added/fixed ASAP and get it released. What's on your list? > > Lets get a short list together, do the work and move into a kind of > QA-process where we test for memory leaks/bugs, profile it, and fix > bugs. > > Then lets break up the new feature ideas into a wish list with balance > between efficiency improvements and new feature for a 3.3 release. > > The mifluz merge is so large in my mind that it ought to be part of a > 4.0 > release. You're welcome to your opinion. But let's start out with your "short list." * Switch to Quim's qtest framework: absolutely crucial. The bugs that Sinclair sees with punctuation, etc. are due to a pretty creaky htsearch system. Moreover, the current code isn't very amenable to expansion, new query syntaxes, or wrapping (in a library or another CGI-like system) * Migrate defaults.cc to new XML system: in retrospect, I think this is high-priority for 3.2 so the binaries don't have to carry around all those extra strings. * Memory improvements to htmerge and other httools: Current implementations load the entire wordlist or document list into memory, rather than "walking" record-by-record. * Forward-porting 3.1.6 improvements to htsearch, etc. * Documentation improvements as mentioned in STATUS file. * Current htsearch "collections" code is really convoluted and should likely be rewritten. IMHO, this is legitimately 3.2-priority since it's been a feature in previous 3.2 betas and there are users reliant on this. * "basic regex" or "wildcard" fuzzy type: Current regex fuzzy isn't particularly user-friendly. I haven't even consulted the sf.net bug tracker or features tracker--these are just off the top of my head. No, it's not too bad if there was some serious development effort like there was shortly before 3.1.0. But as Gilles pointed out, it simply cannot happen in a reasonable time frame without additional development manpower. As for the mifluz merge, I don't quite understand your apparent bias against it. Let's pretend I was considering upgrading the Sleepycat DB code to version 4.2.x from the current 3.1.x. There are a ton of changes there too, but I wouldn't be particularly concerned since I know there's external code review and plenty of testing. The mifluz code that I've merged in has a fair amount of external code review and testing from other users. Why has it taken so long to merge--basically I just haven't had the ability to block out time to do it in one fell swoop so it's dragging on forever. Most of the "ht://Dig" modifications in terms of number of lines of patch are simply upgrades in the build environment--moving to autoconf-2.5x and newer versions of automake, libtool, etc. These need to be done before any 3.2 release. -Geoff |