|
From: Geoff H. <ghu...@ws...> - 2002-10-15 03:15:31
|
On Friday, October 11, 2002, at 02:38 PM, Neal Richter wrote: > Giles wrote: >> stated before, I'm willing to maintain the 3.1.x branch as far as >> 3.1.7, >> which will be a bug-fix release only. But if 3.2 doesn't get solid >> soon >> (and it's going to take more than my input and Geoff's to do that), I'm > > What is your summary of the things that need to be done? It's been > pretty > solid in my view. > > I would propose that we make a SHORT list of things that need to be > added/fixed ASAP and get it released. What's on your list? > > Lets get a short list together, do the work and move into a kind of > QA-process where we test for memory leaks/bugs, profile it, and fix > bugs. > > Then lets break up the new feature ideas into a wish list with balance > between efficiency improvements and new feature for a 3.3 release. > > The mifluz merge is so large in my mind that it ought to be part of a > 4.0 > release. You're welcome to your opinion. But let's start out with your "short list." * Switch to Quim's qtest framework: absolutely crucial. The bugs that Sinclair sees with punctuation, etc. are due to a pretty creaky htsearch system. Moreover, the current code isn't very amenable to expansion, new query syntaxes, or wrapping (in a library or another CGI-like system) * Migrate defaults.cc to new XML system: in retrospect, I think this is high-priority for 3.2 so the binaries don't have to carry around all those extra strings. * Memory improvements to htmerge and other httools: Current implementations load the entire wordlist or document list into memory, rather than "walking" record-by-record. * Forward-porting 3.1.6 improvements to htsearch, etc. * Documentation improvements as mentioned in STATUS file. * Current htsearch "collections" code is really convoluted and should likely be rewritten. IMHO, this is legitimately 3.2-priority since it's been a feature in previous 3.2 betas and there are users reliant on this. * "basic regex" or "wildcard" fuzzy type: Current regex fuzzy isn't particularly user-friendly. I haven't even consulted the sf.net bug tracker or features tracker--these are just off the top of my head. No, it's not too bad if there was some serious development effort like there was shortly before 3.1.0. But as Gilles pointed out, it simply cannot happen in a reasonable time frame without additional development manpower. As for the mifluz merge, I don't quite understand your apparent bias against it. Let's pretend I was considering upgrading the Sleepycat DB code to version 4.2.x from the current 3.1.x. There are a ton of changes there too, but I wouldn't be particularly concerned since I know there's external code review and plenty of testing. The mifluz code that I've merged in has a fair amount of external code review and testing from other users. Why has it taken so long to merge--basically I just haven't had the ability to block out time to do it in one fell swoop so it's dragging on forever. Most of the "ht://Dig" modifications in terms of number of lines of patch are simply upgrades in the build environment--moving to autoconf-2.5x and newer versions of automake, libtool, etc. These need to be done before any 3.2 release. -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-10-15 18:38:09
|
> As for the mifluz merge, I don't quite understand your apparent bias
> against it.
1. I'm a fairly conservative Software Engineer. I believe in tractable
sets of reasonable size changes. This is pretty large. In my
experience the idea of beta versions is to fix bugs, new features
and major code rework is avoided if possible.
2. The mifluz devel list is near death, and it doesn't look like anyone
is actually using mifluz, or furthering development.
3. Loic is AWOL, and we have to certainty, other than what he told the
this group, that the current mifluz works as advertized. It sounds
great, but there is no proof or support for the assertions of
feature improvement.
4. How certain are we that these changes are going to make 3.2beta5
MORE stable than the current beta?
5. The current mifluz code merge has problems with constructors and
destructors in a library (libhtdig) setting. I would rather help
the group fix bugs and cleanup code in the current 3.2 than
burn time fixing those problems in the near-term.
6. It has performance problems.
I'm suspicious of starting down a road of swallowing the complete Mifluz
in the near-term. There are alot more unknowns in merging in mifluz than
fixing other issue first.
If Loic were around and the development list not dead I would be less
suspicious.
The list of feature improvements looks great, and it will be good to get
the merges in. In my opinion the process of doing that should be that we
get a working merge (which you are making great progress on) and doing a
kind of feature verify and some reasonable unit testing.
This process has many unknowns and I'd hate to hold up the release for it.
My past experience in importing alot of new code like this is that it's
always harder then it seems that there are lots of bugs.
> Let's pretend I was considering upgrading the Sleepycat DB
> code to version 4.2.x from the current 3.1.x. There are a ton of changes
> there too, but I wouldn't be particularly concerned since I know there's
> external code review and plenty of testing.
Apples vs Oranges to me. BerkelyDB is very well used and well tested...
by several orders of magnitude more than the current Mifluz and a couple
orders more than HtDig.
The idea of moving from 3.2beta4 to 3.2beta5 with the list of
changes above seems like alot! With the changes above,
a case can be made that not only would the code differ significantly
with the previous 3.2betas, it also has a load of new features.
New features late in the release aren't always a good idea.
---------
You're the development leader, and I'll help accomplish the list you
posted.
My input is to ask if we might be better off making a short list of
absolutely necessary bug-fixes for 3.2beta4 and release it soon.
Part of it is a moral thing. Sometimes when a release is floundering and
taking too long, it's better to draw a line and say we're going to fix
these bugs and get it out the door.
The other part is this important question: Does the current 3.2beta4 +
bug-fixes + 3.1.x improvements offer significant improvement to the 3.1.x users?
If it does then we are harming them in the short-term by delaying the release to
implement lots of new features and import code with many unknowns.
My experience with the current snapshots is very positive. I've had few
problems and the indexing it self is pretty solid, especially with the new
zlib WordDB compression.
I've sent gigabytes of text through this code and the memory leaks are not
in the critical class.
> The mifluz code that I've
> merged in has a fair amount of external code review and testing from
> other users.
Can you say that it has had as much as the average HtDig release? HtDig
is MUCH more active then mifluz has ever been.
> Most of the "ht://Dig" modifications in terms of number of lines of
> patch are simply upgrades in the build environment--moving to
> autoconf-2.5x and newer versions of automake, libtool, etc. These need
> to be done before any 3.2 release.
True. And they are good changes to the build env.
I don't have a good feeling for what 3.1.x users want in 3.2 and if they
are willing to wait for lots of changes to the current 3.2beta or would
rather have a reasonable release soon.
The other question is if you compare 3.1.x with 3.2beta4 + your list
above I personally believe that the changes are so pervasive and
substantial that the release needs to be called "4.0" just to give it
enough credit ;-).
Thanks.
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Geoff H. <ghu...@ws...> - 2002-10-16 14:16:47
|
I'm going to take two separate issues and separate them for the moment: 1) What changes are needed for a solid 3.2.0 release. 2) The mifluz merge (in a separate e-mail). Please don't take any of my comments as overly critical or flaming. You're new to the project and attempting to take on some heavy lifting--so I'm trying to transfer some experience. > experience the idea of beta versions is to fix bugs, new features > and major code rework is avoided if possible. This is certainly the traditional definition. In practice with ht://Dig development, this hasn't worked very well. Typically this happens because there simply hasn't been the manpower to tackle several large cleanups at the same time. In the 3.1 "betas," people also came out of the woodwork to contribute their local changes. We do not currently have anything resembling a traditional software development and engineering process. Largely this happens because there has never been a significant number of core developers who can concentrate signficant amounts of time on ht://Dig. (I'm an excellent case in point.) At some time in the future, it would probably be good to move to a more "traditional" release scheme. It would also be good to have more component-level test suites. In the meantime (i.e. for getting 3.2.0 out the door with an appropriate level of stability), I suggest you temporarily accept a more flexible definition of "beta release." The reality starts with the list I mentioned--we absolutely must do some code reworks or we'll be layering more duct tape over our problems. In particular, IMHO, we'll continue to have weird htsearch bugs until we toss the current parser system. > My past experience in importing alot of new code like this is that it's > always harder then it seems that there are lots of bugs. I'm curious how much open-source development you've done. Remember that merging patches is quite typical for maintainers--Gilles and I do this quite often. In the case of ht://Dig, while development resources are at a premium, we have often ported and merged patches. The typical "beta" process with ht://Dig has been quite flexible towards the beginning and as a release like 3.1.0 firms up, fewer patches would be accepted. In answer to the question about 3.2.0 "firming up," remember the maxim about "development resources at a premium." For example, I'd much rather switch to the new htsearch framework because it'll be easier to find bugs. > a case can be made that not only would the code differ significantly > with the previous 3.2betas, it also has a load of new features. Take a look at the release notes for 3.1.0 betas and for previous 3.2.0 betas. As I said, we've had to take a rather flexible interpretation of a "beta" release. We currently don't have "development" or "alpha" releases. They would be nice, but I also have to be realistic about the pace of development and the number of active developers. Spinning a release, no matter what it's called, is a fair amount of work. > Part of it is a moral thing. Sometimes when a release is floundering > and > taking too long, it's better to draw a line and say we're going to fix > these bugs and get it out the door. True. But pretty much every one of the points I mentioned in the previous e-mail goes directly to a bug-fix question. (So does the mifluz merge, but that's a separate e-mail.) > substantial that the release needs to be called "4.0" just to give it > enough credit ;-). Avi Rappaport has said much the same thing. But: a) it's really an issue worthy of a vote on htdig-dev. b) it's not something to worry about until the final release is close to finished. -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-10-16 23:49:17
|
> Please don't take any of my comments as overly critical or flaming. > You're new to the project and attempting to take on some heavy > lifting--so I'm trying to transfer some experience. Of course not. Let me clarify a bit: My previous post with the proposed schedule could be restated with the releases as 3.2beta4, 3.2beta5, 3.2beta6 etc. I guess it comes down to that I think the code is good enough now to consider a release in the near-term without a raft of changes/improvements. The need for a release PDQ is a result of Gilles level of frustration with 3.1.x. I also think a case could be made for a release with some of the things on your list along with the zlib-WordDB-compression and a improved inverted index representation in the WordDB to cut out the excessive number of rows in the WordDB. If we accomplish that, then it gets some of the pressure off to merge with Mifluz 0.23 to fix bugs. The combination of the two would offset any WordDB size increase penalty from using zlib page-compression. If a short-term need for a release isn't warranted, then as long as we stagger some of these features into a schedule by priority... it sounds good. Let's just get a schedule of deliverables for either a sequence of 3.2betaX or a sequence of releases. For task organization and morale this could be useful. So Gilles, is there a short-term need for a release without some of the larger things on the TODO list? Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2002-10-17 16:43:34
|
According to Neal Richter: > My previous post with the proposed schedule could be restated with the > releases as 3.2beta4, 3.2beta5, 3.2beta6 etc. > > I guess it comes down to that I think the code is good enough now to > consider a release in the near-term without a raft of changes/improvements. > > The need for a release PDQ is a result of Gilles level of frustration > with 3.1.x. Just to clarify my own position, these are the things I'm finding frustrating: 1) Having to repeatedly tell people not to use the 1 1/2 year old 3.2.0b3 release because it's too buggy. You can't blame them for doing this - it was the last actual release of 3.2. We need to get 3.2.0b4 out soon, if only to give b3 a proper burial. 2) Too many questions/complaints about database errors in 3.2 betas. We need something more solid, whether based on the newer mifluz or on a zlib compression retro-fit. This has to be the default behaviour - we can't put another beta out with the current buggy word db compression code. 3) My own lack of time in being able to get the 3.1.6 fixes/updates forward ported to 3.2. I'd be thrilled if someone else picked up the ball on this one, but since pretty much everyone sees me as the 3.1 guy (my own fault for that), I feel the expectation is that I should be the one to do this. Having said that, I also don't want to rush a new release out the door if it's going to mean a whole bunch of new bugs to deal with. But we have to get something happening. I don't want us to stop putting out solid releases for either the sake of ideology (as some members seem to be willing to do), nor for the sake of trying too many new things all at once. > I also think a case could be made for a release with some of the things > on your list along with the zlib-WordDB-compression and a improved > inverted index representation in the WordDB to cut out the excessive number of > rows in the WordDB. > > If we accomplish that, then it gets some of the pressure off to merge > with Mifluz 0.23 to fix bugs. The combination of the two would offset any > WordDB size increase penalty from using zlib page-compression. > > If a short-term need for a release isn't warranted, then as long as we > stagger some of these features into a schedule by priority... it sounds good. > > Let's just get a schedule of deliverables for either a sequence of > 3.2betaX or a sequence of releases. > > For task organization and morale this could be useful. > > So Gilles, is there a short-term need for a release without some of the > larger things on the TODO list? Well, I would dearly love to see 3.2.0b4 out the door in 2-3 months, but frankly I don't see that happening with the latest mifluz code merged in. I have concerns about its portability and dependence on yet another library (iconv). I think Neal's idea of the zlib-WordDB-compression retrofit has merit, if only to get an interim beta 4 out the door soon. I see it as a quicker solution to the reliability issue. The only other thing I see as essential for 3.2.0b4 is getting the 3.1.6 changes in there. Otherwise, there'll be too much confusion about features that have been in 3.1 for almost a year, but not in 3.2. Oh, and documentation updates, of course. Ideally, if we could get 3.1.7 and 3.2.0b4 released in close proximity of each other, and with all 3.1.7 fixes also in 3.2.0b4, then we could feel reasonably confident in saying 3.1.7 is the end of the line for 3.1, and 3.2 is getting solid enough for production use. With that, I think we'd probably cut a quarter to a third of the repeat questions on the lists, that I attribute to a lag in getting new releases out. Other side projects like defaults.xml are great, but this seems to be shaping up to be a much bigger task that originally envisioned, what with the idea of maintaining multiple translations. It's great, but it shouldn't hold up 3.2.0b4, nor the much needed corrections/additions to defaults.cc's documentation fields. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Geoff H. <ghu...@ws...> - 2002-10-17 22:30:00
|
I talked to Neal off-list, so I'd like to clarify as well. I think the three of us are thinking basically the same thing, but it doesn't help when we talk about "3.3" or "4.0." So let's talk about "how to get 3.2.0b4 out soon." On Thu, 17 Oct 2002, Gilles Detillieux wrote: > > I guess it comes down to that I think the code is good enough now to > > consider a release in the near-term without a raft of changes/improvements. ... > was the last actual release of 3.2. We need to get 3.2.0b4 out soon, if > only to give b3 a proper burial. I think we all agree here, with the caveat below. I *hate* apologizing for the "known database bug." Neal: I read your statement as "let's release 3.2 with what we have." I'm not sure I agree with that. > compression retro-fit. This has to be the default behaviour - we can't > put another beta out with the current buggy word db compression code. Agreed. If Neal can get me his zlib patch soon, then we can put that in, test and try a 3.2.0b4 with that sooner, rather than later. > 3) My own lack of time in being able to get the 3.1.6 fixes/updates > forward ported to 3.2. If you have a list of particular things, it would help significantly. I'll check through the mailing list, but if you have a list somewhere it'd save some time. > library (iconv). I think Neal's idea of the zlib-WordDB-compression > retrofit has merit, if only to get an interim beta 4 out the door soon. > I see it as a quicker solution to the reliability issue. I think we're all on the same page here, though I'd like to see the patch first, obviously. I've been working on the mifluz merge because I think it needs to be done and b/c I can't see how we can ship a 3.2.0b4 with these database bugs. If there's a smaller bug-fix, that's great. :-) > The only other thing I see as essential for 3.2.0b4 is getting the > 3.1.6 changes in there. Otherwise, there'll be too much confusion I think there are a few remaining minor bugs which we should probably stomp along the way. > Other side projects like defaults.xml are great, but this seems to be > shaping up to be a much bigger task that originally envisioned, what No offense to Gabriele, but I'd rather consider translations to the documentation _after_ we switch to an XML documentation setup. Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can see a patch in the near future. I'm willing to handle the documentation fixes by hand if I need to do it. -Geoff |
|
From: Brian W. <bw...@st...> - 2002-10-17 23:53:51
|
At 08:29 18/10/2002, Geoff Hutchison wrote: > > Other side projects like defaults.xml are great, but this seems to be > > shaping up to be a much bigger task that originally envisioned, what > >No offense to Gabriele, but I'd rather consider translations to the >documentation _after_ we switch to an XML documentation setup. I've been thinking about this one - and the more I do, the more I agree. The problem isn't creating translated versions of the attributes - the problem is creating translated versions of everything else and managing how that all fits together. I think at some point the documentation needs to be reviewed and part of that should be internationalisation (i18n) - but it doesn't sound like that time is now. Also - bolting a translation system onto the current (:-)!) defaults.xml will take very little retrofitting and reworking so it doesn't need to be specially taken into account. I vote we do the first step and get a basic defaults.xml up - small steps! >Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can >see a patch in the near future. I'm willing to handle the documentation >fixes by hand if I need to do it. > >-Geoff ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Gabriele B. <g.b...@co...> - 2002-10-18 07:00:45
|
> No offense to Gabriele, but I'd rather consider translations to the > documentation _after_ we switch to an XML documentation setup. >=20 > Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can > see a patch in the near future. I'm willing to handle the documentation > fixes by hand if I need to do it. No worries, Geoff! :-) I agree with you. I always have in my mind the famous words of Bill Murray in 'Baby steps'. :-P Let's start with the english one, as is now. We'll worry about translations afterwards if 'the game is worth the candle' (sorry, it is an italian proverb - means if it is worthwhile). My questions regard the net library: I know of 2 'bugs' regarding especially compressed documents (#594790 and #460819), but I'd rather wait to implement them, and just leave the HTTP library as is: as far as you know, are there important bugs I should fix? Ciao and thanks -Gabriele --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Gilles D. <gr...@sc...> - 2002-10-18 17:39:13
|
According to Geoff Hutchison: > I talked to Neal off-list, so I'd like to clarify as well. I think the > three of us are thinking basically the same thing, but it doesn't help > when we talk about "3.3" or "4.0." So let's talk about "how to get 3.2.0b4 > out soon." Agreed. We can hammer out the details of later versions later. For now, though, we need a reliable 3.2.0b4 out there. My only immediate concern, which just occurred to me this morning, is the confusion caused by 18 months worth of 3.2.0b4 snapshots out there, incorporated into RPMs and such. When a bug report mentions 3.2.0b4, will we be able to trust that it's actually the official 3.2.0b4 release? Would it be helpful to skip b4 and jump to b5? > On Thu, 17 Oct 2002, Gilles Detillieux wrote: > > 3) My own lack of time in being able to get the 3.1.6 fixes/updates > > forward ported to 3.2. > > If you have a list of particular things, it would help significantly. I'll > check through the mailing list, but if you have a list somewhere it'd save > some time. I haven't updated the list since I sent it to Jessica Biola back in August. Here it is again... --- --- From: Gilles Detillieux <gr...@sc...> Subject: Re: [htdig-dev] Features in 3.1.6 and not in 3.2.0b4? To: jes...@ya... (Jessica Biola) Cc: htd...@li... Date: Fri, 16 Aug 2002 17:19:43 -0500 (CDT) According to Jessica Biola: > Are there any features that are in 3.1.6 and not in > 3.2.0b4? If so, could someone kindly provide a list > of the features? (i.e. ignore_dead_servers) I haven't yet compiled an exhaustive list of these. The sketchy list I have so far is... - multi-excerpt patch (max_excerpts attribute) for htsearch/Display.cc - better handling of htdig -m option - add startyear et al. to defaults.cc - make startyear et al. handle relative date ranges in Display.cc - fuzzy endings patch and updated english.0 file - get updated external parser scripts into contrib directory (fix eof handling bug in .pl scripts) - list-all feature in htsearch for a query of * or prefix_match_character - ignore_dead_servers attribute - description_meta_tag_names attribute - ignore_alt_text attribute - translate_latin1 attribute, with hooks into SGMLCodec class - search_rewrite_rules attribute - anchor_target attribute - search_results_contenttype attribute - boolean_keywords attribute - boolean_syntax_errors attribute - multimatch_method attribute (still VERY buggy in 3.1.6 though) The only way to get a really complete list is to go through the release notes and ChangeLog for 3.1.6, and make sure that each of these things (or something equivalent) is in the 3.2 CVS tree already. --- --- Note that some items in the list may already be in the 3.2 cvs. I just haven't checked yet. Also, a close look at the 3.1.6 ChangeLog may reveal bug fixes I've missed in both the list above and the 3.2 cvs. > > library (iconv). I think Neal's idea of the zlib-WordDB-compression > > retrofit has merit, if only to get an interim beta 4 out the door soon. > > I see it as a quicker solution to the reliability issue. > > I think we're all on the same page here, though I'd like to see the patch > first, obviously. I've been working on the mifluz merge because I think it > needs to be done and b/c I can't see how we can ship a 3.2.0b4 with these > database bugs. If there's a smaller bug-fix, that's great. :-) Sounds like a plan! > > The only other thing I see as essential for 3.2.0b4 is getting the > > 3.1.6 changes in there. Otherwise, there'll be too much confusion > > I think there are a few remaining minor bugs which we should probably > stomp along the way. Yes, we should comb through the bug database for anything that's tackleable and/or urgent enough to warrant working on for b4. As for Gabriele's question about the Content-Encoding header handling in HTTP/1.1, I'd say that depends. Is Content-Encoding header handling optional in an HTTP/1.1 client, or is it fully up to the server's discretion whether it is used. If HTTP/1.1 clients are required, by the standard, to recongnize Content-Encoding, then I'd call it a bug that htdig doesn't. If the standard makes it optional, then I'd think there should be a way htdig can tell the server "I don't grok this." > > Other side projects like defaults.xml are great, but this seems to be > > shaping up to be a much bigger task that originally envisioned, what > > No offense to Gabriele, but I'd rather consider translations to the > documentation _after_ we switch to an XML documentation setup. > > Personally, I'd consider switching to defaults.xml for 3.2.0b4 if I can > see a patch in the near future. I'm willing to handle the documentation > fixes by hand if I need to do it. Again, sounds like a plan. My concern was that the whole translation issue was going to affect the design of the XML DTD and coding for defaults.xml, and it would take a while to nail that down too. If the basic framework and the English version of the file can be readied in time for b4, let's go with that, and fill in other languages later. Another question to consider for this is, do we want all languages in the same file, or do we want separate default.xml files for each one? What about encodings? We'd need to handle different encodings for different languages, and pass this encoding specification into the generated HTML files. (Or is it all Unicode?) I know we don't need to nail all this down now, but I thought I'd ask just in case these issues affect the basic design we need to get in place for 3.2.0b4. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2002-10-18 20:12:19
|
Earlier today, I wrote... > According to Geoff Hutchison: > > On Thu, 17 Oct 2002, Gilles Detillieux wrote: > > > 3) My own lack of time in being able to get the 3.1.6 fixes/updates > > > forward ported to 3.2. > > > > If you have a list of particular things, it would help significantly. I'll > > check through the mailing list, but if you have a list somewhere it'd save > > some time. > > I haven't updated the list since I sent it to Jessica Biola back in August. > Here it is again... ... > - multi-excerpt patch (max_excerpts attribute) for htsearch/Display.cc > - better handling of htdig -m option > - add startyear et al. to defaults.cc > - make startyear et al. handle relative date ranges in Display.cc > - fuzzy endings patch and updated english.0 file > - get updated external parser scripts into contrib directory > (fix eof handling bug in .pl scripts) > - list-all feature in htsearch for a query of * or prefix_match_character > - ignore_dead_servers attribute > - description_meta_tag_names attribute > - ignore_alt_text attribute > - translate_latin1 attribute, with hooks into SGMLCodec class > - search_rewrite_rules attribute > - anchor_target attribute > - search_results_contenttype attribute > - boolean_keywords attribute > - boolean_syntax_errors attribute > - multimatch_method attribute (still VERY buggy in 3.1.6 though) ... > Note that some items in the list may already be in the 3.2 cvs. I just > haven't checked yet. Also, a close look at the 3.1.6 ChangeLog may reveal > bug fixes I've missed in both the list above and the 3.2 cvs. OK, scratch the first sentence in the paragraph above. I just looked over the list, and I'm fairly certain that none of these are in 3.2 cvs yet. Lachlan's patch to defaults.cc will add startyear et al. to defaults.cc, but it doesn't include the full description that 3.1.6's attrs.html file has for these attributes. And in a separate message, Jim Cole wrote... > Gilles - Please let me know what I can do to help you with all > the 3.1.x and 3.1.x->3.2 issues that seem to fall into your > court. I am a proficient C/C++ programmer. My knowledge of the > auto* tools is minimal, but I have a book :) I can find my way > around CVS. > > I can't promise a lot as my free time is nearly non-existent at > the moment, not to imply that the same is not true for others > around here. However if you have a couple tasks to toss my way, I > will see what I can do. I am certainly not going to make any > significant contributions by forever sticking to the "not enough > time" excuse ;) Well, since you're offering, I wouldn't mind a hand in going through the 3.1.6 ChangeLog to see what other changes need to go in 3.2 still. I'd like to add to the list above to make it a complete list of 3.1.x things still needed for 3.2. If that's not too unappealing a task to start with, that would be a good starting point. Apart from that, probably all the htsearch changes and htsearch-related attributes in the list above would be the easier ones to go into 3.2, and probably the most pressing because during 3.1.6 development I deliberately held back parallel changes to 3.2's htsearch, because of other changes that were to take place there, but never fully materialized. The htfuzzy, endings and contrib changes should all be pretty straightforward too. How's that for starters? If you want help or clarification on any of these, please let me know. And regardless of what you do end up working on, thank you for offering! I'll look after these two... > - better handling of htdig -m option > - translate_latin1 attribute, with hooks into SGMLCodec class because I have a pretty clear idea in mind of what I want to do with those. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Jim C. <gre...@yg...> - 2002-10-27 11:39:54
|
On Friday, October 18, 2002, at 02:12 PM, Gilles Detillieux wrote: > Well, since you're offering, I wouldn't mind a hand in going through > the 3.1.6 ChangeLog to see what other changes need to go in 3.2 still. > I'd like to add to the list above to make it a complete list of 3.1.x > things still needed for 3.2. > > If that's not too unappealing a task to start with, that would be a > good I am not too picky :) I will help where needed. Did you have a particular approach in mind? Is it safe to compare ChangeLog files, or should entries in the 3.1.6 ChangeLog be checked against the 3.2 source? Have you already checked part of the ChangeLog? How far back do we need to go? Other suggestions/advice? Jim |
|
From: Gilles D. <gr...@sc...> - 2002-10-30 20:57:07
|
According to Jim Cole: > On Friday, October 18, 2002, at 02:12 PM, Gilles Detillieux wrote: > > Well, since you're offering, I wouldn't mind a hand in going through > > the 3.1.6 ChangeLog to see what other changes need to go in 3.2 still. > > I'd like to add to the list above to make it a complete list of 3.1.x > > things still needed for 3.2. > > > > If that's not too unappealing a task to start with, that would be a > > good > > I am not too picky :) I will help where needed. Did you have a > particular approach in mind? Is it safe to compare ChangeLog files, or > should entries in the 3.1.6 ChangeLog be checked against the 3.2 > source? Have you already checked part of the ChangeLog? How far back do > we need to go? Other suggestions/advice? Sorry for taking so long to reply. I lost track of your message in the midst of the deluge. Comparing ChangeLog entries between 3.1.6 and the 3.2 cvs would be the first step, and would find most of the missing stuff. Note that there may be differences in wording between the two, especially if someone other than me make the entry in the 3.2 ChangeLog. If you can't find anything close in the 3.2 ChangeLog to a given 3.1.6 entry, then comparing the specific changes in the source against the 3.2 source would be the next step. I'd be glad to answer any questions you have at that stage, including punting a few ChangeLog entries my way for my verification. Potentially all entries since 3.1.5 was released would need to be checked. That may seem like a lot, given that 3.1.5 was released almost 2 full years before 3.1.6, but the CVS tree for 3.1.x was dormant for a long time after 3.1.5. Thanks. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Jim C. <gre...@yg...> - 2002-11-29 07:02:04
|
On Wednesday, October 30, 2002, at 01:56 PM, Gilles Detillieux wrote: > Comparing ChangeLog entries between 3.1.6 and the 3.2 cvs would be the > first step, and would find most of the missing stuff. Note that there The results of my comparison are at http://www.yggdrasill.net/htdig/cl.txt The file is essentially the 3.1.6 ChangeLog from Feb 25 10:11:50 2000 forward, with all forward ported items removed. I used the htdig-3.2.0b4-20021117 snapshot for the final comparison; for the most part, all comparisons were made against the related files, rather than the ChangeLog. There are a few of my own notes inserted along the way. I am 99.9% certain that a number of entries correspond to places where functionality has been intentionally removed or reimplemented in a different location/manner; however I figured it might be best to play dumb and let others with more knowledge make the final decision on those items. Jim |
|
From: Geoff H. <ghu...@ws...> - 2002-12-03 05:36:24
|
On Friday, November 29, 2002, at 01:02 AM, Jim Cole wrote: > The results of my comparison are at > http://www.yggdrasill.net/htdig/cl.txt The file is essentially the > 3.1.6 ChangeLog from Feb 25 10:11:50 2000 forward, with all forward I wouldn't focus on the contrib/ or htdoc/ changes. Obviously these need to be made, but especially in the case of htdoc/ some changes are sync'ed at the very end. (FAQ.html comes to mind.) The changes to Makefile.in files don't really apply since everything is generated by automake now. I'll take a look at the various configure.in changes. -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-10-18 20:35:48
|
Hey,
Goto: http://ai.rightnow.com/htdig/ for a link to a patch to
htdig-3.2.0b4-20021013 that adds zlib-based WordDB compression.
This patch is a workaround for the WordDB compression errors we are
seeing in current snapshots.
It adds a new config option 'wordlist_compress_zlib' that it true by
default. Not also that this feature uses 'compression_level' as a
parameter for zlib compression, which is used to compress the excerpts.
Thanks!
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Jim C. <gre...@yg...> - 2002-10-18 03:55:38
|
Gilles Detillieux's bits of Thu, 17 Oct 2002 translated to: >3) My own lack of time in being able to get the 3.1.6 fixes/updates >forward ported to 3.2. I'd be thrilled if someone else picked up the ball >on this one, but since pretty much everyone sees me as the 3.1 guy (my >own fault for that), I feel the expectation is that I should be the one >to do this. Gilles - Please let me know what I can do to help you with all the 3.1.x and 3.1.x->3.2 issues that seem to fall into your court. I am a proficient C/C++ programmer. My knowledge of the auto* tools is minimal, but I have a book :) I can find my way around CVS. I can't promise a lot as my free time is nearly non-existent at the moment, not to imply that the same is not true for others around here. However if you have a couple tasks to toss my way, I will see what I can do. I am certainly not going to make any significant contributions by forever sticking to the "not enough time" excuse ;) Jim |
|
From: Geoff H. <ghu...@ws...> - 2002-10-16 14:31:47
|
On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: > 2. The mifluz devel list is near death, and it doesn't look like > anyone > is actually using mifluz, or furthering development. Fine, but that simply does not mean that prior releases were not made with active users, developers or testing. There has been much more significant testing (on my part included) on the mifluz framework than the remainder of the ht://Dig codebase. > Can you say that it has had as much as the average HtDig release? > HtDig > is MUCH more active then mifluz has ever been. In terms of testing by the developers, component-level testing suites and testing before releases--the answer is pretty much yes. Granted, the mifluz releases between 0.14 (currently in 3.2.0b4) and 0.23 have not necessarily received the same pounding as thousands of ht://Dig users. But the users who were active with mifluz poured gigabytes of data through it too. Remember also that we *are* mifluz. Take a look at the copyright designations. > 4. How certain are we that these changes are going to make 3.2beta5 > MORE stable than the current beta? I'm certain. I put a lot of testing into the mifluz code and it's definitely more stable now than it was. > 5. The current mifluz code merge has problems with constructors and > destructors in a library (libhtdig) setting. I would rather help No offense, but your argument applies here. Why should libhtdig be a feature criteria for 3.2.0b4? > 6. It has performance problems. These seem like they're locking issues--it seems like the database is being locked and unlocked way too much. When we're indexing, it seems like the database should be locked in place as much as possible and then unlocked at the end. > My experience with the current snapshots is very positive. I've had few > problems and the indexing it self is pretty solid, especially with the > new > zlib WordDB compression. Sorry to sound dubious, but speaking of large code merges, you haven't submitted patches for me to merge into 3.2.0b4 either. As of yet, I haven't tested your zlib WordDB compression or seen if it has performance problems relative to 3.2.0b4. Can I claim that your code has seen as much user-level testing as 3.2.0b4 snapshots? I'm somewhat trying to play devil's advocate here. My gut feeling is that the mifluz merge should be aimed towards a 3.2.0b5 release and we *should* get 3.2.0b4 out the door as stable as possible in the near-term. But I'm pretty sure that merging in the new mifluz code is an overall win. -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-10-16 19:04:40
|
On Wed, 16 Oct 2002, Geoff Hutchison wrote: > > On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: > > > 2. The mifluz devel list is near death, and it doesn't look like > > anyone > > is actually using mifluz, or furthering development. > > Fine, but that simply does not mean that prior releases were not made > with active users, developers or testing. There has been much more > significant testing (on my part included) on the mifluz framework than > the remainder of the ht://Dig codebase. I agree in theory. In practice until the new code has been verified to be acceptable after a successful merge it is suspect. We hope that it will fix all our problems.. it will be a while before we confirm this. > > 5. The current mifluz code merge has problems with constructors and > > destructors in a library (libhtdig) setting. I would rather help > > No offense, but your argument applies here. Why should libhtdig be a > feature criteria for 3.2.0b4? I agree, it's not a criteria. I will maintain a separate branch for that. > > My experience with the current snapshots is very positive. I've had few > > problems and the indexing it self is pretty solid, especially with the > > new > > zlib WordDB compression. > > Sorry to sound dubious, but speaking of large code merges, you haven't > submitted patches for me to merge into 3.2.0b4 either. As of yet, I > haven't tested your zlib WordDB compression or seen if it has > performance problems relative to 3.2.0b4. Can I claim that your code has > seen as much user-level testing as 3.2.0b4 snapshots? Heh. ;-) I'll get you those ASAP. Zlib is extremely well tested and the changes are a few lines of code. Giving this as a work around to people who encounter the WordDB compression bug is a good alternative to hoping that its fixed in a merged-mifluz codebase. > I'm somewhat trying to play devil's advocate here. My gut feeling is > that the mifluz merge should be aimed towards a 3.2.0b5 release and we > *should* get 3.2.0b4 out the door as stable as possible in the > near-term. But I'm pretty sure that merging in the new mifluz code is an > overall win. I agree in theory. In practice I am motivated to suggest we scale back what is absolutely necessary in order to get users a new release faster. Gilles in particular has voiced frustration over the delay in 3.2 release. And the waste of his time maintaining 3.1.x I'd hate to continue adding to the pile and further frustrate him. If we were a company and were risking the speedy completion of a release by wanting to incorporate a huge chunk of third party code that needs more work... we'd be in real danger of getting fired. I guess I see these things: 1. The 3.2 dev process is too open-ended at present 2. The 3.1.x users need a new release 3. The current 3.2beta4 code offers a significant release to users 4. We are in danger of being waist deep in feature-creep quicksand. If we delay the integration of mifluz and the larger items on your list, we'll have a tractable set of things to do to get a decent release out there. Basically I'm suggesting that for morale purposes alone we do this and set a goal of pushing a 3.2 release out the door by December. Next, we make a list and divide it between smaller changes and larger ones. Smaller ones go into 3.3 (release in March?) and the rest into 4.0. The development could be semi-parallel at this point. You may disagree with the "numbers game" here, but I think it would be good for morale to establish a set of well-reasoned conservative milestones and meet them in the sort-term. If we implement a strategy like this and six-months later we look back and see that we've had 1-2 releases and are moving forward with integration of large new features/code we'll feel much better vs still being in feature-creep quicksand. Here's a proposal http://ai.rightnow.com/htdig/proposed_schedule.html Basically I included only things in 3.2 schedule that are necessary to fix or work around known bugs. Things like Quim's new search frame-work and the excellent XML-config file feature are in 3.3. More open-ended things like mifluz merge and STL and Unicode are in 4.0 & 4.1 Also the Zlib-WordDB in 3.2 and More efficient WordDB inverted index are straight forward and buys us time with the mifluz merge. Anyway.. I'm sure you're you won't agree on my thoughts on the mifluz-merge and this is certainly a conservative viewpoint on it. If we make good progress on the mifluz-merge by the end of the year I'll withdraw any further objections. Eh? Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |