[documancer-devel] Re: new Documancer caching code checked in

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Vaclav,

On Jan 23, 2005, at 1:56 PM, Vaclav Slavik wrote:

> Hi,
>
> I checked in the long-promised generic caching code, any testing would
> be most welcome. Aside from the big internal changes in how the
> updates are carried, there are some UI improvements as well:
>
> * Updating book A's index no longer disables searching in book B.
> * You can no longer attempt to search in a book without index.
>   Previously, this would result in the UI blocking and waiting
>   until the index is regenerated. Now, Documancer will warn you
>   the the index is missing (see attached noindex.png) and will
>   disable the search box -- it will be enabled again when
>   indexing is done (remember: indexing happens in background,
>   it's no longer triggered by user action!).
> * If a book has index, you can search it even if the index is
>   out of date. The index is generated in temporary directory
>   and replaces old index when it's fully created, so as long as
>   there is at least _some_ index, you can do (potentially inexact)
>   searches (see attached oldindex.png).
> * Indexing of all books (i.e. even those not currently opened) is
>   done in background when you're reading docs, meaning that
>   there's a good change that Documancer will already be done
>   generating the indexes when you'll need them.

+4 :-)

> Still missing:
>
> * Prioritization of the updater -- when you select a book in
>   Documancer, the updater should stop doing whatever it is it's
>   working on and update items owned by currently selected book
>   as soon as possible.

+1

> * Autodetection of outdated books -- Documancer should check the
>   HTML/info/man/whatever files and regenerate the index if they
>   changed; currently, you have to do it manually.

Can we make this feature an option for the moment, at least until 
Documancer's parser can handle more special cases? For example, 
sometimes EClasses have redirect pages, and those would stop the 
indexer. (As to why they do, it's a bit of a long story. ;-) Anyways, 
it's better for EClass books to use an outdated index than recreate the 
index by only indexing the redirect page and nothing else.

Also, I should mention a couple features I'd like to add. One is 
support for searching metadata, and probably also generating metadata 
indexes. I'd like people to be able to look for documents produced by 
the United Nations, or publications from 1986, for example. Or browse 
an alphabetical list of all organizations that have documents in the 
book. EClass lets users specify this sort of metadata, and I think it 
would be good to allow Documancer users to perform more focused 
searches. For Documancer-generated indexes, we can probably use 
DublinCore metadata to get this metadata from HTML documents. I'm not 
sure if info or man pages specify this info in a standard way.

Another feature - what I'd call "bookshelves". Right now, Documancer 
presents you simply with a list of books, but what I'd like to see are 
books organized by categories. Like, a "Programming" bookshelf, with 
books related to programming only. Of course, there will be an "All" 
bookshelf, the default, that will show every book. The purpose of this 
is of course to keep the books list from getting too crowded as time 
goes on. (And we intend to use it for distributing content in fields 
like Economics, Agriculture, etc. so you can see how users could get 
bogged down with books!) How I foresee things is that you select your 
"Bookshelf" from the drop-down list on the main page, then if you click 
on the "Browse" tab, you get a list of the books in that bookshelf 
inside the tree view. Eventually, with books like EClass books, you can 
also expand their contents in the browse tree view. If Documancer can't 
determine the contents, it will just show the root page of the book.

In fact, we can use the "content package" XML file format to manage 
bookshelves too, so we get bookshelf support and EClass TOC support in 
one shot. Basically, I just need to clean up my conman module and make 
it into a portable Python module. I should also do that for 
wxbrowser.py too, so that we can start using it with Documancer as 
well. I think I've pretty much patched up wxWebKitCtrl, so I'll be 
using that for the Mac version of Documancer.

But all of this doesn't have to be in the next release. I'd like to see 
a 0.2.4 soon, probably after the caching code is tweaked and I've 
committed the indexer code. Anyways, I think that's all for now. ;-)

Thanks,

Kevin

> Regards,
> Vaclav
>
> -- 
> PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/
> <noindex.png><oldindex.png>