documancer-devel Mailing List for Documancer
Status: Beta
Brought to you by:
vaclavslavik
You can subscribe to this list here.
2005 |
Jan
(4) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Vaclav S. <vs...@fa...> - 2006-05-16 18:50:22
|
Hi, I switched Documancer mailing lists into subscriber-only mode in order=20 to cut down the volume of spam getting through (SF.net's spam filter=20 leaves much to be desired). Contrary to the subject, this does _not_=20 mean that non-subscribers cannot post -- it only means that=20 subscribers' posts are delivered straight away while non-members'=20 mails are held for moderator's approval. I'll change it back if=20 there's strong opposition to it, but I hope it's OK, as the worst it=20 may cause is that some legitimate posts will be delayed by ~ one day. Regards, Vaclav=20 =2D-=20 PGP key: 0x465264C9, available from http://pgp.mit.edu/ |
From: Vaclav S. <vs...@fa...> - 2006-02-27 00:04:17
|
Hi, now that SF.net offers Subversion, I moved Documancer repository to=20 SF.net. Updated instructions are on=20 http://documancer.sf.net/download.php. The new checkout command is svn co=20 https://USE...@sv.../svnroot/documancer/documancer/trunk This works for both devel and anon access, you're asked for=20 credentials when you try to write something to the repository. Changes notifications will now go to=20 doc...@li... instead of=20 documancer-svn@berlios. I didn't resubscribe anybody (that is,=20 Kevin ;) -- I didn't know if your tulane.edu address is still=20 working), so please resubscribe. Alternatively, I'm considering using=20 CIA RSS feeds instead... Regards, Vaclav =2D-=20 PGP key: 0x465264C9, available from http://pgp.mit.edu/ |
From: Vaclav S. <vs...@fa...> - 2005-02-08 21:11:41
|
Hi, I wasn't able to make SVN notifications work with=20 doc...@li..., so I "deleted" that list and=20 created doc...@li... -- if you want to see what's=20 going on in CVS, feel free to subscribe to it at https://developer.berlios.de/mail/?group_id=3D3057 Regards, Vaclav =2D-=20 PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ |
From: Vaclav S. <vs...@fa...> - 2005-02-07 21:01:35
|
Hi, I finished the move of Documancer repository to BerliOS's Subversion=20 hosting. You need to register at http://developer.berlios.de and mail=20 me your login to be added to the project=20 (http://developer.berlios.de/projects/documancer). After that, you=20 can checkout the trunk (aka CVS HEAD) by using this command: svn checkout svn+ssh://YOU...@sv.../svnroot/repos/documancer/d= ocumancer/trunk Regards, Vaclav =2D-=20 PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ |
From: Kevin & M. O. <kev...@th...> - 2005-01-26 00:57:44
|
Hi Vaclav, On Jan 25, 2005, at 3:32 PM, Vaclav Slavik wrote: [snip] >> Can we make this feature an option for the moment, at least until >> Documancer's parser can handle more special cases? For example, >> sometimes EClasses have redirect pages, and those would stop the >> indexer. (As to why they do, it's a bit of a long story. ;-) >> Anyways, it's better for EClass books to use an outdated index than >> recreate the index by only indexing the redirect page and nothing >> else. > > I don't understand what's the problem: as long as the index is > generated from the files (i.e. is newer than them), it's considered > up-to-date and not renegerated, so it shouldn't be a problem when > running from a CD, should it? Actually, I guess it depends on how sophisticated detection of outdated index is. As long as it doesn't try to reindex the documentation when it shouldn't, I should be fine. (BTW, I did eventually want to allow people the option to copy CD to their hard drive for faster/easier access.) > As for the "for the moment" part, that's already done because it's not > implemented yet ;-) > >> Also, I should mention a couple features I'd like to add. One is >> support for searching metadata, and probably also generating >> metadata indexes. I'd like people to be able to look for documents >> produced by the United Nations, or publications from 1986, for >> example. Or browse an alphabetical list of all organizations that >> have documents in the book. EClass lets users specify this sort of >> metadata, and I think it would be good to allow Documancer users to >> perform more focused searches. For Documancer-generated indexes, we >> can probably use DublinCore metadata to get this metadata from HTML >> documents. I'm not sure if info or man pages specify this info in a >> standard way. > > I don't understand how is this useful when searching in single > document (which is what Documancer does; as opposed to searching in > multiple documents). Documancer's search returns pages rather than > documents and it IMHO doesn't make sense to attach any metadata to > individual pages, let alone search for them. Am I missing something? I guess in this context I'm not sure what exactly a document is? I know pages are individual 'files' of a book, and a book is a page or collection of pages, but I'm not sure where a 'document' falls in between those two. A book can have pages written by different authors, for example, can't it? (wxWidgets docs are, for example, though that information isn't stored in the pages themselves.) As for its usefulness, this mostly falls under the 'my boss and others are asking for this' reasoning, although I do see how it is useful for them. ;-) We're an academic institution, so it wouldn't be uncommon for our people to want to query Documancer saying "I want all UN publications on the subject of refugees", for example, if they are looking for the UN's position on said topic. And since we work in many developing countries, one thing we try to do is assemble lots of public domain publications onto CD-ROM and distribute them at universities or public labs in those countries. So it is not uncommon for our 'books' to have 'pages' written by many, many different people/organizations. (And for these people, Googling and pulling up the results is very unreliable and/or costly.) >> Another feature - what I'd call "bookshelves". Right now, > > I believe it's already in TODO ;) Hmmm... I don't see it. ;-/ Am I missing something, or is in the TODO list that's sitting on top of your shoulders? :-) Thanks, Kevin >> with books!) How I foresee things is that you select your >> "Bookshelf" from the drop-down list on the main page, then if you >> click on the "Browse" tab, you get a list of the books in that >> bookshelf inside the tree view. Eventually, with books like EClass >> books, you can also expand their contents in the browse tree view. >> If Documancer can't determine the contents, it will just show the >> root page of the book. > > I like this UI to the bookshelfs... > > Regards, > Vaclav > > -- > PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ |
From: Vaclav S. <vs...@fa...> - 2005-01-25 23:32:33
|
Kevin & Masako Ollivier wrote: > Hi Vaclav, > > On Jan 23, 2005, at 1:56 PM, Vaclav Slavik wrote: > > Hi, > > > > I checked in the long-promised generic caching code, any testing > > would be most welcome. Aside from the big internal changes in how > > the updates are carried, there are some UI improvements as well: > > > > * Updating book A's index no longer disables searching in book B. > > * You can no longer attempt to search in a book without index. > > Previously, this would result in the UI blocking and waiting > > until the index is regenerated. Now, Documancer will warn you > > the the index is missing (see attached noindex.png) and will > > disable the search box -- it will be enabled again when > > indexing is done (remember: indexing happens in background, > > it's no longer triggered by user action!). > > * If a book has index, you can search it even if the index is > > out of date. The index is generated in temporary directory > > and replaces old index when it's fully created, so as long as > > there is at least _some_ index, you can do (potentially > > inexact) searches (see attached oldindex.png). > > * Indexing of all books (i.e. even those not currently opened) is > > done in background when you're reading docs, meaning that > > there's a good change that Documancer will already be done > > generating the indexes when you'll need them. > > +4 :-) > > > Still missing: > > > > * Prioritization of the updater -- when you select a book in > > Documancer, the updater should stop doing whatever it is it's > > working on and update items owned by currently selected book > > as soon as possible. > > +1 > > > * Autodetection of outdated books -- Documancer should check the > > HTML/info/man/whatever files and regenerate the index if they > > changed; currently, you have to do it manually. > > Can we make this feature an option for the moment, at least until > Documancer's parser can handle more special cases? For example,=20 > sometimes EClasses have redirect pages, and those would stop the > indexer. (As to why they do, it's a bit of a long story. ;-) > Anyways, it's better for EClass books to use an outdated index than > recreate the index by only indexing the redirect page and nothing > else. I don't understand what's the problem: as long as the index is=20 generated from the files (i.e. is newer than them), it's considered=20 up-to-date and not renegerated, so it shouldn't be a problem when=20 running from a CD, should it? As for the "for the moment" part, that's already done because it's not=20 implemented yet ;-) > Also, I should mention a couple features I'd like to add. One is > support for searching metadata, and probably also generating > metadata indexes. I'd like people to be able to look for documents > produced by the United Nations, or publications from 1986, for > example. Or browse an alphabetical list of all organizations that > have documents in the book. EClass lets users specify this sort of > metadata, and I think it would be good to allow Documancer users to > perform more focused searches. For Documancer-generated indexes, we > can probably use DublinCore metadata to get this metadata from HTML > documents. I'm not sure if info or man pages specify this info in a=20 > standard way. I don't understand how is this useful when searching in single=20 document (which is what Documancer does; as opposed to searching in=20 multiple documents). Documancer's search returns pages rather than=20 documents and it IMHO doesn't make sense to attach any metadata to=20 individual pages, let alone search for them. Am I missing something? > Another feature - what I'd call "bookshelves". Right now, I believe it's already in TODO ;) > with books!) How I foresee things is that you select your > "Bookshelf" from the drop-down list on the main page, then if you > click on the "Browse" tab, you get a list of the books in that > bookshelf inside the tree view. Eventually, with books like EClass > books, you can also expand their contents in the browse tree view. > If Documancer can't determine the contents, it will just show the > root page of the book. I like this UI to the bookshelfs... Regards, Vaclav =2D-=20 PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ |
From: Kevin & M. O. <kev...@th...> - 2005-01-24 22:11:39
|
Hi Vaclav, On Jan 23, 2005, at 1:56 PM, Vaclav Slavik wrote: > Hi, > > I checked in the long-promised generic caching code, any testing would > be most welcome. Aside from the big internal changes in how the > updates are carried, there are some UI improvements as well: > > * Updating book A's index no longer disables searching in book B. > * You can no longer attempt to search in a book without index. > Previously, this would result in the UI blocking and waiting > until the index is regenerated. Now, Documancer will warn you > the the index is missing (see attached noindex.png) and will > disable the search box -- it will be enabled again when > indexing is done (remember: indexing happens in background, > it's no longer triggered by user action!). > * If a book has index, you can search it even if the index is > out of date. The index is generated in temporary directory > and replaces old index when it's fully created, so as long as > there is at least _some_ index, you can do (potentially inexact) > searches (see attached oldindex.png). > * Indexing of all books (i.e. even those not currently opened) is > done in background when you're reading docs, meaning that > there's a good change that Documancer will already be done > generating the indexes when you'll need them. +4 :-) > Still missing: > > * Prioritization of the updater -- when you select a book in > Documancer, the updater should stop doing whatever it is it's > working on and update items owned by currently selected book > as soon as possible. +1 > * Autodetection of outdated books -- Documancer should check the > HTML/info/man/whatever files and regenerate the index if they > changed; currently, you have to do it manually. Can we make this feature an option for the moment, at least until Documancer's parser can handle more special cases? For example, sometimes EClasses have redirect pages, and those would stop the indexer. (As to why they do, it's a bit of a long story. ;-) Anyways, it's better for EClass books to use an outdated index than recreate the index by only indexing the redirect page and nothing else. Also, I should mention a couple features I'd like to add. One is support for searching metadata, and probably also generating metadata indexes. I'd like people to be able to look for documents produced by the United Nations, or publications from 1986, for example. Or browse an alphabetical list of all organizations that have documents in the book. EClass lets users specify this sort of metadata, and I think it would be good to allow Documancer users to perform more focused searches. For Documancer-generated indexes, we can probably use DublinCore metadata to get this metadata from HTML documents. I'm not sure if info or man pages specify this info in a standard way. Another feature - what I'd call "bookshelves". Right now, Documancer presents you simply with a list of books, but what I'd like to see are books organized by categories. Like, a "Programming" bookshelf, with books related to programming only. Of course, there will be an "All" bookshelf, the default, that will show every book. The purpose of this is of course to keep the books list from getting too crowded as time goes on. (And we intend to use it for distributing content in fields like Economics, Agriculture, etc. so you can see how users could get bogged down with books!) How I foresee things is that you select your "Bookshelf" from the drop-down list on the main page, then if you click on the "Browse" tab, you get a list of the books in that bookshelf inside the tree view. Eventually, with books like EClass books, you can also expand their contents in the browse tree view. If Documancer can't determine the contents, it will just show the root page of the book. In fact, we can use the "content package" XML file format to manage bookshelves too, so we get bookshelf support and EClass TOC support in one shot. Basically, I just need to clean up my conman module and make it into a portable Python module. I should also do that for wxbrowser.py too, so that we can start using it with Documancer as well. I think I've pretty much patched up wxWebKitCtrl, so I'll be using that for the Mac version of Documancer. But all of this doesn't have to be in the next release. I'd like to see a 0.2.4 soon, probably after the caching code is tweaked and I've committed the indexer code. Anyways, I think that's all for now. ;-) Thanks, Kevin > Regards, > Vaclav > > -- > PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ > <noindex.png><oldindex.png> |
From: Vaclav S. <vs...@fa...> - 2005-01-23 21:58:41
|
Hi, I checked in the long-promised generic caching code, any testing would be most welcome. Aside from the big internal changes in how the updates are carried, there are some UI improvements as well: * Updating book A's index no longer disables searching in book B. * You can no longer attempt to search in a book without index. Previously, this would result in the UI blocking and waiting until the index is regenerated. Now, Documancer will warn you the the index is missing (see attached noindex.png) and will disable the search box -- it will be enabled again when indexing is done (remember: indexing happens in background, it's no longer triggered by user action!). * If a book has index, you can search it even if the index is out of date. The index is generated in temporary directory and replaces old index when it's fully created, so as long as there is at least _some_ index, you can do (potentially inexact) searches (see attached oldindex.png). * Indexing of all books (i.e. even those not currently opened) is done in background when you're reading docs, meaning that there's a good change that Documancer will already be done generating the indexes when you'll need them. Still missing: * Prioritization of the updater -- when you select a book in Documancer, the updater should stop doing whatever it is it's working on and update items owned by currently selected book as soon as possible. * Autodetection of outdated books -- Documancer should check the HTML/info/man/whatever files and regenerate the index if they changed; currently, you have to do it manually. Regards, Vaclav -- PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/ |