documancer-devel Mailing List for Documancer

Status: Beta

Brought to you by: vaclavslavik

documancer-devel — Developers mailing list

You can subscribe to this list here.

2005	_Jan (4)	_Feb (2)	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2006	_Jan	_Feb (1)	_Mar	_Apr	_May (1)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

[documancer-devel] Documancer mailing lists subscribers only (somewhat) now

From: Vaclav S. <vs...@fa...> - 2006-05-16 18:50:22

Hi,

I switched Documancer mailing lists into subscriber-only mode in order=20
to cut down the volume of spam getting through (SF.net's spam filter=20
leaves much to be desired). Contrary to the subject, this does _not_=20
mean that non-subscribers cannot post -- it only means that=20
subscribers' posts are delivered straight away while non-members'=20
mails are held for moderator's approval. I'll change it back if=20
there's strong opposition to it, but I hope it's OK, as the worst it=20
may cause is that some legitimate posts will be delayed by ~ one day.

Regards,
Vaclav=20

=2D-=20
PGP key: 0x465264C9, available from http://pgp.mit.edu/

[documancer-devel] Important: Subversion repository moved to SF.net

From: Vaclav S. <vs...@fa...> - 2006-02-27 00:04:17

Hi,

now that SF.net offers Subversion, I moved Documancer repository to=20
SF.net. Updated instructions are on=20
http://documancer.sf.net/download.php. The new checkout command is

svn co=20
https://USERNAME@svn.sourceforge.net/svnroot/documancer/documancer/trunk

This works for both devel and anon access, you're asked for=20
credentials when you try to write something to the repository.

Changes notifications will now go to=20
doc...@li... instead of=20
documancer-svn@berlios. I didn't resubscribe anybody (that is,=20
Kevin ;) --  I didn't know if your tulane.edu address is still=20
working), so please resubscribe. Alternatively, I'm considering using=20
CIA RSS feeds instead...

Regards,
Vaclav

=2D-=20
PGP key: 0x465264C9, available from http://pgp.mit.edu/

Re: [documancer-devel] Attention: repository moved to Subversion

From: Vaclav S. <vs...@fa...> - 2005-02-08 21:11:41

Hi,

I wasn't able to make SVN notifications work with=20
doc...@li..., so I "deleted" that list and=20
created doc...@li... -- if you want to see what's=20
going on in CVS, feel free to subscribe  to it at

   https://developer.berlios.de/mail/?group_id=3D3057

Regards,
Vaclav

=2D-=20
PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/

[documancer-devel] Attention: repository moved to Subversion

From: Vaclav S. <vs...@fa...> - 2005-02-07 21:01:35

Hi,

I finished the move of Documancer repository to BerliOS's Subversion=20
hosting. You need to register at http://developer.berlios.de and mail=20
me your login to be added to the project=20
(http://developer.berlios.de/projects/documancer). After that, you=20
can checkout the trunk (aka CVS HEAD) by using this command:

svn checkout svn+ssh://YOUR_LOGIN@svn.berlios.de/svnroot/repos/documancer/d=
ocumancer/trunk

Regards,
Vaclav

=2D-=20
PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/

[documancer-devel] Re: new Documancer caching code checked in

From: Kevin & M. O. <kev...@th...> - 2005-01-26 00:57:44

Hi Vaclav,

On Jan 25, 2005, at 3:32 PM, Vaclav Slavik wrote:

[snip]

>> Can we make this feature an option for the moment, at least until
>> Documancer's parser can handle more special cases? For example,
>> sometimes EClasses have redirect pages, and those would stop the
>> indexer. (As to why they do, it's a bit of a long story. ;-)
>> Anyways, it's better for EClass books to use an outdated index than
>> recreate the index by only indexing the redirect page and nothing
>> else.
>
> I don't understand what's the problem: as long as the index is
> generated from the files (i.e. is newer than them), it's considered
> up-to-date and not renegerated, so it shouldn't be a problem when
> running from a CD, should it?

Actually, I guess it depends on how sophisticated detection of outdated 
index is. As long as it doesn't try to reindex the documentation when 
it shouldn't, I should be fine. (BTW, I did eventually want to allow 
people the option to copy CD to their hard drive for faster/easier 
access.)

> As for the "for the moment" part, that's already done because it's not
> implemented yet ;-)
>
>> Also, I should mention a couple features I'd like to add. One is
>> support for searching metadata, and probably also generating
>> metadata indexes. I'd like people to be able to look for documents
>> produced by the United Nations, or publications from 1986, for
>> example. Or browse an alphabetical list of all organizations that
>> have documents in the book. EClass lets users specify this sort of
>> metadata, and I think it would be good to allow Documancer users to
>> perform more focused searches. For Documancer-generated indexes, we
>> can probably use DublinCore metadata to get this metadata from HTML
>> documents. I'm not sure if info or man pages specify this info in a
>> standard way.
>
> I don't understand how is this useful when searching in single
> document (which is what Documancer does; as opposed to searching in
> multiple documents). Documancer's search returns pages rather than
> documents and it IMHO doesn't make sense to attach any metadata to
> individual pages, let alone search for them. Am I missing something?

I guess in this context I'm not sure what exactly a document is? I know 
pages are individual 'files' of a book, and a book is a page or 
collection of pages, but I'm not sure where a 'document' falls in 
between those two. A book can have pages written by different authors, 
for example, can't it? (wxWidgets docs are, for example, though that 
information isn't stored in the pages themselves.)

As for its usefulness, this mostly falls under the 'my boss and others 
are asking for this' reasoning, although I do see how it is useful for 
them. ;-) We're an academic institution, so it wouldn't be uncommon for 
our people to want to query Documancer saying "I want all UN 
publications on the subject of refugees", for example, if they are 
looking for the UN's position on said topic. And since we work in many 
developing countries, one thing we try to do is assemble lots of public 
domain publications onto CD-ROM and distribute them at universities or 
public labs in those countries. So it is not uncommon for our 'books' 
to have 'pages' written by many, many different people/organizations. 
(And for these people, Googling and pulling up the results is very 
unreliable and/or costly.)

>> Another feature - what I'd call "bookshelves". Right now,
>
> I believe it's already in TODO ;)

Hmmm... I don't see it. ;-/ Am I missing something, or is in the TODO 
list that's sitting on top of your shoulders? :-)

Thanks,

Kevin

>> with books!) How I foresee things is that you select your
>> "Bookshelf" from the drop-down list on the main page, then if you
>> click on the "Browse" tab, you get a list of the books in that
>> bookshelf inside the tree view. Eventually, with books like EClass
>> books, you can also expand their contents in the browse tree view.
>> If Documancer can't determine the contents, it will just show the
>> root page of the book.
>
> I like this UI to the bookshelfs...
>
> Regards,
> Vaclav
>
> -- 
> PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/

[documancer-devel] Re: new Documancer caching code checked in

From: Vaclav S. <vs...@fa...> - 2005-01-25 23:32:33

Kevin & Masako Ollivier wrote:
> Hi Vaclav,
>
> On Jan 23, 2005, at 1:56 PM, Vaclav Slavik wrote:
> > Hi,
> >
> > I checked in the long-promised generic caching code, any testing
> > would be most welcome. Aside from the big internal changes in how
> > the updates are carried, there are some UI improvements as well:
> >
> > * Updating book A's index no longer disables searching in book B.
> > * You can no longer attempt to search in a book without index.
> >   Previously, this would result in the UI blocking and waiting
> >   until the index is regenerated. Now, Documancer will warn you
> >   the the index is missing (see attached noindex.png) and will
> >   disable the search box -- it will be enabled again when
> >   indexing is done (remember: indexing happens in background,
> >   it's no longer triggered by user action!).
> > * If a book has index, you can search it even if the index is
> >   out of date. The index is generated in temporary directory
> >   and replaces old index when it's fully created, so as long as
> >   there is at least _some_ index, you can do (potentially
> > inexact) searches (see attached oldindex.png).
> > * Indexing of all books (i.e. even those not currently opened) is
> >   done in background when you're reading docs, meaning that
> >   there's a good change that Documancer will already be done
> >   generating the indexes when you'll need them.
>
> +4 :-)
>
> > Still missing:
> >
> > * Prioritization of the updater -- when you select a book in
> >   Documancer, the updater should stop doing whatever it is it's
> >   working on and update items owned by currently selected book
> >   as soon as possible.
>
> +1
>
> > * Autodetection of outdated books -- Documancer should check the
> >   HTML/info/man/whatever files and regenerate the index if they
> >   changed; currently, you have to do it manually.
>
> Can we make this feature an option for the moment, at least until
> Documancer's parser can handle more special cases? For example,=20
> sometimes EClasses have redirect pages, and those would stop the
> indexer. (As to why they do, it's a bit of a long story. ;-)
> Anyways, it's better for EClass books to use an outdated index than
> recreate the index by only indexing the redirect page and nothing
> else.

I don't understand what's the problem: as long as the index is=20
generated from the files (i.e. is newer than them), it's considered=20
up-to-date and not renegerated, so it shouldn't be a problem when=20
running from a CD, should it?

As for the "for the moment" part, that's already done because it's not=20
implemented yet ;-)

> Also, I should mention a couple features I'd like to add. One is
> support for searching metadata, and probably also generating
> metadata indexes. I'd like people to be able to look for documents
> produced by the United Nations, or publications from 1986, for
> example. Or browse an alphabetical list of all organizations that
> have documents in the book. EClass lets users specify this sort of
> metadata, and I think it would be good to allow Documancer users to
> perform more focused searches. For Documancer-generated indexes, we
> can probably use DublinCore metadata to get this metadata from HTML
> documents. I'm not sure if info or man pages specify this info in a=20
> standard way.

I don't understand how is this useful when searching in single=20
document (which is what Documancer does; as opposed to searching in=20
multiple documents). Documancer's search returns pages rather than=20
documents and it IMHO doesn't make sense to attach any metadata to=20
individual pages, let alone search for them. Am I missing something?

> Another feature - what I'd call "bookshelves". Right now,

I believe it's already in TODO ;)

> with books!) How I foresee things is that you select your
> "Bookshelf" from the drop-down list on the main page, then if you
> click on the "Browse" tab, you get a list of the books in that
> bookshelf inside the tree view. Eventually, with books like EClass
> books, you can also expand their contents in the browse tree view.
> If Documancer can't determine the contents, it will just show the
> root page of the book.

I like this UI to the bookshelfs...

Regards,
Vaclav

=2D-=20
PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/

[documancer-devel] Re: new Documancer caching code checked in

From: Kevin & M. O. <kev...@th...> - 2005-01-24 22:11:39

Hi Vaclav,

On Jan 23, 2005, at 1:56 PM, Vaclav Slavik wrote:

> Hi,
>
> I checked in the long-promised generic caching code, any testing would
> be most welcome. Aside from the big internal changes in how the
> updates are carried, there are some UI improvements as well:
>
> * Updating book A's index no longer disables searching in book B.
> * You can no longer attempt to search in a book without index.
>   Previously, this would result in the UI blocking and waiting
>   until the index is regenerated. Now, Documancer will warn you
>   the the index is missing (see attached noindex.png) and will
>   disable the search box -- it will be enabled again when
>   indexing is done (remember: indexing happens in background,
>   it's no longer triggered by user action!).
> * If a book has index, you can search it even if the index is
>   out of date. The index is generated in temporary directory
>   and replaces old index when it's fully created, so as long as
>   there is at least _some_ index, you can do (potentially inexact)
>   searches (see attached oldindex.png).
> * Indexing of all books (i.e. even those not currently opened) is
>   done in background when you're reading docs, meaning that
>   there's a good change that Documancer will already be done
>   generating the indexes when you'll need them.

+4 :-)

> Still missing:
>
> * Prioritization of the updater -- when you select a book in
>   Documancer, the updater should stop doing whatever it is it's
>   working on and update items owned by currently selected book
>   as soon as possible.

+1

> * Autodetection of outdated books -- Documancer should check the
>   HTML/info/man/whatever files and regenerate the index if they
>   changed; currently, you have to do it manually.

Can we make this feature an option for the moment, at least until 
Documancer's parser can handle more special cases? For example, 
sometimes EClasses have redirect pages, and those would stop the 
indexer. (As to why they do, it's a bit of a long story. ;-) Anyways, 
it's better for EClass books to use an outdated index than recreate the 
index by only indexing the redirect page and nothing else.

Also, I should mention a couple features I'd like to add. One is 
support for searching metadata, and probably also generating metadata 
indexes. I'd like people to be able to look for documents produced by 
the United Nations, or publications from 1986, for example. Or browse 
an alphabetical list of all organizations that have documents in the 
book. EClass lets users specify this sort of metadata, and I think it 
would be good to allow Documancer users to perform more focused 
searches. For Documancer-generated indexes, we can probably use 
DublinCore metadata to get this metadata from HTML documents. I'm not 
sure if info or man pages specify this info in a standard way.

Another feature - what I'd call "bookshelves". Right now, Documancer 
presents you simply with a list of books, but what I'd like to see are 
books organized by categories. Like, a "Programming" bookshelf, with 
books related to programming only. Of course, there will be an "All" 
bookshelf, the default, that will show every book. The purpose of this 
is of course to keep the books list from getting too crowded as time 
goes on. (And we intend to use it for distributing content in fields 
like Economics, Agriculture, etc. so you can see how users could get 
bogged down with books!) How I foresee things is that you select your 
"Bookshelf" from the drop-down list on the main page, then if you click 
on the "Browse" tab, you get a list of the books in that bookshelf 
inside the tree view. Eventually, with books like EClass books, you can 
also expand their contents in the browse tree view. If Documancer can't 
determine the contents, it will just show the root page of the book.

In fact, we can use the "content package" XML file format to manage 
bookshelves too, so we get bookshelf support and EClass TOC support in 
one shot. Basically, I just need to clean up my conman module and make 
it into a portable Python module. I should also do that for 
wxbrowser.py too, so that we can start using it with Documancer as 
well. I think I've pretty much patched up wxWebKitCtrl, so I'll be 
using that for the Mac version of Documancer.

But all of this doesn't have to be in the next release. I'd like to see 
a 0.2.4 soon, probably after the caching code is tweaked and I've 
committed the indexer code. Anyways, I think that's all for now. ;-)

Thanks,

Kevin

> Regards,
> Vaclav
>
> -- 
> PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/
> <noindex.png><oldindex.png>

[documancer-devel] new Documancer caching code checked in

From: Vaclav S. <vs...@fa...> - 2005-01-23 21:58:41

Attachments: noindex.png oldindex.png

Hi,

I checked in the long-promised generic caching code, any testing would 
be most welcome. Aside from the big internal changes in how the 
updates are carried, there are some UI improvements as well:

* Updating book A's index no longer disables searching in book B.
* You can no longer attempt to search in a book without index.
  Previously, this would result in the UI blocking and waiting
  until the index is regenerated. Now, Documancer will warn you
  the the index is missing (see attached noindex.png) and will
  disable the search box -- it will be enabled again when
  indexing is done (remember: indexing happens in background,
  it's no longer triggered by user action!).
* If a book has index, you can search it even if the index is
  out of date. The index is generated in temporary directory
  and replaces old index when it's fully created, so as long as
  there is at least _some_ index, you can do (potentially inexact)
  searches (see attached oldindex.png).
* Indexing of all books (i.e. even those not currently opened) is
  done in background when you're reading docs, meaning that
  there's a good change that Documancer will already be done
  generating the indexes when you'll need them.

Still missing:

* Prioritization of the updater -- when you select a book in
  Documancer, the updater should stop doing whatever it is it's
  working on and update items owned by currently selected book
  as soon as possible.
* Autodetection of outdated books -- Documancer should check the
  HTML/info/man/whatever files and regenerate the index if they
  changed; currently, you have to do it manually.

Regards,
Vaclav

-- 
PGP key: 0x465264C9, available from http://wwwkeys.pgp.net/

Flat | Threaded

2005	Jan (4)	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2006	Jan	Feb (1)	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec