|
From: Derek H. <lor...@ms...> - 2002-10-09 02:33:50
|
First, hello everyone. I've joined the group to help develop the Microsoft
Office integration components, or at least get them kick-started. :-)
"Ashwini Kumar" wrote on 2002-10-02,
> I think we need to decide how complicated we would like the search
> for eDocs to get. Maybe for the time being we should simply stick to
> the RDF framework and make sure that the document creator
> associates a metadata with the document.
I agree with your proposal of RDF for the time being, Ashwini.
WHY RDF?
RDF is a mature standard and better-understood than ontologies. It's
easier (from a software development standpoint) to employ a simpler
metadata approach.
I'd like to see a first working release of eDocs as soon as possible,
versus a full-featured release that bogs down in complexity and which
might not "get out the proverbial door." That's why I'd support a minimal
feature set for Version 1, so that Version 2, and 3, etc., can evolve into
the eDocs Document Management System Sergio has envisioned with
help from a supportive user community.
THE MICROSOFT OFFICE INTEGRATION VIEW:
A METADATA SEARCH IS BETTER
A metadata search approach is easier to implement, less demanding
of network bandwidth in the size of serialized SOAP requests, and
would be much more reliable across the breadth of "document" types
business users will use eDocs with from the Microsoft Office suite
of products.
Speaking from my investigation of a Microsoft Word Add-in, to store
and retrieve a Word document, it would be easiest to provide meta-
data (the fields a user optionally fills in as the Document's Properties
in Word: title, keywords, author, manager, version, etc.) and then
a BASE64 octet-stream that could be stored on the server as an
opaque package ("black box"). This wouldn't facilitate full-text
search though, and serializing the Word Document's Object Model
would be extraordinarily heavyweight (also brittle to different releases
of Word, and labor-intensive to code and maintain pre dot NET.)
I can foresee sending metadata, the bare text of the Word
document as a (very long) xsd:string element so eDocs can
search the bare, unadorned text, and an octet-stream (which
would still be stored as a black box). This may not be as
applicable to other Microsoft Office applications, like Visio
diagrams [eg, I want to search for all UML Class Diagrams
in eDocs concerning class name "*LexicalHandler".]
Actually, Visio 2000 doesn't expose that in it's Object Model
(spent a weekend trying to get at it to write a code-gen Addin)
though it may be possible to get at it after the internal COM
object it uses to handle UML in Visio serializes itself into
the .VSD file format. In that case, I wouldn't want the
octet-stream to be opaque to the repository. :-) Or, I
might use the VBA FileSystemObject in the Add-in to
save the diagram to disk in a TMP folder, scan it for text,
and forward that text separately to be searchable (then
the octet-stream could be opaque, I suppose we want
to shoot for consistency there... )
In any event, I definately see issues with searching
Visio diagrams for text. Support would be incomplete,
at best. (My recommended practice, if a user has a UML
Class Diagram in Visio 2000, he or she must list all
applicable Class Names as keywords for metadata to
search for them reliably. Visio XP, I think, can export
UML to XMI, it might need a Microsoft patch to do it,
but that could work better.)
PROBLEM WITH ADDING FULL-TEXT SEARCH LATER
On the other hand (isn't it awful, having two hands? ;-) ).
The one problem with adding the sophisticated, full-text search
capability later is Migration for upgrading users. We would need
either:
1. A Migration Plan for going from a keyword-oriented search
facility to a full text-oriented search facility in the future. This
would probably involve something tantamount to checking-out
and checking-in all revisions (or having smart differencing) of
all (the latest) documents. :-(
2. Not support migration formally. Perhaps allow existing
documents in a company's repository catalogued with keyword
search to remain available for keyword searchs, but be excluded
from newer comprehensive text searchs. But for a pre-existing
document revision to be subject to comprehensive text search
in an upgrading organization, the user would have to check-out
and check-in under the new version.
So, there could be a problem going from a simpler metadata to
a more comprehensive metadata in the future for early-adopters.
Administrators will be unhappy if it's not easy to migrate.
CONCLUSION
I'd still choose the first hand: RDF and searching on metadata
about repository documents to begin with. It increases the
likelihood eDocs 1.0 happens (if eDocs 1.0 doesn't happen,
there'll be no first release to migrate from and so nobody has
that problem).
Derek Harmon
sto...@us...
|
|
From: Piotr K. <pkr...@wp...> - 2002-10-09 09:43:18
|
Hi all, First of all welcome to Derek. Now back to the subject. Sergio could you please add some news to our project? For example about the website being ready. I can check then if the shell script is working properly. BTW, I have modified the site and it recognizes the client's resolution now. I made 2 versions: 800x600 and 1024x768 as they are stil most popular on the web. Regards Piotr |
|
From: Sergio R. <sra...@ti...> - 2002-10-09 10:04:37
|
> Hi all, > > First of all welcome to Derek. > Now back to the subject. Sergio could you please add some news to our > project? For example about the website being ready. I can check then if > the shell script is working properly. > BTW, I have modified the site and it recognizes the client's resolution > now. I made 2 versions: 800x600 and 1024x768 as they are stil most > popular on the web. > > Regards > Piotr Great as usual Piotr, I've tried the site functionalities and works great. I'll add news to the project immediately. I was waiting for your ok to let the site be annouced. I've you solved the problem with publishing the site news? At the moment we've not many news but I suppose we'll got many soon. Is it possible to have the an excerpt of the latest news on the home page of the web site? I've tried also the two different resolutions and are ok. The script you're talking in your email related to news publish or to resolution adjustment? Derek can you write some text to put in the web site to illustrate our integration capabilities? Can you coordinate that with Piotr? Integration is a good point for a system like that. Piotr I think that we need to find a little space for that on the site. I was thinking the same more or less we have done for Vision, Benefit, and What is. Let me know guys. Regards Sergio |
|
From: Piotr K. <pkr...@wp...> - 2002-10-09 17:56:17
|
Sergio Ramazzina wrote: > Great as usual Piotr, Thank you. > I've tried the site functionalities and works great. I'll add news to the > project > immediately. I was waiting for your ok to let the site be annouced. I've you > solved > the problem with publishing the site news? At the moment we've not many news > but I suppose we'll got many soon. Is it possible to have the an excerpt of > the latest > news on the home page of the web site? > > I've tried also the two different resolutions and are ok. > > The script you're talking in your email related to news publish or to > resolution adjustment? It's for the news. It works that way: a perl script is being run according to cron schedule. It's connecting to project news database and checks if there's something new. Then it creates a html file with the news. I wanted you to post some news to see it in action, what this created file looks like and so on. And then I'll integrate it with the website. > Derek can you write some text to put in the web site to illustrate our > integration > capabilities? Can you coordinate that with Piotr? Integration is a good > point for > a system like that. Piotr I think that we need to find a little space for > that on the site. > I was thinking the same more or less we have done for Vision, Benefit, and > What is. Yes, of course, when I get the text I'll put it online. But I hope it won't be too big. I think it would be good to make a brief overview of the integration capabilities (just main features) for the front page and bigger, more detailed document for download. Regards Piotr |
|
From: Sergio R. <sra...@ti...> - 2002-10-09 18:05:31
|
Derek, can you interact with Piotr? Piotr, I've inserted a news that state that our web site is online. Try it Sergio ----- Original Message ----- From: "Piotr Kreglicki" <pkr...@wp...> To: "edocs-development mailing" <edo...@li...> Sent: Wednesday, October 09, 2002 7:54 PM Subject: Re: [Edocs-development] Integration of the news with website > Sergio Ramazzina wrote: > > > Great as usual Piotr, > > Thank you. > > > I've tried the site functionalities and works great. I'll add news to the > > project > > immediately. I was waiting for your ok to let the site be annouced. I've you > > solved > > the problem with publishing the site news? At the moment we've not many news > > but I suppose we'll got many soon. Is it possible to have the an excerpt of > > the latest > > news on the home page of the web site? > > > > I've tried also the two different resolutions and are ok. > > > > The script you're talking in your email related to news publish or to > > resolution adjustment? > > It's for the news. It works that way: a perl script is being run > according to cron schedule. It's connecting to project news database and > checks if there's something new. Then it creates a html file with the > news. I wanted you to post some news to see it in action, what this > created file looks like and so on. And then I'll integrate it with the > website. > > > Derek can you write some text to put in the web site to illustrate our > > integration > > capabilities? Can you coordinate that with Piotr? Integration is a good > > point for > > a system like that. Piotr I think that we need to find a little space for > > that on the site. > > I was thinking the same more or less we have done for Vision, Benefit, and > > What is. > > Yes, of course, when I get the text I'll put it online. But I hope it > won't be too big. I think it would be good to make a brief overview of > the integration capabilities (just main features) for the front page and > bigger, more detailed document for download. > > Regards > Piotr > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Edocs-development mailing list > Edo...@li... > https://lists.sourceforge.net/lists/listinfo/edocs-development |
|
From: Sergio R. <sra...@ti...> - 2002-10-09 15:25:52
|
> First, hello everyone. I've joined the group to help develop the
Microsoft
> Office integration components, or at least get them kick-started. :-)
>
> "Ashwini Kumar" wrote on 2002-10-02,
> > I think we need to decide how complicated we would like the search
> > for eDocs to get. Maybe for the time being we should simply stick to
> > the RDF framework and make sure that the document creator
> > associates a metadata with the document.
>
> I agree with your proposal of RDF for the time being, Ashwini.
>
>
> WHY RDF?
>
> RDF is a mature standard and better-understood than ontologies. It's
> easier (from a software development standpoint) to employ a simpler
> metadata approach.
I agreed on following this approach for our document search strategy.
> I'd like to see a first working release of eDocs as soon as possible,
> versus a full-featured release that bogs down in complexity and which
> might not "get out the proverbial door." That's why I'd support a
minimal
> feature set for Version 1, so that Version 2, and 3, etc., can evolve into
> the eDocs Document Management System Sergio has envisioned with
> help from a supportive user community.
We hope to have it soon. But now I would like to concentrate on analysis
and design and I absolutely want to finish that phase by the end of this
month.
After that we can think about implementing a prototype.
> A metadata search approach is easier to implement, less demanding
> of network bandwidth in the size of serialized SOAP requests, and
> would be much more reliable across the breadth of "document" types
> business users will use eDocs with from the Microsoft Office suite
> of products.
Right
>
> Speaking from my investigation of a Microsoft Word Add-in, to store
> and retrieve a Word document, it would be easiest to provide meta-
> data (the fields a user optionally fills in as the Document's Properties
> in Word: title, keywords, author, manager, version, etc.) and then
> a BASE64 octet-stream that could be stored on the server as an
> opaque package ("black box"). This wouldn't facilitate full-text
> search though, and serializing the Word Document's Object Model
> would be extraordinarily heavyweight (also brittle to different releases
> of Word, and labor-intensive to code and maintain pre dot NET.)
>
> I can foresee sending metadata, the bare text of the Word
> document as a (very long) xsd:string element so eDocs can
> search the bare, unadorned text, and an octet-stream (which
> would still be stored as a black box). This may not be as
> applicable to other Microsoft Office applications, like Visio
> diagrams [eg, I want to search for all UML Class Diagrams
> in eDocs concerning class name "*LexicalHandler".]
Right, we need to think about the ability to store every type of
document.
> Actually, Visio 2000 doesn't expose that in it's Object Model
> (spent a weekend trying to get at it to write a code-gen Addin)
> though it may be possible to get at it after the internal COM
> object it uses to handle UML in Visio serializes itself into
> the .VSD file format. In that case, I wouldn't want the
> octet-stream to be opaque to the repository. :-) Or, I
> might use the VBA FileSystemObject in the Add-in to
> save the diagram to disk in a TMP folder, scan it for text,
> and forward that text separately to be searchable (then
> the octet-stream could be opaque, I suppose we want
> to shoot for consistency there... )
>
> In any event, I definately see issues with searching
> Visio diagrams for text. Support would be incomplete,
> at best. (My recommended practice, if a user has a UML
> Class Diagram in Visio 2000, he or she must list all
> applicable Class Names as keywords for metadata to
> search for them reliably. Visio XP, I think, can export
> UML to XMI, it might need a Microsoft patch to do it,
> but that could work better.)
>
>
> PROBLEM WITH ADDING FULL-TEXT SEARCH LATER
>
> On the other hand (isn't it awful, having two hands? ;-) ).
>
> The one problem with adding the sophisticated, full-text search
> capability later is Migration for upgrading users. We would need
> either:
>
> 1. A Migration Plan for going from a keyword-oriented search
> facility to a full text-oriented search facility in the future. This
> would probably involve something tantamount to checking-out
> and checking-in all revisions (or having smart differencing) of
> all (the latest) documents. :-(
>
> 2. Not support migration formally. Perhaps allow existing
> documents in a company's repository catalogued with keyword
> search to remain available for keyword searchs, but be excluded
> from newer comprehensive text searchs. But for a pre-existing
> document revision to be subject to comprehensive text search
> in an upgrading organization, the user would have to check-out
> and check-in under the new version.
>
> So, there could be a problem going from a simpler metadata to
> a more comprehensive metadata in the future for early-adopters.
> Administrators will be unhappy if it's not easy to migrate.
>
>
> CONCLUSION
>
> I'd still choose the first hand: RDF and searching on metadata
> about repository documents to begin with. It increases the
> likelihood eDocs 1.0 happens (if eDocs 1.0 doesn't happen,
> there'll be no first release to migrate from and so nobody has
> that problem).
I think that now we need to be able to use metadata to enable serching
document, to let people be able to use the system with any
particular type of document and for us to mantain things easier.
May be in a next release we can think about full text search with
particular set of documents that needs this functionality.
Ashwini is involved in the task of define a strategy to implement the
RDF framework in our product. We hope to see soon his proposal.
Sergio
|