From: Derek H. <lor...@ms...> - 2002-10-09 02:33:50
|
First, hello everyone. I've joined the group to help develop the Microsoft Office integration components, or at least get them kick-started. :-) "Ashwini Kumar" wrote on 2002-10-02, > I think we need to decide how complicated we would like the search > for eDocs to get. Maybe for the time being we should simply stick to > the RDF framework and make sure that the document creator > associates a metadata with the document. I agree with your proposal of RDF for the time being, Ashwini. WHY RDF? RDF is a mature standard and better-understood than ontologies. It's easier (from a software development standpoint) to employ a simpler metadata approach. I'd like to see a first working release of eDocs as soon as possible, versus a full-featured release that bogs down in complexity and which might not "get out the proverbial door." That's why I'd support a minimal feature set for Version 1, so that Version 2, and 3, etc., can evolve into the eDocs Document Management System Sergio has envisioned with help from a supportive user community. THE MICROSOFT OFFICE INTEGRATION VIEW: A METADATA SEARCH IS BETTER A metadata search approach is easier to implement, less demanding of network bandwidth in the size of serialized SOAP requests, and would be much more reliable across the breadth of "document" types business users will use eDocs with from the Microsoft Office suite of products. Speaking from my investigation of a Microsoft Word Add-in, to store and retrieve a Word document, it would be easiest to provide meta- data (the fields a user optionally fills in as the Document's Properties in Word: title, keywords, author, manager, version, etc.) and then a BASE64 octet-stream that could be stored on the server as an opaque package ("black box"). This wouldn't facilitate full-text search though, and serializing the Word Document's Object Model would be extraordinarily heavyweight (also brittle to different releases of Word, and labor-intensive to code and maintain pre dot NET.) I can foresee sending metadata, the bare text of the Word document as a (very long) xsd:string element so eDocs can search the bare, unadorned text, and an octet-stream (which would still be stored as a black box). This may not be as applicable to other Microsoft Office applications, like Visio diagrams [eg, I want to search for all UML Class Diagrams in eDocs concerning class name "*LexicalHandler".] Actually, Visio 2000 doesn't expose that in it's Object Model (spent a weekend trying to get at it to write a code-gen Addin) though it may be possible to get at it after the internal COM object it uses to handle UML in Visio serializes itself into the .VSD file format. In that case, I wouldn't want the octet-stream to be opaque to the repository. :-) Or, I might use the VBA FileSystemObject in the Add-in to save the diagram to disk in a TMP folder, scan it for text, and forward that text separately to be searchable (then the octet-stream could be opaque, I suppose we want to shoot for consistency there... ) In any event, I definately see issues with searching Visio diagrams for text. Support would be incomplete, at best. (My recommended practice, if a user has a UML Class Diagram in Visio 2000, he or she must list all applicable Class Names as keywords for metadata to search for them reliably. Visio XP, I think, can export UML to XMI, it might need a Microsoft patch to do it, but that could work better.) PROBLEM WITH ADDING FULL-TEXT SEARCH LATER On the other hand (isn't it awful, having two hands? ;-) ). The one problem with adding the sophisticated, full-text search capability later is Migration for upgrading users. We would need either: 1. A Migration Plan for going from a keyword-oriented search facility to a full text-oriented search facility in the future. This would probably involve something tantamount to checking-out and checking-in all revisions (or having smart differencing) of all (the latest) documents. :-( 2. Not support migration formally. Perhaps allow existing documents in a company's repository catalogued with keyword search to remain available for keyword searchs, but be excluded from newer comprehensive text searchs. But for a pre-existing document revision to be subject to comprehensive text search in an upgrading organization, the user would have to check-out and check-in under the new version. So, there could be a problem going from a simpler metadata to a more comprehensive metadata in the future for early-adopters. Administrators will be unhappy if it's not easy to migrate. CONCLUSION I'd still choose the first hand: RDF and searching on metadata about repository documents to begin with. It increases the likelihood eDocs 1.0 happens (if eDocs 1.0 doesn't happen, there'll be no first release to migrate from and so nobody has that problem). Derek Harmon sto...@us... |
From: Piotr K. <pkr...@wp...> - 2002-10-09 09:43:18
|
Hi all, First of all welcome to Derek. Now back to the subject. Sergio could you please add some news to our project? For example about the website being ready. I can check then if the shell script is working properly. BTW, I have modified the site and it recognizes the client's resolution now. I made 2 versions: 800x600 and 1024x768 as they are stil most popular on the web. Regards Piotr |
From: Sergio R. <sra...@ti...> - 2002-10-09 10:04:37
|
> Hi all, > > First of all welcome to Derek. > Now back to the subject. Sergio could you please add some news to our > project? For example about the website being ready. I can check then if > the shell script is working properly. > BTW, I have modified the site and it recognizes the client's resolution > now. I made 2 versions: 800x600 and 1024x768 as they are stil most > popular on the web. > > Regards > Piotr Great as usual Piotr, I've tried the site functionalities and works great. I'll add news to the project immediately. I was waiting for your ok to let the site be annouced. I've you solved the problem with publishing the site news? At the moment we've not many news but I suppose we'll got many soon. Is it possible to have the an excerpt of the latest news on the home page of the web site? I've tried also the two different resolutions and are ok. The script you're talking in your email related to news publish or to resolution adjustment? Derek can you write some text to put in the web site to illustrate our integration capabilities? Can you coordinate that with Piotr? Integration is a good point for a system like that. Piotr I think that we need to find a little space for that on the site. I was thinking the same more or less we have done for Vision, Benefit, and What is. Let me know guys. Regards Sergio |
From: Piotr K. <pkr...@wp...> - 2002-10-09 17:56:17
|
Sergio Ramazzina wrote: > Great as usual Piotr, Thank you. > I've tried the site functionalities and works great. I'll add news to the > project > immediately. I was waiting for your ok to let the site be annouced. I've you > solved > the problem with publishing the site news? At the moment we've not many news > but I suppose we'll got many soon. Is it possible to have the an excerpt of > the latest > news on the home page of the web site? > > I've tried also the two different resolutions and are ok. > > The script you're talking in your email related to news publish or to > resolution adjustment? It's for the news. It works that way: a perl script is being run according to cron schedule. It's connecting to project news database and checks if there's something new. Then it creates a html file with the news. I wanted you to post some news to see it in action, what this created file looks like and so on. And then I'll integrate it with the website. > Derek can you write some text to put in the web site to illustrate our > integration > capabilities? Can you coordinate that with Piotr? Integration is a good > point for > a system like that. Piotr I think that we need to find a little space for > that on the site. > I was thinking the same more or less we have done for Vision, Benefit, and > What is. Yes, of course, when I get the text I'll put it online. But I hope it won't be too big. I think it would be good to make a brief overview of the integration capabilities (just main features) for the front page and bigger, more detailed document for download. Regards Piotr |
From: Sergio R. <sra...@ti...> - 2002-10-09 18:05:31
|
Derek, can you interact with Piotr? Piotr, I've inserted a news that state that our web site is online. Try it Sergio ----- Original Message ----- From: "Piotr Kreglicki" <pkr...@wp...> To: "edocs-development mailing" <edo...@li...> Sent: Wednesday, October 09, 2002 7:54 PM Subject: Re: [Edocs-development] Integration of the news with website > Sergio Ramazzina wrote: > > > Great as usual Piotr, > > Thank you. > > > I've tried the site functionalities and works great. I'll add news to the > > project > > immediately. I was waiting for your ok to let the site be annouced. I've you > > solved > > the problem with publishing the site news? At the moment we've not many news > > but I suppose we'll got many soon. Is it possible to have the an excerpt of > > the latest > > news on the home page of the web site? > > > > I've tried also the two different resolutions and are ok. > > > > The script you're talking in your email related to news publish or to > > resolution adjustment? > > It's for the news. It works that way: a perl script is being run > according to cron schedule. It's connecting to project news database and > checks if there's something new. Then it creates a html file with the > news. I wanted you to post some news to see it in action, what this > created file looks like and so on. And then I'll integrate it with the > website. > > > Derek can you write some text to put in the web site to illustrate our > > integration > > capabilities? Can you coordinate that with Piotr? Integration is a good > > point for > > a system like that. Piotr I think that we need to find a little space for > > that on the site. > > I was thinking the same more or less we have done for Vision, Benefit, and > > What is. > > Yes, of course, when I get the text I'll put it online. But I hope it > won't be too big. I think it would be good to make a brief overview of > the integration capabilities (just main features) for the front page and > bigger, more detailed document for download. > > Regards > Piotr > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Edocs-development mailing list > Edo...@li... > https://lists.sourceforge.net/lists/listinfo/edocs-development |
From: Sergio R. <sra...@ti...> - 2002-10-09 15:25:52
|
> First, hello everyone. I've joined the group to help develop the Microsoft > Office integration components, or at least get them kick-started. :-) > > "Ashwini Kumar" wrote on 2002-10-02, > > I think we need to decide how complicated we would like the search > > for eDocs to get. Maybe for the time being we should simply stick to > > the RDF framework and make sure that the document creator > > associates a metadata with the document. > > I agree with your proposal of RDF for the time being, Ashwini. > > > WHY RDF? > > RDF is a mature standard and better-understood than ontologies. It's > easier (from a software development standpoint) to employ a simpler > metadata approach. I agreed on following this approach for our document search strategy. > I'd like to see a first working release of eDocs as soon as possible, > versus a full-featured release that bogs down in complexity and which > might not "get out the proverbial door." That's why I'd support a minimal > feature set for Version 1, so that Version 2, and 3, etc., can evolve into > the eDocs Document Management System Sergio has envisioned with > help from a supportive user community. We hope to have it soon. But now I would like to concentrate on analysis and design and I absolutely want to finish that phase by the end of this month. After that we can think about implementing a prototype. > A metadata search approach is easier to implement, less demanding > of network bandwidth in the size of serialized SOAP requests, and > would be much more reliable across the breadth of "document" types > business users will use eDocs with from the Microsoft Office suite > of products. Right > > Speaking from my investigation of a Microsoft Word Add-in, to store > and retrieve a Word document, it would be easiest to provide meta- > data (the fields a user optionally fills in as the Document's Properties > in Word: title, keywords, author, manager, version, etc.) and then > a BASE64 octet-stream that could be stored on the server as an > opaque package ("black box"). This wouldn't facilitate full-text > search though, and serializing the Word Document's Object Model > would be extraordinarily heavyweight (also brittle to different releases > of Word, and labor-intensive to code and maintain pre dot NET.) > > I can foresee sending metadata, the bare text of the Word > document as a (very long) xsd:string element so eDocs can > search the bare, unadorned text, and an octet-stream (which > would still be stored as a black box). This may not be as > applicable to other Microsoft Office applications, like Visio > diagrams [eg, I want to search for all UML Class Diagrams > in eDocs concerning class name "*LexicalHandler".] Right, we need to think about the ability to store every type of document. > Actually, Visio 2000 doesn't expose that in it's Object Model > (spent a weekend trying to get at it to write a code-gen Addin) > though it may be possible to get at it after the internal COM > object it uses to handle UML in Visio serializes itself into > the .VSD file format. In that case, I wouldn't want the > octet-stream to be opaque to the repository. :-) Or, I > might use the VBA FileSystemObject in the Add-in to > save the diagram to disk in a TMP folder, scan it for text, > and forward that text separately to be searchable (then > the octet-stream could be opaque, I suppose we want > to shoot for consistency there... ) > > In any event, I definately see issues with searching > Visio diagrams for text. Support would be incomplete, > at best. (My recommended practice, if a user has a UML > Class Diagram in Visio 2000, he or she must list all > applicable Class Names as keywords for metadata to > search for them reliably. Visio XP, I think, can export > UML to XMI, it might need a Microsoft patch to do it, > but that could work better.) > > > PROBLEM WITH ADDING FULL-TEXT SEARCH LATER > > On the other hand (isn't it awful, having two hands? ;-) ). > > The one problem with adding the sophisticated, full-text search > capability later is Migration for upgrading users. We would need > either: > > 1. A Migration Plan for going from a keyword-oriented search > facility to a full text-oriented search facility in the future. This > would probably involve something tantamount to checking-out > and checking-in all revisions (or having smart differencing) of > all (the latest) documents. :-( > > 2. Not support migration formally. Perhaps allow existing > documents in a company's repository catalogued with keyword > search to remain available for keyword searchs, but be excluded > from newer comprehensive text searchs. But for a pre-existing > document revision to be subject to comprehensive text search > in an upgrading organization, the user would have to check-out > and check-in under the new version. > > So, there could be a problem going from a simpler metadata to > a more comprehensive metadata in the future for early-adopters. > Administrators will be unhappy if it's not easy to migrate. > > > CONCLUSION > > I'd still choose the first hand: RDF and searching on metadata > about repository documents to begin with. It increases the > likelihood eDocs 1.0 happens (if eDocs 1.0 doesn't happen, > there'll be no first release to migrate from and so nobody has > that problem). I think that now we need to be able to use metadata to enable serching document, to let people be able to use the system with any particular type of document and for us to mantain things easier. May be in a next release we can think about full text search with particular set of documents that needs this functionality. Ashwini is involved in the task of define a strategy to implement the RDF framework in our product. We hope to see soon his proposal. Sergio |