[Carrot2-cvs] website/site/architecture index.xml,1.1.1.1,1.2 usergain.xml,1.1.1.1,1.2
Brought to you by:
dawidweiss,
stachoo
From: <daw...@us...> - 2003-10-06 17:13:39
|
Update of /cvsroot/carrot2/website/site/architecture In directory sc8-pr-cvs1:/tmp/cvs-serv9307/site/architecture Modified Files: index.xml usergain.xml Log Message: site update - downloads especially Index: index.xml =================================================================== RCS file: /cvsroot/carrot2/website/site/architecture/index.xml,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** index.xml 20 Sep 2003 18:14:14 -0000 1.1.1.1 --- index.xml 6 Oct 2003 17:13:31 -0000 1.2 *************** *** 12,73 **** <title lng="pl">Architektura</title> ! <lang:pl> ! <frame> ! Ta strona jeszcze nie zostaÅa przepisana na jÄzyk polski. ProszÄ ! przeÅÄ czyÄ siÄ na <a href="/architecture/index.xml?lang=en">wersjÄ angielskÄ </a>. ! </frame> ! </lang:pl> ! ! <lang:en> ! <chapter level="1"> ! <title>Architecture</title> ! <p><carrot-text/> is based on a concept of separate components, which communicate ! only by passing XML data. The communication protocol is restricted to HTTP, ! POST method specifically. This allows for great flexibility in adding new components, ! as the language of implementation and physical location may remain unknown. ! </p> ! <p> ! <illustration src="/gfx/carrot2/figures/components-dataflow.gif" float="right"> ! <description>1: A scenario of data flow in <carrot-text/> architecture</description> ! </illustration> ! There are four component types in Carrot<sup>2</sup>: ! <ul> ! <li><b>Input</b> - This type of component accepts user query request (wrapped ! in standard XML and passed via HTTP POST), and is in charge of producing some ! document list, which should "match" the query. Upon successful processing, ! the component is required to produce a valid XML result stream. ! </li> ! <li><b>Filter</b> - This type of component accepts result stream from Input, or ! Filter components, and does some processing on it. At the end of processing, ! it is required to return unchanged input stream, with perhaps intermixed custom ! tags (the result of processing). Such tags may include, for instance, alternate ! relevance ranking of results, grouping of similar documents, or other. ! </li> ! <li><b>Output</b> - Output component type is in charge of somehow presenting the ! results to the user. The results, which this component produces are not defined (it ! may produce HTML page, display a Swing applet, or write results to disk). Components ! of this type usually interact with Controllers to present processing results to the user. ! </li> ! <li><b>Controller</b> - A component, which binds all other together to form a processing stream. ! Carrot<sup>2</sup> is a Controller component, because it allows to select input, filter and ! output components and facilitates communication among them. However, other controller ! components are possible, such as command-line processors, or local application (as opposed to ! Web-accessible) controllers. ! </li> ! </ul> ! ! It should be clearly stated that the scenario of data flow presented in figure 1 ! is not optimal (because data is sent back and forth between components and the controller), ! but it was a design-decision to simplify component-side programming. ! </p> ! <p> ! A detailed description of architecture, data exchange protocols and other elements of the ! framework is given in the official Developers Manual (see ! <a href="/developers/index.xml">developers section</a>). ! </p> ! </chapter> ! ! </lang:en> </page> --- 12,68 ---- <title lng="pl">Architektura</title> ! <lang:pl> ! <frame> ! Ta strona tylko w wersji angielskiej, przepraszamy. ! </frame> ! </lang:pl> ! <chapter level="1"> ! <title>Architecture</title> ! <p><carrot-text/> is based on a concept of separate components, which communicate ! only by passing XML data. The communication protocol is restricted to HTTP, ! POST method specifically. This allows for great flexibility in adding new components, ! as the language of implementation and physical location may remain unknown. ! </p> ! <p> ! <illustration src="/gfx/carrot2/figures/components-dataflow.gif" float="right"> ! <description>1: A scenario of data flow in <carrot-text/> architecture</description> ! </illustration> ! There are four component types in Carrot<sup>2</sup>: ! <ul> ! <li><b>Input</b> - This type of component accepts user query request (wrapped ! in standard XML and passed via HTTP POST), and is in charge of producing some ! document list, which should "match" the query. Upon successful processing, ! the component is required to produce a valid XML result stream. ! </li> ! <li><b>Filter</b> - This type of component accepts result stream from Input, or ! Filter components, and does some processing on it. At the end of processing, ! it is required to return unchanged input stream, with perhaps intermixed custom ! tags (the result of processing). Such tags may include, for instance, alternate ! relevance ranking of results, grouping of similar documents, or other. ! </li> ! <li><b>Output</b> - Output component type is in charge of somehow presenting the ! results to the user. The results, which this component produces are not defined (it ! may produce HTML page, display a Swing applet, or write results to disk). Components ! of this type usually interact with Controllers to present processing results to the user. ! </li> ! <li><b>Controller</b> - A component, which binds all other together to form a processing stream. ! Carrot<sup>2</sup> is a Controller component, because it allows to select input, filter and ! output components and facilitates communication among them. However, other controller ! components are possible, such as command-line processors, or local application (as opposed to ! Web-accessible) controllers. ! </li> ! </ul> ! ! It should be clearly stated that the scenario of data flow presented in figure 1 ! is not optimal (because data is sent back and forth between components and the controller), ! but it was a design-decision to simplify component-side programming. ! </p> ! <p> ! A detailed description of architecture, data exchange protocols and other elements of the ! framework is given in the official Developers Manual (see ! <a href="/developers/index.xml">developers section</a>). ! </p> ! </chapter> </page> Index: usergain.xml =================================================================== RCS file: /cvsroot/carrot2/website/site/architecture/usergain.xml,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** usergain.xml 20 Sep 2003 18:14:15 -0000 1.1.1.1 --- usergain.xml 6 Oct 2003 17:13:31 -0000 1.2 *************** *** 12,61 **** <title lng="pl">KorzyÅci</title> ! <lang:pl> ! <frame> ! Ta strona jeszcze nie zostaÅa przepisana na jÄzyk polski. ProszÄ ! przeÅÄ czyÄ siÄ na <a href="/architecture/usergain.xml?lang=en">wersjÄ angielskÄ </a>. ! </frame> ! </lang:pl> ! <chapter level="1"> ! <title>What a researcher may gain from <carrot-text/>?</title> ! <p> ! A number of scientific papers concerning search results processing have been published. ! Each and every author had to go through the same tedious (and quite fruitless) tasks ! of building a search engine wrapper, incorporating stemming/ light language processing ! algorithms, and finally displaying a result of the algorithm in some form. ! </p> ! <p> ! In our opinion this is highly reduntant work and most of the components (wrappers, processing, ! displaying) can be reused. Thus, we think <carrot-text/> gives a number of possible ! shortcuts for those willing to work with search result clustering or any other form ! of textual data processing (we envision some applications of information retrieval could also ! applied easily using the architecture we proposed): ! <ul> ! <li><b>Reusable components</b> - for a certain application, a standard data exchange ! format can be proposed, which makes components using this format replaceable and reusable. ! We proposed such data exchange format for application in search results clustering -- ! an XML-based query and search result. ! </li> ! <li><b>Platform and language independency</b> - data transmission protocol in <carrot-text/> ! is fixed to HTTP POST. Almost any programming language can be adopted to work as a web server ! or in web-server enabled mode (via CGI for instance). This allows for true independency of ! components within the architecture -- they can be physically distributed and running ! on different hardware platforms for instance. Also, the language of implementation does not matter, ! so for research objectives slower, but more elegant languages can be used (Prolog, Lisp, Java?), ! while for production systems, if ever, the code can be rewritten for performance and the component ! reused without any changes to other parts of the system. ! </li> ! <li><b>Start your experiments fast</b> - because <carrot-text/> comes with a set of template ! component classes and utilities in Java, a researches can almost immediately set up his/ her own ! component within the framework and start working on specific thing he wants to deal with. ! </li> ! <li><b>OpenSource</b> - <carrot-text/> is an open source initiative. It is available free of charge ! and the code base can be modified to be adjusted to one's needs. ! </li> ! </ul> ! </p> ! </chapter> </page> --- 12,60 ---- <title lng="pl">KorzyÅci</title> ! <lang:pl> ! <frame> ! Ta strona tylko w wersji angielskiej, przepraszamy. ! </frame> ! </lang:pl> ! <chapter level="1"> ! <title>What a researcher may gain from <carrot-text/>?</title> ! <p> ! A number of scientific papers concerning search results processing have been published. ! Each and every author had to go through the same tedious (and quite fruitless) tasks ! of building a search engine wrapper, incorporating stemming/ light language processing ! algorithms, and finally displaying a result of the algorithm in some form. ! </p> ! <p> ! In our opinion this is highly reduntant work and most of the components (wrappers, processing, ! displaying) can be reused. Thus, we think <carrot-text/> gives a number of possible ! shortcuts for those willing to work with search result clustering or any other form ! of textual data processing (we envision some applications of information retrieval could also ! applied easily using the architecture we proposed): ! <ul> ! <li><b>Reusable components</b> - for a certain application, a standard data exchange ! format can be proposed, which makes components using this format replaceable and reusable. ! We proposed such data exchange format for application in search results clustering -- ! an XML-based query and search result. ! </li> ! <li><b>Platform and language independency</b> - data transmission protocol in <carrot-text/> ! is fixed to HTTP POST. Almost any programming language can be adopted to work as a web server ! or in web-server enabled mode (via CGI for instance). This allows for true independency of ! components within the architecture -- they can be physically distributed and running ! on different hardware platforms for instance. Also, the language of implementation does not matter, ! so for research objectives slower, but more elegant languages can be used (Prolog, Lisp, Java?), ! while for production systems, if ever, the code can be rewritten for performance and the component ! reused without any changes to other parts of the system. ! </li> ! <li><b>Start your experiments fast</b> - because <carrot-text/> comes with a set of template ! component classes and utilities in Java, a researches can almost immediately set up his/ her own ! component within the framework and start working on specific thing he wants to deal with. ! </li> ! <li><b>OpenSource</b> - <carrot-text/> is an open source initiative. It is available free of charge ! and the code base can be modified to be adjusted to one's needs. ! </li> ! </ul> ! </p> ! </chapter> </page> |