Menu

#29 Web interface

closed
nobody
None
5
2022-09-02
2010-01-23
No

With a web interface, users could search for documents over the network. The basic requirement is a search box where the user can enter queries, and a result list for displaying the matching documents. The user must also be able to retrieve and open a selected file.

In addition to that, the following features are (very) desirable, but not absolutely necessary:
- The result list is divided into pages, so that very long result lists can be displayed properly. This requires 'back' and 'forward' buttons and a display of the current page number vs. total page numbers. Also, displaying the number of results is desirable.
- A text preview of the document selected in the result list, with highlighting of the search terms.
- The user can filter the results by filesize, filetype and/or location, similar to what is possible on the desktop interface of DocFetcher.
- The user can sort the results by certain criteria, e.g. title or filesize. On the desktop interface, this is done by clicking on the column headers.
- A help button or link that leads to a help page.

In contrast to the desktop interface, it is neither necessary nor desirable for the web interface to allow the user to modify the indexes (e.g. add/remove/update/rebuild). Also, unlike the desktop interface, the web interface doesn't need a preferences panel.

Discussion

  • Nam-Quang Tran

    Nam-Quang Tran - 2010-01-23

    Overview over the relevant methods in DocFetcher:

    When the user enters a query in the search box and presses the Enter key, net.sourceforge.docfetcher.DocFetcher.doSearch(String) is called. This leads to net.sourceforge.docfetcher.model.ScopeRegistry.search(String), which returns an array containing the result objects.

    Each result is represented as a ResultDocument object, which has various fields containing information about the result object:
    - score
    - title
    - the query that led to the result
    It also inherits some fields from the Document class, e.g.
    - file
    - author
    - which parser was used for text extraction

    For the text preview, you'll need to know which parser to use in order to extract text from the document.

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2010-03-27

    1) Purpose of the web interface:
    The purpose of the web interface is to allow DocFetcher to be used in a network so that one computer becomes the "search server" and provides Google-like access to a central document repository to all the other computers in the network. This is what you would want in an enterprise or in a school network, or something like that. (In fact, I once received an e-mail from a school, asking me how to do that.)
    A web interface also comes handy if you're at work and want to access the documents on your home computer, and vice versa.

    2) Web interface from other programs:
    Beagle, one of DocFetcher's (many) competitors, already has a web interface, you can check it out if you want. (AFAIK, Beagle only runs on Linux.)
    VLC also has a web interface, but the configuration is a little bit clumsy, IMO.

    3) DocFetcher's web interface:
    For starters, a hotkey for turning the web interface on and off should suffice. After the basic functionality is implemented, we can wrap a nicer user interface around it. See, for example, the web interface dialog for Transmission (a bittorrent client), in the attachment below.
    As for the web interface inside the browser, I hope we can make it look (and "feel") as similar to DocFetcher's desktop interface as possible - unless it's too difficult on the technical side.

    4) Implementation:
    We'll probably need Jetty, and embedded HTTP server for Java. Here's a page that describes how to set it up: http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty
    (Actually, I have no idea what this server/HTTP/whatever stuff is all about, that's why I posted the job offer ;-))

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2010-03-27

    Transmission Web Interface Configuration Dialog

     
  • MasterWizely

    MasterWizely - 2010-03-27

    I will post my answer by using your indexes, adding my own to any question / item that is new (hopefully this makes tracing the answers easier)

    RE: 1) Purpose of the web interface
    If it is planned to be used in some enterprise network, what about security? I think there might be some need for at least some kind of authorization and authentication. Maybe something like usergroups and permissions might be useful.
    Additionally some kind of encryption might be needed as well.

    While the scenario mentioned by you includes a central instance that unifies the results of a set of observed clients, any search result might be extended by the id of the client the file was found on.

    RE: 3) DocFetcher's web interface
    What about a standalone server? Maybe it could be useful to split the application into a core (search, indexing and other backend stuff) and a user interface (e.g. either the SWT frontend or the webserver). Of course some kind of hybrid / combination might also be useful.
    I wonder if some GUI might limit the usage on enterprise server, while they might run without any XServer (or pendant) , so while text based access might be possible as well as some webaccess (and administration), the GUI might not.

    RE: 4) Implementation
    Well, I agree on the fact, that Jetty is a nice application. As you mentioned before, the advantage of Jetty is the possibility to embed it into an application. But in fact any servlet container might do (getting some server like Tomcat working includes some more work, but I think we could be able to offer some choice)

    5) I wonder if this part: "In contrast to the desktop interface, it is neither necessary nor desirable
    for the web interface to allow the user to modify the indexes (e.g.
    add/remove/update/rebuild)." is still up to date. If a centralized server manages the clients the build/rebuild of indexes might be a feature that is not necessary but desirable, isn't it?

    6) What about the control? What is to be done to access any index by the webinterface? How is the index created, how published or accessed by the server? And what actions are needed to get a client observed by a centralized server?

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2010-03-27

    If it is planned to be used in some enterprise network,
    what about security? I think there might be some need
    for at least some kind of authorization and
    authentication. Maybe something like usergroups and
    permissions might be useful. Additionally some kind
    of encryption might be needed as well.

    If I understand correctly, this issue can be addressed in the way Transmission (see attached screenshot) handles it. In the screenshot, you can see two settings: First, access can be restricted with a username and a password, and second, there's the "Only allow these IP addresses to connect" option.
    I don't think we need any kind of encryption. At least I haven't seen anything like that in other web interfaces.
    Also, I don't think we need user groups, because something like that can be emulated by running multiple instances of DocFetcher, each serving on a different port.

    While the scenario mentioned by you includes a central
    instance that unifies the results of a set of observed
    clients, any search result might be extended by the id
    of the client the file was found on.

    The server is basically a read-only database which each client connects to, like a LAN version of Google. I'm not sure what you mean by "unifes the results".

    What about a standalone server? Maybe it could be useful
    to split the application into a core (search, indexing and
    other backend stuff) and a user interface (e.g. either the
    SWT frontend or the webserver).

    IMO, an easier solution would be to add a command line parameter that allows the user to start the web interface without the desktop interface. This is how the VLC team did it (I think). If we do it that way, I think Jetty is all we need (and note that most DocFetcher users are ordinary non-developer Windows folks who just want to search their files).

    I wonder if this part: "In contrast to the desktop
    interface, it is neither necessary nor desirable for the
    web interface to allow the user to modify the indexes
    (e.g. add/remove/update/rebuild)." is still up to date. If
    a centralized server manages the clients the build/rebuild
    of indexes might be a feature that is not necessary but
    desirable, isn't it?

    Only the server should be allowed to perform index operations. If we allow the clients to do it, we'll get into all kinds of trouble. For example, imagine you have one server and 50 clients, and one of the clients decides, for whatever reason (stupidity, malice, etc.), to remove all indexes. Then bam! All indexes are gone, and none of the 50 clients can do any more searches.

    What about the control? What is to be done to access
    any index by the webinterface? How is the index created,
    how published or accessed by the server? And what actions
    are needed to get a client observed by a centralized
    server?

    As said before, DocFetcher can be thought of as a LAN version of Google. All documents and all indexes are on the server, and the indexes are created, updated and removed by the server. A client can send a query to the server, and the server searches its indexes and returns a list of results.

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2010-03-27

    What operating system are you working on?

     
  • MasterWizely

    MasterWizely - 2010-03-27

    Windows XP mostly, but also Suse Linux (Enterprise as well as OpenSuSE), CentOS. Currently the linux derivates are running on servers, although I was using linux mainly as my desktop os (which I stopped for reasons of hardware compatibility)

     
  • Tonio Rush

    Tonio Rush - 2010-12-19

    Hello guys,

    I've got the new version from SVN, ans see there's a lot done for the web interface. But I don't understand where is the entry point. Is it from a special URL ?

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2010-12-19

    Yes, there's a special URL :-) You have to launch DocFetcher, then go to localhost:8080.

    Btw, I just send you an e-mail and uploaded the Outlook extractor to a folder named 'sandbox' in the SVN repository.

     
  • Bert Bouwen

    Bert Bouwen - 2012-09-17

    It is possible to search the docfetcher indexes thru apache solr.

    Just merge the indexes into one ( java -cp lucene-core-3.6.1.jar:lucene-misc-3.6.1.jar org.apache.lucene.misc.IndexMergeTool ./Merged_Index ./DocFetcher/indexes/* ) and copy the merged index to solr's datadir. Then you only need to modify solrs schema.xml and the default search field in solrconfig.xml

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2022-09-02
    • status: open --> closed
    • Group: --> Next_Release_(example)
     
  • Nam-Quang Tran

    Nam-Quang Tran - 2022-09-02

    Now that the commercial software DocFetcher Server is out, this feature request can finally be marked as resolved.

     

Log in to post a comment.