Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Remote index - corporate environment (2)

2013-02-26
2013-03-07
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-02-26

    I opened up this thread so that the original posters on the other thread don't get notification e-mails about new posts all the time.

    Hi Tran

    You keep calling me 'Tran', but my first name is actually 'Quang' :) 'Tran' is my last name. It's because in Vietnam and other South-East Asian countries the order of the names is reversed.

    it might be nice to be able to have a setting for update-index frequency too.

    I didn't put an update-index frequency setting in the settings file because there's no need to reinvent the wheel. You see, there are lots of programs out there that allow you to schedule the execution of tasks such as updating indexes. On Linux, people use cron for this; I don't know what the equivalents for Windows are. Googling for "cron for windows" or something like that will give you some results.

    Keep us posted if you can get the network located index files working better too.

    Since I'm currently involved in other projects, there won't be much progress on DocFetcher in general, and on the remote index problem in particular for at least a year or so. Is the remote index problem really such a big issue?

    Regarding the IndexExcelFormulas variable, I'm not clear how this works with true vs false; please advise.

    For example, if you have a formula "=2+2" in an Excel file and this variable is set to 'true', then DocFetcher will see the literal text of the formula, i.e. "=2+2". On the other hand, if the value is 'false', DocFetcher will see "4", because that's what pops out when you evaluate the formula.

     
    • Paul Rubin
      Paul Rubin
      2013-03-05

      Hi again Quang, having installed DocFetcher on over a dozen computers here at the office, all separately indexing the identical directory, we have noticed that opening and saving files has slowed down, probably due to all duplicate indexing and watching. Having a remote index maintained by just one computer would be a big improvement, so please consider fixing this feature.

      Since I'm currently involved in other projects, there won't be much progress on >DocFetcher in general, and on the remote index problem in particular for at least a >year or so. Is the remote index problem really such a big issue?

       
  • Paul Rubin
    Paul Rubin
    2013-02-26

    Thank you Quang, and sorry I had your name incorrect! Interestingly, I don't get any email notifications from this discussion, even through I tried to subscribe to them.

    The program does update the index automatically, but it's unclear to me how often. Sometimes new entries are picked up very quickly, and other times it can take a couple of hours. How does the program decide when to update the index?

    The idea for network located index file would simply the indexing process, eliminating duplication of the indexing process, but it's not essential, I suppose.

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-02-26

    Thank you Quang, and sorry I had your name incorrect!

    No big deal. People get my name wrong all the time. :-)

    Interestingly, I don't get any email notifications from this discussion, even through I tried to subscribe to them.

    Hmm... looks like you can only either subscribe to the entire forum via the subscribe button on the forum page, or via the RSS button on the individual threads. At any rate, it appears to me that when you start a new thread, you automatically get subscribed to it.

    The program does update the index automatically, but it's unclear to me how often. Sometimes new entries are picked up very quickly, and other times it can take a couple of hours. How does the program decide when to update the index?

    I thought you were talking about the '--update-indexes' parameter. As far as the folder-watching based updating is concerned, this explanation from the DocFetcher manual is relevant:

    If DocFetcher is running and the folder watching for the modified folder is enabled, DocFetcher detects the changes and updates its indexes immediately. [...] If DocFetcher isn't running, the changes are recorded by a small daemon program that runs in the background; the affected indexes will then be updated the next time DocFetcher starts.

    However, if you use the portable version, you have to install the daemon by hand (i.e. add it to the autostart program list).

    The idea for network located index file would simply the indexing process, eliminating duplication of the indexing process, but it's not essential, I suppose.

    Ah, okay. Not a high priority issue, I'm afraid, so this will have to wait.

     
  • Paul Rubin
    Paul Rubin
    2013-02-27

    Thanks Tran, I'm still not getting any email notices.

    Regarding the folder watching, I've been using the portable version, which seems to watch the folders; should I be running the regular version instead?

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-02-27

    Thanks Tran, I'm still not getting any email notices.

    Consider submitting a support request to the SourceForge.net team:
    https://sourceforge.net/p/forge/site-support/new/

    Regarding the folder watching, I've been using the portable version, which seems to watch the folders; should I be running the regular version instead?

    Both versions support folder watching while they're running, and both have a daemon that takes care of the folder watching when DocFetcher isn't running. However, in the regular version the daemon is automatically added to your operating system's autostart program list (during installation of DocFetcher), while in the portable version you have to set up the daemon by hand. That's the only difference between the two versions as far as the folder watching is concerned.

     
    Last edit: Nam-Quang Tran 2013-02-27
  • Paul Rubin
    Paul Rubin
    2013-02-27

    Hi Quang, does the latest download of the full version contain all the latest jar file updates that you've been sending to me ?

     
  • Paul Rubin
    Paul Rubin
    2013-02-28

    Also, since I can't get the full install to work (error messages previously reported), what is the specific file I need to execute to get the daemon running?

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-02-28

    The file you're looking for is: docfetcher-daemon-windows.exe

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-02-28

    Btw, can you post the exact error messages you're seeing with the installed version of DocFetcher? Maybe I can fix it...

     
  • Paul Rubin
    Paul Rubin
    2013-02-28

    I just downloaded the complete Windows installer version, and for some reason, this time it installed OK. I'll let you know if I have the issue again.

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-05

    Hi again Quang, having installed DocFetcher on over a dozen computers here at the office, all separately indexing the identical directory, we have noticed that opening and saving files has slowed down, probably due to all duplicate indexing and watching. Having a remote index maintained by just one computer would be a big improvement, so please consider fixing this feature.

    A couple of remarks on this:

    1) The proper solution would be to have a single server instance that maintains the indexes, accepts queries from client instances and sends search results back to the latter. Years ago, an effort was made to implement something like this (a web interface), but it was never finished because the guy who did the main work left the project. The feature is still on my roadmap, although as I said I don't have the time right now to do anything in that direction. The shared index approach on the other hand is likely a dead end as far as efficiency is concerned and will thus not be pursued any further.

    2) It seems your performance issue boils down to the folder watching triggering full index updates whenever files are created or modified. Here's one idea to alleviate this: If you have your document repository indexed as a single index in the Search Scope pane, split it up into as many indexes as can be reasonably maintained. To understand why this makes sense, consider the following two setups:

    Scenario A: You have a document repository with 100,000 files and create a single index out of that. Consequently, if folder watching is on and you modify just a single file in the repository, DocFetcher will have to scan all 100,000 files. This is relatively fast since DocFetcher only looks at the last-modified attributes of these files, but if this happens all the time and dozens of DocFetcher instances do it simultaneously, the impact on performance is substantial.

    Scenario B: You have a document repository with 100,000 files and create, say, 10 indexes out of these. So each index corresponds to 10,000 files. Now with folder watching turned on, if you modify a single file in the repository, DocFetcher will only have to scan 10,000 files rather than 100,000 files.

    Note that this is of course only a theoretical consideration. I can't guarantee that it will actually work in practice.

    3) Consider turning off the folder watching altogether and instead relying on either manual index updates or on regularly scheduled index updates via the "--update-indexes" program parameter. After all, do you really need to have the latest changes immediately available in the search results? Wouldn't an index update once per day suffice?

    4) It may be a good idea to split up the indexes according to how frequently the associated files are modified. For example, if you have 100,000 files, but only 1,000 of them are frequently modified, then put those 1,000 files in a separate index. That way, DocFetcher will only have to scan 1,000 files whenever one of them is modified.

     
  • Paul Rubin
    Paul Rubin
    2013-03-05

    Thank you for your thoughtful reply Quang. The central server / web idea sounds great for the future. Meanwhile, I'd like to try turning off the watching altogether; how would suggest I turn this off? When I go to 'Rebuid Index...' and uncheck the 'Watch folders for file changes' option, it doesn't keep the setting, unless I click on 'Run', which would then re-index the entire directory. Or should I disable the daemon? If so, how can I disable the daemon? what does the windows installer modify to make the daemon start automatically?

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-05

    Meanwhile, I'd like to try turning off the watching altogether; how would suggest I turn this off? When I go to 'Rebuid Index...' and uncheck the 'Watch folders for file changes' option, it doesn't keep the setting, unless I click on 'Run', which would then re-index the entire directory. Or should I disable the daemon? If so, how can I disable the daemon? what does the windows installer modify to make the daemon start automatically?

    Yes, you have to rebuid the index. This is one part of DocFetcher that obviously needs fixing, but for now you can't change the folder watching setting without actually recreating the index.

     
  • Paul Rubin
    Paul Rubin
    2013-03-05

    Hi Quang, what about just disabling the daemon? How can I do this?

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-05

    I can't say off the top of my head what the exact effect of disabling the daemon is, but I guess it wouldn't hurt if you disable it. Here's how:

    In the portable version, the daemon is disabled by default and you have to create an autostart entry to actually enable it. In the installed version, just move the daemon elsewhere or rename it to prevent it from being started.

     
  • Paul Rubin
    Paul Rubin
    2013-03-06

    I'm still trying some options with the single network located index, maintained by only one computer, and read by all the others. In the documentation, it suggested setting to false these variables: AllowIndexCreation, AllowIndexUpdate, AllowIndexRebuild and AllowIndexDeletion. Would it also make sense to turn off the 'watch' on all the other computers? Or does turning off these other variables take care of this ?

     
    Last edit: Paul Rubin 2013-03-06
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-06

    AllowIndexCreation, AllowIndexUpdate, AllowIndexRebuild and AllowIndexDeletion just hide the respective entries in the context menu of the Search Scope. They do not affect the folder watching in any way. Therefore, it would indeed make sense to turn off the folder watching.

     
  • Paul Rubin
    Paul Rubin
    2013-03-06

    Hi Quang, the issue here is that the only way to turn off the watch requires it to re-index, correct? And how does that affect the original computer's settings to watch and index??? Is there some way to turn off the 'watch' without re-indexing???

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-06

    Hi Quang, the issue here is that the only way to turn off the watch requires it to re-index, correct?

    Unfortunately yes.

    And how does that affect the original computer's settings to watch and index?

    I'm not sure what you're trying to say here... What should affect what?

     
  • Paul Rubin
    Paul Rubin
    2013-03-06

    I'm confused by how to approach this. Here's the progression of events. First I set up the admin computer to index and watch the files. Then I set up other computers to look to the network-located index files. However my concern is that when I do this, DocFetcher will start watching these folders, correct? Then there doesn't appear to be any way to turn watching off on these other computers, since I don't want them to re-index.

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-06

    I think you're right. The folder watching setting is stored in the index, so if you share the index, the folder watching setting is also shared. As far as I can tell, there's currently no way around this. You can only either turn the folder watching on for all computers, or turn it off for all computers.

     
  • Paul Rubin
    Paul Rubin
    2013-03-06

    Ah, that's a serious issue indeed. The whole point of having a single, centralized index is to avoid multiple 'watching' and 're-indexing'. Here's another idea. What if only the admin computer has write permission on the indexing files? Would that stop the other computers from trying to watch and update the indexes?

     
  • Nam-Quang Tran
    Nam-Quang Tran
    2013-03-07

    What if only the admin computer has write permission on the indexing files? Would that stop the other computers from trying to watch and update the indexes?

    In principle, write permission and folder watching are two unrelated things, so taking away write permission shouldn't stop a DocFetcher instance from watching a folder. However, I think you said earlier that setting read-only permissions caused the folder watching to fail, so I'm not entirely sure whether these two are related or not.