Filtering special files or folders

2006-02-17
2013-04-09
  • Nobody/Anonymous

    How can I filter a path, a file or an url containing special phrases, so that it not will be added to the index?
    For example I want to crawl the site

    www.example.com

    and want to exclude:
    - the complete Folder
    www.example.com/content/privat
    - and url which contain:
    * sendit *

     
    • netzgoetter

      netzgoetter - 2006-02-17

      I have the same questions.

      How can I configure it?

       
    • Darin Kuntze

      Darin Kuntze - 2006-02-24

      Oxyus uses regex to specify which content is indexed. The config file is located WEB-INF/conf/initial_pages.conf.

       
      • netzgoetter

        netzgoetter - 2006-02-24

        I understood so far that the initial_pages.conf is used for the defintion of the start crawl url.

        What is the syntax for an entry in initial_pages.conf to filter some urls.

        Example:

        http://www.example.com

        Starts the crawl at that point. How can I exclude for example urls that contain the folder cgi?

         

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks