How can I filter a path, a file or an url containing special phrases, so that it not will be added to the index?
For example I want to crawl the site
and want to exclude:
- the complete Folder
- and url which contain:
* sendit *
I have the same questions.
How can I configure it?
Oxyus uses regex to specify which content is indexed. The config file is located WEB-INF/conf/initial_pages.conf.
I understood so far that the initial_pages.conf is used for the defintion of the start crawl url.
What is the syntax for an entry in initial_pages.conf to filter some urls.
Starts the crawl at that point. How can I exclude for example urls that contain the folder cgi?