How can I filter a path, a file or an url containing special phrases, so that it not will be added to the index? For example I want to crawl the site
www.example.com
and want to exclude: - the complete Folder www.example.com/content/privat - and url which contain: * sendit *
I have the same questions.
How can I configure it?
Oxyus uses regex to specify which content is indexed. The config file is located WEB-INF/conf/initial_pages.conf.
I understood so far that the initial_pages.conf is used for the defintion of the start crawl url.
What is the syntax for an entry in initial_pages.conf to filter some urls.
Example:
http://www.example.com
Starts the crawl at that point. How can I exclude for example urls that contain the folder cgi?
Log in to post a comment.
How can I filter a path, a file or an url containing special phrases, so that it not will be added to the index?
For example I want to crawl the site
www.example.com
and want to exclude:
- the complete Folder
www.example.com/content/privat
- and url which contain:
* sendit *
I have the same questions.
How can I configure it?
Oxyus uses regex to specify which content is indexed. The config file is located WEB-INF/conf/initial_pages.conf.
I understood so far that the initial_pages.conf is used for the defintion of the start crawl url.
What is the syntax for an entry in initial_pages.conf to filter some urls.
Example:
http://www.example.com
Starts the crawl at that point. How can I exclude for example urls that contain the folder cgi?