Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I want to index the WEB, but only a specific language, is this posible at all?
So far the web crawler does not leave the domain I provided as a starting point.
I had it once crawling URL's that it retrieved from the starting URL but somehow the Index got corrupted and I deleted it, now I have a new index crfawling the same starting URL for 1 week and so far it hasn't left the starting domain...
I'm not quioete sure what to do here..
Thnx in advance for any help
OpenSearchServer v1.4 - stable - rev 2274 - build 240
Crawling a specific language is not possible.
OpenSearchServer works with patterns.
You can add the specific domain in the pattern list and crawl the entire domain.
If you need to crawl other domains too It can be manually added in the pattern list or un-check the check-box in the "Crawler/Web/PatternList",which crawls all the domains starts from the pattern specified.
How about indexing a specific language?
The web crawler detects the language of each web page and indexes it accordingly.
You can check the "Language" of each URL in the "Crawler/Web/URL browser".
By default the "Language" column is not visible.
It can be added from the "Column view" select box.
What data is used to detect the language of a specific web page? Is there need for a specific meta data tag to specify it? I need my pages to be in Swedish and then I need it to use the Swedish stop word list accordingly.