Hi
Please answer it
Can i call Web Crawler - A search Engine ???
WebCrawler, the Web’s first comprehensive full-text search engine, is a tool that assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers’ queries from the index.
How much worthy this statement is ??
Becuase What i have read on internet,everywhere it is written Web crawler extracts information from Web and store it in Search Engine database.
My question is Who Index It then in search Engine??
Who will perform Searching of user query when user querying to serach engine???
Is Web Crawler and Search Engine Same??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A search engine is a big database which provides a search inferface. Like Goolge, Yahoo etc.
This engine is "feeded" by the data the web crawler collects and sends back to its mother ship (called search engine).
Therefore, Google is a search engine which has a web crawler (Googlebot) which collects data for it.
You can say a web crawler is part of a web search engine.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And here are two definitions, for "crawlers" and "index":
Index: this is where documents are stored, sorted and analysed using algorithms that allow for faster searches.
Crawler: a "web crawler" explores websites to index their pages. It can follow every link it finds, or it can be limited to exploring certain URL patterns. A modern web crawler can read many types of document: web pages, files, images, etc. There also exist crawlers that index filesystem and databases rather than web sites.
The crawler is responsible for giving data to the index. Then the "querying" part of the search engines can be used to access this data.
Regards,
Alexandre
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Number of URLs to crawl: 50000
Fetch interval between re-fetches: 30
Number of simultaneous threads: 10
Maximum number of URLs per host: 500
Delay between each successive access, in seconds: 10
It's running 2nd day and there's about 150MB of data in index. Is this good or I have to tweak configuration?
Goal is to fetch all info about these 1,000,000 videos
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
Please answer it
Can i call Web Crawler - A search Engine ???
WebCrawler, the Web’s first comprehensive full-text search engine, is a tool that assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers’ queries from the index.
How much worthy this statement is ??
Becuase What i have read on internet,everywhere it is written Web crawler extracts information from Web and store it in Search Engine database.
My question is Who Index It then in search Engine??
Who will perform Searching of user query when user querying to serach engine???
Is Web Crawler and Search Engine Same??
I understand those words like the following:
A search engine is a big database which provides a search inferface. Like Goolge, Yahoo etc.
This engine is "feeded" by the data the web crawler collects and sends back to its mother ship (called search engine).
Therefore, Google is a search engine which has a web crawler (Googlebot) which collects data for it.
You can say a web crawler is part of a web search engine.
Hello Sachin,
Here is a picture showing the different parts of a global search engine: http://jaeksoft.github.io/opensearchserver/assets/tutorial/schema3_en.png
And here are two definitions, for "crawlers" and "index":
The crawler is responsible for giving data to the index. Then the "querying" part of the search engines can be used to access this data.
Regards,
Alexandre
Hi all,
Can anyone tell me approximately how much time crawler needs to index pages from site with about 1,000,000 videos?
thanks
Hello,
It depends up on your Network speed.
OpenSearchServer limits one thread per domain so if you are crawling from same domain it will be little slower.
Naveen.A.N
Thank you for your reply,
I have this config in crawler:
Number of URLs to crawl: 50000
Fetch interval between re-fetches: 30
Number of simultaneous threads: 10
Maximum number of URLs per host: 500
Delay between each successive access, in seconds: 10
It's running 2nd day and there's about 150MB of data in index. Is this good or I have to tweak configuration?
Goal is to fetch all info about these 1,000,000 videos