Menu

Is Web Crawler and Serach Engine Same ??

2013-12-16
2014-03-18
  • Sachin Gupta

    Sachin Gupta - 2013-12-16

    Hi
    Please answer it
    Can i call Web Crawler - A search Engine ???

    WebCrawler, the Web’s first comprehensive full-text search engine, is a tool that assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers’ queries from the index.

    How much worthy this statement is ??
    Becuase What i have read on internet,everywhere it is written Web crawler extracts information from Web and store it in Search Engine database.

    My question is Who Index It then in search Engine??

    Who will perform Searching of user query when user querying to serach engine???

    Is Web Crawler and Search Engine Same??

     
  • Oliver Froitzheim

    I understand those words like the following:

    A search engine is a big database which provides a search inferface. Like Goolge, Yahoo etc.
    This engine is "feeded" by the data the web crawler collects and sends back to its mother ship (called search engine).

    Therefore, Google is a search engine which has a web crawler (Googlebot) which collects data for it.

    You can say a web crawler is part of a web search engine.

     
  • Alexandre Toyer

    Alexandre Toyer - 2014-03-03

    Hello Sachin,

    Here is a picture showing the different parts of a global search engine: http://jaeksoft.github.io/opensearchserver/assets/tutorial/schema3_en.png

    And here are two definitions, for "crawlers" and "index":

    • Index: this is where documents are stored, sorted and analysed using algorithms that allow for faster searches.
    • Crawler: a "web crawler" explores websites to index their pages. It can follow every link it finds, or it can be limited to exploring certain URL patterns. A modern web crawler can read many types of document: web pages, files, images, etc. There also exist crawlers that index filesystem and databases rather than web sites.

    The crawler is responsible for giving data to the index. Then the "querying" part of the search engines can be used to access this data.

    Regards,
    Alexandre

     
  • Lev

    Lev - 2014-03-18

    Hi all,

    Can anyone tell me approximately how much time crawler needs to index pages from site with about 1,000,000 videos?

    thanks

     
  • Naveen A.N

    Naveen A.N - 2014-03-18

    Hello,

    It depends up on your Network speed.

    OpenSearchServer limits one thread per domain so if you are crawling from same domain it will be little slower.

    Naveen.A.N

     
  • Lev

    Lev - 2014-03-18

    Thank you for your reply,

    I have this config in crawler:

    Number of URLs to crawl: 50000
    Fetch interval between re-fetches: 30
    Number of simultaneous threads: 10
    Maximum number of URLs per host: 500
    Delay between each successive access, in seconds: 10

    It's running 2nd day and there's about 150MB of data in index. Is this good or I have to tweak configuration?

    Goal is to fetch all info about these 1,000,000 videos

     

Log in to post a comment.