Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Is Web Crawler and Serach Engine Same ??

2013-12-16
2014-03-18
  • Sachin Gupta
    Sachin Gupta
    2013-12-16

    Hi
    Please answer it
    Can i call Web Crawler - A search Engine ???

    WebCrawler, the Web’s first comprehensive full-text search engine, is a tool that assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers’ queries from the index.

    How much worthy this statement is ??
    Becuase What i have read on internet,everywhere it is written Web crawler extracts information from Web and store it in Search Engine database.

    My question is Who Index It then in search Engine??

    Who will perform Searching of user query when user querying to serach engine???

    Is Web Crawler and Search Engine Same??

     
  • I understand those words like the following:

    A search engine is a big database which provides a search inferface. Like Goolge, Yahoo etc.
    This engine is "feeded" by the data the web crawler collects and sends back to its mother ship (called search engine).

    Therefore, Google is a search engine which has a web crawler (Googlebot) which collects data for it.

    You can say a web crawler is part of a web search engine.

     
  • Hello Sachin,

    Here is a picture showing the different parts of a global search engine: http://jaeksoft.github.io/opensearchserver/assets/tutorial/schema3_en.png

    And here are two definitions, for "crawlers" and "index":

    • Index: this is where documents are stored, sorted and analysed using algorithms that allow for faster searches.
    • Crawler: a "web crawler" explores websites to index their pages. It can follow every link it finds, or it can be limited to exploring certain URL patterns. A modern web crawler can read many types of document: web pages, files, images, etc. There also exist crawlers that index filesystem and databases rather than web sites.

    The crawler is responsible for giving data to the index. Then the "querying" part of the search engines can be used to access this data.

    Regards,
    Alexandre

     
  • Lev
    Lev
    2014-03-18

    Hi all,

    Can anyone tell me approximately how much time crawler needs to index pages from site with about 1,000,000 videos?

    thanks

     
  • Naveen A.N
    Naveen A.N
    2014-03-18

    Hello,

    It depends up on your Network speed.

    OpenSearchServer limits one thread per domain so if you are crawling from same domain it will be little slower.

    Naveen.A.N

     
  • Lev
    Lev
    2014-03-18

    Thank you for your reply,

    I have this config in crawler:

    Number of URLs to crawl: 50000
    Fetch interval between re-fetches: 30
    Number of simultaneous threads: 10
    Maximum number of URLs per host: 500
    Delay between each successive access, in seconds: 10

    It's running 2nd day and there's about 150MB of data in index. Is this good or I have to tweak configuration?

    Goal is to fetch all info about these 1,000,000 videos