How Search Engines Work and a Web Crawler Application
The objective of this project is to describe the performance of HTTP based search engine that what tasks are performed by the search engine. The basic key of a search engine is Web Crawler. The web crawler is developed and implemented in JAVA of version 1.4.2. The HTTP based search engine uses the system architecture of the other search engine such as Google, Yahoo etc. When we talk about web crawler the question arises that what is web crawler…So , a web crawler is a program that downloads and stores web pages. A web crawler gets the URL then extract that URL and then download the page later it places the other URL in the queue. A web crawler uses special software called Spider to find the information of the specific word from hundreds of millions of web pages. As we know that architecting a search engine is a difficult task. We daily search hundreds or thousands of pages and search engine typically take most 0.001 seconds to search our required word or phrase. Every search engine is different from another other but their searching methodology is same. Though every search engine has its unique method to search they all perform three major tasks. First, every search engine broke the word into different pieces based on important words. Secondly , they remember the word and location of the word , that where they find them . Third and the last , they give the choice to the user to search word or combination of many words .
There are basically two types of the search engine, the first one is a crawler-based search engine and the second one is human-powered directories. The difference between these two types of search tools is that crawler-based search engine creates listing automatically and not by human selection, whereas human powered built listings on human selection.
As there is the much more recent advancement in search engine, yet there is a door of improvements. There are many ways to improve the search engine. First one is to improve User Interface, to make the search engine user-friendly. Second, is to improve the filtering the process. And the third one is to solve algorithms in web pages. In the end, the conclusion of this research is to explain the working of search engine and the web crawler (software used for searching). It discusses the complete working, its functionalities, its architecture, its types and the ways of improvements.