Evaluating and comparing search engines in retrieving text information from the web

Weldeghebriel, Zemichael Fesahatsion (2004-03)

Thesis (MPhil)--Stellenbosch University, 2004

Thesis

ENGLISH ABSTRACT: With the introduction of the Internet and the World Wide Web (www), information can be easily accessed and retrieved from the web using information retrieval systems such as web search engines or simply search engines. There are a number of search engines that have been developed to provide access to the resources available on the web and to help users in retrieving relevant information from the web. In particular, they are essential for finding text information on the web for academic purposes. But, how effective and efficient are those search engines in retrieving the most relevant text information from the web? Which of the search engines are more effective and efficient? So, this study was conducted to see how effective and efficient search engines are and to see which search engines are most effective and efficient in retrieving the required text information from the web. It is very important to know the most effective and efficient search engines because such search engines can be used to retrieve a higher number of the most relevant text web pages with minimum time and effort. The study was based on nine major search engines, four search queries and relevancy judgments as relevant/partly-relevanUnon-relevant. Precision and recall were calculated based on the experimental or test results and these were used as basis for the statistical evaluation and comparisons of the retrieval effectiveness of the nine search engines. Duplicated items and broken links were also recorded and examined separately and were used as an additional measure of search engine effectiveness. A response time was also recorded and used as a base for the statistical evaluation and comparisons of the retrieval efficiency of the nine search engines. Additionally, since search engines involve indexing and searching in the information retrieval processes from the web, this study first discusses, from the theoretical point of view, how the indexing and searching processes are performed in an information retrieval environment. It also discusses the influences of indexing and searching processes on the effectiveness and efficiency of information retrieval systems in general and search engines in particular in retrieving the most relevant text information from the web.

AFRIKAANSE OPSOMMING: Met die koms van die Internet en die Wêreldwye Web (www) is inligting maklik bekombaar. Dit kan herwin word deur gebruik te maak van inligtingherwinningsisteme soos soekenjins. Daar is 'n hele aantal sulke soekenjins wat ontwikkel is om toegang te verleen tot die hulpbronne beskikbaar op die web en om gebruikers te help om relevante inligting vanaf die web in te win. Dit is veral noodsaaklik vir die verkryging van teksinligting vir akademiese doeleindes. Maar hoe effektief en doelmatig is die soekenjins in die herwinning van die mees relevante teksinligting vanaf die web? Watter van die soekenjins is die effektiefste? Hierdie studie is onderneem om te kyk watter soekenjins die effektiefste en doelmatigste is in die herwinning van die nodige teksinligting. Dit is belangrik om te weet watter soekenjin die effektiefste is want so 'n enjin kan gebruik word om 'n hoër getal van die mees relevante tekswebblaaie met die minimum van tyd en moeite te herwin. Heirdie studie is baseer op die sewe hoofsoekenjins, vier soektogte, en toepasliksheidsoordele soos relevant /gedeeltelik relevant/ en nie- relevant. Presiesheid en herwinningsvermoë is bereken baseer op die eksperimente en toetsresultate en dit is gebruik as basis vir statistiese evaluasie en vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Gedupliseerde items en gebreekte skakels is ook aangeteken en apart ondersoek en is gebruik as bykomende maatstaf van effektiwiteit. Die reaksietyd is ook aangeteken en is gebruik as basis vir statistiese evaluasie en die vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Aangesien soekenjins betrokke is by indeksering en soekprosesse, bespreek hierdie studie eers uit 'n teoretiese oogpunt, hoe indeksering en soekprosesse uitgevoer word in 'n inligtingherwinningsomgewing. Die invloed van indeksering en soekprosesse op die doeltreffendheid van herwinningsisteme in die algemeen en veral van soekenjins in die herwinning van die mees relevante teksinligting vanaf die web, word ook bespreek.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/53740
This item appears in the following collections: