Collection, evaluation and selection of scientific literature : machine learning, bibliometrics and the World Wide Web

Connan, James (2004-12)

Thesis (MSc)--University of Stellenbosch, 2004.

Thesis

ENGLISH ABSTRACT: We present a system that uses statistical machine learning to identify and extract bibliography information from scientific literature. Techniques for finding and gathering useful information from the ever growing volume of knowledge on the World Wide Web (WWW), are investigated. We use hidden Markov models both for recognition of bibliography styles and extraction of bibliographic information with an accuracy of up to 97%. The accuracy with which we are able to extract this information allows us to present a case study in which we apply methods of citation analysis to information extracted from three areas of machine learning. We use this information to identify core sets of papers that have made significant contributions to the fields of hidden Markov models, neural networks and recurrent neural networks.

AFRIKAANSE OPSOMMING: Ons bied 'n sisteem aan wat gebruik maak van statistiese masjiene wat leer om bibliografiese inligting uit wetenskaplikke literatuur te identifiseer en ontgin. Tegnieke wat aangewend word vir die verkenning en insameling van nuttige inligting vanaf die snel groeiende kennisbron van die WWW, word ondersoek. Ons gebruik verskuilde Markov modelle vir die herkenning van verwysingsstyl en ontginning van verwysingsinligting met 'n akuraatheidspeil van to 97%. Hierdie hoë ontginningsakuraatheid stelons in staat om 'n toepassing van die tegniek op die veld van masjiene wat leer toe te pas. Ons rapporteer hoe ons die tegnieke gebruik het om literatuur wat beduidende bydraes in die velde van verskuilde Markov modelle, neurale netwerke en terugkerende neurale netwerke, te identifiseer.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/49886
This item appears in the following collections: