Fahmi, Ismail Examining learning algorithms for text classification in digital libraries., 2004 Master Thesis thesis, University of Groningen, Netherland. [Thesis]
Preview |
PDF
IsmailFahmi_Thesis_master.pdf Download (399kB) | Preview |
English abstract
Information presentation in a digital library plays important role especially in improving the usability of collections and helping users to get started with the collection. One approach is to provide an overview through large topical category hierarchies associated with the documents of a collection. But with the growth in the amount of information, this manual classification becomes a new problem for users. The navigation through the hierarchy can be a time-consuming and frustrating process. In this master thesis, we examine the performance of machine learning algorithms for automatic text classification. We examine three learning algorithms namely ID3, Instance Based Learning, and Naive Bayes to classify documents according to their category hierarchies. We focused on the effectiveness measurement such as recall, precision, the F1- measure, error, and the learning curve in learning a manually classified metadata collection from the Indonesian Digital Library Network (IndonesiaDLN), and we compare the results with an examination of the Reuters-21578 dataset. We summarize the algorithm that is most suitable for the digital library collection and the performance of the algorithms on these datasets.
Item type: | Thesis (UNSPECIFIED) |
---|---|
Keywords: | dataset; algorythms; digital library; software |
Subjects: | A. Theoretical and general aspects of libraries and information. |
Depositing user: | Imam Budi Prasetiawan |
Date deposited: | 14 Apr 2007 |
Last modified: | 02 Oct 2014 12:07 |
URI: | http://hdl.handle.net/10760/9315 |
References
Downloads
Downloads per month over past year
Actions (login required)
View Item |