Examining learning algorithms for text classification in digital libraries

Fahmi, Ismail Examining learning algorithms for text classification in digital libraries., 2004 Master Thesis thesis, University of Groningen, Netherland. [Thesis]

[thumbnail of IsmailFahmi_Thesis_master.pdf]

Preview

PDF
IsmailFahmi_Thesis_master.pdf
Download (399kB) | Preview

English abstract

Information presentation in a digital library plays important role especially in improving the usability of collections and helping users to get started with the collection. One approach is to provide an overview through large topical category hierarchies associated with the documents of a collection. But with the growth in the amount of information, this manual classification becomes a new problem for users. The navigation through the hierarchy can be a time-consuming and frustrating process. In this master thesis, we examine the performance of machine learning algorithms for automatic text classification. We examine three learning algorithms namely ID3, Instance Based Learning, and Naive Bayes to classify documents according to their category hierarchies. We focused on the effectiveness measurement such as recall, precision, the F1- measure, error, and the learning curve in learning a manually classified metadata collection from the Indonesian Digital Library Network (IndonesiaDLN), and we compare the results with an examination of the Reuters-21578 dataset. We summarize the algorithm that is most suitable for the digital library collection and the performance of the algorithms on these datasets.

Item type:	Thesis (UNSPECIFIED)
Keywords:	dataset; algorythms; digital library; software
Subjects:	A. Theoretical and general aspects of libraries and information.
Depositing user:	Imam Budi Prasetiawan
Date deposited:	14 Apr 2007
Last modified:	02 Oct 2014 12:07
URI:	http://hdl.handle.net/10760/9315

Check full metadata for this record

References

Downloads

Downloads per month over past year

Actions (login required)

View Item

Facebook

Twitter

RSS