Sever, Hayri and Tonta, Yaşar Truncation of Content Terms for Turkish., 2006 (Unpublished) [Report]
Preview |
PDF
tonta-sever-cicling2006.pdf Download (272kB) | Preview |
English abstract
Stemming, truncating, suffix stripping and decompounding algorithms used in information retrieval (IR) to reduce the content terms to their respective conflated forms are well-known algorithms for their causes for improving the retrieval performance as well as providing space and processing efficiency. In this paper we investigate the statistical characteristics of the truncated terms for Turkish on a text corpus consisting of more than 50 million words and attempt to measure the vocabulary growth rates for both the whole and truncated words. Findings indicate that the truncated words in Turkish exhibit a Zipfian behavior and that the whole words can successfully be truncated to the average word length (6.2 characters) without compromising performance effectiveness. The vocabulary growth rate for truncated words is about one third of that for the whole words. The result of our study is two fold. First it surely opens the room for truncation of content terms for Turkish for which there is no publicly available stemming code equipped with morphological analysis capability. Second, use of a truncation algorithm for indexing Turkish text may yield comparable effectiveness values with that of a stemming algorithm and hence, the need for stemming may become absolote, given that morphological analyzers for Turkish is highly complex in nature.
Item type: | Report |
---|---|
Keywords: | Stemming algorithms, truncation, Turkish language, information retrieval, indexing |
Subjects: | L. Information technology and library technology > LM. Automatic text retrieval. I. Information treatment for information services > ID. Knowledge representation. |
Depositing user: | prof. yasar tonta |
Date deposited: | 05 May 2007 |
Last modified: | 02 Oct 2014 12:07 |
URI: | http://hdl.handle.net/10760/9494 |
References
Downloads
Downloads per month over past year
Actions (login required)
View Item |